Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws

Nandan Kumar Jha, Brandon Reagen

May 2026

Abstract

Scaling laws usually describe how loss improves with parameters, data, and compute. This work studies a complementary question: how much of an architecture’s nominal feed-forward width becomes realized representational capacity during training. Holding architecture and data fixed, we show that optimizers can induce sharply different spectral scaling behavior across token-frequency regimes. In particular, optimizer choice changes hard-rank scaling, spectral asymmetry, and the extent to which rare-token and mid-frequency representations use available width. The results suggest that capacity is not only specified by architecture; it is realized through learning dynamics.

Type

Preprint

Publication

Under review, 2026

Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws

Abstract

Nandan Kumar Jha

Ph.D., New York University · Representation Learning, Scaling Laws, and High-Dimensional Learning Dynamics

Related