Scaling laws usually describe how loss improves with parameters, data, and compute. This work studies a complementary question: how much of an architecture’s nominal feed-forward width becomes realized representational capacity during training. Holding architecture and data fixed, we show that optimizers can induce sharply different spectral scaling behavior across token-frequency regimes. In particular, optimizer choice changes hard-rank scaling, spectral asymmetry, and the extent to which rare-token and mid-frequency representations use available width. The results suggest that capacity is not only specified by architecture; it is realized through learning dynamics.