Feed-forward networks account for a large fraction of transformer parameters, but parameter count alone does not reveal how effectively their latent width is used. This work studies spectral scaling laws for language-model feed-forward representations, measuring soft and hard spectral ranks as model width, depth, and training conditions vary. The analysis shows that nominal width and realized representational dimension can scale differently, motivating spectral telemetry as a complement to loss-based scaling laws.
Earlier version presented at the ICML 2025 Actionable Interpretability Workshop (AIW).