Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?

Nandan Kumar Jha, Brandon Reagen

October 2025

Abstract

Feed-forward networks account for a large fraction of transformer parameters, but parameter count alone does not reveal how effectively their latent width is used. This work studies spectral scaling laws for language-model feed-forward representations, measuring soft and hard spectral ranks as model width, depth, and training conditions vary. The analysis shows that nominal width and realized representational dimension can scale differently, motivating spectral telemetry as a complement to loss-based scaling laws.

Type

Conference paper

Publication

Conference on Empirical Methods in Natural Language Processing 2025

Earlier version presented at the ICML 2025 Actionable Interpretability Workshop (AIW).

Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?

Abstract

Nandan Kumar Jha

Ph.D., New York University · Representation Learning, Scaling Laws, and High-Dimensional Learning Dynamics

Related