Spectral Geometry

Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws

Shows that optimizers can determine how much nominal FFN width becomes realized spectral capacity, even when validation loss is matched.

Nandan Kumar Jha, Brandon Reagen

NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks

Introduces eigenspectrum-based tools for tracking how nonlinearities reshape FFN representation geometry across layers and model scales.

Nandan Kumar Jha, Brandon Reagen

Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?

Studies how effectively LLM feed-forward networks use latent width through soft- and hard-spectral-rank scaling laws.

Nandan Kumar Jha, Brandon Reagen

A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention

Uses random-matrix tools to analyze how multi-head latent attention evolves during training, revealing capacity bottlenecks and representation-geometry shifts.

Nandan Kumar Jha, Brandon Reagen