A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention

Abstract

This work studies Multi-head Latent Attention through the lens of random matrix theory and high-dimensional learning dynamics. By tracking spectral outliers, Marchenko–Pastur deviations, and spike-like structure during training, the analysis probes how latent attention mechanisms reorganize representation geometry relative to standard multi-head attention variants.

Publication
HiLD Workshop at ICML 2025
Nandan Kumar Jha
Nandan Kumar Jha
Ph.D., New York University · Representation Learning, Scaling Laws, and High-Dimensional Learning Dynamics

I study nonlinear representation dynamics in large language models, focusing on how nonlinearities, architecture, and optimization jointly shape representational geometry, scaling behavior, and usable computational capacity.

Related