A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention

Nandan Kumar Jha, Brandon Reagen

July 2025

Abstract

This work studies Multi-head Latent Attention through the lens of random matrix theory and high-dimensional learning dynamics. By tracking spectral outliers, Marchenko–Pastur deviations, and spike-like structure during training, the analysis probes how latent attention mechanisms reorganize representation geometry relative to standard multi-head attention variants.

Type

Preprint

Publication

HiLD Workshop at ICML 2025

A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention

Abstract

Nandan Kumar Jha

Ph.D., New York University · Representation Learning, Scaling Laws, and High-Dimensional Learning Dynamics

Related