Regularizing the Entropy Landscape of Self-Attention: Towards a Soft Inductive Bias in LLMs
Nandan Kumar Jha, Brandon Reagen
December 2025
Abstract
This workshop paper studies entropy regularization for self-attention as a soft inductive bias for large language models, connecting attention entropy dynamics to training stability and representation behavior.
Publication
OPT Workshop at NeurIPS 2025

Ph.D., New York University · Representation Learning, Scaling Laws, and High-Dimensional Learning Dynamics
I study nonlinear representation dynamics in large language models, focusing on how nonlinearities, architecture, and optimization jointly shape representational geometry, scaling behavior, and usable computational capacity.