Regularizing the Entropy Landscape of Self-Attention: Towards a Soft Inductive Bias in LLMs

Abstract

This workshop paper studies entropy regularization for self-attention as a soft inductive bias for large language models, connecting attention entropy dynamics to training stability and representation behavior.

Publication
OPT Workshop at NeurIPS 2025
Nandan Kumar Jha
Nandan Kumar Jha
Ph.D., New York University · Representation Learning, Scaling Laws, and High-Dimensional Learning Dynamics

I study nonlinear representation dynamics in large language models, focusing on how nonlinearities, architecture, and optimization jointly shape representational geometry, scaling behavior, and usable computational capacity.

Related