Regularizing the Entropy Landscape of Self-Attention: Towards a Soft Inductive Bias in LLMs

Nandan Kumar Jha, Brandon Reagen

December 2025

Abstract

This workshop paper studies entropy regularization for self-attention as a soft inductive bias for large language models, connecting attention entropy dynamics to training stability and representation behavior.

Type

Preprint

Publication

OPT Workshop at NeurIPS 2025

Regularizing the Entropy Landscape of Self-Attention: Towards a Soft Inductive Bias in LLMs

Abstract

Nandan Kumar Jha

Ph.D., New York University · Representation Learning, Scaling Laws, and High-Dimensional Learning Dynamics

Related