Nandan Kumar Jha
Nandan Kumar Jha
Home
Research
Publications
Highlights
Talks
Media
Contact
Optimization
Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws
Shows that optimizers can determine how much nominal FFN width becomes realized spectral capacity, even when validation loss is matched.
Nandan Kumar Jha
,
Brandon Reagen
PDF
Cite
Code
Project
Blog
Regularizing the Entropy Landscape of Self-Attention: Towards a Soft Inductive Bias in LLMs
Studies entropy regularization for self-attention as a soft inductive bias in large language models.
Nandan Kumar Jha
,
Brandon Reagen
Cite
Workshop
OpenReview
Cite
×