Nandan Kumar Jha
Nandan Kumar Jha
Home
News
Publications
Contact
3
Sisyphus: A Cautionary Tale of Using Low-Degree Polynomial Activations in Privacy-Preserving Deep Learning
Privacy concerns in client-server machine learning have given rise to private inference (PI), where neural inference occurs directly on …
Karthik Garimella
,
Nandan Kumar Jha
,
Brandon Reagen
PDF
Cite
Code
Poster
PPML Proceeding
CryptoNite: Revealing the Pitfalls of End-to-End Private Inference at Scale
In this paper we demonstrated that how the current trend in private inference myopically optimized the performance only for zero arrival rate; in particular, they have developed the mechanism to mitigate the bottlenecked caused by non-linearity in neural networks. However, in a real-world scenario when inference request comes even with a moderate arrival rate the homomorphic encryption becomes the main bottleneck since we can no longer pre-process it in the offline computation phase.
Karthik Garimella
,
Nandan Kumar Jha
,
Zahra Ghodsi
,
Siddharth Garg
,
Brandon Reagen
PDF
Cite
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
LayerNorm is a critical component in modern large language models (LLMs) for stabilizing training and ensuring smooth optimization. …
Nandan Kumar Jha
,
Brandon Reagen
PDF
Cite
Code
Poster
AERO: Softmax-Only LLMs for Efficient Private Inference
In this work, we present a comprehensive analysis to understand the role of nonlinearities in transformer-based decoder-only language models. We introduce AERO, a four-step architectural optimization framework that refines the existing LLM architecture for efficient PI by systematically removing nonlinearities such as LayerNorm and GELU and reducing FLOPs counts. For the first time, we propose a Softmax-only architecture with significantly fewer FLOPs tailored for efficient PI. Furthermore, we devise a novel entropy regularization technique to improve the performance of Softmax-only models. AERO achieves up to 4.23x communication and 1.94x latency reduction.
Nandan Kumar Jha
,
Brandon Reagen
PDF
Cite
Entropy-Guided Attention for Private LLMs
We introduce an information-theoretic framework to characterize the role of nonlinearities in decoder-only language models, laying a …
Nandan Kumar Jha
,
Brandon Reagen
PDF
Cite
Code
Poster
Cite
×