Nandan Kumar Jha

PhD student at NYU CCS

New York University

About me

I am a PhD candidate at the Center for Cybersecurity, New York University (NYU), advised by Prof. Brandon Reagen. My research lies at the intersection of deep learning and applied cryptography (homomorphic encryption and multiparty computation), with a focus on cryptographically secure privacy-preserving machine learning (PPML). As part of the DPRIVE projects, I develop novel architectures and algorithms to optimize neural network computations on encrypted data.

In the early stages of my PhD, I led the design of nonlinear-efficient CNNs, introducing ReLU-optimization techniques (DeepReDuce, ICML'21) and methods for redesigning existing CNNs for private inference efficiency (DeepReShape, TMLR'24), including a family of architectures called HybReNets.

My current research focuses on making private LLM inference more practical through architectural optimizations and algorithmic innovations. Specifically, we examine the functional role of nonlinearities from an information-theoretic perspective and develop the AERO framework which designs nonlinearity-reduced architectures with entropy-guided attention mechanisms. Our preliminary findings have been accepted to PPAI@AAAI'25 and ATTRIB@NeurIPS'24.

Recent talks: We presented our work Entopy and Private Language Models at the NYU CILVR Seminar, and Entropy-Guided Attention for Private LLMs on the AI Fireside Chat

Besides research, I have contributed as an (invited) reviewer for NeurIPS (2023, 2024), ICML (2024, 2025), ICLR (2024, 2025), TMLR (2025), AISTATS (2025), CVPR (2024, 2025), ICCV (2025), and AAAI (2025).

I am currently on the job market, graduating in Fall 2025, and seeking research scientist roles at the intersection of LLM science, architectural optimization, and privacy-preserving AI. Feel free to reach out!

Interests

Cryptographically Secure PPML
Architectural Optimizations for Privacy and Security in LLMs
Entropic Characterization of Nonlinearity’s Role in LLM

Education

Ph.D. in Privacy-preserving Deep Learning, 2020 - present
New York University
M.Tech. (Research Assistant) in Computer Science and Engineering, 2017 - 2020
Indian Institute of Technology Hyderabad
B.Tech. in Electronics and Communication Engineering, 2009 - 2013
National Institute of Technology Surat

Entropy-guided Attention Work Accepted to AAAI'25 PPAI Workshop!

We discover the dual significance of nonlinearities in LLMs, enabling stable training and attention head specialization.

Explore our findings!

Our Work on Private LLM Inference is Out!

We introduce AERO framework and design Softmax-only architecture with a novel entropy regularization technique, to prevent entropic overload and foster attention diversity

Read the full paper!

Presented at NeurIPS'24 ATTRIB Workshop!

Normalization-free LLMs prefer ReLU-like activation as it prevents entropic overload, a condition where many attention heads remain stuck in a higher entropy state during training.

Discover our findings!

DeepReShape accepted to TMLR!

We introduced a novel approach (ReLU-equalization) for redesigning network architectures for processing encrypted data efficiently, and proposed HybReNets

Read the full paper here!

Our work on end-to-end system-level optimizations for private inference is accepted to ASPLOS 2023!

Read the accepted paper!

Circa accepted to NeurIPS 2021!

We co-design Garbled circuit and stochastic ReLUs exploiting the fault tolerance of DNNs for fast private inference

See the research!

Our work CryptoNite is out!

We showed that current practices in private inference optimization primarily work with zero arrival requests but tend to fail even at moderate request rates.

Read the paper!

Sisyphus accepted to ACM CCS PPML 2021!

We investigate the “escaping activations” problem in low-degree polynomial substitution of ReLUs for faster private inference and proposed Quadratic Imitation Learning

Have a look at the paper and code!

DeepReDuce accepted to ICML 2021 [Spotlight]!

DeepReDuce is the first work which introduces criticality based ReLU dropping for fast private inference

Read the full paper here!

Started PhD at ECE, NYU!

Paper on arithmetic intensity for DNNs accepted to IEEE TC 2020!

We proposed a data-type aware arithmetic intensity for estimating energy efficiency of DNNs at design time

Read the paper!

Paper on DNN security accepted to ACM JETC 2020!

We demonstrated how side-channel attacks can be used to decipher the architecture of basic building blocks in DNNs

Read the paper!

Presented our work at ISVLSI 2020!

We proposed DNN co-optimization techniques to circumvent the low PE utilization in depthwise convolution

Read the paper!

Presented our work at WACV 2020!

We proposed light-weight subspace attention module for compact DNNs

Read the paper!

Presented our work at VLSID 2020!

We proposed group convolution techniques for energy-efficient and accurate DNNs

Read the paper!

Presented poster at ICCD 2019!

Received certificate of appreciation for research in CSE IIT Hyderabad!

My first paper accepted to VLSID 2019!

We showed the design ramifications of FLOPs and parameter optimized DNNs

Read the paper!

Featured Publications

Nandan Kumar Jha, Brandon Reagen

January 2025 In PPAI (AAAI) Workshop 2025

Entropy-Guided Attention for Private LLMs

We introduce an information-theoretic framework to characterize the role of nonlinearities in decoder-only language models, laying a principled foundation for optimizing transformer-architectures tailored to the demands of Private Inference (PI). By leveraging Shannon’s entropy as a quantitative measure, we uncover the previously unexplored dual significance of nonlinearities, beyond ensuring training stability, they are crucial for maintaining attention head diversity. Specifically, we find that their removal triggers two critical failure modes, entropy collapse in deeper layers that destabilizes training, and entropic overload in earlier layers that leads to under-utilization of Multi-Head Attention’s (MHA) representational capacity. We propose an entropy-guided attention mechanism paired with a novel entropy regularization technique to mitigate entropic overload. Additionally, we explore inference-efficient alternatives to layer normalization for preventing entropy collapse and stabilizing the training of LLMs with reduced-nonlinearities. Our study bridges the gap between information theory and architectural design, establishing entropy dynamics as a principled guide for developing efficient PI architecture.

Nandan Kumar Jha, Zahra Ghodsi, Siddharth Garg, Brandon Reagen

July 2021 In ICML 2021 (Spotlight)

DeepReDuce: ReLU Reduction for Fast Private Inference

DeepReDuce is a set of optimizations for the judicious removal of ReLUs to reduce private inference latency by leveraging the ReLUs heterogeneity in classical networks. DeepReDuce strategically drops ReLUs upto 4.9x (on CIFAR-100) and 5.7x (on TinyImageNet) for ResNet18 with no loss in accuracy. Compared to the state-of-the-art for private inference DeepReDuce improves accuracy and reduces ReLU count by up to 3.5% (iso-ReLU) and 3.5×(iso-accuracy), respectively.

Recent Publications

Quickly discover relevant content by filtering publications.

Nandan Kumar Jha, Brandon Reagen (2024). AERO: Softmax-Only LLMs for Efficient Private Inference. In ArXiv Preprint.

PDF Cite

Nandan Kumar Jha, Brandon Reagen (2024). ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models. In ATTRIB (NeurIPS) Workshop.

PDF Cite Code Poster

Nandan Kumar Jha, Brandon Reagen (2024). DeepReShape: Redesigning Neural Networks for Efficient Private Inference. In TMLR 2024.

PDF Cite Slides Video

Karthik Garimella, Zahra Ghodsi, Nandan Kumar Jha, Siddharth Garg, Brandon Reagen (2023). Characterizing and Optimizing End-to-End Systems for Private Inference. In ASPLOS 2023.

PDF Cite Code Poster

Karthik Garimella, Nandan Kumar Jha, Zahra Ghodsi, Siddharth Garg, Brandon Reagen (2021). CryptoNite: Revealing the Pitfalls of End-to-End Private Inference at Scale. In Arxiv Preprint.

PDF Cite

See all publications