1

NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks

Introduces eigenspectrum-based tools for tracking how nonlinearities reshape FFN representation geometry across layers and model scales.

Nandan Kumar Jha, Brandon Reagen

Circa: Stochastic ReLUs for Private Deep Learning

Circa reduces the runtime overhead of ReLU operation by 1.9x by decoupling the sign evaluation and multiplication steps in the Garbled circuit with no loss in accuracy. Further, it achieves a total of 4.7x runtime reduction by employing the sign approximation in the Garbled circuit by leveraging the error-tolerant properties of neural networks within a 1% accuracy margin.

Zahra Ghodsi, Nandan Kumar Jha, Brandon Reagen, Siddharth Garg

Circa: Stochastic ReLUs for Private Deep Learning

Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?

Studies how effectively LLM feed-forward networks use latent width through soft- and hard-spectral-rank scaling laws.

Nandan Kumar Jha, Brandon Reagen

DeepReDuce: ReLU Reduction for Fast Private Inference

DeepReDuce is a set of optimizations for the judicious removal of ReLUs to reduce private inference latency by leveraging the ReLUs heterogeneity in classical networks. DeepReDuce strategically drops ReLUs upto 4.9x (on CIFAR-100) and 5.7x (on TinyImageNet) for ResNet18 with no loss in accuracy. Compared to the state-of-the-art for private inference DeepReDuce improves accuracy and reduces ReLU count by up to 3.5% (iso-ReLU) and 3.5×(iso-accuracy), respectively.

Nandan Kumar Jha, Zahra Ghodsi, Siddharth Garg, Brandon Reagen

DeepReDuce: ReLU Reduction for Fast Private Inference

DRACO: Co-Optimizing Hardware Utilization and Performance of DNNs on Systolic Accelerator

Co-optimizes DNN structure for systolic accelerators, improving hardware utilization, inference latency, and energy efficiency without requiring hardware dataflow changes.

Nandan Kumar Jha, Shreyas Ravishankar, Sparsh Mittal, Arvind Kaushik, Dipan Mandal, Mahesh Chandra

DRACO: Co-Optimizing Hardware Utilization and Performance of DNNs on Systolic Accelerator

ULSAM: Ultra-lightweight subspace attention module for compact convolutional neural networks

Introduces an ultra-lightweight subspace-attention module for compact CNNs, reducing parameters and FLOPs while improving accuracy on ImageNet and fine-grained recognition benchmarks.

Rajat Saini, Nandan Kumar Jha, Bedanta Das, Sparsh Mittal, C Krishna Mohan

ULSAM: Ultra-lightweight subspace attention module for compact convolutional neural networks

E2GC: Energy-efficient group convolution in deep neural networks

The number of groups (g) in group convolution (GConv) is selected to boost the predictive performance of deep neural networks (DNNs) in …

Nandan Kumar Jha, Rajat Saini, Subhrajit Nag, Sparsh Mittal

E2GC: Energy-efficient group convolution in deep neural networks

The Ramifications of Making Deep Neural Networks Compact

The recent trend in deep neural networks (DNNs) research is to make the networks more compact. The motivation behind designing compact …

Nandan Kumar Jha, Sparsh Mittal, Govardhan Mattela

The Ramifications of Making Deep Neural Networks Compact

Characterizing and Optimizing End-to-End Systems for Private Inference

Studies private inference as an end-to-end systems problem, identifying bottlenecks across the full stack and optimizing performance beyond isolated model changes.

Karthik Garimella, Zahra Ghodsi, Nandan Kumar Jha, Siddharth Garg, Brandon Reagen

Characterizing and Optimizing End-to-End Systems for Private Inference

DeepReShape: Redesigning Neural Networks for Efficient Private Inference

DeepReShape is the first work to conduct a rigorous characterization of desirable neural network attributes for efficient Private Inference (PI). We discovered that distinct network attributes are required for different ReLU counts; in particular, wider networks are beneficial only for higher ReLU counts, whereas networks with a greater proportion of least-critical ReLU are desirable for lower ReLU counts. Further, we introduced a novel network design principle called “ReLU-equalization” to strategically allocate channels within the network to optimize ReLUs and FLOPs efficiency simultaneously. DeepReShape outperforms the current SOTA (SENets, ICLR'23) by achieving a 2.1% increase in accuracy and a 5.2x faster runtime at iso-ReLU counts on CIFAR-100, and an 8.7x faster runtime at iso-accuracy on the TinyImageNet dataset.

Nandan Kumar Jha, Brandon Reagen

DeepReShape: Redesigning Neural Networks for Efficient Private Inference