FlashDP: Memory-Efficient and High-Throughput DP-SGD Training for Large Language Models
Published in NeurIPS workshop 2024, 2024
FlashDP introduces a memory-efficient and high-throughput approach for differential privacy stochastic gradient descent (DP-SGD) training of large language models.
Our method addresses the significant memory and computational challenges of applying DP-SGD to large-scale models by optimizing the per-example gradient computation process. FlashDP makes privacy-preserving training more practical and efficient for modern LLM architectures while maintaining privacy guarantees.
Recommended citation: Liangyu Wang, Junxiao Wang, Jie Ren, Zihang Xiang, David E. Keyes, and Di Wang. (2024). "FlashDP: Memory-Efficient and High-Throughput DP-SGD Training for Large Language Models." NeurIPS workshop 2024.
Download Paper