Projects

A curated list of my open-source research projects on efficient LLM training and inference systems.

ZO2: Full-Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory

Zeroth-order offloading framework that enables memory-efficient full-parameter fine-tuning for extremely large LLMs.

Tinytron

A minimal yet practical pre-training stack for GPT-style models with FA/GQA/MoE support and distributed training utilities (ZeRO-1, Sequence-Expert Joint Parallelism).

Tiny-LLM-Libs

Educational mini-replicas of major distributed training stacks, designed for reading core mechanisms quickly.

Tiny-FSDP

DDP/ZeRO-3/FSDP side-by-side implementations for communication and memory trade-off learning.

Tiny-DeepSpeed

Minimal DDP + ZeRO1/2/3 training stack with meta initialization and overlap primitives.

Tiny-Megatron

Educational TP/DP/2D hybrid pipeline with custom modules and runtime auto-tuning.

Liangyu Wang

ZO2: Full-Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory

Tinytron

Tiny-LLM-Libs