CV

Education

June 2025 - Present: Research Intern
- Alibaba Qwen Team
- Duties includes: TBD
- Supervisor: TBD
Summer 2024: LLM Pretraining Engineer (Intern)
- Aramco
- Duties includes: Pretraining large-scale models using a 24-H100-GPU cluster. Building pre-training framework, improving training throughput with CUDA kernel fusion, multi-thread scheduling, and asynchronous checkpointing.
- Supervisor: Salma Alsinan
Fall 2022: Research Assistant
- King Abdullah University of Science and Technology
- Duties included: Distributed Federated Learning
- Supervisor: Di Wang

PyTorch / Libtorch: In-depth knowledge of PyTorch operators’ workflow and implementation, including distributed training packages, and multi-threading / streaming programming.
CUDA programming / Triton: Intermediate in CUDA stream and kernel programming, with a solid understanding of CUDA principles.
DeepSpeed / Megatron: Experience using DeepSpeed and Megatron for distributed training, including manual implementation for optimization.
Programming Languages: Python (Mainly for PyTorch), C/C++ (Mainly for Multi-thread, CUDA Programming, and LibTorch).

–>