CV

Education

Work experience

  • Summer 2024: LLM Pretraining Engineer (Intern)

    • Aramco
    • Duties includes: Pretraining large-scale models using a 24-H100-GPU cluster. Building pre-training framework, improving training throughput with CUDA kernel fusion, multi-thread scheduling, and asynchronous checkpointing.
    • Supervisor: Salma Alsinan
  • Fall 2022: Research Assistant

    • King Abdullah University of Science and Technology
    • Duties included: Distributed Federated Learning
    • Supervisor: Di Wang

Skills

  • PyTorch / Libtorch: In-depth knowledge of PyTorch operators’ workflow and implementation, including distributed training packages, and multi-threading / streaming programming.
  • CUDA programming / Triton: Intermediate in CUDA stream and kernel programming, with a solid understanding of CUDA principles.
  • DeepSpeed / Megatron: Experience using DeepSpeed and Megatron for distributed training, including manual implementation for optimization.
  • Programming Languages: Python (Mainly for PyTorch), C/C++ (Mainly for Multi-thread, CUDA Programming, and LibTorch).

–>