
I am an Assistant Professor at Halıcıoğlu Data Science Institute and Department of Computer Science and Engineering (affiliate) at UC San Diego. I lead the Hao AI Lab at UCSD. I am a Sloan Research Fellow (2026), nominated as MIT TR35 (China) (2025), and a recipient of the Google ML and Systems Award (2025) and OSDI Best Paper Award (2021). My work on FastVideo, DistServe, vLLM, and LMArena has reached millions of users. Here is an extended Bio.
Prospective students and postdocs: I am recruiting new PhD students and postdocs. We also have openings for MS/undergrad research interns. Please check out this page to see how to get involved.
Research
I study the intersection area of machine learning and systems. I am equally interested in designing strong, efficient, and secure machine learning models and algorithms, and in building scalable, practical distributed systems that can support real-world machine learning workloads.
Our Lab (@haoailab) develops open models, algorithms, and systems to democratize the access of large models.
Current Projects
- LLM inference and serving systems: DeepConf [ICLR'26], Dynasor [NeurIPS'25], DistServe [OSDI'24], vLLM [SOSP'23]
- Efficient ML architectures/algorithms: d3LLM [Preprint'26], Jacobi Forcing [Preprint'25], VSA/STA [NeurIPS'25, ICML'25]
- Open data, models, and evals: VideoScience-Bench [Preprint'26], FastWan Series, LMGame Bench [ICLR'26, ICLR'25]
- Model-parallel ML Systems: DistCA [MLSys'26], Alpa [OSDI'22, MLSys'23]
Some of my research have been actively developed and maintained as open source software:
- Dreamverse: Vibe-directing 1080p/30s videos in real time.
- FastVideo: A lightweight framework for accelerating large video diffusion models.
- LMGame: Evaluate and improve AI by repurposing computer games.
- Lookahead Decoding: A parallel LLM decoding method that trades FLOPs for fewer decoding steps.
- vLLM: A high-throughput and memory-efficient inference engine for LLMs.
- Ray Collective: CPU/GPU collective communication primitives on Ray.
Some previous projects:
- FastChat: An open platform for training, serving, and evaluating Large Language Models.
- Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings.
- Vicuna: A series of popular open-source LLM chatbots available in 7B/13B/33B sizes.
- Alpa: Training large-scale neural networks with auto parallelization. Scales to 1000+ GPUs.
- AutoDist: Automatic data-parallel training on TensorFlow.
- DyNet: The Dynamic Neural Network Toolkit.
- Poseidon: Parameter server on distributed GPUs.
Students and Postdocs
Current Members
- Junda Chen, PhD (w/ Tajana Rosing)
- Shao Duan, Undergrad Intern
- Yichao Fu, PhD
- Lanxiang Hu, PhD (w/ Tajana Rosing)
- Yixin Huang, Master Intern
- Mingjia Huo, PhD (w/ Tajana Rosing)
- Kaiqin Kong, Master Intern
- Will Lin, PhD
- Zihang Min, Undergrad Intern
- Jinzhe Pan, Undergrad Intern
- Yu-Yang Qian, Visiting PhD
- Loay Rashid, Master Intern
- David Su, PhD
- Junli Wang, PhD (w/ Prithviraj Ammanabrolu)
- Alex Zhang, Master Intern
- Yuxuan Zhang, Master
- Yiming Zhao, Master
Past Students
- Matthew Noto, Undergrad Intern (2025) -> xAI (2026)
- Wenxuan Tan, Undergrad Intern (2025) -> Bytedance (2026)
- Abhilash Shankarampeta, Master (2025) -> Cisco (2026)
- Rui Ge, Undergrad Intern (2025) -> UT Austin (2026)
- Susan Li, Undergrad Intern (2025) -> NUS (2026)
- Peiyuan Zhang, PhD (Master out) (2024) -> Bytedance (2026)
- Haoyang Yu, Undergrad Intern (2024) -> Harvard University (2026)
- Minghang Deng, Master (2024) -> Ant Group (2026)
- Yonqqi Chen, Master (2024) -> Stealth startup (2025)
- Runlong Su, Master (2024) -> Bytedance (2025)
- Zheyu Fu, Master (2024) -> NVIDIA (2025)
- Ashwin Ramachandran, Master (2024) -> ContextFort (co-founder) (2025)
- Siqi Zhu, Undergrad Intern (2024) -> PhD @ UIUC (2025)
- Anze Xie, Master (2023) -> MBZUAI IFM Lab (2025)
- Hangliang Ding, Undergrad Intern (2024) -> Bytedance (2024)
- Jiangfei Duan, Visiting PhD (2023) -> Alibaba Group (2024)
- Runyu Lu, Undergrad Intern (2023) -> PhD @ UMich (2024)
- Dacheng Li, Master (2020) -> PhD @ UC Berkeley (2023)
- Hexu Zhao, Undergrad Intern (2022) -> PhD @ NYU (2023)
- Yonghao Zhuang, Undergrad Intern (2021) -> PhD @ CMU (2022)
Recent Talks
- 03/2026Talk at Cerebras
- 02/2026Tutorial at Nvidia Research Radar Talk Series
- 01/2026Talk at Nvidia Dynamo Day
- 12/2025Talk at Workshop on Next Practices in Video Generation and Evaluation @ NeurIPS 2025
- 12/2025Talk at The First Workshop on Efficient Reasoning @ NeurIPS 2025
- 05/2025Talk at MBZUAI IFM Launching Event
- 04/2025Talk at Rugters Efficient AI Seminar
- 04/2025Talk at Microsoft Research Aisa ACE Talk Series
- 04/2025Talk at CMU 11868 LLM Systems
- 03/2025Talk at Bytedance AIP Spearhead Tech Talk Series
- 02/2025Talk at Faster LLM Inference Seminar @ Weizmann Institute of Science
- 11/2024Talk at UWaterloo Invited Talk
- 10/2024Talk at LinkedIn AI Seminar
- 10/2024Talk at PyTorch Webinar
- 09/2024Talk at Microsoft GenAI AIMS Talk
- 04/2024Talk at UChicago AI+System Seminar
- 03/2024Talk at NSF Open-Source Generative AI (OSGAI) Workshop
- 03/2024Talk at Essence VC Q1 Virtual Conference: LLM Inference
- 02/2024Talk at PKU Alumni Association of Northern California (PKUAANC)
- 12/2023Panel at Instruction Workshop @ NeurIPS 2023
Experience
- Assistant Professor, UC San Diego, 2023 - Present
- Software Engineer, Snowflake, 2023 - Present
- Postdoc, UC Berkeley, 2021 - 2023
- Director of Scalable Machine Learning, Petuum Inc, 2016 - 2021
- Ph.D. Student, Carnegie Mellon University, 2014 - 2020 (on leave 2016 - 2020)