Hao Zhang

Assistant Professor

HDSI, CSE (affiliate)

Email: haozhang AT ucsd.edu

I am an Assistant Professor at Halıcıoğlu Data Science Institute and Department of Computer Science and Engineering (affiliate) at UC San Diego. I lead the Hao AI Lab at UCSD. I am a Sloan Research Fellow (2026), nominated as MIT TR35 (China) (2025), and a recipient of the Google ML and Systems Award (2025) and OSDI Best Paper Award (2021). My work on FastVideo, DistServe, vLLM, and LMArena has reached millions of users. Here is an extended Bio.

Prospective students and postdocs: I am recruiting new PhD students and postdocs. We also have openings for MS/undergrad research interns. Please check out this page to see how to get involved.

Research

I study the intersection area of machine learning and systems. I am equally interested in designing strong, efficient, and secure machine learning models and algorithms, and in building scalable, practical distributed systems that can support real-world machine learning workloads.

Our Lab (@haoailab) develops open models, algorithms, and systems to democratize the access of large models.

Current Projects

LLM inference and serving systems: DeepConf [ICLR'26], Dynasor [NeurIPS'25], DistServe [OSDI'24], vLLM [SOSP'23]
Efficient ML architectures/algorithms: d3LLM [Preprint'26], Jacobi Forcing [Preprint'25], VSA/STA [NeurIPS'25, ICML'25]
Open data, models, and evals: VideoScience-Bench [Preprint'26], FastWan Series, LMGame Bench [ICLR'26, ICLR'25]
Model-parallel ML Systems: DistCA [MLSys'26], Alpa [OSDI'22, MLSys'23]

Some of my research have been actively developed and maintained as open source software:

Dreamverse: Vibe-directing 1080p/30s videos in real time.
FastVideo: A lightweight framework for accelerating large video diffusion models.
LMGame: Evaluate and improve AI by repurposing computer games.
Lookahead Decoding: A parallel LLM decoding method that trades FLOPs for fewer decoding steps.
vLLM: A high-throughput and memory-efficient inference engine for LLMs.
Ray Collective: CPU/GPU collective communication primitives on Ray.

Some previous projects:

FastChat: An open platform for training, serving, and evaluating Large Language Models.
Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings.
Vicuna: A series of popular open-source LLM chatbots available in 7B/13B/33B sizes.
Alpa: Training large-scale neural networks with auto parallelization. Scales to 1000+ GPUs.
AutoDist: Automatic data-parallel training on TensorFlow.
DyNet: The Dynamic Neural Network Toolkit.
Poseidon: Parameter server on distributed GPUs.

Students and Postdocs

Current Members

Junda Chen, PhD (w/ Tajana Rosing)
Shao Duan, Undergrad Intern
Yichao Fu, PhD
Lanxiang Hu, PhD (w/ Tajana Rosing)
Yixin Huang, Master Intern
Mingjia Huo, PhD (w/ Tajana Rosing)
Kaiqin Kong, Master Intern
Will Lin, PhD
Zihang Min, Undergrad Intern
Jinzhe Pan, Undergrad Intern
Yu-Yang Qian, Visiting PhD
Loay Rashid, Master Intern
David Su, PhD
Junli Wang, PhD (w/ Prithviraj Ammanabrolu)
Alex Zhang, Master Intern
Yuxuan Zhang, Master
Yiming Zhao, Master

Past Students

Matthew Noto, Undergrad Intern (2025) -> xAI (2026)
Wenxuan Tan, Undergrad Intern (2025) -> Bytedance (2026)
Abhilash Shankarampeta, Master (2025) -> Cisco (2026)
Rui Ge, Undergrad Intern (2025) -> UT Austin (2026)
Susan Li, Undergrad Intern (2025) -> NUS (2026)
Peiyuan Zhang, PhD (Master out) (2024) -> Bytedance (2026)
Haoyang Yu, Undergrad Intern (2024) -> Harvard University (2026)
Minghang Deng, Master (2024) -> Ant Group (2026)
Yonqqi Chen, Master (2024) -> Stealth startup (2025)
Runlong Su, Master (2024) -> Bytedance (2025)
Zheyu Fu, Master (2024) -> NVIDIA (2025)
Ashwin Ramachandran, Master (2024) -> ContextFort (co-founder) (2025)
Siqi Zhu, Undergrad Intern (2024) -> PhD @ UIUC (2025)
Anze Xie, Master (2023) -> MBZUAI IFM Lab (2025)
Hangliang Ding, Undergrad Intern (2024) -> Bytedance (2024)
Jiangfei Duan, Visiting PhD (2023) -> Alibaba Group (2024)
Runyu Lu, Undergrad Intern (2023) -> PhD @ UMich (2024)
Dacheng Li, Master (2020) -> PhD @ UC Berkeley (2023)
Hexu Zhao, Undergrad Intern (2022) -> PhD @ NYU (2023)
Yonghao Zhuang, Undergrad Intern (2021) -> PhD @ CMU (2022)

Recent Talks

03/2026Talk at Cerebras
02/2026Tutorial at Nvidia Research Radar Talk Series
01/2026Talk at Nvidia Dynamo Day
12/2025Talk at Workshop on Next Practices in Video Generation and Evaluation @ NeurIPS 2025
12/2025Talk at The First Workshop on Efficient Reasoning @ NeurIPS 2025
05/2025Talk at MBZUAI IFM Launching Event
04/2025Talk at Rugters Efficient AI Seminar
04/2025Talk at Microsoft Research Aisa ACE Talk Series
04/2025Talk at CMU 11868 LLM Systems
03/2025Talk at Bytedance AIP Spearhead Tech Talk Series
02/2025Talk at Faster LLM Inference Seminar @ Weizmann Institute of Science
11/2024Talk at UWaterloo Invited Talk
10/2024Talk at LinkedIn AI Seminar
10/2024Talk at PyTorch Webinar
09/2024Talk at Microsoft GenAI AIMS Talk
04/2024Talk at UChicago AI+System Seminar
03/2024Talk at NSF Open-Source Generative AI (OSGAI) Workshop
03/2024Talk at Essence VC Q1 Virtual Conference: LLM Inference
02/2024Talk at PKU Alumni Association of Northern California (PKUAANC)
12/2023Panel at Instruction Workshop @ NeurIPS 2023

Experience

Assistant Professor, UC San Diego, 2023 - Present
Software Engineer, Snowflake, 2023 - Present
Postdoc, UC Berkeley, 2021 - 2023
Director of Scalable Machine Learning, Petuum Inc, 2016 - 2021
Ph.D. Student, Carnegie Mellon University, 2014 - 2020 (on leave 2016 - 2020)