Hao Zhang

Hao Zhang

Assistant Professor

HDSI, CSE (affiliate)

UC San Diego

Email: haozhang AT ucsd.edu

I am an Assistant Professor at Halıcıoğlu Data Science Institute and Department of Computer Science and Engineering (affiliate) at UC San Diego. I lead the Hao AI Lab at UCSD. I am a Sloan Research Fellow (2026), nominated as MIT TR35 (China) (2025), and a recipient of the Google ML and Systems Award (2025) and OSDI Best Paper Award (2021). My work on FastVideo, DistServe, vLLM, and LMArena has reached millions of users. Here is an extended Bio.

Prospective students and postdocs: I am recruiting new PhD students and postdocs. We also have openings for MS/undergrad research interns. Please check out this page to see how to get involved.

Research

I study the intersection area of machine learning and systems. I am equally interested in designing strong, efficient, and secure machine learning models and algorithms, and in building scalable, practical distributed systems that can support real-world machine learning workloads.

Our Lab (@haoailab) develops open models, algorithms, and systems to democratize the access of large models.

Current Projects

Some of my research have been actively developed and maintained as open source software:

  • Dreamverse: Vibe-directing 1080p/30s videos in real time.
  • FastVideo: A lightweight framework for accelerating large video diffusion models.
  • LMGame: Evaluate and improve AI by repurposing computer games.
  • Lookahead Decoding: A parallel LLM decoding method that trades FLOPs for fewer decoding steps.
  • vLLM: A high-throughput and memory-efficient inference engine for LLMs.
  • Ray Collective: CPU/GPU collective communication primitives on Ray.

Some previous projects:

  • FastChat: An open platform for training, serving, and evaluating Large Language Models.
  • Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings.
  • Vicuna: A series of popular open-source LLM chatbots available in 7B/13B/33B sizes.
  • Alpa: Training large-scale neural networks with auto parallelization. Scales to 1000+ GPUs.
  • AutoDist: Automatic data-parallel training on TensorFlow.
  • DyNet: The Dynamic Neural Network Toolkit.
  • Poseidon: Parameter server on distributed GPUs.

Students and Postdocs

Current Members

Past Students

Recent Talks

Experience

  • Assistant Professor, UC San Diego, 2023 - Present
  • Software Engineer, Snowflake, 2023 - Present
  • Postdoc, UC Berkeley, 2021 - 2023
  • Director of Scalable Machine Learning, Petuum Inc, 2016 - 2021
  • Ph.D. Student, Carnegie Mellon University, 2014 - 2020 (on leave 2016 - 2020)