Hengshuang Zhao


Assistant Professor
Rm424, Chow Yei Ching Building
Department of Computer Science
The University of Hong Kong
Email: hszhao[at]cs.hku.hk
   /      /      /      /  

I am an Assistant Professor at the Department of Computer Science of The University of Hong Kong. Previously, I have spent wonderful times as a Postdoctoral Researcher at Computer Science and Artificial Intelligence Laboratory (CSAIL) of MIT, supervised by Prof. Antonio Torralba, at Torr Vision Group of University of Oxford (beautiful Oxford), supervised by Prof. Philip Torr. I obtained my Ph.D. degree from The Chinese University of Hong Kong, supervised by Prof. Jiaya Jia, and my bachelor's degree from Huazhong University of Science and Technology. During Ph.D., I have spent wonderful times as a Research Intern, working with Dr. Xiaohui Shen, Dr. Zhe Lin, Dr. Kalyan Sunkavalli, Dr. Brian Price at Adobe (San Jose), Prof. Raquel Urtasun at Uber (Toronto), and Dr. Vladlen Koltun at Intel (Santa Clara).

My general research interests cover the broad area of computer vision, machine learning and artificial intelligence, with special emphasis on building intelligent visual systems. My research goal is to utilize artificial intelligence techniques to make machines perceive, understand, imagine, and interact with the surrounding environment, and ultimately make high positive impacts on various fields. Our current research interests and focus include: 1. visual scene understanding, perception, reconstruction, representation learning, multimodal learning; 2. generative modeling, visual content creation, generation, and manipulation (image/video/3d); 3. autonomous driving, embodied ai, robot learning, LLM applications etc.

Prospective students: I am looking for self-motivated Ph.D. students, postdoctoral reseachers, research assistants, and visiting scholars, working together on exciting and cutting-edge computer vision, machine learning and artificial intelligence projects. If you are interested in working with me, please drop me an email with your resume. Available Ph.D. scholarships and opportunities include Hong Kong PhD Fellowship Scheme (HKPFS), HKU Presidential PhD Scholar Programme (HKUPS), and Postgraduate Scholarships (PGS).

News

Selected Publications

Google Scholar and Full List.

NeurIPS
Depth Anything V2
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2024.
[Project] [Paper] [Code] [Demo] [Media]

NeurIPS
Zero-shot Image Editing with Reference Imitation
Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2024.
[Project] [Paper] [Code] [Demo] [Media]

NeurIPS
LiT: Unifying LiDAR "Languages" with LiDAR Translator
Yixing Lao, Tao Tang, Xiaoyang Wu, Peng Chen, Kaicheng Yu, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2024.
[Project] [Paper] [Code]

NeurIPS
SyncVIS: Synchronized Video Instance Segmentation
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2024.
[Project] [Paper] [Code]

NeurIPS
One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Zhenyu Wang, Yali Li, Hengshuang Zhao, Shengjin Wang. (†: corresponding)
Neural Information Processing Systems (NeurIPS), 2024.
[Project] [Paper] [Code]

NeurIPS
LION: Linear Group RNN for 3D Object Detection in Point Clouds
Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang, Hengshuang Zhao, Xiang Bai.
Neural Information Processing Systems (NeurIPS), 2024.
[Project] [Paper] [Code]

arXiv
LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
Zhuoling Li, Xiaogang Xu, Zhenhua Xu, Ser-Nam Lim, Hengshuang Zhao.
arXiv, 2024.
[Project] [Paper] [Demo] [Video]

ECCV
LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao.
European Conference on Computer Vision (ECCV), 2024.
[Project] [Paper] [Code] [Video]

ECCV
Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, Hengshuang Zhao.
European Conference on Computer Vision (ECCV), 2024.
[Project] [Paper] [Code]

ECCV
InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
Zhenhua Xu, Kwan-Yee. K. Wong, Hengshuang Zhao.
European Conference on Computer Vision (ECCV), 2024.
[Project] [Paper] [Code] [Video]

ECCV
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
Zhenyu Wang, Yali Li, Taichi Liu, Hengshuang Zhao, Shengjin Wang. (†: corresponding)
European Conference on Computer Vision (ECCV), 2024.
[Paper]

ECCV
LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
Mingkang Zhu, Xi Chen, Zhongdao Wang, Hengshuang Zhao, Jiaya Jia.
European Conference on Computer Vision (ECCV), 2024.
[Project] [Paper] [Code]

ECCV
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang, Xiaoyang Wu, Xi Chen, Hengshuang Zhao, Lei Zhu, Joan Lasenby. (†: corresponding)
European Conference on Computer Vision (ECCV), 2024.
[Project] [Paper] [Code] [Video]

ECCV
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao, Xiu Li, Jiaya Jia.
European Conference on Computer Vision (ECCV), 2024.
[Paper] [Code]

TPAMI
UniDetector: Towards Universal Object Detection with Heterogeneous Supervision
Zhenyu Wang, Yali Li, Xi Chen, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao, Shengjin Wang. (†: corresponding)
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024.
[Paper]

RA-L
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao.
IEEE Robotics and Automation Letters (RA-L), 2024.
[Project] [Paper] [Code] [Video]

RA-L
GroupLane: End-to-End 3D Lane Detection With Channel-Wise Grouping
Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Xiangyu Zhang, Hengshuang Zhao.
IEEE Robotics and Automation Letters (RA-L), 2024.
[Paper] [Code]

CVPR
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao.
Computer Vision and Pattern Recognition (CVPR), 2024.
[Project] [Paper] [Code] [Demo] [Media]

CVPR
AnyDoor: Zero-shot Object-level Image Customization
Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao.
Computer Vision and Pattern Recognition (CVPR), 2024.
[Project] [Paper] [Code] [Demo] [Media]

CVPR Oral
Point Transformer V3: Simpler, Faster, Stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao.
Computer Vision and Pattern Recognition (CVPR), 2024. Oral
Ranked 1st place in the CVPR 2024 Waymo 3D Semantic Segmentation Challenge.
[Paper] [Code]

CVPR
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, Hengshuang Zhao.
Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper] [Code]

CVPR Highlight
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin, Hengshuang Zhao.
Computer Vision and Pattern Recognition (CVPR), 2024. Highlight
[Project] [Paper] [Code]

CVPR Highlight
UniMODE: Universal Monocular 3D Object Detection
Zhuoling Li, Xiaogang Xu, Ser-Nam Lim, Hengshuang Zhao.
Computer Vision and Pattern Recognition (CVPR), 2024. Highlight
[Paper]

CVPR
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao, Jiaya Jia. (†: corresponding)
Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper] [Code]

CVPR
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia.
Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper] [Code]

CVPR
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu.
Computer Vision and Pattern Recognition (CVPR), 2024.
[Project] [Paper] [Code]

CVPR
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng, Hengshuang Zhao, Shuguang Cui, Zhen Li.
Computer Vision and Pattern Recognition (CVPR), 2024.
[Project] [Paper] [Code] [Video]

CVPR
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang.
Computer Vision and Pattern Recognition (CVPR), 2024.
[Paper] [Code]

ICLR Highlight
Influencer Backdoor Attack on Semantic Segmentation
Haoheng Lan, Jindong Gu, Philip Torr, Hengshuang Zhao.
International Conference on Learning Representations (ICLR), 2024. Highlight
[Paper] [Code]

NeurIPS
FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2023.
[Paper] [Code]

NeurIPS
Uni3DETR: Unified 3D Detection Transformer
Zhenyu Wang, Yali Li, Xi Chen, Hengshuang Zhao, Shengjin Wang. (†: corresponding)
Neural Information Processing Systems (NeurIPS), 2023.
[Paper] [Code]

NeurIPS
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2023.
[Paper] [Code]

NeurIPS
CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
Yixing Lao, Xiaogang Xu, Zhipeng Cai, Xihui Liu, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2023.
[Project] [Paper] [Code]

ICCV
Open-vocabulary Panoptic Segmentation with Embedding Modulation
Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao.
International Conference on Computer Vision (ICCV), 2023.
[Project] [Paper] [Code]

ICCV
Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning
Lihe Yang, Zhen Zhao, Lei Qi, Yu Qiao, Yinghuan Shi, Hengshuang Zhao.
International Conference on Computer Vision (ICCV), 2023.
[Paper] [Code]

ICCVW
SAM3D: Segment Anything in 3D Scenes
Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu.
International Conference on Computer Vision Workshop (ICCVW), 2023.
[Paper] [Code]

CVPR
Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
Xiaoyang Wu, Xin Wen, Xihui Liu, Hengshuang Zhao.
Computer Vision and Pattern Recognition (CVPR), 2023.
[Paper] [Code]

CVPR
Detecting Everything in the Open World: Towards Universal Object Detection
Zhenyu Wang, Yali Li, Xi Chen, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao, Shengjin Wang. (†: corresponding)
Computer Vision and Pattern Recognition (CVPR), 2023.
[Paper] [Code]

NeurIPS
Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2022.
[Paper] [Code]

ECCV
MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning
Xiaogang Xu*, Hengshuang Zhao*, Vibhav Vineet, Ser-Nam Lim, Antonio Torralba. (*: equal contribution)
European Conference on Computer Vision (ECCV), 2022.
[Paper] [Code]

CVPR
FocalClick: Towards Practical Interactive Image Segmentation
Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang Zhao.
Computer Vision and Pattern Recognition (CVPR), 2022.
[Paper] [Code]

TPAMI
Fully Convolutional Networks for Panoptic Segmentation with Point-based Supervision
Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Yukang Chen, Lu Qi, Liwei Wang, Zeming Li, Jian Sun, Jiaya Jia.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
[Paper] [Code]

TPAMI
Open World Entity Segmentation
Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Philip Torr, Zhe Lin, Jiaya Jia.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
[Project] [Paper] [Code]

NeurIPS
Do Different Tracking Tasks Require Different Appearance Models?
Zhongdao Wang, Hengshuang Zhao, Yali Li, Shengjin Wang, Philip Torr, Luca Bertinetto.
Neural Information Processing Systems (NeurIPS), 2021.
[Project] [Paper] [Code]

ICCV Oral
Point Transformer
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun.
International Conference on Computer Vision (ICCV), 2021. Oral
[Paper] [Code]

CVPR Oral
Bidirectional Projection Network for Cross Dimension Scene Understanding
Wenbo Hu*, Hengshuang Zhao*, Li Jiang, Jiaya Jia, Tien-Tsin Wong. (*: equal contribution)
Computer Vision and Pattern Recognition (CVPR), 2021. Oral
[Project] [Paper] [Code]

CVPR Oral
Fully Convolutional Networks for Panoptic Segmentation
Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Liwei Wang, Zeming Li, Jian Sun, Jiaya Jia.
Computer Vision and Pattern Recognition (CVPR), 2021. Oral
[Paper] [Code]

CVPR
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip Torr, Li Zhang.
Computer Vision and Pattern Recognition (CVPR), 2021.
[Project] [Paper] [Code]

CVPR
Exploring Self-attention for Image Recognition
Hengshuang Zhao, Jiaya Jia, Vladlen Koltun.
Computer Vision and Pattern Recognition (CVPR), 2020.
[Paper] [Code]

CVPR Oral
PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
Li Jiang*, Hengshuang Zhao*, Shaoshuai Shi, Shu Liu, Chi-Wing Fu, Jiaya Jia. (*: equal contribution)
Computer Vision and Pattern Recognition (CVPR), 2020. Oral
[Paper] [Code]

CVPR
PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing
Hengshuang Zhao*, Li Jiang*, Chi-Wing Fu, and Jiaya Jia. (*: equal contribution)
Computer Vision and Pattern Recognition (CVPR), 2019.
[Paper] [Code] [Video]

CVPR Oral
UPSNet: A Unified Panoptic Segmentation Network
Yuwen Xiong*, Renjie Liao*, Hengshuang Zhao*, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun. (*: equal contribution)
Computer Vision and Pattern Recognition (CVPR), 2019. Oral
[Paper] [Code]

ECCV
PSANet: Point-wise Spatial Attention Network for Scene Parsing
Hengshuang Zhao*, Yi Zhang*, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, Jiaya Jia. (*: equal contribution)
European Conference on Computer Vision (ECCV), 2018.
Ranked 1st place in the CVPR 2018 WAD Drivable Area Segmentation Challenge.
[Project] [Paper] [Caffe] [PyTorch] [Video] [Supp] [Slides in WAD @ CVPR 2018]

ECCV
SegStereo: Exploiting Semantic Information for Disparity Estimation
Guorun Yang*, Hengshuang Zhao*, Jianping Shi, Zhidong Deng, Jiaya Jia. (*: equal contribution)
European Conference on Computer Vision (ECCV), 2018.
[Project] [Paper] [Code] [Video] [Supp]

ECCV
ICNet for Real-Time Semantic Segmentation on High-Resolution Images
Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia.
European Conference on Computer Vision (ECCV), 2018.
[Project] [Paper] [Code] [Video] [Supp]

CVPR
Pyramid Scene Parsing Network
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia.
Computer Vision and Pattern Recognition (CVPR), 2017.
Ranked 1st place in the ECCV 2016 ImageNet Scene Parsing Challenge.
Ranked 1st place in the CVPR 2017 LSUN Semantic Segmentation Challenge.
[Project] [Paper] [Caffe] [PyTorch] [Video] [Slides in ILSVRC2016@ECCV2016]