Publications
Google Scholar and Bib.
PlayerOne: Egocentric World Simulator
Yuanpeng Tu, Hao Luo, Xi Chen, Xiang Bai, Fan Wang,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2025.
Oral
[
Project]
[
Paper]
[
Code]
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Yujia Zhang, Xiaoyang Wu, Yixing Lao, Chengyao Wang, Zhuotao Tian, Naiyan Wang, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2025.
[Project]
[Paper]
[Code]
NeurIPS

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Xi Chen, Mingkang Zhu, Shaoteng Liu, Xiaoyang Wu, Xiaogang Xu, Yu Liu, Xiang Bai,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2025.
[Project]
[
Paper]
[Code]
NeurIPS

STAR-R1: Improving Video Perception via Spatio-Temporal Aggregated Reinforcement
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2025.
[Project]
[Paper]
[Code]
NeurIPS

Seg-VAR: Image Segmentation with Visual Autoregressive Modeling
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Hengshuang Zhao.
Neural Information Processing Systems (NeurIPS), 2025.
[Project]
[Paper]
[Code]
NeurIPS

ROSE: Remove Objects with Side Effects in Videos
Chenxuan Miao, Yutong Feng, Jianshu Zeng, Zixiang Gao, Hantang Liu, Yunfeng Yan, Donglian Qi, Xi Chen, Bin Wang,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2025.
[
Project]
[
Paper]
[
Code]
NeurIPS

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Senqiao Yang, Junyi Li, Xin Lai, Jinming Wu, Wei Li, Zejun Ma, Bei Yu,
Hengshuang Zhao, Jiaya Jia.
Neural Information Processing Systems (
NeurIPS), 2025.
[
Project]
[
Paper]
[
Code]
NeurIPS Spotlight

Orient Anything V2: Unifying Orientation and Rotation Understanding
Zehan Wang, Ziang Zhang, Jiayang Xu, Jialei Wang, Tianyu Pang, Chao Du, Hengshuang Zhao, Zhou Zhao.
Neural Information Processing Systems (NeurIPS), 2025. Spotlight
[Project]
[Paper]
[Code]
LiteReality: Graphic-Ready 3D Scene Reconstruction from RGB-D Scans
Zhening Huang, Xiaoyang Wu, Fangcheng Zhong,
Hengshuang Zhao, Matthias Nießner, Joan Lasenby.
Neural Information Processing Systems (
NeurIPS), 2025.
[
Project]
[
Paper]
[
Code]
Mover: Motion-controllable Video Generation via Latent Trajectory Guidance
Ruihang Chu, Yefei He, Zhekai Chen, Shiwei Zhang, Xiaogang Xu, Bin Xia, Dingdong Wang, Hongwei Yi, Xihui Liu, Hengshuang Zhao, Yu Liu, Yingya Zhang, Yujiu Yang.
Neural Information Processing Systems (NeurIPS), 2025.
[Project]
[Paper]
[Code]
NeurIPSDB

GenSpace: Benchmarking Spatially-Aware Image Generation
Zehan Wang, Jiayang Xu, Ziang Zhang, Tianyu Pang, Chao Du,
Hengshuang Zhao, Zhou Zhao.
Neural Information Processing Systems (
NeurIPS) Datasets and Benchmarks Track, 2025.
[Project]
[
Paper]
[Code]
DiffCamera: Arbitrary Refocusing on Images
Yiyang Wang, Xi Chen, Xiaogang Xu, Yu Liu, Hengshuang Zhao.
SIGGRAPH Asia, 2025.
[Project]
[Paper]
[Code]
EMNLP

Enhancing LLM Knowledge Learning through Generalization
Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu,
Hengshuang Zhao, Jiaya Jia.
Empirical Methods in Natural Language Processing (
EMNLP), 2025.
[Project]
[
Paper]
[Code]
StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth
Zheng Zhang, Lihe Yang, Tianyu Yang, Chaohui Yu, Xiaoyang Guo, Yixing Lao,
Hengshuang Zhao.
International Conference on Computer Vision (
ICCV), 2025.
Highlight
[
Project]
[Paper]
[Code]
ICCV

DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Yiyang Wang, Xi Chen, Xiaogang Xu, Sihui Ji, Yu Liu, Yujun Shen,
Hengshuang Zhao.
International Conference on Computer Vision (
ICCV), 2025.
[
Project]
[
Paper]
[
Code]
ICCV

ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao,
Hengshuang Zhao.
International Conference on Computer Vision (
ICCV), 2025.
[
Paper]
[
Code]
ICCV

DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
Jiahe Zhao, Rongkun Zheng, Yi Wang, Helin Wang,
Hengshuang Zhao.
International Conference on Computer Vision (
ICCV), 2025.
[
Paper]
[
Code]
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Xin Zhou, Dingkang Liang, Sifan Tu, Xiwu Chen, Yikang Ding, Dingyuan Zhang, Feiyang Tan,
Hengshuang Zhao, Xiang Bai.
International Conference on Computer Vision (
ICCV), 2025.
[
Project]
[
Paper]
[
Code]
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Yuanpeng Tu, Hao Luo, Xi Chen, Sihui Ji, Xiang Bai,
Hengshuang Zhao.
SIGGRAPH, 2025.
[
Project]
[
Paper]
[
Code]
LayerFlow: A Unified Model for Layer-aware Video Generation
Sihui Ji, Hao Luo, Xi Chen, Yuanpeng Tu, Yiyang Wang,
Hengshuang Zhao.
SIGGRAPH, 2025.
[
Project]
[
Paper]
[
Code]
SIGGRAPH

FashionComposer: Compositional Fashion Image Generation
Sihui Ji, Yiyang Wang, Xi Chen, Xiaogang Xu, Hao Luo,
Hengshuang Zhao.
SIGGRAPH, 2025.
[
Project]
[
Paper]
[
Code]
SIGGRAPH

DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data
Yuanpeng Tu, Xi Chen, Ser-Nam Lim,
Hengshuang Zhao.
SIGGRAPH, 2025.
[
Project]
[
Paper]
[Code]
ICML

VIP: Vision Instructed Pre-training for Robotic Manipulation
Zhuoling Li, Liangliang Ren, Jinrong Yang, Yong Zhao, Xiaoyang Wu, Zhenhua Xu, Xiang Bai,
Hengshuang Zhao.
International Conference on Machine Learning (
ICML), 2025.
[
Project]
[
Paper]
[
Code]
LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
Zhuoling Li, Xiaogang Xu, Zhenhua Xu, Ser-Nam Lim,
Hengshuang Zhao.
International Conference on Machine Learning (
ICML), 2025.
[
Project]
[
Paper]
[
Demo]
[
Video]
ICML

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
Rui Yang, Lin Song, Yicheng Xiao, Runhui Huang, Yixiao Ge, Ying Shan,
Hengshuang Zhao.
International Conference on Machine Learning (
ICML), 2025.
[
Paper]
[
Code]
ICML

BOOD: Boundary-based Out-Of-Distribution Data Generation
Qilin Liao, Shuo Yang, Bo Zhao, Ping Luo, Hengshuang Zhao.
International Conference on Machine Learning (ICML), 2025.
[Paper]
[Code]
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
Zehan Wang, Ziang Zhang, Tianyu Pang, Chao Du,
Hengshuang Zhao, Zhou Zhao.
International Conference on Machine Learning (
ICML), 2025.
[
Project]
[
Paper]
[
Code]
ICML

TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, Hengshuang Zhao, Jiaya Jia.
International Conference on Machine Learning (ICML), 2025.
[Paper]
[Code]
CVPR Highlight

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin,
Hengshuang Zhao.
Computer Vision and Pattern Recognition (
CVPR), 2025.
Highlight
[
Project]
[
Paper]
[Code]
CVPR Highlight

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
Zhenhua Xu, Yan Bai, Yujia Zhang, Zhuoling Li, Fei Xia, Kwan-Yee K. Wong, Jianqiang Wang, Hengshuang Zhao.
Computer Vision and Pattern Recognition (CVPR), 2025. Highlight
[Project]
[Paper]
[Code]
CVPR Highlight

Sonata: Self-Supervised Learning of Reliable Point Representations
Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard Newcombe,
Hengshuang Zhao, Julian Straub.
Computer Vision and Pattern Recognition (
CVPR), 2025.
Highlight
[
Project]
[
Paper]
[
Code]
CVPR

SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language
Zehan Wang, Sashuai zhou, Shaoxuan He, Haifeng Huang, Lihe Yang, Ziang Zhang, Xize Cheng, Shengpeng Ji, Tao Jin, Hengshuang Zhao, Zhou Zhao.
Computer Vision and Pattern Recognition (CVPR), 2025.
[Project]
[Paper]
[Code]
CVPR

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation
Zidong Cao, Jinjing Zhu, Weiming Zhang, Hao Ai, Haotian Bai,
Hengshuang Zhao, Lin Wang.
Computer Vision and Pattern Recognition (
CVPR), 2025.
[
Project]
[
Paper]
[
Code]
CVPR

Empowering Large Language Models with 3D Situation Awareness
Zhihao Yuan, Yibo Peng, Jinke Ren, Yinghong Liao, Yatong Han, Chun-Mei Feng,
Hengshuang Zhao, Guanbin Li, Shuguang Cui, Zhen Li.
Computer Vision and Pattern Recognition (
CVPR), 2025.
[
Paper]
CVPR

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu,
Hengshuang Zhao, Hang Xu, Lu Hou, Wei Zhang, Xiaodan Liang.
Computer Vision and Pattern Recognition (
CVPR), 2025.
[
Paper]
CVPR

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok,
Hengshuang Zhao, Xiaodan Liang, Dit-Yan Yeung, Xiao Chen, Zhenguo Li, Wei Zhang, Qun Liu, Jun Yao, Lanqing Hong, Lu Hou, Hang Xu.
Computer Vision and Pattern Recognition (
CVPR), 2025.
[
Project]
[
Paper]
[
Code]
ICLR

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Zehan Wang, Ziang Zhang, Minjie Hong, Hang Zhang, Luping Liu, Rongjie Huang, Xize Cheng, Shengpeng Ji, Tao Jin,
Hengshuang Zhao, Zhou Zhao.
International Conference on Machine Learning (
ICLR), 2025.
[
Project]
[
Paper]
[
Code]
TPAMI

GPT4Point++: Advancing Unified Point-Language Understanding and Generation
Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin,
Hengshuang Zhao.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2025.
[
Project]
[
Paper]
[
Code]
TPAMI

AnyDoor: Zero-shot Image Customization with Region-to-region Reference
Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao,
Hengshuang Zhao.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2025.
[
Project]
[
Paper]
TPAMI

Towards Unified 3D Object Detection via Algorithm and Data Unification
Zhuoling Li, Xiaogang Xu, Ser-Nam Lim,
Hengshuang Zhao.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2025.
[
Project]
[
Paper]
TPAMI

UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation
Lihe Yang, Zhen Zhao,
Hengshuang Zhao.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2025.
[
Paper]
[
Code]
TPAMI

DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation
Yunhan Yang, Shuo Chen, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Edmund Y. Lam,
Hengshuang Zhao, Tong He, Xihui Liu.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2025.
[
Paper]
TPAMI

PonderV2: Improved 3D Representation with A Universal Pre-training Paradigm
Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He,
Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2025.
[
Paper]
[
Code]
Depth Anything V2
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2024.
[
Project]
[
Paper]
[
Code]
[
Demo]
[
Media]
NeurIPS

Zero-shot Image Editing with Reference Imitation
Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2024.
[
Project]
[
Paper]
[
Code]
[
Demo]
[
Media]
NeurIPS

LiT: Unifying LiDAR "Languages" with LiDAR Translator
Yixing Lao, Tao Tang, Xiaoyang Wu, Peng Chen, Kaicheng Yu,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2024.
[
Project]
[
Paper]
[
Code]
NeurIPS

SyncVIS: Synchronized Video Instance Segmentation
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2024.
[
Paper]
[
Code]
NeurIPS

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
Zhenyu Wang, Yali Li,
Hengshuang Zhao†, Shengjin Wang. (†: corresponding)
Neural Information Processing Systems (
NeurIPS), 2024.
[
Paper]
NeurIPS

LION: Linear Group RNN for 3D Object Detection in Point Clouds
Zhe Liu, Jinghua Hou, Xinyu Wang, Xiaoqing Ye, Jingdong Wang,
Hengshuang Zhao, Xiang Bai.
Neural Information Processing Systems (
NeurIPS), 2024.
[
Project]
[
Paper]
[
Code]
ECCV

LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen,
Hengshuang Zhao.
European Conference on Computer Vision (
ECCV), 2024.
[
Project]
[
Paper]
[
Code]
[
Video]
Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He,
Hengshuang Zhao.
European Conference on Computer Vision (
ECCV), 2024.
[
Project]
[
Paper]
[
Code]
ECCV

InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
Zhenhua Xu, Kwan-Yee. K. Wong,
Hengshuang Zhao.
European Conference on Computer Vision (
ECCV), 2024.
[
Project]
[
Paper]
[
Code]
[
Video]
ECCV

OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
Zhenyu Wang, Yali Li, Taichi Liu,
Hengshuang Zhao†, Shengjin Wang. (†: corresponding)
European Conference on Computer Vision (
ECCV), 2024.
[
Paper]
ECCV

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
Mingkang Zhu, Xi Chen, Zhongdao Wang,
Hengshuang Zhao, Jiaya Jia.
European Conference on Computer Vision (
ECCV), 2024.
[
Project]
[
Paper]
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang, Xiaoyang Wu, Xi Chen,
Hengshuang Zhao†, Lei Zhu, Joan Lasenby. (†: corresponding)
European Conference on Computer Vision (
ECCV), 2024.
[
Project]
[
Paper]
[
Code]
[
Video]
ECCV

Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou,
Hengshuang Zhao, Xiu Li, Jiaya Jia.
European Conference on Computer Vision (
ECCV), 2024.
[
Paper]
[
Code]
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng,
Hengshuang Zhao.
Computer Vision and Pattern Recognition (
CVPR), 2024.
[
Project]
[
Paper]
[
Code]
[
Demo]
[
Media]
CVPR

AnyDoor: Zero-shot Object-level Image Customization
Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao,
Hengshuang Zhao.
Computer Vision and Pattern Recognition (
CVPR), 2024.
[
Project]
[
Paper]
[
Code]
[
Demo]
[
Media]
CVPR Oral

Point Transformer V3: Simpler, Faster, Stronger
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He,
Hengshuang Zhao.
Computer Vision and Pattern Recognition (
CVPR), 2024.
Oral
Ranked 1st place in the CVPR 2024
Waymo 3D Semantic Segmentation Challenge.
[
Paper]
[
Code]
CVPR

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu,
Hengshuang Zhao.
Computer Vision and Pattern Recognition (
CVPR), 2024.
[
Paper]
[
Code]
CVPR Highlight

GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin,
Hengshuang Zhao.
Computer Vision and Pattern Recognition (
CVPR), 2024.
Highlight
[
Project]
[
Paper]
[
Code]
CVPR Highlight

UniMODE: Universal Monocular 3D Object Detection
Zhuoling Li, Xiaogang Xu, Ser-Nam Lim,
Hengshuang Zhao.
Computer Vision and Pattern Recognition (
CVPR), 2024.
Highlight
[
Paper]
CVPR

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng,
Hengshuang Zhao†, Jiaya Jia. (†: corresponding)
Computer Vision and Pattern Recognition (
CVPR), 2024.
[
Paper]
[
Code]
CVPR

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen,
Hengshuang Zhao, Zhuotao Tian, Jiaya Jia.
Computer Vision and Pattern Recognition (
CVPR), 2024.
[
Paper]
[
Code]
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang,
Hengshuang Zhao, Tong He, Xihui Liu.
Computer Vision and Pattern Recognition (
CVPR), 2024.
[
Project]
[
Paper]
[
Code]
CVPR

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng,
Hengshuang Zhao, Shuguang Cui, Zhen Li.
Computer Vision and Pattern Recognition (
CVPR), 2024.
[
Project]
[
Paper]
[
Code]
[
Video]
CVPR

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang,
Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang.
Computer Vision and Pattern Recognition (
CVPR), 2024.
[
Paper]
[
Code]
ICLR Highlight

Influencer Backdoor Attack on Semantic Segmentation
Haoheng Lan, Jindong Gu, Philip Torr,
Hengshuang Zhao.
International Conference on Learning Representations (
ICLR), 2024.
Highlight
[
Paper]
[
Code]
3DV

OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection
Zhangyang Qi, Jiaqi Wang, Xiaoyang Wu,
Hengshuang Zhao.
International Conference on 3D Vision (
3DV), 2024.
[
Paper]
TPAMI

Language-Aware Vision Transformer for Referring Segmentation
Zhao Yang, Jiaqi Wang, Xubing Ye, Yansong Tang, Kai Chen,
Hengshuang Zhao, Philip Torr.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2024.
[
Paper]
TPAMI

UniDetector: Towards Universal Object Detection with Heterogeneous Supervision
Zhenyu Wang, Yali Li, Xi Chen, Ser-Nam Lim, Antonio Torralba,
Hengshuang Zhao†, Shengjin Wang. (†: corresponding)
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2024.
[
Paper]
RA-L

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li,
Hengshuang Zhao.
IEEE Robotics and Automation Letters (
RA-L), 2024.
[
Project]
[
Paper]
[
Code]
[
Video]
RA-L

GroupLane: End-to-End 3D Lane Detection With Channel-Wise Grouping
Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Xiangyu Zhang,
Hengshuang Zhao.
IEEE Robotics and Automation Letters (
RA-L), 2024.
[
Paper]
NeurIPS

FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2023.
[
Paper]
[
Code]
NeurIPS

Uni3DETR: Unified 3D Detection Transformer
Zhenyu Wang, Yali Li, Xi Chen,
Hengshuang Zhao†, Shengjin Wang. (†: corresponding)
Neural Information Processing Systems (
NeurIPS), 2023.
[
Paper]
[
Code]
NeurIPS

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2023.
[
Paper]
[
Code]
CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
Yixing Lao, Xiaogang Xu, Zhipeng Cai, Xihui Liu,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2023.
[
Project]
[
Paper]
[
Code]
ICCV

Open-vocabulary Panoptic Segmentation with Embedding Modulation
Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba,
Hengshuang Zhao.
International Conference on Computer Vision (
ICCV), 2023.
[
Project]
[
Paper]
[
Code]
ICCV

Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning
Lihe Yang, Zhen Zhao, Lei Qi, Yu Qiao, Yinghuan Shi,
Hengshuang Zhao.
International Conference on Computer Vision (
ICCV), 2023.
[
Paper]
[
Code]
ICCV

BT2: Backward-compatible Training with Basis Transformation
Yifei Zhou, Zilu Li, Abhinav Shrivastava,
Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim.
International Conference on Computer Vision (
ICCV), 2023.
[
Paper]
[
Code]
ICCVW

SAM3D: Segment Anything in 3D Scenes
Yunhan Yang, Xiaoyang Wu, Tong He,
Hengshuang Zhao, Xihui Liu.
International Conference on Computer Vision Workshop (
ICCVW), 2023.
[
Paper]
[
Code]
CVPR

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
Xiaoyang Wu, Xin Wen, Xihui Liu,
Hengshuang Zhao.
Computer Vision and Pattern Recognition (
CVPR), 2023.
[
Paper]
[
Code]
CVPR

Detecting Everything in the Open World: Towards Universal Object Detection
Zhenyu Wang, Yali Li, Xi Chen, Ser-Nam Lim, Antonio Torralba,
Hengshuang Zhao†, Shengjin Wang. (†: corresponding)
Computer Vision and Pattern Recognition (
CVPR), 2023.
[
Paper]
[
Code]
CVPR

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen,
Hengshuang Zhao, Erik Learned-Miller, Chuang Gan.
Computer Vision and Pattern Recognition (
CVPR), 2023.
[
Proj]
[
Paper]
[
Code]
AAAI

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen,
Hengshuang Zhao, Philip Torr.
AAAI Conference on Artificial Intelligence (
AAAI), 2023.
[
Paper]
IJCAI

Universal Adaptive Data Augmentation
Xiaogang Xu,
Hengshuang Zhao.
International Joint Conferences on Artificial Intelligence (
IJCAI), 2023.
[
Paper]
[
Code]
IJCV

PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer
Zitong Yu, Yuming Shen, Jingang Shi,
Hengshuang Zhao, Yawen Cui, Jiehua Zhang, Philip Torr, Guoying Zhao.
International Journal of Computer Vision (
IJCV), 2023.
[
Paper]
[
Code]
NeurIPS

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu,
Hengshuang Zhao.
Neural Information Processing Systems (
NeurIPS), 2022.
[
Paper]
[
Code]
ECCV

MTFormer: Multi-Task Learning via Transformer and Cross-Task Reasoning
Xiaogang Xu*,
Hengshuang Zhao*, Vibhav Vineet, Ser-Nam Lim, Antonio Torralba. (*: equal contribution)
European Conference on Computer Vision (
ECCV), 2022.
[
Paper]
[
Code]
ECCV

SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness
Jindong Gu,
Hengshuang Zhao, Volker Tresp, Philip Torr.
European Conference on Computer Vision (
ECCV), 2022.
[
Paper]
ECCV

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
Xin Lai, Zhuotao Tian, Xiaogang Xu, Yingcong Chen, Shu Liu,
Hengshuang Zhao, Liwei wang, Jiaya Jia.
European Conference on Computer Vision (
ECCV), 2022.
[
Paper]
[
Code]
RSSW

Towards Visual Social Navigation in Photo-realistic Indoor Scenes
Feng Gao,
Hengshuang Zhao, Yu Wang.
Robotics: Science and Systems (
RSS) Workshop, 2022.
[
Paper]
CVPR

FocalClick: Towards Practical Interactive Image Segmentation
Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi,
Hengshuang Zhao.
Computer Vision and Pattern Recognition (
CVPR), 2022.
[
Paper]
[
Code]
CVPR

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen,
Hengshuang Zhao, Philip Torr.
Computer Vision and Pattern Recognition (
CVPR), 2022.
[
Paper]
[
Code]
CVPR

Generalized Few-shot Semantic Segmentation
Zhuotao Tian, Xin Lai, Li Jiang, Michelle Shu,
Hengshuang Zhao, Jiaya Jia.
Computer Vision and Pattern Recognition (
CVPR), 2022.
[
Paper]
[
Code]
CVPR

PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu, Yuming Shen, Jingang Shi,
Hengshuang Zhao, Philip Torr, Guoying Zhao.
Computer Vision and Pattern Recognition (
CVPR), 2022.
[
Paper]
[
Code]
CVPR

Stratified Transformer for 3D Point Cloud Segmentation
Xin Lai, Jianhui Liu, Li Jiang, Liwei Wang,
Hengshuang Zhao, Shu Liu, Xiaojuan Qi, Jiaya Jia.
Computer Vision and Pattern Recognition (
CVPR), 2022.
[
Paper]
[
Code]
ICRA

Prototype-Voxel Contrastive Learning for LiDAR Point Cloud Panoptic Segmentation
Minzhe Liu, Zhou Qiang,
Hengshuang Zhao, Jianing Li, Yuan Du, Kurt Keutzer, Li Du, Shanghang Zhang.
International Conference on Robotics and Automation (
ICRA), 2022.
[
Paper]
TPAMI

Fully Convolutional Networks for Panoptic Segmentation with Point-based Supervision
Yanwei Li,
Hengshuang Zhao, Xiaojuan Qi, Yukang Chen, Lu Qi, Liwei Wang, Zeming Li, Jian Sun, Jiaya Jia.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2022.
[
Paper]
[
Code]
TPAMI

Open World Entity Segmentation
Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu,
Hengshuang Zhao, Philip Torr, Zhe Lin, Jiaya Jia.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2022.
[
Project]
[
Paper]
[
Code]
TPAMI

Adaptive Perspective Distillation for Semantic Segmentation
Zhuotao Tian, Pengguang Chen, Xin Lai, Li Jiang, Shu Liu,
Hengshuang Zhao, Bei Yu, Ming-Chang Yang, Jiaya Jia.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2022.
[
Paper]
[
Code]
TPAMI

Patch-based Separable Transformer for Visual Recognition
Shuyang Sun, Xiaoyu Yue,
Hengshuang Zhao, Philip Torr, Song Bai.
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2022.
[
Paper]
Do Different Tracking Tasks Require Different Appearance Models?
Zhongdao Wang,
Hengshuang Zhao, Yali Li, Shengjin Wang, Philip Torr, Luca Bertinetto.
Neural Information Processing Systems (
NeurIPS), 2021.
[
Project]
[
Paper]
[
Code]
BMVC

Hierarchical Interaction Network for Video Object Segmentation from Referring Expressions
Zhao Yang, Yansong Tang, Luca Bertinetto,
Hengshuang Zhao, Philip Torr.
British Machine Vision Conference (
BMVC), 2021.
[
Paper]
ICCV Oral

Point Transformer
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun.
International Conference on Computer Vision (
ICCV), 2021.
Oral
[
Paper]
[
Code]
ICCV

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation
Xiaogang Xu,
Hengshuang Zhao, Jiaya Jia.
International Conference on Computer Vision (
ICCV), 2021.
[
Paper]
[
Code]
CVPR Oral

Bidirectional Projection Network for Cross Dimension Scene Understanding
Wenbo Hu*,
Hengshuang Zhao*, Li Jiang, Jiaya Jia, Tien-Tsin Wong. (*: equal contribution)
Computer Vision and Pattern Recognition (
CVPR), 2021.
Oral
[
Project]
[
Paper]
[
Code]
CVPR Oral

Fully Convolutional Networks for Panoptic Segmentation
Yanwei Li,
Hengshuang Zhao, Xiaojuan Qi, Liwei Wang, Zeming Li, Jian Sun, Jiaya Jia.
Computer Vision and Pattern Recognition (
CVPR), 2021.
Oral
[
Paper]
[
Code]
CVPR

Distilling Knowledge via Knowledge Review
Pengguang Chen, Shu Liu,
Hengshuang Zhao, Jiaya Jia.
Computer Vision and Pattern Recognition (
CVPR), 2021.
[
Paper]
[
Code]
CVPR

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
Mutian Xu, Runyu Ding,
Hengshuang Zhao, Xiaojuan Qi.
Computer Vision and Pattern Recognition (
CVPR), 2021.
[
Paper]
[
Code]
CVPR

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Sixiao Zheng, Jiachen Lu,
Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip Torr, Li Zhang.
Computer Vision and Pattern Recognition (
CVPR), 2021.
[
Project]
[
Paper]
[
Code]
CVPR

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency
Xin Lai, Zhuotao Tian, Li Jiang, Shu Liu,
Hengshuang Zhao, Liwei Wang, Jiaya Jia.
Computer Vision and Pattern Recognition (
CVPR), 2021.
[
Paper]
[
Code]
IJCAI

Dual-Cross Central Difference Network for Face Anti-Spoofing
Zitong Yu, Yunxiao Qin,
Hengshuang Zhao, Xiaobai Li, Guoying Zhao.
International Joint Conference on Artificial Intelligence (
IJCAI), 2021.
[
Paper]
CVPR

Exploring Self-attention for Image Recognition
Hengshuang Zhao, Jiaya Jia, Vladlen Koltun.
Computer Vision and Pattern Recognition (
CVPR), 2020.
[
Paper]
[
Code]
PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
Li Jiang*,
Hengshuang Zhao*, Shaoshuai Shi, Shu Liu, Chi-Wing Fu, Jiaya Jia. (*: equal contribution)
Computer Vision and Pattern Recognition (
CVPR), 2020.
Oral
[
Paper]
[
Code]
TPAMI

Prior Guided Feature Enrichment Network for Few-Shot Segmentation
Zhuotao Tian,
Hengshuang Zhao†, Michelle Shu, Zhicheng Yang, Ruiyu Li, Jiaya Jia. (†: corresponding)
IEEE Transactions on Pattern Analysis and Machine Intelligence (
TPAMI), 2020.
[
Paper]
[
Code]
ICCV

Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation
Li Jiang,
Hengshuang Zhao, Shu Liu, Xiaoyong Shen, Chi-Wing Fu, Jiaya Jia.
International Conference on Computer Vision (
ICCV), 2019.
[
Paper]
PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing
Hengshuang Zhao*, Li Jiang*, Chi-Wing Fu, and Jiaya Jia. (*: equal contribution)
Computer Vision and Pattern Recognition (
CVPR), 2019.
[
Paper]
[
Code]
[
Video]
UPSNet: A Unified Panoptic Segmentation Network
Yuwen Xiong*, Renjie Liao*,
Hengshuang Zhao*, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun. (*: equal contribution)
Computer Vision and Pattern Recognition (
CVPR), 2019.
Oral
[
Paper]
[
Code]
ECCV

Compositing-aware Image Search
Hengshuang Zhao, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Brian Price, Jiaya Jia.
European Conference on Computer Vision (
ECCV), 2018.
[
Project]
[
Paper]
[
Supp]
SegStereo: Exploiting Semantic Information for Disparity Estimation
Guorun Yang*,
Hengshuang Zhao*, Jianping Shi, Zhidong Deng, Jiaya Jia. (*: equal contribution)
European Conference on Computer Vision (
ECCV), 2018.
[
Project]
[
Paper]
[
Code]
[
Video]
[
Supp]
ICNet for Real-Time Semantic Segmentation on High-Resolution Images
Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia.
European Conference on Computer Vision (
ECCV), 2018.
[
Project]
[
Paper]
[
Code]
[
Video]
[
Supp]
CVPR

ECCV

Augmented Feedback in Semantic Segmentation under Image Level Supervision
Xiaojuan Qi, Zhengzhe Liu, Jianping Shi,
Hengshuang Zhao, Jiaya Jia.
European Conference on Computer Vision (
ECCV), 2016.
[
Paper]
AO

Rapid and Automatic 3D Body Measurement System based on a GPU-steger Line Detector
Xingjian Liu,
Hengshuang Zhao, Guomin Zhan, Kai Zhong, Zhongwei Li, YuhJin Chao, Yusheng Shi.
Applied Optics, 2016.
[
Paper]
SPIE

A High-reflective Surface Measurement Method based on Conoscopic Holography Technology
Xu Cheng, Zhongwei Li, Yusheng Shi,
Hengshuang Zhao, Guomin Zhan.
Optical Metrology and Inspection for Industrial Applications III, 2014.
[
Paper]