Yonggang Qi's homepage

Email Google Scholar

Yonggang Qi 齐勇刚

Beijing University of Posts and Telecommunications (BUPT), Beijing, China

I am an associate professor at BUPT, leading a small research group working on generative and multimodal AI, with recent publications at CVPR, ICLR, ICML, NeurIPS, and IJCV. I received my PhD in Signal Processing from BUPT in 2015, advised by Professor Jun Guo at the Pattern Recognition and Intelligent Systems (PRIS) Laboratory. From 2019 to 2020, I was a visiting scholar at SketchX Lab, headed by Dr. Yi-Zhe Song, at the Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey. I was also a guest PhD at Aalborg University, Denmark in 2013, and a visiting researcher at Sun Yat-sen University, China in 2014.

My research lies at the intersection of computer vision, generative modeling, and multimodal learning. Recent work focuses on (i) scalable generative models for video and 3D (autoregressive & diffusion), (ii) geometry- and physics-aware world modeling, (iii) chain-of-thought reasoning for embodied tasks such as vision-language navigation, and (iv) post-training of diffusion models with reinforcement learning. Earlier work centered on free-hand sketch as a window into human visual abstraction.

拟招收2027年博士研究生一名（申请-考核、硕博连读），欢迎带简历邮件联系。

常年招收2-4名硕士研究生（保研+考研）、科研实习生若干名（3-6个月及以上），欢迎有科研热情的同学带简历邮件联系。

News

2026-05: One paper (Uniform Discrete Diffusion Model + GRPO) is accepted by ICML 2026 as Spotlight!
2026-03: I am invited as an Area Chair (AC) for NeurIPS 2026.
2026-02: Two papers (CoT for VLN & LLM for 3D Drawing) including one Highlight to appear in CVPR 2026!
2026-02: One paper (Complex Sketch Generation) is accepted by IJCV!
2026-01: One paper (World Model) to appear in ICLR 2026!
2026-01: I am invited as an Area Chair (AC) for ICPR 2026!
2025-09: Two papers (Dense Semantic Matching & Diffusion Inversion) to appear in NeurIPS 2025!
2025-08: I will serve as an Area Chair (AC) for ICLR 2026!
2025-07: One paper (Talking video generation) to appear in ACM MM 2025. Check it here of our GitHub repo.
2025-01: One paper (Video generation) to appear in ICLR 2025.
2024-12: Two papers (Edge detection & Text-to-image generation) to appear in AAAI 2025 including one Oral!
2024-01: One paper (Sketch generation) to appear in ICLR 2024.
2023-03: One paper about explainable ZS-SBIR to appear in CVPR 2023 as a highlight!
2023-01: Our work on vectorized sketch generation with diffusion model to appear in ICLR 2023 (Spotlight).
2022-10: Our work on sketch to image generation using diffusion model is accepted by BMVC 2022.
2022-09: One work about sketch to point cloud generation is accepted by ACCV 2022 (Oral)!
2022-04: Our work Generative Sketch Healing is accepted by IJCV!
2022-04: I am co-organizing the 2nd workshop on Sketching for Human Expressivity (SHE) at ECCV'22!
2021-12: Promoted to an associate professor.
2021-10: One paper on sketch-based 3D shape retrieval (SBSR) with a new fine-grained SBSR dataset got accepted by TIP!
2021-07: One paper on latticed sketch representation for sketch manipulation got accepted by ICCV 2021! Code and pre-trained models are publicly available now.
2021-04: I am co-organizing 1st workshop on Sketching for Human Expressivity (SHE) at ICCV'21!
2021-03: One paper on perceptual reasoning got accepted by CVPR 2021!
2020-11: One journal paper got accepted by IEEE TCSVT.
2020-08: One paper is accepted at BMVC 2020 as oral presentation.
2020-03: Two papers got accepted by IEEE ICME 2020.
2019-12: One journal paper got accepted by IEEE TCSVT.
2019-11: Moved to UK and started my visiting at CVSSP in University of Surrey!
2019-08: One paper got accepted by IEEE VCIP 2019.
2019-05: Got sponsorship from Chinese Scholarship Council (CSC) for visiting SketchX Lab at CVSSP in University of Surrey for one year!

Selected Publications

*: equal contribution

#: corresponding author

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

Jiaqi Wang^*, Haoge Deng^*, Ting Pan^*, Yang Liu, Chengyuan Wang, Fan Zhang, Yonggang Qi^#, Xinlong Wang^#.

International Conference on Machine Learning (ICML 2026). Spotlight

Introduces the first framework to integrate uniform discrete diffusion models with GRPO, using final clean samples and forward-process trajectory reconstruction for stable, efficient reinforcement learning.

[ Arxiv ]--[ Project Page ]--[ Github Code ]--[ Weights ]

FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation

Jing Zuo, Lingzhou Mu, Fan Jiang, Chengcheng Ma, Mu Xu, Yonggang Qi^#.

Computer Vision and Pattern Recognition (CVPR 2026).

Addresses text-only reasoning lacking spatial grounding in VLN; encodes imagined visual tokens into compact latent representations, enabling efficient reasoning-aware navigation without explicit token generation at inference.

[ PDF ]--[ Project Page ]--[ Github Code ]

3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience

Hongcan Xiao, Xinyue Xiao, Yilin Wang, Yue Zhang, Yonggang Qi^#.

Computer Vision and Pattern Recognition (CVPR 2026). Highlight

Addresses the lack of 3D spatial drawing ability in LLMs; introduces early contrastive experience to teach LLMs to generate structured, geometry-aware 3D drawings.

[ Arxiv ]--[ Project Page ]--[ Github Code ]

FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction

Yixiang Dai, Fan Jiang, Chiyu Wang, Mu Xu, Yonggang Qi^#.

International Conference on Learning Representations (ICLR 2026).

Addresses geometry inconsistency in video generation; augments frozen video foundation models with a trainable geometric branch to jointly model video latents and implicit 3D fields in a single forward pass.

[ Arxiv ]--[ Project Page ]--[ Github Code ]

Precise Diffusion Inversion: Towards Novel Samples and Few-Step Models

Jing Zuo, Luoping Cui, Chuang Zhu, & Yonggang Qi^#.

The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025).

Addresses accumulated inversion errors in DDIM; proposes a precise inversion framework that enables novel sample generation and accelerates few-step diffusion models without retraining.

[ Paper ]--[ Github Code ]

Fuse2Match: Training-Free Fusion of Flow, Diffusion, and Contrastive Models for Zero-Shot Semantic Matching

Jing Zuo, Jiaqi Wang, Yonggang Qi^#, & Yi-Zhe Song.

The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025).

Addresses zero-shot semantic matching across large appearance gaps; training-free fusion of optical flow, diffusion, and contrastive models yields complementary cross-image correspondences without any fine-tuning.

[ Paper ]--[ Github Code ]

Autoregressive Video Generation without Vector Quantization

Haoge Deng, Ting Pan, Haiwen Diao, Zhengxiong Luo, Yufeng Cui, Huchuan Lu, Shiguang Shan, Yonggang Qi^#, Xinlong Wang^#

International Conference on Learning Representations (ICLR 2025).

Addresses the quality bottleneck of discrete VQ tokenization in autoregressive video generation; replaces VQ with continuous token modeling for higher-fidelity, temporally coherent video synthesis.

[ Arxiv ]--[ Github Code ]

VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Zhipeng Chen, Lan Yang, Yonggang Qi, Honggang Zhang, Kaiyue Pang, Ke Li, & Yi-Zhe Song.

AAAI Conference on Artificial Intelligence (AAAI 2025). Oral

Addresses limited and rigid visual control in text-to-image synthesis; proposes a unified framework supporting versatile spatial and semantic control signals (sketch, depth, pose, etc.) with a single model.

[ Arxiv ]--[ Github Code ]

SAUGE: Taming SAM for Uncertainty-Aligned Multi-Granularity Edge Detection

Xing Liufu, Chaolei Tan, Xiaotong Lin, Yonggang Qi, Jinxuan Li & Jian-Fang Hu

AAAI Conference on Artificial Intelligence (AAAI 2025).

Addresses the lack of uncertainty awareness in edge detection; tames SAM's multi-granularity segmentation priors to produce uncertainty-aligned, scale-consistent edge maps.

[ Arxiv ]--[ Github Code ]

Scale-Adaptive Diffusion Model for Complex Sketch Synthesis

Jijin Hu, Ke Li, Yonggang Qi & Yi-Zhe Song.

International Conference on Learning Representations (ICLR 2024).

Addresses difficulty in synthesizing sketches with complex multi-scale structures; introduces a scale-adaptive diffusion model that dynamically adjusts generation resolution to capture fine-grained spatial details.

[ Arxiv ]--[ Github Code ]

Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style

Fengyin Lin*, Mingkang Li*, Da Li, Timothy Hospedales, Yi-Zhe Song, Yonggang Qi

Computer Vision and Pattern Recognition (CVPR 2023). Highlight

Addresses the black-box nature of cross-modal sketch-based retrieval; proposes a zero-shot SBIR framework that simultaneously retrieves images and generates visual explanations of the matching rationale.

[ Arxiv ]--[ Github Code ]

SketchKnitter: Vectorized Sketch Generation with Diffusion Models

Qiang Wang, Haoge Deng, Yonggang Qi^#, Da Li, Yi-Zhe Song

International Conference on Learning Representations (ICLR 2023). Spotlight

Addresses the lack of temporal coherence in vectorized sketch generation; first to apply diffusion models to sequential vector stroke generation, producing human-like sketches with natural drawing order.

[ Paper ]--[ Github Code ]

A Diffusion-ReFinement Model for Sketch-to-Point Modeling

Di Kong, Qiang Wang, Yonggang Qi^#.

The 16th Asian Conference on Computer Vision (ACCV 2022). Oral

Addresses the domain gap between abstract sketches and 3D point clouds; introduces a diffusion-based refinement model that progressively denoises coarse sketch-driven point predictions into detailed 3D shapes.

[ Paper ]--[ Suppl. ]--[ Github Code ]

DiffSketching: Sketch Control Image Synthesis with Diffusion Models

Qiang Wang, Di Kong, Yonggang Qi^#.

The 33rd British Machine Vision Conference (BMVC 2022).

Addresses the difficulty of using sparse sketch inputs to guide realistic image generation; leverages diffusion models' strong generative priors to synthesize photorealistic images conditioned on free-hand sketches.

[ Paper ]--[ Suppl. ]--[ Github Code ]

Generative Sketch Healing

Yonggang Qi, Guoyao Su, Qiang Wang, Jie Yang, Kaiyue Pang and Yi-Zhe Song.

International Journal of Computer Vision (IJCV), Springer.

Addresses the problem of incomplete or corrupted sketches; proposes a generative model that infers and restores missing strokes while preserving the original drawing style and semantic structure.

[ Paper ]--[ Project Page ]

SketchLattice: Latticed Representation for Sketch Manipulation

Yonggang Qi*, Guoyao Su*, Pinaki Nath Chowdhury, Mingkang Li and Yi-Zhe Song.

IEEE International Conference on Computer Vision (ICCV), 2021.

Addresses the rigidity of existing sketch representations; introduces a latticed graph structure over strokes that enables flexible, controllable sketch editing and semantic manipulation.

[ arXiv Paper ]--[ Project Page ]--[ Github Code ]

PQA: Perceptual Question Answering

Yonggang Qi*, Kai Zhang*, Aneeshan Sain and Yi-Zhe Song.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

Addresses the absence of perceptual grouping reasoning in VQA; introduces a new task and dataset requiring models to answer questions based on Gestalt perceptual principles applied to visual scenes.

[ arXiv Paper ]--[ Project Page ]--[ Github Code ]

Towards Fine-Grained Sketch-Based 3D Shape Retrieval

Anran Qi, Yulia Gryaditskaya, Jifei Song, Yongxin Yang, Yonggang Qi, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song.

IEEE Transactions on Image Processing (TIP), 2021.

Addresses coarse-grained retrieval in sketch-based 3D shape search; proposes a fine-grained cross-modal embedding with a new benchmark to retrieve instance-specific 3D shapes from hand-drawn sketches.

[Paper]--[ Project Page ]--[Code ]

Towards Practical Sketch-based 3D Shape Generation: The Role of Professional Sketches

Yue Zhong, Yonggang Qi, Yulia Gryaditskaya, Honggang Zhang, and Yi-Zhe Song.

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

[ Paper ]

SketchHealer: A Graph-to-Sequence Network for Recreating Partial Human Sketches

Guoyao Su, Yonggang Qi, Kaiyue Pang, Jie Yang and Yi-Zhe Song.

The 31st British Machine Vision Virtual Conference (BMVC2020)

Oral Presentation, 5% acceptance rate

[ Paper ]--[ Github Code ]

S3NET: Graph Representational Network For Sketch Recognition

Lan Yang, Aneeshan Sain, Linpeng Li, Yonggang Qi, Honggang Zhang and Yi-Zhe Song.

2020 IEEE International Conference on Multimedia and Expo (ICME2020)

[ Paper ]--[ Github Code ]

Improved Traffic Sign Detection In Videos Through Reasoning Effective RoI Proposals

Yanting Zhang, Yonggang Qi, Jie Yang and Jenq-Neng Hwang.

2020 IEEE International Conference on Multimedia and Expo (ICME2020)

[ Paper ]

Sketch Fewer to Recognize More by Learning A Co-regularized Sparse Representation

Yonggang Qi and Yi-Zhe Song.

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

[ Paper ]

Unpaired Image-to-Sketch Translation Network for Sketch Synthesis

Yue Zhang, Guoyao Su, Yonggang Qi and Jie Yang.

IEEE Visual Communications and Image Processing (VCIP), 2019.

[ Paper ]

SketchSegNet+: An End-to-End Learning of RNN for Multi-Class Sketch Semantic Segmentation

Yonggang Qi and Zheng-Hua Tan.

IEEE ACCESS.

[ Paper ]--[ Dataset download ]

Image Retrieval by Dense Caption Reasoning

Xinru Wei, Yonggang Qi, Jun Liu and Fang Liu.

IEEE Visual Communications and Image Processing (VCIP), 2017. Oral

[ Paper ]

Instance-level Coupled Subspace Learning for Fine-grained Sketch-based Image Retrieval

Peng Xu, Qiyue Yin, Yonggang Qi, Yi-Zhe Song, Zhanyu Ma, Liang Wang and Jun Guo.

European Conference on Computer Vision (ECCV), Workshop on Visual Analysis of Sketches, 2016. Oral

[ Paper ]

Sketch-based Image Retrieval via Siamese Convolutional Neural Network

Yonggang Qi, Yi-Zhe Song, Honggang Zhang and Jun Liu.

IEEE International Conference on Image Processing (ICIP), 2016.

[ Paper ]

Making Better Use of Edges via Perceptual Grouping

Yonggang Qi, Yi-Zhe Song, Tao Xiang, Honggang Zhang, Timothy Hospedales, Yi Li and Jun Guo.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

[ Paper ]--[ Github Code ]

Im2Sketch: Sketch generation by unconflicted perceptual grouping

Yonggang Qi, Jun Guo, Yi-Zhe Song, Tao Xiang, Honggang Zhang and Zheng-Hua Tan.

Neurocomputing

[ Paper ]

sketching by perceptual grouping

Yonggang Qi, Jun Guo, Yi Li, Honggang Zhang, Tao Xiang and Yi-Zhe Song.

IEEE International Conference on Image Processing (ICIP), 2013.

[ Paper ]

Team Members

PhD Students

Jing Zuo (左京)

Multimodal Reasoning

Yilin Wang (王怡琳)

Image Generation and Editing

Liyun Peng (彭立云)

Multimodal Understanding and Generation

Yanjie Guo (郭琰杰)

3D Sketch Vision

Master Students

Donglin Ni (倪东霖)

OOD Generation

Yunpeng Zhang (张云鹏)

Avatar Generation

Jiaqi Wang (王家琪)

Image Generation

Junxiao Tang (唐浚潇)

LLM for Motion

Hongcan Xiao (肖红灿)

LLM for 3D

Zundi Ke (柯尊迪)

Visual Abstraction

Xiaobin Zhang (张晓斌)

3D Vision

Xinyue Xiao (肖欣悦)

Image Generation & Editing

Xinyue Zhang (张欣悦)

VLM Post Training

Junwei Liu (刘俊玮)

Image Generation

Alumni

Master Students

邓皓戈 (Haoge Deng) 2025 · 中科院&智源联培博士
代欣 (Xin Dai) 2025 · 北京市
汪强 (Qiang Wang) 2024 · 阿里巴巴 Alibaba
卢婷 (Ting Lu) 2024 · 京东 Jd.com
林峰印 (Fengyin Lin) 2024 · 字节跳动 ByteDance
孔迪 (Di Kong) 2024 · 清华大学博士 PhD@Tsinghua University
刘达 (Da Liu) 2024 · 北京市专用通信局
李明康 (Mingkang Li) 2023 · 字节跳动 ByteDance
陈彦岚 (Yanlan Chen) 2023 · 微软 Microsoft
吴宙思 (Zhusi Wu) 2023 · 南方基金 NFfund
张楷 (Kai Zhang) 2022 · 字节跳动 ByteDance
苏国耀 (Guoyao Su) 2022 · 招商银行总部
王尤嘉 (Youjia Wang) 2022 · 中金 CICC
程闱邵 (Weishao Cheng) 2021 · 大众 Volkswagen

Updated May. 2026, page created using Bootstrap