publications | Defang Chen

2025

25-TMLR
Survey
Conditional Image Synthesis with Diffusion Models: A Survey

Zheyuan Zhan, Defang Chen^†, Jian-Ping Mei, Zhenghe Zhao, Jiawei Chen, and 3 more authors

Transactions on Machine Learning Research, 2025

Abs arXiv Bib Code

Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and to understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches during the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the sampling process. All discussions are centered around popular applications. Finally, we pinpoint several critical yet still unsolved problems and suggest some possible solutions for future research.
@article{zhan2025conditional, topic = {Survey}, title = {Conditional Image Synthesis with Diffusion Models: A Survey}, author = {Zhan, Zheyuan and Chen, Defang and Mei, Jian-Ping and Zhao, Zhenghe and Chen, Jiawei and Chen, Chun and Lyu, Siwei and Wang, Can}, journal = {Transactions on Machine Learning Research}, year = {2025}, month = may, }

2024

24-NeurIPS
Diffusion
Simple and fast distillation of diffusion models

Zhenyu Zhou, Defang Chen^†, Can Wang, Chun Chen, and Siwei Lyu

In Advances in Neural Information Processing Systems, 2024

Abs arXiv Bib Code

Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000 . We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the synthesis efficiency and quality. Our method can also achieve sampling with variable NFEs using a single distilled model. Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task. For example, SFD achieves 4.53 FID (NFE=2) on CIFAR-10 with only 0.64 hours of fine-tuning on a single NVIDIA A100 GPU.
@inproceedings{zhou2024simple, topic = {diffusion}, title = {Simple and fast distillation of diffusion models}, author = {Zhou, Zhenyu and Chen, Defang and Wang, Can and Chen, Chun and Lyu, Siwei}, booktitle = {{Advances in Neural Information Processing Systems}}, pages = {40831--40860}, year = {2024}, }
24-ICML
Diffusion
On the Trajectory Regularity of ODE-based Diffusion Sampling

Defang Chen, Zhenyu Zhou, Can Wang, Chunhua Shen, and Siwei Lyu

In International Conference on Machine Learning, 2024

Early version: A Geometric Perspective on Diffusion Models (May, 2023)

Abs arXiv Bib Code

Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoising trajectory and discuss its vital role in forming the coupled sampling trajectory with a strong shape regularity, regardless of the generated content. We also describe a dynamic programming-based scheme to make the time schedule in sampling better fit the underlying trajectory structure. This simple strategy requires minimal modification to any given ODE-based numerical solvers and incurs negligible computational cost, while delivering superior performance in image generation, especially in 5∼10 function evaluations.
@inproceedings{chen2024trajectory, topic = {Diffusion}, title = {On the Trajectory Regularity of ODE-based Diffusion Sampling}, author = {Chen, Defang and Zhou, Zhenyu and Wang, Can and Shen, Chunhua and Lyu, Siwei}, booktitle = {International Conference on Machine Learning}, pages = {7905--7934}, year = {2024}, xgoogle_scholar_id = {Ak0FvsSvgGUC}, }
24-CVPR
Diffusion
Fast ODE-based Sampling for Diffusion Models in Around 5 Steps

Zhenyu Zhou, Defang Chen^†, Can Wang, and Chun Chen

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Abs arXiv Bib Code

Sampling from diffusion models can be treated as solving the corresponding ordinary differential equations (ODEs), with the aim of obtaining an accurate solution with as few number of function evaluations (NFE) as possible. Recently, various fast samplers utilizing higher-order ODE solvers have emerged and achieved better performance than the initial first-order one. However, these numerical methods inherently result in certain approximation errors, which significantly degrades sample quality with extremely small NFE (e.g., around 5). In contrast, based on the geometric observation that each sampling trajectory almost lies in a two-dimensional subspace embedded in the ambient space, we propose Approximate MEan-Direction Solver (AMED-Solver) that eliminates truncation errors by directly learning the mean direction for fast diffusion sampling. Besides, our method can be easily used as a plugin to further improve existing ODE-based samplers. Extensive experiments on image synthesis with the resolution ranging from 32 to 512 demonstrate the effectiveness of our method. With only 5 NFE, we achieve 6.61 FID on CIFAR-10, 10.74 FID on ImageNet 64\times64, and 13.20 FID on LSUN Bedroom.
@inproceedings{zhou2024fast, topic = {Diffusion}, title = {Fast ODE-based Sampling for Diffusion Models in Around 5 Steps}, author = {Zhou, Zhenyu and Chen, Defang and Wang, Can and Chen, Chun}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages = {7777--7786}, year = {2024}, }

2022

22-CVPR
Distillation
Knowledge Distillation with the Reused Teacher Classifier

Defang Chen, Jian-Ping Mei, Hailin Zhang, Can Wang, Yan Feng, and 1 more author

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Abs arXiv Bib Code

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years, generally with elaborately designed knowledge representations, which in turn increase the difficulty of model development and interpretation. In contrast, we empirically show that a simple knowledge distillation technique is enough to significantly narrow down the teacher-student performance gap. We directly reuse the discriminative classifier from the pre-trained teacher model for student inference and train a student encoder through feature alignment with a single \ell_2 loss. In this way, the student model is able to achieve exactly the same performance as the teacher model provided that their extracted features are perfectly aligned. An additional projector is developed to help the student encoder match with the teacher classifier, which renders our technique applicable to various teacher and student architectures. Extensive experiments demonstrate that our technique achieves state-of-the-art results at the modest cost of compression ratio due to the added projector
@inproceedings{chen2022simkd, topic = {Distillation}, title = {Knowledge Distillation with the Reused Teacher Classifier}, author = {Chen, Defang and Mei, Jian-Ping and Zhang, Hailin and Wang, Can and Feng, Yan and Chen, Chun}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages = {11933--11942}, year = {2022}, xgoogle_scholar_id = {mNrWkgRL2YcC}, }
22-ICASSP
Confidence-aware multi-teacher knowledge distillation

Hailin Zhang, Defang Chen, and Can Wang

In IEEE International Conference on Acoustics, Speech and Signal Processing, 2022

Abs arXiv Bib Code

Knowledge distillation is initially introduced to utilize additional supervision from a single teacher model for the student model training. To boost the student performance, some recent variants attempt to exploit diverse knowledge sources from multiple teachers. However, existing studies mainly integrate knowledge from diverse sources by averaging over multiple teacher predictions or combining them using other label-free strategies, which may mislead student in the presence of low-quality teacher predictions. To tackle this problem, we propose Confidence-Aware Multi-teacher Knowledge Distillation (CA-MKD), which adaptively assigns sample-wise reliability for each teacher prediction with the help of ground-truth labels, with those teacher predictions close to one-hot labels assigned large weights. Besides, CA-MKD incorporates features in intermediate layers to stable the knowledge transfer process. Extensive experiments show our CA-MKD consistently outperforms all compared state-of-the-art methods across various teacher-student architectures.
@inproceedings{zhang2022confidence, title = {Confidence-aware multi-teacher knowledge distillation}, author = {Zhang, Hailin and Chen, Defang and Wang, Can}, booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing}, pages = {4498--4502}, year = {2022}, }

2021

21-ICCV
Distilling holistic knowledge with graph neural networks

Sheng Zhou, Yucheng Wang, Defang Chen, Jiawei Chen, Xin Wang, and 2 more authors

In Proceedings of the IEEE/CVF international conference on computer vision, 2021

Abs arXiv Bib Code

Knowledge Distillation (KD) aims at transferring knowledge from a larger well-optimized teacher network to a smaller learnable student network. Existing KD methods have mainly considered two types of knowledge, namely the individual knowledge and the relational knowledge. However, these two types of knowledge are usually modeled independently while the inherent correlations between them are largely ignored. It is critical for sufficient student network learning to integrate both individual knowledge and relational knowledge while reserving their inherent correlation. In this paper, we propose to distill the novel holistic knowledge based on an attributed graph constructed among instances. The holistic knowledge is represented as a unified graph-based embedding by aggregating individual knowledge from relational neighborhood samples with graph neural networks, the student network is learned by distilling the holistic knowledge in a contrastive manner. Extensive experiments and ablation studies are conducted on benchmark datasets, the results demonstrate the effectiveness of the proposed method.
@inproceedings{zhou2021distilling, title = {Distilling holistic knowledge with graph neural networks}, author = {Zhou, Sheng and Wang, Yucheng and Chen, Defang and Chen, Jiawei and Wang, Xin and Wang, Can and Bu, Jiajun}, booktitle = {Proceedings of the IEEE/CVF international conference on computer vision}, pages = {10387--10396}, year = {2021}, }
21-AAAI
Distillation
Cross-Layer Distillation with Semantic Calibration

Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Zhe Wang, and 2 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence, 2021

Journal version: IEEE Transactions on Knowledge and Data Engineering (TKDE)
Highly-Cited Paper Indexed by 2024/2025 Google Scholar Metrics

Abs arXiv Bib Code

Knowledge distillation is a technique to enhance the generalization ability of a student model by exploiting outputs from a teacher model. Recently, feature-map based variants explore knowledge transfer between manually assigned teacher-student pairs in intermediate layers for further improvement. However, layer semantics may vary in different neural networks and semantic mismatch in manual layer associations will lead to performance degeneration due to negative regularization. To address this issue, we propose Semantic Calibration for cross-layer Knowledge Distillation (SemCKD), which automatically assigns proper target layers of the teacher model for each student layer with an attention mechanism. With a learned attention distribution, each student layer distills knowledge contained in multiple teacher layers rather than a specific intermediate layer for appropriate cross-layer supervision. We further provide theoretical analysis of the association weights and conduct extensive experiments to demonstrate the effectiveness of our approach.
@inproceedings{chen2021cross, topic = {Distillation}, author = {Chen, Defang and Mei, Jian{-}Ping and Zhang, Yuan and Wang, Can and Wang, Zhe and Feng, Yan and Chen, Chun}, title = {Cross-Layer Distillation with Semantic Calibration}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, pages = {7028--7036}, year = {2021}, xgoogle_scholar_id = {tH6gc1N1XXoC}, }

2020

20-AAAI
Distillation
Online Knowledge Distillation with Diverse Peers

Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng, and Chun Chen

In Proceedings of the AAAI Conference on Artificial Intelligence, 2020

Highly-Cited Paper Indexed by 2024/2025 Google Scholar Metrics

Abs arXiv Bib PDF Review Response Code

Distillation is an effective knowledge-transfer technique that uses predicted distributions of a powerful teacher model as soft targets to train a less-parameterized student model. A pre-trained high capacity teacher, however, is not always available. Recently proposed online variants use the aggregated intermediate predictions of multiple student models as targets to train each student model. Although group-derived targets give a good recipe for teacher-free distillation, group members are homogenized quickly with simple aggregation functions, leading to early saturated solutions. In this work, we propose Online Knowledge Distillation with Diverse peers (OKDDip), which performs two-level distillation during training with multiple auxiliary peers and one group leader. In the first-level distillation, each auxiliary peer holds an individual set of aggregation weights generated with an attention-based mechanism to derive its own targets from predictions of other auxiliary peers. Learning from distinct target distributions helps to boost peer diversity for effectiveness of group-based distillation. The second-level distillation is performed to transfer the knowledge in the ensemble of auxiliary peers further to the group leader, i.e., the model used for inference. Experimental results show that the proposed framework consistently gives better performance than state-of-the-art approaches without sacrificing training or inference complexity, demonstrating the effectiveness of the proposed two-level distillation framework.
@inproceedings{chen2020online, topic = {Distillation}, title = {Online Knowledge Distillation with Diverse Peers}, author = {Chen, Defang and Mei, Jian-Ping and Wang, Can and Feng, Yan and Chen, Chun}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, pages = {3430--3437}, year = {2020}, review = {20AAAI_review.pdf}, response = {20AAAI_response.pdf}, }