I am a Postdoctoral Scholar at the University of California, Berkeley. I received my Ph.D. from Zhejiang University, and my thesis, titled Knowledge Distillation on Deep Neural Networks, won the Outstanding Doctoral Dissertation award (2%). I worked with SUNY Distinguished Professor Siwei Lyu at the State University of New York at Buffalo.
I am focusing on diffusion-based generative models (theoretical understanding, accelerated sampling), and knowledge distillation (e.g., SimKD, CVPR’22, SemCKD, AAAI’21, TKDE’23, OnlineKD, AAAI’20). My Google Scholar citations reached 2025 in 2025. I lived in Hangzhou (Paradise on Earth) and Wenzhou (Home of Mathematicians) for more than 25 years.
Text-to-image diffusion models have achieved unprecedented success but still struggle to produce high-quality results under limited sampling budgets. Existing training-free sampling acceleration methods are typically developed independently, leaving the overall performance and compatibility among these methods unexplored. In this paper, we bridge this gap by systematically elucidating the design space, and our comprehensive experiments identify the sampling time schedule as the most pivotal factor. Inspired by the geometric properties of diffusion models revealed through the Frenet-Serret formulas, we propose constant total rotation schedule (TORS), a scheduling strategy that ensures uniform geometric variation along the sampling trajectory. TORS outperforms previous training-free acceleration methods and produces high-quality images with 10 sampling steps on Flux.1-Dev and Stable Diffusion 3.5. Extensive experiments underscore the adaptability of our method to unseen models, hyperparameters, and downstream applications.
@inproceedings{zhou2026tors,title={Analyzing and Improving Fast Sampling of Text-to-Image Diffusion Models},author={Zhou, Zhenyu and Chen, Defang and Lyu, Siwei and Chen, Chun and Wang, Can},booktitle={European Conference on Computer Vision},year={2026}}
25-JSTAT
Diffusion
Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models
Diffusion-based generative models employ stochastic differential equations (SDEs) and their equivalent probability flow ordinary differential equations (ODEs) to establish a smooth transformation between complex high-dimensional data distributions and tractable prior distributions. In this paper, we reveal a striking geometric regularity in the deterministic sampling dynamics: each simulated sampling trajectory lies within an extremely low-dimensional subspace, and all trajectories exhibit an almost identical ”boomerang” shape, regardless of the model architecture, applied conditions, or generated content. We characterize several intriguing properties of these trajectories, particularly under closed-form solutions based on kernel-estimated data modeling. We also demonstrate a practical application of the discovered trajectory regularity by proposing a dynamic programming-based scheme to better align the sampling time schedule with the underlying trajectory structure. This simple strategy requires minimal modification to existing ODE-based numerical solvers, incurs negligible computational overhead, and achieves superior image generation performance, especially in regions with only 5-10 function evaluations.
@article{chen2025geometric,title={Geometric Regularity in Deterministic Sampling of Diffusion-based Generative Models},author={Chen, Defang and Zhou, Zhenyu and Wang, Can and Lyu, Siwei},journal={J. Stat. Mech.},year={2025},}
arXiv
Survey
Fully AI-Generated Image Detection: Definition, Recent Advances and Challenges
@article{xu2025detection,title={Fully AI-Generated Image Detection: Definition, Recent Advances and Challenges},author={Xu, Qijie and Wang, Can and Chen, Jiawei and Lyu, Siwei and Chen, Defang},journal=arxiv,year={2025},}
26-AAAI
Diffusion
DICE: Distilling Classifier-Free Guidance into Text Embeddings
Text-to-image diffusion models are capable of generating high-quality images, but these images often fail to align closely with the given text prompts. Classifier-free guidance (CFG) is a popular and effective technique for improving text-image alignment in the generative process. However, using CFG introduces significant computational overhead and deviates from the established theoretical foundations of diffusion models. In this paper, we present DIstilling CFG by enhancing text Embeddings (DICE), a novel approach that removes the reliance on CFG in the generative process while maintaining the benefits it provides. DICE distills a CFG-based text-to-image diffusion model into a CFG-free version by refining text embeddings to replicate CFG-based directions. In this way, we avoid the computational and theoretical drawbacks of CFG, enabling high-quality, well-aligned image generation at a fast sampling speed. Extensive experiments on multiple Stable Diffusion v1.5 variants, SDXL and PixArt-αdemonstrate the effectiveness of our method. Furthermore, DICE supports negative prompts for image editing to improve image quality further.
@inproceedings{zhou2026dice,title={DICE: Distilling Classifier-Free Guidance into Text Embeddings},author={Zhou, Zhenyu and Chen, Defang and Wang, Can and Chen, Chun and Lyu, Siwei},booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},year={2025}}
Diffusion-based generative models have demonstrated their powerful performance across various tasks, but this comes at a cost of the slow sampling speed. To achieve both efficient and high-quality synthesis, various distillation-based accelerated sampling methods have been developed recently. However, they generally require time-consuming fine tuning with elaborate designs to achieve satisfactory performance in a specific number of function evaluation (NFE), making them difficult to employ in practice. To address this issue, we propose Simple and Fast Distillation (SFD) of diffusion models, which simplifies the paradigm used in existing methods and largely shortens their fine-tuning time up to 1000 . We begin with a vanilla distillation-based sampling method and boost its performance to state of the art by identifying and addressing several small yet vital factors affecting the synthesis efficiency and quality. Our method can also achieve sampling with variable NFEs using a single distilled model. Extensive experiments demonstrate that SFD strikes a good balance between the sample quality and fine-tuning costs in few-step image generation task. For example, SFD achieves 4.53 FID (NFE=2) on CIFAR-10 with only 0.64 hours of fine-tuning on a single NVIDIA A100 GPU.
@inproceedings{zhou2024simple,title={Simple and fast distillation of diffusion models},author={Zhou, Zhenyu and Chen, Defang and Wang, Can and Chen, Chun and Lyu, Siwei},booktitle={{Advances in Neural Information Processing Systems}},pages={40831--40860},year={2024},}
25-TMLR
Survey
Conditional Image Synthesis with Diffusion Models: A Survey
Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and to understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches during the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the sampling process. All discussions are centered around popular applications. Finally, we pinpoint several critical yet still unsolved problems and suggest some possible solutions for future research.
@article{zhan2025conditional,title={Conditional Image Synthesis with Diffusion Models: A Survey},author={Zhan, Zheyuan and Chen, Defang and Mei, Jian-Ping and Zhao, Zhenghe and Chen, Jiawei and Chen, Chun and Lyu, Siwei and Wang, Can},journal={Transactions on Machine Learning Research},year={2025},}
24-ICML
Diffusion
On the Trajectory Regularity of ODE-based Diffusion Sampling
Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoising trajectory and discuss its vital role in forming the coupled sampling trajectory with a strong shape regularity, regardless of the generated content. We also describe a dynamic programming-based scheme to make the time schedule in sampling better fit the underlying trajectory structure. This simple strategy requires minimal modification to any given ODE-based numerical solvers and incurs negligible computational cost, while delivering superior performance in image generation, especially in 5∼10 function evaluations.
@inproceedings{chen2024trajectory,title={On the Trajectory Regularity of ODE-based Diffusion Sampling},author={Chen, Defang and Zhou, Zhenyu and Wang, Can and Shen, Chunhua and Lyu, Siwei},booktitle={International Conference on Machine Learning},pages={7905--7934},year={2024},}
24-CVPR
Diffusion
Fast ODE-based Sampling for Diffusion Models in Around 5 Steps
Sampling from diffusion models can be treated as solving the corresponding ordinary differential equations (ODEs), with the aim of obtaining an accurate solution with as few number of function evaluations (NFE) as possible. Recently, various fast samplers utilizing higher-order ODE solvers have emerged and achieved better performance than the initial first-order one. However, these numerical methods inherently result in certain approximation errors, which significantly degrades sample quality with extremely small NFE (e.g., around 5). In contrast, based on the geometric observation that each sampling trajectory almost lies in a two-dimensional subspace embedded in the ambient space, we propose Approximate MEan-Direction Solver (AMED-Solver) that eliminates truncation errors by directly learning the mean direction for fast diffusion sampling. Besides, our method can be easily used as a plugin to further improve existing ODE-based samplers. Extensive experiments on image synthesis with the resolution ranging from 32 to 512 demonstrate the effectiveness of our method. With only 5 NFE, we achieve 6.61 FID on CIFAR-10, 10.74 FID on ImageNet 64\times64, and 13.20 FID on LSUN Bedroom.
@inproceedings{zhou2024fast,title={Fast ODE-based Sampling for Diffusion Models in Around 5 Steps},author={Zhou, Zhenyu and Chen, Defang and Wang, Can and Chen, Chun},booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},pages={7777--7786},year={2024},}