ReCon: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories
1Purdue University, 2Adobe Research
*Equal contribution
[Paper]     [GitHub]

⭐ Our paper has been accepted at ECCV 2024 [resources coming soon]


Examples of outputs from existing retrieval-based acceleration approaches, namely Text-based (Baseline) and Noise-based (ReDI) Retrieval, show their difficulties in accurately representing the input prompts. On the other hand, our proposed Concept-based Retrieval framework is able to produce high-fidelity and faithful images compared to other techniques.

Abstract

Text-to-image diffusion models excel in generating photo-realistic images but are hampered by slow processing times. Training-free retrieval-based acceleration methods, which leverage pre-generated “trajectories,” have been introduced to address this. Yet, these methods often lack diversity and fidelity as they depend heavily on similarities to stored prompts. To address this, we present ReCon (Retrieving Concepts), an innovative retrieval-based diffusion acceleration method that extracts visual “concepts” from prompts, forming a knowledge base that facilitates the creation of adaptable trajectories. Consequently, ReCon surpasses existing retrieval-based methods, producing high-fidelity images and reducing required Neural Function Evaluations (NFEs) by up to 40%. Extensive testing on MS-COCO, Pick-a-pick, and DiffusionDB datasets confirms that ReCon consistently outperforms established methods across multiple metrics such as Pick Score, CLIP Score, and Aesthetics Score. A user study further indicates that 76% of images generated by ReCon are rated as the highest fidelity, outperforming two competing methods, a purely text-based retrieval and a noise similarity-based retrieval.


BibTex



@inproceedings{Lu2024ReCon,
title={ReCon: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories},
author={Chen-yi Lu and Shubham Agarwal and Mehrab Tanjim and Kanak Mahadik and Anup Rao and Subrata Mitra and Shiv Kumar Saini and Saurabh Bagchi and Somali Chaterji},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2024}}


Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.