Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang · Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis · SlidesLive

Kategorie

CS

Přihlásit se Kontaktujte nás

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

1. května 2023

Řečníci

O prezentaci

Large-scale diffusion models have demonstrated remarkable performance on text-to-image synthesis (T2I). Despite their ability to generate high-quality and creative images, users still observe images that do not align well with the text input, especially when involving multiple objects. In this work, we strive to improve the compositional skills of existing large-scale T2I models, specifically more accurate attribute binding and better image compositions. We propose to incorporate language structures with the cross-attention layers based on a recently discovered property of diffusion-based T2I models. Our method is implemented on a state-of-the-art model, Stable Diffusion, and achieves better compositional skills in both qualitative and quantitative results. Our structured cross-attention design is also efficient that requires no additional training samples. Lastly, we conduct an in-depth analysis to reveal potential causes of incorrect image compositions and justify the properties of cross-attention layers in the generation process.

Organizátor

Baví vás formát? Nechte SlidesLive zachytit svou akci!

Profesionální natáčení a streamování po celém světě.

Sdílení

Doporučená videa

Prezentace na podobné téma, kategorii nebo přednášejícího