Daniel Geng
Aesthetics

Daniel Geng

Factorized Diffusion
Daniel Geng*, Aaron Park*, Andrew Owens (2024)

Visit our website for more information and examples!

Factorized Diffusion is a method that enables different text conditioning on different components of an image. For example, conditioning low and high frequencies on different text allows us to make images that change appearance when seen from a distance. These are called hybrid images, and were first introduced by Oliva et al. For more examples, please see our hybrid images gallery.

https://dangeng.github.io/factorized_diffusion/static/videos/teaser/teaser.hybrid.mp4



Our method can also make what we call color hybrids: images that change appearance when color is added or subtracted. Interestingly, because the human eye cannot see color under dim lighting, there is a physical mechanism for this illusion—these images change appearance when taken from a brightly lit environment to a dimly lit one. These images are generated by conditioning grayscale and color components on differnt prompts. For more examples, please see our color hybrids gallery.

https://dangeng.github.io/factorized_diffusion/static/videos/teaser/teaser.color.mp4     We can also make images that change appearance when motion blurred, which we call motion hybrids. To make these, we condition a motion blurred component on one prompt, and the residual component on another. Note in the below visualizations we synthetically add motion blur. For more examples, please see our motion hybrids gallery. https://dangeng.github.io/factorized_diffusion/static/videos/teaser/teaser.motion.mp4     In addition, we can make hybrid images from real images. We do this by taking high or low pass components from a real image, and generating the missing component. Effectively, this is a method to solve inverse problems, which we discuss in more detail on our website. For more examples, please see our inverse hybrids gallery. https://dangeng.github.io/factorized_diffusion/static/videos/teaser/teaser.inverse.mp4     Finally, we can make hybrid images with three different interpretations by conditioning three different levels of a Laplacian pyramid on different prompts. We found that this was fairly difficult to do, and required manually hand tuning the Laplacian pyramid parameters. If you have difficulty seeing the prompts please try zooming in and out, or stepping a couple meters away from the screen. For more examples, please see our triple hybrids gallery. https://dangeng.github.io/factorized_diffusion/static/videos/teaser/teaser.triple.mp4


As a sidenote, we also show the design process of this year's CVPR T-shirt and other samples that were "rejected". To make the hybrid image illusion on the T-shirt, we generate inverse hybrids which is conditioned on a real photo of the seattle skyline taken by Pavol Svantner(@palsoft), and then print the letters 'CVPR' on top of the image.


Here we show a gallery of some of the illusions that we have tried using real images. The left top image, which is the basis of this year's T-shirt design was generated with the text prompts, "a watercolor of the seattle skyline with mount rainier in the background" and "the text 'CVPR'".