Compositionality and Parts
Kenan Tang, 2024, Digital illustration using SPICE
I created this artwork in response to a blog post by AI critic Prof. Gary Marcus.
The blog post identified AI models’ weaknesses in understanding the compositionality and parts of common objects. AI models were believed to be unable to reliably create common objects with uncommon compositions and parts. In the blog post, failure cases included a 4-ear rabbit, a violin without a bridge, a potato under a spoon, and a certain number of fish.
Although the blog post was written in 2024, the limitation remains relevant today. To address this limitation, I proposed SPICE, a state-of-the-art AI image generation and editing technique that faithfully follows user requirements at every generation step. With SPICE, I effortlessly made these elements co-occur in a single image, challenging the claimed limitation of AI image generation models.
SPICE is a training-free method that outperforms the brute-force solution of scaling up the model or the training data. By May 2025, it is still impossible to create a similar image with all the elements using the most advanced commercial image generation models.
SPICE is an open-source workflow that has been successfully adapted to a wide range of models (FLUX.1 [dev], SDXL, SD 1.5, etc.) and art styles. SPICE is able to handle challenging cases where other strong methods (GPT-4o, Imagen 4, FLUX.1 Kontext [pro]) fail. Please see two more examples below. The first image features the character Melinoë from Hades II. The second image features the character Chirizuka Ubame from Touhou Kinjoukyou.

I would like to express my sincere gratitude to my collaborator, Yanhong Li, and my advisor, Prof. Yao Qin, for their valuable support and guidance on the SPICE technical report.