Title: Jiroft Illusions
Date: 2025
Jiroft Illusions is an exploration of machine hallucination and the artificial gaze’s interpretation of the ancient civilization of Jiroft. This project employs a custom multimodal workflow that intentionally amplifies machine hallucinations rather than minimizing them. Unlike traditional AI applications that prioritize accuracy and realism, this workflow leverages hallucinations as a creative force. The iterative process—moving between text and image—generates increasingly abstract, unpredictable, and non-human interpretations of Jiroft’s artifacts and culture. The project presents a series of still images, profoundly shaped by machine hallucination, installed as a public artwork in Tehran, the modern capital of Iran. Utilizing multimodal techniques, augmented reality, and spatial audio hallucinations, the experience is activated through image tracking via a custom-made APK.
The Jiroft culture refers to an early Bronze Age civilization (c. 3000–2000 BCE) in present-day southeastern Iran, particularly around the Halil River near the modern city of Jiroft. First recognized in the early 2000s, this civilization gained significance following major archaeological discoveries, including elaborately carved chlorite artifacts and architectural remains.
The Jiroft writing system is an undeciphered script found on a few artifacts dating to around 3000–2000 BCE. It is considered one of the region’s earliest potential writing systems, possibly predating or contemporaneous with Proto-Elamite and Mesopotamian cuneiform. If deciphered, it could reshape the history of early writing by revealing an independent Iranian tradition alongside Mesopotamia and the Indus Valley. This remains an active area of research in early literacy and Bronze Age civilizations.
Given the significance of this writing system in understanding Jiroft’s beliefs, culture, and mythology, the only available method of analysis is archaeological interpretation, relying heavily on the visual characteristics of a limited number of discovered artifacts. This constraint creates an artistic opportunity to explore Jiroft’s cultural heritage through an artificial gaze—one that does not aim to humanize AI or eliminate the biases inherent in its datasets but instead acknowledges these biases as reflections of our collective memory.
Interestingly, while engineers strive to optimize AI for greater accuracy and realism—particularly in fields like finance, medical diagnostics, and legal analysis, where hallucinations are undesirable—creative disciplines can harness machine hallucinations as a tool for uncovering new perspectives within artificial perception. This project explores the fluid transformation between text and image in multimodal systems. Moving beyond simple translation, it highlights how AI-generated imagery and text co-construct meaning through recursive exchange, shaped by machine hallucinations and dataset biases. This shift from intersemiotic conversion to intermedial synthesis reflects how AI can be used not just for representation but for reimagining cultural narratives.

Technical Process
Image to Text (Florence Model):
The raw image is analyzed, generating a descriptive caption using the microsoft/Florence-2-base model from the transformers library [1].
Generating Hallucinated Text:
To produce hallucinated text output from GPT-2, the logit function’s numerical output is modified through a specifically designed random number generator. This ensures the generated text appears suitably hallucinated rather than entirely random.
Initial Image Generation:
The refined text prompt generates an AI-created image.
Filter-Guided Diffusion (FGD):
Using filter-guided diffusion [2], which aligns the input prompt with the reference image to generate the final artistic output.
In essence, this process moves iteratively between text and image, shaping the final artwork through machine perception, multimodal translation, and artistic re-interpretation.

Credits
Idea, Art & Design Lead, AR & Unity Development: Romina Rahnamoun
Technological Methodology, Programming: Rashin Rahnamoun
References
[1] Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan. Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.
[2] Zeqi Gu, Ethan Yang, Abe Davis. Filter-Guided Diffusion for Controllable Image Generation. SIGGRAPH ’24: ACM SIGGRAPH 2024 Conference Papers.
[3] Hugging Face. Image to Prompt. Available at: https://huggingface.co/spaces/ovi054/image-to-prompt
Video link: https://youtu.be/92N5JztxDuE?feature=shared