Latent handstyles
In this work I explore the use of the Contrastive Language-Image Pre-training (CLIP) model to drive the generation of traces that resemble graffiti tags. The CLIP model encodes images and their corresponding text-captions in a shared latent space. Here I am “exploring” this space in conjunction with an adaptation of the Sigma-Lognormal model of handwriting movements, which is known to work well in reproducing the kinematics (position, speed, acceleration, etc…) of human handwriting. By integrating this model with a differentiable vector graphics (DiffVG) renderer, it is possible to optimize trajectory parameters with respect to neural costs that are expressed in image space.
This video shows the trajectory evolution of various results where the optimization tries to write the word “CREAM”. The trajectories are concatenated to make a seamless loop. While the CLIP model has difficulty in fully capturing the order and shape of letters, the overall structure appears to resemble the chosen word and the trajectory generation model produces traces and motions that resemble the ones of tags. Of course this judgement is personal and purely based on my experience as a graffiti writer.