So here I was for the third time in my life with a new medium on my hands. There had been the arrival of 2d computer graphics and then 3d modelling and rendering. These had caused some disruption. The first to photographers and layout artists who were either wiped out or in the case of photographers cut down from little gods to mere mortals. As always the superstars survived but the rest went back do doing mundane stuff like weddings.
This new revolution was somehow different. Nobody was using it for anything. People were having fun playing with the tools but the images were glanced at for a second and then forgotten. I set about seeing what I could make. Surely, I thought, my years of experience and drawing skills would allow me to do something of note.
I decided to return to my teenage passion of sci-fi illustration. The medium seemed tailor made for this. I decided I didn’t want photographs. Most of the Ai users were either doing Manga or seeking to make what they called ‘realistic’ which meant photographic. So I sought a painted feel. If you prompt for an artistic style or a famous artist the Ai model will attempt to emulate it. Here’s Rubens in space…
Vermeer reimagined. Here I hit the strangest problem. I didn’t feel I’d created these images. I had caused them to be made but somehow they didn’t belong to me in the same way a drawing or painting did. It was very hard to move out from a particular genre.
You could flit between genres and even mix them together. Here’s Rackham and Japanese Woodcut…
As a young illustrator I would have been delighted with any of these. Alas the images came with no sense of achievement.
As I learnt more I realised that the Photoshop plugin was very limited. It was a user interface for Stable Diffusion. I read a little and found there were other ways of using the same Models where you had more control. I need to explain models here. To make an Ai diffusion model you need millions of image/text pairs. Then the vectors are calculated between them all. So the vector angle between ‘man’ and ‘woman’ would be zero if calculated from the position of ‘animal’ but much wider if calculated from the position of ‘gender’ A diffusion model is the result of many billions of these calculations.
The end result is that if you type in ‘cat’ and input a field of random noise as well and then put it through your model. It will come out the other side a little more cat like, take that image and put it through again and it will be more cat like still. Do the same thirty times and you have a realistic moggy. I won’t go into the theory more as there are many better explanations out there by people who understand the maths better than I do.
I settled on an interface called ComfyUi which was node based. I was familiar with node based programs as I had used them to make shaders in 3d programs. Essentially you have a box of parts which you can plug together. The interface is quite forbidding and the documentation patchy… here’s an image of what I was faced with.
The rectangles are nodes each of which performs a simple operation. The ‘noodles’ are the connections between them. This workflow is for image to image, where you start with an image and then type a prompt for how you would like it to be remade. You can set the level of change and mix different images together. My friends and the press appear to believe you type in any old nonsense and a brilliant image will pop out. This is sort of true but the image will be pretty much random. If you want a specific image things get much harder. I was soon producing fairly finished images.
The only trouble with these is they were the same as everyone else’s images. There was still no ‘me’ in them. I put that to one side. I had only scratched the surface of the various methods to get what you wanted. It was soon clear I was never going to put in a sketch and a few sentences and the right image would pop out. To learn more I had to watch many You Tube videos and look at other people’s images. I came to an uncomfortable conclusion. Everyone was making every kind of image. Personal style was nonexistent in the medium. After a lot of study and experimentation I began to make some slightly more crafted images.
After a few months I could make pretty much anything I intended in any style. The core problem remained. I felt almost zero attachment to the resulting images and would certainly never sign one. On the other hand making them was great fun. As a child my favourite toy was a huge Meccano set. I spent many hours building complicated machines and vehicles. ComfyUI was like the biggest Meccano set ever, the ways of plugging the nodes together was endless. I had the thought that my attitude to the creations was similar. I never played with the cranes and cars I made as a child. I would finish making the model, look at it for a few moments and then take it apart again. I would never even show them to anyone.
There were so many people struggling with the medium that I began to make Videos teaching the basics… I tried to smuggle some of the basics of composition and colour theory into them but people had zero interest in the nuts and bolts of image making. I would correct the distorted hands but most users wouldn’t bother. There was never any compositional thought. The subject was always dead centre of frame.
So far on my quest I had come to many different conclusions and then had to abandon them. What were the millions of images being made for? Were they entirely disposable? I am still undecided.