I had a play with MidJourney to find out if it would steal my job as a visual practitioner or not, and the results were – well – comforting…
Technology has always pushed our craft forward
Ever since we humans first started using stones rather than our hands to cut and shape things, technology has been advancing our capabilities. History is often marked by huge jumps in progress, thanks to one innovation unlocking more innovations. New technologies have always changed our impact on the world, on each other, and the way we think about ourselves, and our place in this world (and beyond).
Granted, some technologies have been pretty benign (I mean, bricks are a pretty awesome invention, if you think about it). Others have been ultra-disruptive (hello Gutenberg Press, the steam engine, nuclear fission, the Internet).
Think impact, not job
And technology has always shifted our thinking and assumptions of what any certain job should be. If you have a fixed job-oriented mindset, then you’re up for a world of anxiety, because it’s only a matter of time before any new technology challenges, disrupts, or even completely removes the need for that job.
But if you have a more open impact-oriented mindset, then new technologies will always present new opportunities for you to make your mark in the world in new and better ways.
How will AI-powered image generators affect the visual practice field?
In recent years, AI and smart programs have been advancing in leaps and bounds, and every new app seems to make yet another industry really nervous about their livelihoods. The release of AI-powered image generators like DALL-E, DALL-E Mini, Imagen, and MidJourney is certainly making a lot of people in the fields of visualisation, illustration, art, design and photography pretty twitchy.
And no doubt about it, the images these service generate are pretty amazing. Here’s what you get when you prompt it with the word “happiness”:
There’s loads of commentary going on about what this means for the visual practice field (i.e. illustration, art, graphic recording, visual storytelling), but rather than just add more words to that, I thought I’d run a little experiment, to show some evidence to us as a community of what to be twitchy (or not twitchy) about…
An AI-augmented visualisation experiment: illustrating company visions
I thought I’d get the AI-powered image generator MidJourney to illustrate some company visions, to see what it created, and to see what I could learn from that.
Why company visions? I help teams gain clarity and direction through visualising their complex and often ambiguous and esoteric information and ideas. So, if I were to go toe-to-toe against MidJourney, I thought I’d choose something that we can all reference, as a starting point.
Using MidJourney is a bit tricky; it uses Discord as its interface, which means it’s easy to lose your creations (MidJourney’s creations?) amongst hundreds of other images going on in the same big stream of messages. There are tons of others in the same channel, and most of what they’re prompting MidJourney for are for pictures of cyberpunk cities, elves, hot medieval chicks and robots. Kids today, hey.
Basically, you type imagine/ and then whatever text description you want as the prompt for the image generator. It then generates 4 ‘draft’ images. From there, you can opt for a variation on any one image, or select one image for it to create a final, larger, more detailed piece.
Here’s what MidJourney generated for Nike’s vision: “To bring inspiration and innovation to every athlete in the world“. As you can see in these shots, the images start as broad shapes and colours, and gain detail and specificity over about a minute.
What about Amazon? “to be Earth’s most customer-centric company, where customers can find and discover anything they might want to buy online”:
You wondering what Tesla’s vision “to accelerate the world’s transition to sustainable energy” looks like? Here you go:
I dig the other-worldly-looking windmills in the lower-left version.
LinkedIn’s vision statement, “Create economic opportunity for every member of the global workforce”:
What can we make of these images?
Any image generation algorithm is written by at least one person, and drawing upon (pun intended) a vast collection of existing (human-made) images of various kinds to create any new image. Looking at these images (above), there seem to be some decisions about colour and composition that would have been informed by that collection, and perhaps the visual texture and form also.
The objects and shapes themselves rendered in these images seem pretty generic… but then again the prompts (i.e. the vision statements) use pretty generic language. Bland words yield bland images. That’s on the vision statements, not on the AI.
So, what does this mean? A few things jump out for me:
The images are novel, but not original
Every AI image is re-sampling images that already exist. Can original art come from existing art? Yes, absolutely. Music by DJs like Moby re-sampled existing tracks, but is still fresh and original; it’s music we haven’t heard before.
AI steals from the original creators
Speaking of images that already exist, there is no attribution given to the artists who created the images that sit in the corpus that the AI has learned from.
Which is not cool.
What’s more, it breeds the mindset that most people have about images online, and that is if it’s on the internet, it must be free. Not so.
The fight against this behaviour is on. For example, Getty Images is suing AI art generator Stable Diffusion in the US for copyright infringement. This article on The Verge is a great exploration about the issues at stake.
The magic is in our interpretation, not in their generation
The introduction of the camera catalysed a wild explosion of new thought about art, and a range of modern art movements, like impressionism, expressionism, surrealism, and cubism. We (i.e. human artists) reacted to this new technology, and intentionally challenged existing ideas of aesthetics, technique and composition, and created whole new ways of expressing ourselves visually.
AI-generated images don’t intentionally challenge existing patterns and compositions, or re-interpret existing metaphors or existing visual treatments of subject matter. The algorithms are serving up a range of calculated renderings, and then we choose what ‘stands out’ and what does not. Any freshness, originality, or aesthetic value we ascribe to any of these images comes from us, not the AI.
We are still doing the synthesis
These AI algorithms ape some of what we intuitively do when thinking how to visualise something, but not all of it. In Presto Sketching, I wrote about how we do that. Generally we:
- Understand – We check that we ‘get’ the concepts that need to be visualised
- Synthesise – Then, our clever brains make a multitude of choices about what those concepts mean, how are they connected, what to include, what not to include, what to emphasise, how to appeal to our intended audience, is there a metaphor at play, and so on.
- Translate – Then, we think about how to visualise and communicate that synthesis. This is steered by our own memory and experiences, our visual diet, our confidence and competence in being able to render something (on paper or pixels).
I think AI-powered image generators do an amazing job of #3 Translate, but they can’t do #1 Understand or #2 Synthesise. You can see this when you look at the visual rendering of the vision statements above; yes, the text is very generic, so all it can do is algorithmically reach for a ‘stock’ object that fits a word, or render a ‘stock’ abstract symbol that represents a concept (e.g. the ‘go faster’ lines) for “innovation”).
Bottom line? It’s a (possibly illegal) type of brush
As long as we demonstrate understanding and synthesis, I think we still have a massive edge over AI-powered image generation. I see it as a new type of brush that we can paint with. What ‘colours’ do you put on the brush? How do you wield the brush? It depends on the prompts you use.
But this (to me) calls for three pursuits:
Broaden your visual diet. Whatever your role is, intentionally go after a richer variety of visual stimuli. Go to nature, seek out different kinds of First Nation art, go through old art books… but do what it takes to feed your eyeballs and your brain with a wider range of visual layouts, colours, textures, treatments, and subject matter.
Cultivate your creativity. Intentionally colour outside the lines. Create things for the sheer heck of it, not for money or likes or any reward. Be impulsive. Swap out a familiar tool with an unfamiliar tool. Set yourself a new constraint for each project.
Example: Here’s what MidJourney makes with my nonsense prompt: “yuopoyy at bhlkip asd”:
Interesting, hey?
Skill up in synthesis. Cultivate and improve your skills of listening and observing, questioning, empathising, critical thinking, (re)framing, (re)classifying, (re)grouping, (re)wording, and resampling.
And last thing: Check out the subtle art of prompt whispering. 😉
What are your thoughts? I’m keen to know!