For the past few weeks, I’ve been obsessively experimenting with the ‘text to image’ AI DALL-E. If you’ve never heard of this amazing piece of tech, I can recommend Two Minute Papers’ video on the subject.
Needless to say, being able to generate decent looking images of basically anything just from a line of text is amazing on its own, but what really sets DALL-E apart from its competition is its ability of inpainting. How this works is that you can upload an image, erase part of it and then have DALL-E fill in the blank space with whatever you want to be there.
The video below shows this mechanism in action; I took a DALL-E generated image of a town, sawed it in half (hence the title) and then let the AI fill in the blanks.
Because DALL-E always generates three variants for each inpainting attempt, you can see three of the possible ‘expanded’ towns in the image below; each of them started from the same image and was then expanded using one of the tree image results.
As you can tell, DALL-E has a tendency to repeat existing patterns, but even the seemingly duplicated buildings aren’t exact copies. The most amazing thing though is that for each and every single image it generated, it got the perspective, lighting and art style exactly right.
Using this technique I created more than a few “townscapes” in a few different styles, which you can see in the image gallery below.
The way these were created is by starting from a single DALL-E generation with a prompt along the lines of “The town square of a typical French village. Isometric view. Watercolor art.”, then moving the ‘canvas’ 50% and letting the AI fill in the rest.
As astute viewers may have noticed, the further you get from the starting point, the more ‘watered down’ the style and quality becomes. The reason for this is that the AI has only half of the original image to go off of and slight variations in style creep in over time. Still, once again DALL-E managed to keep lighting, perspective and overall style remarkably consistent. All of these images took maybe one or two hours to composite (with the compositing process being the bulk of the work) while creating any of these images from scratch using only Photoshop would take me days if not weeks (and even then my art style isn’t even close to painterly).
If you’re still not convinced and think that maybe DALL-E just mashes together images from a massive image library, you might be won over by the following image in which it effortlessly blends together the starting images for three different townscapes with different art styles, architectural styles and lighting.
This blog post only touches on an extremely small portion of what’s possible with inpainting – it’s nothing short of revolutionary and as new and better versions of DALL-E (and its competitors) are undoubtedly released over the coming years, the world of art will never be the same again.
While straying a little from the original goal of this development-oriented blog, I believe developments in Artificial Intelligence are exciting enough to warrant a few detours from time to time.