Alex Korte

This project aims to leverage several AI tools to generate novel comic strips related to a short prompt from the user. These strips can be several pages long and are viewable from our web application.

Check out our presentation here

How It Works

First, it sends the user's prompt to ChatGPT to get a fleshed-out story and a set of characters which are then developed into frame descriptions that can be sent to a fine-tuned diffusion model to generate images. We then use CLIPSeg to locate the characters in the frame based on their physical descriptions and attach text bubbles with dialogue. After several frames are compiled, it's sent to a React website to allow the user to view their results.

Problems and Solutions

Generating well formatted stories from GPT-4

At first, we had trouble getting GPT-4 to generate stories where the data could be easily parsed and then sent to other AI models. We solved this by refining a custom prompt designed with lots of examples and a clear structure, enabling us to get the data we needed consistently.

Character Consistency

With each frame being rendered separately, the key characters in the story can often get horribly modified throughout, making it hard for the user to follow the story. To remedy this, we used heavy prompt engineering when formulating characters to ensure that key details such as the character's skin color, gender, and clothing were consistent throughout the story.

Assigning Text Bubbles to Characters

Once we had images with our characters on them, we needed to add text bubbles to the images but had no obvious way of knowing which character was which and where exactly they were in the image. We solved this by using the CLIPSeg model to locate the characters in the image using their detailed physical description, then using the location data to attach text bubbles to the characters.