Creating 'The Open Window' with AI: 10 Hours, 5 Tools, and many lessons in AI Filmmaking

From recreating Syndicate Wars with Sora to creating "The Open Window" with Google Veo: a complete short film with dialogue, music, and sound design. 10 hours of creation, $100 in costs and many lessons about what happens when AI meets storytelling. This is documentation of a historic transition in AI filmmaking that I can't wait to look back on.

The Beginning

I have no business creating film. And I have probably less business writing about it.

A few months ago I could barely edit videos. But I've been learning a lot about the latter as a matter of necessity. I've come to the conclusion that an Oscar winner made entirely with AI is likely already in production.

But this article is not about that.

I got my first real taste of using text-to-video AI tools when OpenAI released Sora five months ago in December 2024. It was pretty impressive at the time.

I re-made the opening scene to Syndicate Wars which is an old DOS game.

Syndicate Wars: My AI Recreation

Syndicate Wars: Original Opening

I was surprised at how well it worked. And also how many challenges there were:

Preserving character continuity scene-to-scene was almost impossible.
You were unable to prompt voices and this lack of audio led to lack of richness.
Generations were not consistent (a lot of odd "artifacting") and took a very long time.

I spent the better part of 24 hours trying to refactor prompts to get 10s scene increments to resemble the original. I had to take lots of shortcuts and you'll see if you watch each side-by-side just how much was compensated for. With satisfactory scenes (and an empty credit balance), I used Final Cut Pro to edit the final product. In my limited experience it doesn't hold a candle to CapCut in terms of ease of use but I didn't know that then.

I then put AI film creation to the side despite minimally exploring Kling and Runway.

But I've been thinking about it a lot ever since.

After the Syndicate Wars project, I was left with mixed feelings. While the visual quality impressed me, the lack of character continuity and audio made the result feel hollow. I wondered: could these tools handle actual storytelling with dialogue and emotional beats? When Google Veo 3 launched, I saw my chance to find out.

Lucky for us, Google Veo was just made available to everyone if you purchase the Google AI Ultra subscription which was launched at Google I/O on May 20th, 2025.

People are already using Veo to create crazy videos so I thought I would give it a shot.

Veo 3 AI-generated video examples showing impressive quality

Additional Veo 3 AI-generated video examples

Tools and Technology

Before I get into the details of the creative process, below is the overview of the tools I used.

My primary goal was to understand how far I could go with my super limited knowledge of filmmaking but tons of experience prompting AI models.

I used the following tools:

Screenplay Development: Claude 4 Sonnet
Video Generation: Google Veo via Flow
Voice & Narration: ElevenLabs
Music & Sound: Suno
Editing & Post-Production: CapCut

There has been a lot of hype around Google's Veo 3 model and its native sound generation. What I found was that it was not as good as I expected and I ended up needing to use ElevenLabs to generate the precise audio I wanted. I am also sure it is going to get MUCH better.

The Creative Process

When making anything with AI, I follow a pretty simple process:

I start with an idea.
I iterate on the idea with AI.
I get myself really excited about the idea.
And then I create a roadmap which I break down into smaller steps to execute.

The full process for film creation, however, resembles the following:

Ideate

→

Screenplay

→

Video Generation

→

Edit

→

Voice & Narration

→

Music & Sound

→

Edit & Post-Production

Let's dive into the details!

Ideation & Screenplay Development

In this case, I didn't have a great idea about the film I wanted to make but I had 12,500 AI credits in Google Flow to play with and knew it was going to need to be short. Why? Well generally speaking you're working with 5 - 10s video clips generated by today's SOTA models and so doing anything more than 90s is going to be a challenge.

So to help me, I used Claude 4 Sonnet to do some brainstorming. Claude said "The Open Window" was a masterclass. Who was I to be particular? I picked it.

Claude 4 Sonnet brainstorming session for short film ideas

→

Claude 4 Sonnet explanation of The Open Window story

So now I had a story (read it here). I needed to create a screenplay. Based on my Sora experience, I needed a screenplay that would be simple to convert into a video generation script. I didn't have a guide this time with a prior video to "copy". So back to Claude! I asked Claude to write a screenplay for the story.

Claude 4 Sonnet screenplay for The Open Window

I was impressed with the quality of the screenplay. It was a bit long and I was able to trim it down and refine it as I went along.

The original story's twist relies entirely on dialogue. For film, I needed visual cues. I added:

Subtle elements like the "ghost strobe" transition effect in CapCut
The scenes growing darker and more ethereal as the story progresses until the end
A subtle sound design where "rustling wind" is always present

My hope was that these elements would mean that viewers could feel the growing dread even with minimal audio.

Video Generation

You'll notice that I was specific in my screenplay. I wanted to make sure that the video generation would be able to handle the scenes. And what came out was both visual prompts and audio prompts.

Google Flow is a pretty powerful tool but it's also barebones and finicky. It has a number of different modes:

Text to Video
Frames to Video
Ingredients to Video

I used the Text to Video mode to start and then to get continuity in generations used "Ingredients to Video". Ingredients allows you to add images and screnes as assets that can be used over again. I wanted to use the "Jump To" and "Extend" modes from in Scene Builder but they limit your ability to use Veo 3. I had to settle for Veo 2 generations which seems to be the highest quality mode of video generation available when using "Ingredients to Video".

The video generation process is the hardest (not necessarily longest) part of the entire process because its the most unpredictable. It's a lot of trial and error. I had to go back and forth between the screenplay and video generation.

My first attempt at Mrs. Stappleton's reveal was a disaster. I prompted: "A middle-aged woman sweeps in cheerfully, her light-colored dress a stark contrast to the room's gloom. She immediately glances toward the windows with genuine warmth and love, then settles into her chair with practiced ease". And got this...

Failed attempt at generating Mrs. Stappleton character showing inconsistent results

After subtle prompt tweaks, and using "Ingredients to Text" with Vera and Framtom as elements, I was about to achieve an effect that didn't make it seem like she was from a different century than Framton.

Google Flow workspace showing the Ingredients to Video interface with prompt and character elements

What follows is a visual representation from my Google Flow workspace. Its only part of it but gives you a good idea. One of the interesting things when creating video with Veo is you can select 1 to 4 output variations at a time. I found the sweetspot to be 2. Each output uses the same amount of credits so you can quickly chew through credits when selecting 4 and you don't have the benefit of modifiying the prompt for each.

Video generation process showing trial and error iterations

There is an option to download videos as 1080p but I found that it was a little confusing to use. It seemed that the 1080p export creates a duplicate in your workspace rather than a simple download. With 50+ video clips, this would have doubled my already cluttered project.

Since CapCut upscales 720p footage reasonably well, I made the pragmatic choice to work in 720p and save my sanity.

Once you start to generate video that is acceptable and working well, it's time to edit.

Editing

As mentioned, I have little experience editing videos professionally. I've been using CapCut for a few months now and it's a pretty good tool that I highly recommend (ByteDance is the company behind it - yes, the same folks who know a thing or two about making videos go viral). Here's a good primer but it's really pretty intuitive. The features I like to use the most are: Splitting, Transitions and Speed controls. This is because getting audio to sync up with the video is the hardest part of the entire process.

My process for editing together scenes is pretty simple:

I start with a blank project.
I add the first scene.
I think about the next scene and other assets I might need.

Unfortunately I don't have screenshots of what my CapCut project looked like at the very beginning. This is roughly what it looked like when I added the first scene (sans all the assets in the media library).

The opening shot of Framton approaching the house was actually not scene one to begin with.

CapCut project showing the first scene added to the timeline

Voice & Narration

I realized pretty quickly that I was not going to get reliable voice generation from Google Flow. The second and third scenes in the video I was able to get sound effects but not voice and I decided those scenes didn't need it anyway.

I decided to use ElevenLabs to generate the voice, narration and sound effects for the film. I've been using it a lot for a variety of side projects and it's quite easy to use.

Based on what I would have asked Veo 3 to generate, I searched ElevenLabs for the voices I thought made the most sense and typed in the script. You can see all of my generations here. Luckily they're deterministic!

ElevenLabs interface showing voice generation projects and exports

Sound effects are also quite easy to generate with ElevenLabs. Here are some that appear in the film.

ElevenLabs sound effects generation interface showing various audio samples

So really it ends up being a matter of deciding what you need and can get from one tool and compensating with the others.

There is a lot of back and forth refinement. And you can get really creative when editing!

For example, I have this audio clip I used when the window opens and Vera starts to speak. There is a rustling of wind in it. I ended up copying it and repeating it over and over again to give the entire film an atmospheric effect.

My biggest "happy accident" came from ElevenLabs. The wind sound effect I generated for the window opening in scene four contained this subtle whistling that sounded almost like distant voices. Instead of generating clean wind sounds, I:

Copied this "haunted" wind clip 6 times
Adjusted pitch slightly on each (-5%, -3%, 0%, +2%, +4%, +6%)
Panned them across the stereo field

The result: an unsettling atmospheric bed that made most scenes feel ominous. As is the case with AI in my experience, the best creative decisions come from embracing the "mistakes."

Music

One of the most important parts of the film is the music. I used Suno to generate the music. I've been making tons of custom music for side projects and have a simple process for trial and error based generation here.

In this case, I fed the entire script into OpenAI's O3 model and then used the output as the input styling in Suno. Make sure to turn on instrumental mode when doing this.

OpenAI O3 model output for music styling based on The Open Window script

I've used this process a few times before. It only took me a few minutes to get the track that you hear in the final film. It was definitely the easiest part of the entire process. I did slow it down a little bit in Post Production.

Suno music generation interface showing The Open Window soundtrack variations

Post Production

Editing is of course where the real magic happens (as any real filmmaker will likely tell you). As you go, you build up a lot of assets and need to "do stuff with them".

My first cut of the video didn't have audio lip syncing. I also had to figure out how to work around the limitations of the video generation process a number of times where I couldn't get a great scene extension.

This is what the film looks like in CapCut.

CapCut final project showing The Open Window film with all assets and timeline

A few tricks I learned along the way:

I didn't know how to create text overlays but it's very easy to do. You just place the text on the timeline and animate it in
I didn't know how to create end credits. But they're just key frames that are pretty straightforward to create.
I wasn't able to get audio from Veo 3 so I had to lip sync my audio. CapCut is not nearly the best at this but I didn't want to add another tool to the stack. So I used it begrudgingly.

On Lip Syncing

At first I thought I was going to be able to do lip syncing with ElevenLabs but they only offer audio dubbing. Perhaps for the international release I'll use that ☺️.

Lucikly, CapCut makes lip syncing relatively easy to do, but its far from perfect. When you click on a video in the Basic menu, there's a button where you can add audio or text for synchronization.

The tool doesn't come with many advanced options, so I had to cut the audio precisely and trim the video carefully to avoid obvious mismatches. Even with this careful approach, the results still look somewhat artificial but that's part of the current limitations of current AI filmmaking tools in non-expert hands (read: me!).

CapCut's "AI Generate" watermark was my biggest post-production headache. This appears on any video you use lip syncing for. See where I'm pointing to it in the screenshot with the red arrow. My first attempt to remove it with a blur effect made characters look like they had facial injuries. I tried cropping, but lost important visual information. Ultimately, I ended up exporting the video and adding another mask over the entire video to hide it.

CapCut lip syncing interface showing the audio and video to be synced

The Final Result

Below is the first cut (no lip syncing) and the Final Result. I added a few more flourishes and sound effects too... Do you notice them?

In terms of time and cost, I spent about 10 hours on the filmmaking directly. Breakdown by process stage is roughly:

Ideation & Screenplay Development: 1 hour
Video Generation: 2 hours
Editing: 6 hours
Voice & Narration: 1 hour

Total Project Cost: ~$100

Google Flow credits used (8,000 of 12,500): ~$80
ElevenLabs generations (3 unique voices): $5
Suno tracks (tried 4, used 1): $0.25
CapCut Pro (1 month for lip sync feature): $20
My time: 10 hours (but priceless learning experience) and a bunch more writing this post

Pretty impressive if I do say so myself!

The Open Window: First Cut (No lip Syncing)

The Open Window: Final Cut

I've shared "The Open Window" on social media to get feedback from the community. Join the conversation:

Discuss on X Comment on LinkedIn

What Didn't Make the Cut in my Adaptation

Not everything worked on the first (or fifth) try. Here are some of my fav failures:

Framton's dialogue: I tried many times to get this right and be "truer" to the story but it just never sounded right.
The hunting party return: Veo kept generating modern hunters with rifles instead of Victorian-era ones (which is what I was going for).
Mrs. Sappleton's tea service: Every attempt looked like a beer commercial... so no tea in this version of the The Open Window.
Original ending: Tried to show Framton running away in panic, but it looked like a Benny Hill sketch so its a little out of place.

Reflections on AI in Filmmaking

Creating "The Open Window" taught me that AI in filmmaking in 2025 is very much like what is happening with software creation and AI in general. It isn't going to replace traditional filmmaking. The "aha" moment is in the wide democratization of access to creative tools. What struck me most was how the process forced me to become a better storyteller. Every creative decision becomes intentional and subject to a bit of creative emergence.

The biggest lesson: AI tools for film and video are incredible accelerators, but they're terrible at consistency. Google Veo 3 could generate stunning individual shots, but maintaining character continuity across scenes required constant workarounds (Note: this is why many Veo 3 videos you see are single clips with different subjects in each strung together artistically). ElevenLabs delivered perfect voice generation, but lip syncing in CapCut still looked artificial. Each tool excelled in isolation but struggled in coordination.

What surprised me was how much traditional filmmaking knowledge became essential. Understanding pacing, shot composition, and audio design was critical for making AI-generated content feel cohesive - which made me acutely aware of how much I still have to learn. The tools gave me superpowers but I still needed to learn how to use them and have a ton left to learn.

Looking ahead, I'm convinced we're witnessing the birth of a new creative medium. AI filmmaking will create entirely new categories of content and storytellers. The Oscar winner made 100% with AI is inevitable.

For now, though, AI filmmaking remains beautifully imperfect but do follow the AI Video Subreddit because its amazing. The artifacts, the slightly off lip sync and occasional uncanny valley moments are features that remind us we're watching something genuinely new. For me, that's part of what makes it so exciting.

About this article

We're living through a transition that will feel historic very soon. So I documented it - step by step, tool by tool, problem by problem. 10 hours creating "The Open Window" with AI. Many more hours writing about what actually happened. The documentation matters more than the film. This article explores the creative possibilities, technical challenges, and surprising lessons learned when combining Google Veo 3, ElevenLabs, Suno, and CapCut to produce a complete short film.