Veo AI Video Generator: Google’s Text-To-Video AI Model
On 14 May 2024, was a crazy day for Google as they hosted their I/O conference, which is an event where they push out new products and innovations. During that conference, they released a new model called ASTRA, which is an advanced seeing and speaking responsive agent. Google also announce their new text to video model called VEO.
There’s a video that I already published, so I definitely recommend that you take a look at it. It’s Google’s push to build the future of AI assistance.
What is Veo Video Generator?
Another major release Google announced was their new generative video model. This is a direct competitor to OpenAI’s SORA model. I’d like to introduce Veo, Google’s most capable generative video model. It’s able to create high-quality 1080p clips that can go beyond 60 seconds.
Demo Clips Showcase
Just take a look at these demo clips. The prompt that was given to generate this video clip was “Many spotted jellyfish pulsating under water. Their bodies are transparent and glowing in deep ocean.”
For this clip, the prompt was “time lapse of water lily opening, dark background,” and you can see that this is just an amazing generation of a 1080p clip of this new lily opening up.
The prompt for this clip was “a lone cowboy rides his horse across an open plain at beautiful sunset, soft light, warm colors.” This is quite remarkable and amazing to see.
Veo AI: Advancing Video Generation
Veo is definitely an advanced video generation model developed by Google, capable of producing high-quality resolution videos in various cinematic styles.
It surpasses the traditional one-minute limit and excels in understanding natural language as well as visual semantics.
Understanding User Prompts
This allows it to accurately interpret user prompts and render detailed footage like we saw previously. The footage will be able to align with your own creative vision.
Revolutionizing Filmmaking with AI
Well, I’ve been interested in AI for a couple of years now. We got in contact with some of the people at Google, and they had been working on something of their own. So, we’re all meeting here at Gilgal Farms to make a short film.
Google DeepMind’s Generative Video Model
The core technology is Google DeepMind’s generative video model that has been trained to convert input text into output video. It looks good. We are able to bring ideas to life that were otherwise not possible. We can visualize things on a timescale that’s 10 or 100 times faster than before.
Enhanced Creativity and Iteration
When you’re shooting, you can iterate as much as you wish, and so we’ve been hearing feedback that it allows for more optionality, more iteration, more improvisation. What’s cool about it is you can make a mistake faster. That’s all you really want at the end of the day, at least in art, just to make mistakes fast.
Using Gemini’s multimodal capabilities to optimize the model training process, Veo is able to better capture the nuance from prompts. This includes cinematic techniques and visual effects, giving you total creative control.
Empowering Everyone as Directors
Everybody’s going to become a director, and everybody should be a director because at the heart of all of this is just storytelling.
The closer we are to being able to tell each other our stories, the more we’ll understand each other. These models are really enabling us to be more creative and to share that creativity with each other.
How to Get Access to Veo?
Now guys, if you are interested in trying Veo, you can definitely do so. You can sign up to try Video FX by clicking on this button.
Accessing AI Test Kitchen
It will take you over to the AI Test Kitchen, which is a way for you to sign up and get access to different waitlists for AI projects that Google provides.
Join the Waitlist
What you can do is click on “Join our waitlist.” Once you have done that, you can go over to their docs page where you just need to simply put in your basic information such as your name, last name, email, where you found us, country, age, and what you’re going to be doing with this.
Await Confirmation
This way, you’ll be able to gain access by getting an email back from Google DeepMind. This will take approximately a week, a couple of days, or even a couple of months depending on who they’re giving access to.
Veo’s Integration with YouTube Shorts
Something cool to note is that this is going to be coming to YouTube Shorts, which is quite interesting regarding what you can actually do with this.
Veo’s AI Models
To simply put, this whole model is built upon drawing various generative AI models such as Generative Query Network, DVD-GAN, image generation, video, and many others.
You also have Google’s Transformer architecture as well as Gemini.
Enhanced Prompt Understanding
To make Veo better at understanding prompts, they enhanced the details of the captions of each video it learns from.
This uses high-quality compressed representations, making the videos more efficient and improving the overall quality of the generative videos.
This is the basic flow of how it works through different encoders and Transformers.
Veo vs. Sora:
Feature | Veo | Sora |
---|---|---|
Organization | Google DeepMind | OpenAI |
Architecture | Various AI models (e.g., GQN, DVD-GAN, Transformer) | Proprietary architecture |
Integration | YouTube Shorts | Not specified |
Enhanced Prompt Understanding | Yes | Not specified |
Availability | Access through waitlist | Not specified |
Testing | Future tests anticipated | Continuous development and testing |
User Engagement | Await access to explore potential | Potential for experimentation |
Conclusion:
I hope you enjoyed today’s article and got some value from learning about this new generative AI video model.