Mark Zuckerberg’s Meta isn’t the only company developing an AI-powered program that can generate video out of text inputs. Google has been working on one, too.
On Wednesday, researchers at the company’s AI lab, Google Brain, debuted(Opens in a new window) Imagen Video(Opens in a new window), a program that can create realistic-looking video clips from a text input. The system expands Google’s original Imagen(Opens in a new window) program by moving beyond still images to moving pictures, resulting in creative videos that remain largely consistent throughout each frame.
“We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding,” Google researchers wrote(Opens in a new window) in a paper.
Imagen Video can create 5.3-second, 1,280-by-768 resolution videos running at 24 frames per second. Google’s researchers developed the program by training its computer models to identify videos and still images, which were already labeled with a text description. Imagen Video then tries to replicate that imagery in the form of a video when given a text prompt.
(Credit: Imagen Video)
“While training on natural video data only enables the model to learn dynamics in natural settings, the model can learn about different image styles (such as sketch, painting, etc.) by training on images,” the paper added. “As a result, this joint training enables the model to generate interesting video dynamics in different styles.”
In total, Imagen Video was trained on an “internal dataset” made up of 14 million videos and 60 million still images, along with another 400 million images in the LAION-400M open dataset. Researchers found the program was smart enough to understand three-dimensional objects and settings, “as it is capable of generating videos of objects rotating while roughly preserving structure.”
That said, the results can be far from perfect. Google researchers uploaded some of the videos the program has created, and as you can see, it’ll struggle to accurately render complex movements, such as a panda bear eating some bamboo or naval ships moving at sea.
Recommended by Our Editors
Still, it’s clear Imagen Video could unlock a whole new era of video creation. The program can also produce the video clips in less than a minute. But for now, Google’s researchers are refraining from releasing the technology to the public. The team has already added safeguards to prevent Imagen Video from creating “fake, hateful, explicit or harmful content.” But the researchers are still worried about the technology promoting stereotypes, given that it was trained on limited data set of videos and images.
“While our internal testing suggest much of explicit and violent content can be filtered out, there still exists social biases and stereotypes which are challenging to detect and filter. We have decided not to release the Imagen Video model or its source code until these concerns are mitigated,” the researchers wrote.
Meta, on the other hand, plans on eventually releasing its own text-to-video generator to the public once more testing is done. However, all videos created with the program will contain a watermark.
Get Our Best Stories!
Sign up for What’s New Now to get our top stories delivered to your inbox every morning.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.
Hits: 2