With the command and model Sora recently introduced by OpenAI, users can create a short video under one minute with high realism.
After the “craze” surrounding ChatGPT, OpenAI continues to stir the global tech community by introducing the Sora model, capable of generating short videos under one minute with impressive realism using just a few commands.
On social media platform X, many users expressed their amazement at the video quality produced by this new AI model. Not only are the visuals realistic, but many short videos also exhibit physical simulations that closely resemble reality.
“This could be the moment that leaves everyone in awe of AI,” commented Tom Warren, editor at The Verge.
Of course, upon closer inspection, users can still spot some errors in the videos. The videos released by OpenAI are currently very short, under 30 seconds, and longer videos will likely contain more mistakes. However, just a few short videos have allowed many users to envision the contexts in which Sora could be applied, such as illustrative clips.
In addition to creating videos from user descriptions, Sora can also produce videos inspired by any image or extend existing videos or autonomously fill in missing frames.
Sora, in Japanese, means “sky.” The development team behind this technology, including researchers Tim Brooks and Bill Peebles, chose this name because it “evokes the idea of limitless creative potential.”
Unparalleled Realism
On its homepage, OpenAI states that Sora can create videos up to 60 seconds long with high detail scenes, complex camera movements, and numerous characters with vivid emotions.
The Sora model introduced by OpenAI has the ability to generate short videos under one minute with high realism using just a few commands. (Image: OpenAI).
Below, the tech startup illustrates with a command stating: “A bustling Tokyo city covered in snow. The camera moves through the busy streets, following a few people enjoying the beautiful snowfall and shopping at nearby stalls. Beautiful cherry blossom petals swirl in the wind alongside the falling snow.”
After processing, the AI model returns an astonishing video featuring the unmistakable landscape of Tokyo and the breathtaking moment when snowflakes and cherry blossoms appear together in the same frame.
If not scrutinized closely, users would find it challenging to distinguish this as an AI-generated video. The virtual camera, as if mounted on a drone, tracks a couple leisurely strolling through the street scene.
One of the passersby is wearing a mask. Cars rush by on the riverside road to the left, while shoppers on the right enter and exit a row of small shops.
Sam Altman, CEO of OpenAI, created a video for followers with the request “grandma’s cooking class in a Tuscan-style kitchen.” (Image: Sam Altman).
The video generated from basic commands quickly attracted over 30 million views on platform X. In the comments, many users expressed their astonishment at the realism of the video.
CNBC suggests that video could be the next frontier for generative AI after chatbots and image generators have successfully penetrated the consumer and business world.
Aside from exciting AI enthusiasts, this new technology also raises serious concerns about the spread of fake news, especially as major political elections approach globally.
According to data from the machine learning company Clarity, the number of AI-generated deepfakes has increased by 900% compared to 2023.
A Wake-Up Call for the Film Industry
OpenAI, the company behind the ChatGPT chatbot and DALL-E image generation software, is just one of many big names in tech racing to perfect this instant video creation model.
In February 2023, a company named Runway introduced a completely new AI technology that transforms text into video.
Visual effects artist Evan Halleck from the Oscar-winning film Everything Everywhere All At Once even admitted that Runway’s AI tools have optimized his work.
The video generated by AI from the description “a fashionable woman walking down a street in Tokyo.” (Image: OpenAI).
“I can cut characters and arrange them neatly on a still image in minutes compared to half a day of work,” the expert shared.
Experts believe that the application of AI in the entertainment industry is growing significantly. From the development of deepfake technology to AI being used for scriptwriting, artificial intelligence is gradually infiltrating the film production process.
“It’s faster and cheaper than labor. In my view, visual effects is a very time-consuming and labor-intensive process. Therefore, it’s great that everything is being automated,” he continued.
The New York Times suggests that AI could accelerate the work of seasoned filmmakers while completely replacing less experienced digital artists.
While Sora’s footage is undoubtedly impressive, that’s not all it has to offer. Wired notes that the most surprising aspect of the Sora model is its capabilities that it wasn’t initially trained to perform.
Specifically, Sora not only generates videos based on user descriptions, but it also seems to have a clear understanding of cinematic language.
The AI tools from Runway have optimized the work for the visual effects team of Everything Everywhere All At Once. (Image: A24).
Additionally, one feature in Sora that the OpenAI development team has not disclosed is its ability to create videos from a single image or a series of frames.
“This will be a truly exciting way to enhance storytelling. You can accurately draw what you have in mind and then bring it to life,” said Tim Brooks, a research scientist on the project.
According to Bill Peebles, another researcher on the project, OpenAI is aware that this feature could also create misinformation and could be misused.