Create amazing content with text-to-video.

A screenshot of a video generated by OpenAI’s Sora. (Source – OpenAI).

OpenAI introduces text-to-video

  • OpenAI unveils Sora, a text-to-video model.
  • Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.
  • How are copyright issues dealt with in the new twist on the ever-evolving technology?

While text-to-image has been around for some time, generative AI enabling text-to-video is still slowly being developed to perfection by tech companies around the world. Text-to-video involves users prompting a text to an AI and having the video generated within minutes.

In the past year, several tech companies have been developing text-to-video capabilities. Today, some applications can scan articles and documents and such and generate videos from those inputs. But most of the clips generated involve stock footage that has been tagged to match certain keywords in the article.

Content developers have started using these tools to improve their deliveries and to give themselves more options to work on. Yet, there was still one problem. Some of the videos used had copyright issues that needed to be addressed.

In fact, AI-generated content continues to face copyright issues – because it eats everything, and as leading companies have said, it needs to take in copyrighted material in order to do the jobs we ask it to do. As AI-generated content is a new and evolving area of creativity that poses many challenges and opportunities for intellectual property (IP) law, there is as yet no definitive solution for this, though the lawyers thank you for your ethical dilemmas.

Different jurisdictions may have different approaches and interpretations of existing laws when it comes to AI-generated content. Some of the main issues that arise are:

  • Authorship and ownership: who is the author and owner of AI-generated content? Is it the human who designed, trained, or used the AI system? Is it the AI system itself? Or is it no one?
  • Infringement and liability: does AI-generated content infringe on the rights of existing human or non-human creators? How can infringement be detected and prevented? Who is liable for any damages caused by AI-generated content? How can liability be allocated and enforced?
  • Ethics and fairness: does AI-generated content respect the moral and economic interests of human or non-human creators? How can AI systems be designed and used in a way that promotes ethical and fair practices in the creative industries? How can AI systems be made transparent and accountable for their actions and outputs?

More recently, tech companies working on AI-generated content have agreed to watermark these images. Unfortunately, digital watermarks may not be enough to protect users from deepfakes and such.

Looking at AI-generated video, there is more concern about how text-to-video could lead to the creation of more deepfake content as well. While tech companies have assured that the models are blocked from generating such content, there are still some who find a way to get around this.

OpenAI's Sora is a text-to-video model.

OpenAI’s Sora is a text-to-video model.

OpenAI’s text-to-video tool

OpenAI, the creator of ChatGPT and image generator DALL-E is now testing a text-to-video model called Sora that would allow users to create realistic videos with a simple prompt.

According to a blog post by OpenAI, the new platform is currently being tested. The company also released a few videos of what it said was already possible, with the accompanying input made to generate the video.

“Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” OpenAI stated.

OpenAI explained that the model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately portrays characters and visual style.

OpenAI also acknowledged that the current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.

“The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory,” explained OpenAI.

In terms of availability, OpenAI CEO Sam Altman on X said the company was “offering access to a limited number of creators” in a testing phase. He also invited users to suggest prompts on X, the convincing results of which he posted on the platform a few moments later.

These included a short video of two golden retrievers podcasting on a mountain. Another showed a “half duck half dragon (that) flies through a beautiful sunset with a hamster dressed in adventure gear on its back.”

Safety and copyright concerns

Given the increasing concerns on privacy, safety and copyright with AI-generated content, OpenAI also highlighted that they’ll be taking several important safety steps ahead of making Sora available in OpenAI’s products.

The company is currently working with red teamers, which include  domain experts in areas like misinformation, hateful content, and bias ,  who will be adversarially testing the model. OpenAI is also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora. The company plans to include C2PA metadata in the future if they deploy the text-to-video tool in an OpenAI product.

“In addition to us developing new techniques to prepare for deployment, we’re leveraging the existing safety methods that we built for our products that use DALL·E 3, which are applicable to Sora as well.”

“For example, once in an OpenAI product, our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others. We’ve also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies before it’s shown to the user,” the company mentioned.

OpenAI will also be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology.

“Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time,” it concluded.