Can DALL-E 3 fix the problem with image generators?
- AI image generators simplify image generation.
- However, concerns about bias remain a problem for AI image-generating tools.
- OpenAI’s DALL-E 3 is expected to fix these problems.
Image generators have been around for some time. In fact, image generators started being used more than a decade ago, although there were many issues with the generated images.
As technology improved, AI image generators soon started seeing more use cases. From mobile phone apps to editing applications, AI image generators could do much more, especially in photo editing. AI image generators could generate higher definition images from those taken with a camera by enhancing certain features in a picture.
Mobile phone companies went all out to highlight the AI image capabilities in their devices. Beautification features on cameras were popular among customers, while animated pictures also started getting more traction.
Despite its success, the AI was still not capable of creating an image based on instructions given to it. Most of the time, the image generated was based on an existing photo. The technology merely re-edits and improves the features of the picture.
Generative AI changed all this. When OpenAI introduced DALL-E in January 2021, the tool aimed to generate images based on text. Simply put, DALL-E could create images in multiple styles by manipulating and rearranging objects in its images.
The generative AI model incorporates ideas from language and image processing. This includes making sense of sizeable data that it’s trained on, whereby the link between visual information and written representation is taught to the model with image-text pairs.
Today, many generative AI image generators, available for free or through a subscription, offer unique capabilities. Examples include Canva, Deep AI, WePik, and many more. DALL-E powers some of these, while others leverage text-to-image tools by Google, Meta, and AWS. For Google, there is Imagen on Vertex AI, while for AWS, there is AI Image Synthesis.
The problems of image generators
While OpenAI’s DALL-E seemed revolutionary, other problems soon arose. As the generative AI model was trained based on a sizeable amount of data, some of the images it generated were somewhat biased and caused concern among users.
Just as facial recognition tools faced racial bias issues when they were first launched, image generators were experiencing issues. Even when OpenAI released DALL-E 2, an improved version of its image generator, user reviews showed that the tool produced images that reflect societal stereotypes.
According to a report by NBC News, the signs of biases include how it stereotyped genders to specific roles. For example, when captioning “a builder,” the generative AI model only produced images featuring men, while the caption “a flight attendant” produced only images of women. Aware of this, OpenAI published a Risks and Limitations document noting that “DALL-E 2 additionally inherits various biases from its training data, and its outputs sometimes reinforce societal stereotypes.”
The biggest concern surrounding image generators is deepfake content. Users will always find ways to use technology for the wrong reasons: right now, some users are using image generators to generate deepfake images of famous people.
Deepfakes of celebrities and politicians continue to be generated and used to spread misinformation and even scam users worldwide. There is also a growing concern about deepfake porn images generated by image generator tools. While OpenAI has taken steps to ensure such images are not generated by its AI models, other image-generating tool providers still enable it.
It’s not all doom and gloom
However, all this did not stop DALL-E 2 from being used by businesses worldwide. Open AI statistics show that over 3 million people are already using DALL-E to extend their creativity and speed up workflows. With over four million images generated daily, OpenAI also enabled DALL-E to be integrated directly into apps and products through their API. This allows developers to start building within the same technology in minutes.
Microsoft, for example, is bringing DALL-E to its Designer application, a graphic design app that helps users create professional quality social media posts, invitations, digital postcards, graphics, and more. Microsoft has also integrated DALL-E into Bing and Microsoft Edge with Image Creator.
In filmmaking, The Frost is the first film to be made entirely by AI. The filmmakers used DALL-E 2 to generate every single shot in the film. Then they used D-ID, an AI tool that can add movement to still images, to animate these shots, making eyes blink and lips move.
These are just some of the capabilities of DALL-E 2. And just as everyone thought they’d seen it all, OpenAI unveils the next version of the technology, DALL-E 3.
Perfecting image generators
According to OpenAI, DALL-E 3 understands significantly more nuance and detail than previous systems, allowing users to translate ideas into images quickly. Like the earlier versions, DALL-E 3 generates images based on detailed prompts.
Improvements in this version allow users to request ChatGPT to tweak an image with just a few words if they are unhappy with the content. For example, a user asks the image generator to generate a picture of the sun. If the user wants the image to look brighter and more majestic, a few more prompts on tweaking it are all needed to generate the desired image.
The new tool will be available to ChatGPT Plus and Enterprise customers in early October. On copyright concerns, OpenAI has stated that images created with DALL-E 2 and 3 belong to the user, and they don’t need OpenAI’s permission to reprint, sell, or merchandise them.
OpenAI has also taken steps to limit DALL-E’s ability to generate violent, adult, or hateful content. “DALL-E 3 has mitigations to decline requests that ask for a public figure by name. We improved safety performance in risk areas like the generation of public figures and harmful biases related to visual over/under-representation in partnership with red teamers—domain experts who stress-test the model—to help inform our risk assessment and mitigation efforts in areas like propaganda and misinformation,” stated OpenAI.
At the same time, OpenAI is researching the best ways to help users identify when an image was created with AI. This includes experimenting with a provenance classifier—a new internal tool that can help determine whether or not an image was generated by DALL-E 3—and hopes to use this tool to understand better how generated images might be used.
“DALL-E 3 is designed to decline requests that ask for an image in the style of a living artist. Creators can now also opt their images out from training of our future image generation models,” added OpenAI.
Other tech companies have also launched similar tools to deal with copyright issues. This includes digital watermarking the content created by generative AI. Google’ SynthID, for example, generates an invisible digital watermark for AI-generated images.
Are dreams collapsing?
Watching the example demonstrated by DALL-E 3 in the video above, the generative AI model creates images based on prompts. But here’s the thing: the technology is developing tools based on what it is being fed and what it believes is the best representation of the prompt.
While this does get the job done, one wonders what happens to the idea of creativity in art. For example, having a kid draw an image of what happiness is to them and getting an AI to generate an image of joy can be very different. While the technology will no doubt generate the correct type of image, is it the image that truly represents happiness to a kid?
Some philosophers believe that relying too much on AI to get work done slowly makes humans forget what it’s like to think more. In the future, humans could eventually rely on technology to do everything for them, from picking what outfits they want to wear planning their schedules, and even deciding what they should eat. This is already happening, as users can ask ChatGPT to do this.
For images, while OpenAI has reiterated that its tools can help artists enhance their image skills, it is still not the same as generating content that is genuinely based on the vision and message of an artist.
At the end of the day, while ChatGPT and DALL-E 3 can do wonders in improving the way we work and communicate, the reality is that the final decision should remain with us. We need to be able to understand that not everything generated by the technology is the absolute. Creativity in image generation comes from one’s vision and not images suggested by technology.
As Michelangelo said, “I saw the angel in the marble and carved until I set him free.” This is something AI image generators may never be able to comprehend or understand.
- Adobe’s Achilles heel: How InDesign became a hacker tool and what other options are out there
- Unprecedented data breaches of the last ten years – and their aftermath
- Adobe products continuously targeted for phishing attacks
- Singapore’s AI strategy 2.0 explained
- Can AMD disrupt Nvidia’s AI reign with its latest MI300 chips?