OpenAI is beginning to roll out new voice and image capabilities in ChatGPT

OpenAI is beginning to roll out new voice and image capabilities in ChatGPT. (Image – Shutterstock)

ChatGPT now capable of voice conversations

  • OpenAI to roll out voice and image features on ChatGPT.
  • Users can have real-time voice conversations with the chatbot. 
  • Service to be available to Plus and Enterprise customers. 

It was inevitable, wasn’t it? ChatGPT can now see, hear and speak. That’s right, OpenAI’s generative AI chatbot will not only be able to understand voice commands but also have its own voice to communicate with users.

ChatGPT, already the most popular technology on the planet, will now be capable of handling many more use cases with the new features announced. According to a blog post by OpenAI, the updates offer a new, more intuitive type of interface to users.

Users can snap a picture of anything they want, from landmarks to scenic views, and have a live conversation with the chatbot on what’s interesting about the place.

Apart from voice, ChatGPT users can also now use the technology to get suggestions from pictures it takes. For example, a user can take a picture of their fridge and pantry and ask ChatGPT to figure out what they should have for dinner.

“After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you,” an example shared by OpenAI.

Regarding the technology enabling these features, OpenAI explained that the new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.

“We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text,” stated the company.

Meanwhile, the image understanding is powered by multimodal GPT-3.5 and GPT-4. These models apply their language reasoning skills to a wide range of images, such as photographs, screenshots, and documents containing both text and images.

Can ChatGPT voice and image features be used for the wrong reasons?

“OpenAI’s goal is to build AGI that is safe and beneficial. We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future. This strategy becomes even more important with advanced models involving voice and vision,” stated OpenAI.

However, as with any new technology update, there will definitely be concerns regarding the voice and image features of ChatGPT. The first thing that comes to mind is privacy. If a user can just snap any picture and discuss it with the AI, what’s going to stop someone from snapping a picture of a person or location for nefarious reasons?

While the new voice technology opens doors to many creative and accessibility-focused applications, OpenAI has acknowledged that these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud.

“This is why we are using this technology to power a specific use case—voice chat,” stated OpenAI.

The company also highlighted that before enabling broad deployment for both voice and image, they tested the model with red teamers for risk in domains such as extremism and scientific proficiency.

“We’ve also taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy,” added OpenAI.

Both voice and images in ChatGPT are expected to be rolled out to Plus and Enterprise users over the next two weeks. Voice features are also expected to be available on iOS and Android while images will be available on all platforms.

ChatGPT can now have a conversation with you.

It’s all about AI

The timing of OpenAI’s new voice and image features for ChatGPT couldn’t be better. The entire generative AI industry is currently seeing a boom in investments over the last few days, after what many would feel was a rather slow summer in the industry.

In fact, ChatGPT’s usage  took a slight dip during the summer, as most users were believed to be on vacation – perhaps a trip the chatbot was used to plan.

At the same time, Amazon had also just announced a massive US$4 billion investment in OpenAI’s competitor Anthropic while Google had also just unveiled some new features to its Bard search engine.

Still, OpenAI’s new updates seem to be taking the world by storm again as they did in November last year when they released ChatGPT. While the innovation continues to improve technology and see more use cases being developed, it’s not showing any signs of slowing down.

AI usage in organizations is increasing and businesses will no doubt find ways to incorporate the new features into workflows as well. The only question now is, how much the technology will dictate business decisions, even lifestyles in the future.