Robin Li, Co-founder, Chairman and CEO of Baidu delivers speech and presents demos at the ERNIE Bot press conference.

Baidu’s ERNIE Bot hoping to be a worthy competitor to Google and Microsoft

  • ERNIE Bot is a new-generation large language model and generative AI product developed by Baidu.
  • The AI product showcases proficiency in five generative use cases: literary creation, business writing, mathematical calculation, Chinese language understanding, and multi-modal generation.
  • ERNIE Bot is currently only accessible to invited users, and the API will be available upon application to enterprise clients via Baidu AI Cloud.

China has taken a step forward in generative AI with the unveiling of Baidu’s ERNIE Bot. While Microsoft and Google continue to compete with each other on generative AI and improving their products and services, both companies seem to be focusing mainly on the English-speaking market for now.

While there are claims that the tools can support more languages, Microsoft and Google still have one big hurdle ahead of them – China. Currently, both Microsoft and Google do not make all their products available in China. For example, Google Search and Microsoft Bing are not available in the country. However, this does not mean that the Chinese are not taking note of generative AI and its capabilities.

In fact, as Google and Microsoft enhance their products with generative AI capabilities, Chinese companies are also making sure they get a piece of the pie. Several large Chinese tech companies like Baidu, for example, are already working on implementing the technology in their products and services.

The leading AI company in China has just introduced ERNIE Bot, its version of a generative AI product that is a new generation large language model (LLM). ERNIE Bot excels in a range of areas, including understanding the Chinese language and culture, generating literary and business writing, performing complex mathematical calculations, and producing multi-modal content. The AI product can comprehend human intentions and deliver accurate, logical, and fluent responses approaching the human level.

According to Haifeng Wang, CTO of Baidu, ERNIE Bot is a culmination of years of research and industry practices. This new-generation, knowledge-enhanced LLM is built upon Baidu’s in-house models ERNIE (Enhanced Representation of Knowledge Integration) and PLATO (Pre-trained Dialogue Generation Model). Since its release in 2019, ERNIE has evolved from a natural language understanding model into a model platform with cross-language, cross-modal, cross-industry, and cross-task capabilities.

Baidu trained ERNIE Bot using Supervised Fine-Tuning, Reinforcement Learning from Human Feedback, Prompt Learning, Knowledge Enhancement, Retrieval Augmentation, and Dialogue Augmentation.

Currently, ERNIE Bot is only available to an initial group of users with invitation codes and will soon be made available to more users. Baidu is also offering access to the ERNIE Bot API via Baidu AI Cloud, allowing enterprise clients to apply for and harness the platform’s advanced language capabilities. Since February, over 650 enterprises have joined the ecosystem of ERNIE Bot.

ERNIE Bot

ERNIE Bot demonstrates multi-modal generation ability, able to produce text, images, audio and video given a text prompt.

Robin Li, Baidu Co-founder, Chairman and CEO commented, “Baidu envisions a future where we join forces with all to drive the evolution of AI, empowering every individual with access to state-of-the-art productivity tools and ensuring that the benefits of these advancements are shared by all.”

ERNIE Bot combines large language models and generative AI

Large language models and generative AI represent a new technological paradigm, presenting an opportunity that no global enterprise can afford to miss. For Baidu, ERNIE Bot is positioned as a foundational AI empowering platform, designed to facilitate intelligent transformations across various industries, such as finance, energy, media, and public affairs.

During ERNIE Bot’s unveiling in Beijing, Li demonstrated its performance in five scenarios. They include:

  • Literary creation: ERNIE Bot summarized the essential content of the popular Chinese science fiction novel, The Three-Body Problem. It proposed five angles for the potential expansion of the story based on dialogue queries, demonstrating its well-rounded expertise in dialogue, analysis, and content generation, as well as its factuality and reasoning bolstered by inherent knowledge graphs.
  • Business writing: Able to serve as a versatile business copywriter, ERNIE Bot demonstrated its ability to construct a brand from scratch, encompassing tasks such as devising a name for a company, crafting an engaging brand slogan, and drafting press releases. This high-level creative capacity is possible because ERNIE Bot is trained on trillions of web pages, tens of billions of search and image data, hundreds of billions of daily voice data, and a knowledge graph of 550 billion facts.
  • Mathematical calculation: ERNIE Bot also possesses a level of cognitive ability, enabling it to master relatively complex tasks, such as mathematical derivation and logical reasoning. When faced with classic puzzles like the “chicken and rabbit in the same cage” problem, which tests human logical reasoning, ERNIE Bot can understand the meaning of the question, develop a correct problem-solving approach, and follow the proper steps to come up with the correct answer.
  • Chinese language understanding: ERNIE Bot demonstrates unparalleled natural language processing (NLP) capabilities in Chinese, which is reflected in its understanding of the Chinese language and cultural nuances. In a demo, ERNIE Bot explained the meaning behind the idiom “Paper is expensive in Luoyang”, which alludes to the high demand for the paper due to the popularity of poetry. ERNIE Bot expounded on the economic theory underpinning the idiom – the law of supply and demand – and created a poem incorporating the four Chinese characters of the idiom as the first word of each line.
  • Multi-modal generation: ERNIE Bot can produce text, images, audio, and video given a text prompt and is even capable of delivering voice in several local dialects, such as the Sichuan dialect. The video generation features of ERNIE Bot are not yet available to all users due to its relatively high cost.

“Multi-modality is an undeniably future trend for generative AI. In the future, as we continue to refine Baidu’s unified multi-modal large model, ERNIE Bot’s multi-modal generation capabilities will advance,” added Li.

Li also admitted that while the product has some great capabilities, LLM of this type are still far from perfect. This is why continuous improvement based on real-world user feedback remains essential.

Baidu is not the only company admitting this. Both Google and Microsoft have also acknowledged that the technology is still learning and needs time to reach its full potential.

For Li, once ERNIE Bot is put into use, they will establish a mechanism where real-world user feedback, developer calls, and model iterations work in synergy to enhance the model more effectively and efficiently.

Li predicted that LLMs would pave the way for three emerging business opportunities: cloud computing firms offering Model-as-a-Service solutions, companies focusing on fine-tuning sector-specific models, and enterprises creating applications built upon LLMs. He also highlighted that Baidu AI Cloud would soon launch cloud services and application products based on ERNIE Bot, including public cloud and privatized deployment.

“We believe that artificial intelligence (AI) will revolutionize every industry we know today. The immense long-term value of AI and its transformative impact on all aspects of life are only in their infancy. The future holds numerous groundbreaking applications and products, as well as many more milestone events,” Li concluded.

While it is still early days, ERNIE Bot may prove to be a winner in this part of the world and a worthy competitor to Google and Microsoft.