The Microsoft Azure Maia AI Accelerator is the first designed by Microsoft for large language model training and inferencing in the Microsoft Cloud.

The Microsoft Azure Maia AI Accelerator is the first designed by Microsoft for large language model training and inferencing in the Microsoft Cloud. (Image by Microsoft).

Microsoft finally builds its own chips

  • Microsoft unveils two new chips it designed to support AI infrastructure.
  • The Microsoft Azure Maia AI Accelerator will be optimized for AI tasks and generative AI.
  • The Microsoft Azure Cobalt CPU will be an Arm-based processor tailored to run general-purpose compute workloads.

The demand for AI workloads has seen Microsoft taking matters into its own hands. As companies want better AI infrastructure to support their development of AI use cases, the need to deliver that infrastructure has become a challenge for tech companies around the world.

More AI use cases simply means the need for more compute power. And the need for more computing power means the need for more data centers and chips to process these workloads. But the problem now is, are there sufficient chips capable of doing all this?

While the shortage of chips is normally seen as the reason for difficulties and stalling in the progress of AI, there is also the increasing costs of chips as well as the challenge of making sure everything can work together with minimal complexity. This includes ensuring the cooling systems can support the amount of heat generated from data centers – which, with increased chip complexity, is no longer in any sense a certainty.

For Microsoft, AI will be key for the company’s direction in the future, especially in the areas in which it plans to develop solutions for customers. As such, Microsoft has unveiled two of its own custom-designed chips and integrated systems at its Ignite event. The Microsoft Azure Maia AI Accelerator will be optimized for AI tasks and generative AI,  while the Microsoft Azure Cobalt CPU will be an Arm-based processor tailored to run general-purpose compute workloads on the Microsoft Cloud.

he Microsoft Azure Maia AI Accelerator is the first chip designed by Microsoft for large language model training and inferencing in the Microsoft Cloud.

The Microsoft Azure Maia AI Accelerator is the first chip designed by Microsoft for large language model training and inferencing in the Microsoft Cloud.

“Cobalt is the first CPU designed by us specifically for Microsoft Cloud, and this 64-bit 128-core ARM-based chip is the fastest of any cloud provider. It’s already powering parts of Microsoft Teams, and Azure communication services, as well as Azure SQL. And next year we will make this available to customers,” said Satya Nadella, Microsoft CEO in his keynote address at the event.

“Starting with the Maia 100 design running cloud AI workloads like LLM training and inference, this chip is manufactured on a five-nanometre process, and has 105 billion transistors, making it one of the largest chips that can be made with current technology. And it goes beyond the chip, as we have designed Maia 100 as an end-to-end rack for AI,” added Nadella.

Expected to be rolled out early next year to Microsoft data centers, the chips will initially power the company’s services, such as Microsoft Copilot or Azure OpenAI Service. They will then join an expanding range of products from industry partners to help meet the exploding demand for efficient, scalable and sustainable compute power, and the needs of customers eager to take advantage of the latest cloud and AI breakthroughs.

The Microsoft Azure Cobalt CPU is the first chip developed by Microsoft for the Microsoft Cloud.

The Microsoft Azure Cobalt CPU is the first chip developed by Microsoft for the Microsoft Cloud. (Image by Microsoft)

The Microsoft Azure Maia AI Accelerator and Microsoft Azure Cobalt CPU

As the new Maia 100 AI Accelerator is expected to power some of the largest internal AI workloads running on Microsoft Azure, it only made sense for OpenAI to provide feedback on the development as well.

According to Sam Altman, CEO of OpenAI, the company has worked together with Microsoft in designing and testing the new chip with its models. For Altman, Azure’s end-to-end AI architecture, now optimized down to the silicon with Maia, paves the way for training more capable models and making those models cheaper for customers.

Looking at the hardware stack, Brian Harry, a Microsoft technical fellow leading the Azure Maia team, explained that vertical integration, which is the alignment of chip design with the larger AI infrastructure designed with Microsoft’s workloads in mind, can yield huge gains in performance and efficiency.

Meanwhile, Wes McCullough, corporate vice president of hardware product development at Microsoft pointed out that the Cobalt 100 CPU is built on Arm architecture, a type of energy-efficient chip design, and optimized to deliver greater efficiency and performance in cloud-native offerings. McCullough added that choosing Arm technology was a key element in Microsoft’s sustainability goal. It aims to optimize performance per watt throughout its data centers, which essentially means getting more computing power for each unit of energy consumed.

A custom-built rack for the Maia 100 AI Accelerator and a “sidekick” that cools the chips at a Microsoft lab in Redmond, Washington.

A custom-built rack for the Maia 100 AI Accelerator and a “sidekick” that cools the chips at a Microsoft lab in Redmond, Washington (Image by Microsoft).

Partnership with Nvidia

Apart from the new chips, Microsoft is also continuing to build its AI infrastructure in close collaboration with other silicon providers and industry leaders, such as Nvidia and AMD. With Nvidia, Azure works closely on using the Nvidia H100 Tensor Core (GPU) graphics processing unit-based virtual machines for mid to large-scale AI workloads, including Azure Confidential VMs.

The NC H100 v5 virtual machine (VM) series, which is now available for public preview, is the latest addition to Microsoft’s portfolio of purpose-built infrastructure for high performance computing (HPC) and AI workloads. The new Azure NC H100 v5 series is powered by Nvidia Hopper generation H100 NVL 94GB PCIe Tensor Core GPUs and 4th Gen AMD EPYC Genoa processors, delivering powerful performance and flexibility for a wide range of AI and HPC applications.

Chairman and CEO Satya Nadella and Nvidia founder, president and CEO Jensen Huang, at Microsoft Ignite 2023.

Chairman and CEO Satya Nadella and Nvidia founder, president and CEO Jensen Huang, at Microsoft Ignite 2023. (Image by Microsoft).

Azure NC H100 v5 VMs are designed to accelerate a broad range of AI and HPC workloads, including:

  • Mid-range AI model training and generative inferencing: unlike the massively scalable ND-series powered by the same Nvidia Hopper technology, the NC-series is optimized for training and inferencing AI models that require smaller data size and a smaller number of GPU parallelism. This includes generative AI models such as DALL-E, which creates original images based on text prompts, as well as traditional discriminative AI models such as image classification, object detection, and natural language processing focused on the accuracy of prediction rather than the generation of new data.
  • Traditional HPC modeling and simulation workloads: Azure NC H100 v5 VMs are also an ideal platform for running various HPC workloads that require high compute, memory, and GPU offload acceleration. This includes scientific workloads such as computational fluid dynamics (CFD), molecular dynamics, quantum chemistry, weather forecasting and climate modeling, and financial analytics.

Nvidia also introduced an AI foundry service to supercharge the development and tuning of custom generative AI applications for enterprises and startups deploying on Microsoft Azure.

The Nvidia AI foundry service pulls together three elements — a collection of Nvidia AI foundation models, Nvidia NeMo framework and tools, and Nvidia DGX Cloud AI supercomputing services — that give enterprises an end-to-end solution for creating custom generative AI models. Businesses can then deploy their customized models with Nvidia AI Enterprise software to power generative AI applications, including intelligent search, summarization and content generation.

“Enterprises need custom models to perform specialized skills trained on the proprietary DNA of their company — their data,” said Jensen Huang, founder and CEO of Nvidia. “Nvidia’s AI foundry service combines our generative AI model technologies, LLM training expertise and giant-scale AI factory. We built this in Microsoft Azure so enterprises worldwide can connect their custom model with Microsoft’s world-leading cloud services.”