Microsoft is planning to launch a new AI infrastructure service called “Singularity.” In a research paper published yesterday titled “Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads,” the company explained that this new AI platform service should help to minimize the cost of artificial intelligence (via ZDNet).
With Microsoft’s Singularity distributed infrastructure service, data scientists will be able to build, scale, experiment, as well as iterate on deep learning models without compromising performance. We’ll spare you the technical details, but the new service uses thousands of GPUs and AI accelerators to prioritize and manage different workloads.
“Singularity is a fully managed, globally distributed infrastructure service for AI workloads at Microsoft, with support for diverse hardware accelerators. Singularity is designed from the ground up to scale across a global fleet of hundreds of thousands of GPUs and other AI accelerators,” the Microsoft Azure and Research explained. “Singularity is built with one key goal: driving down the cost of AI by maximizing the aggregate useful throughput on a given fixed pool of capacity of accelerators at planet scale, while providing stringent SLAs for multiple pricing tiers.”
As it turns out, Singularity was also able to eliminate the need to restart the DNN (Deep neural networks) training process from scratch as a result of an unexpected system failure. Microsoft says that the process can be resumed in exactly the same state it was in when preempted. Overall, the Singularity service is a big step forward, and it should help to reduce the time and effort required to train the machine learning models.
Microsoft has significantly increased its investments in artificial intelligence and Azure in the past few years. In 2019, the company announced a $1 billion investment in OpenAI and expressed its intention to become its preferred partner for commercializing new AI technologies.