Last Update: Sep 04, 2024 | Published: Feb 23, 2018
In a recent post on the Google Cloud Platform blog, Google announced the beta availability of its new Cloud TPUs, which were designed to speed up various machine learning processes that are programmed using the open-source TensorFlow framework.
These cloud-based TPUs, or tensor processing units, can provide a high level of performance, up to 180 teraflops of floating-point performance with 64GB of RAM. This makes it easier for those working with machine learning algorithms, frameworks, and processes to get things done quicker and more efficiently than what they may have been previously used to.
Given that configuring and using a Cloud TPU is easier than setting up a traditional high-performance machine or supercomputer, users are able to fine-tune their machine learning processes much quicker than perhaps they were able to on native hardware.
With the ability to quickly set up a Cloud TPU, users can use multiple units for training different variants of the same machine learning model, thus saving organizations time when it comes to training and then choosing the best model variant. Google also enables users to use a custom Google Compute Engine VM to access its network-attached Cloud TPUs.
What’s more is that Google will be providing users with the ability to connect multiple Cloud TPUs together into what the company refers to as a “TPU pod”. Depending on how many TPUs are linked together, a TPU pod is capable of delivering multiple petaflops of floating point performance. While this feature is not currently available on the Google Cloud Platform, the amount of power that users will have access to, should they wish to link individual Cloud TPUs together, is impressive. All this while likely still being cheaper than using on-site hardware for the same purpose.
For users that would like to begin using Cloud TPUs, Google has provided a set of open-source reference model implementations, including the ResNet-50, Transformer, RetinaNet, and other models, with others to come over time.
Cloud TPU pricing starts at $6.50 per TPU per hour with usage being billed by the second. However, there is currently a limited quantity of Cloud TPUs available with Google providing an online form for those interested in requesting a TPU quota.
The ability to simultaneously train multiple ML model variants, as well as the ability to scale up performance by linking multiple Cloud TPUs together, is something that can provide some significant improvements in the amount of work that can be accomplished by machine learning researchers and others working in the field.
And what may have involved expensive supercomputers and specialized hardware in the past, can now be done using cloud-based services at a fraction of the cost and fraction of the time of building and configuring an on-site supercomputer. This, coupled with Google’s open-sourcing of several machine learning model implementations can help lower the barrier of entry to machine learning projects to organizations, universities, or even individuals who are interested in the field but who may not be able to afford such expensive hardware investments to get started.
Machine learning used to be a field that required highly customized computers and applications, which often came with very high costs. However, with the advent of services like Google’s Cloud TPUs, machine learning projects can now be undertaken at a fraction of the cost with users only being required to pay for the resources and services they use. With advances in cloud technology like this, resource-intensive processes like machine learning are being made available to many more people than previously had access. This will potentially lead to some great projects or even advancements in the machine learning field.