The Cloud War

Celia Gubler

Ranking of Cloud Service Providers for Machine Learning Solutions


“It all happens in the cloud. So that you don’t have to worry about the infrastructure, but focus on the quality of your solution.”

The promise made by cloud vendors is as valid now as it was almost two decades ago. With an ever-increasing adoption rate, cloud computing has enabled both users and enterprises in the IT world to leverage powerful infrastructures in a cost-efficient, hassle-free manner. The desire to increase their market share is motivating the providers to continually improve their offers. For the cloud service consumers this translates to safer, more reliable and increased performance of their cloud-based solutions. 

In the AI sector, businesses of all sizes also benefit from such rapid development of cloud services. AI tools that used to be available only to a few large enterprises are now available to anybody, from the curious student to the experienced engineer. The only requirement is an internet connection!

But the question of which vendor to choose for a cloud based Machine Learning solution is one that requires thorough research and a deep understanding of what the different ecosystems offer. This ranking guide gives the ML Solution Architect an overview over what we consider to be the most important criteria of reliable cloud service providers. 

Since the market share is dominated by Amazon Web Services, Google Cloud Platform and Microsoft Azure, we are going to focus on them in our comparison.

1. Availability of specialized hardware

Let’s start with the obvious. Big Data requires big (if not huge) computational and storage capabilities. Two of the most sought out cloud resources are Compute Instances and Storage Services. GPU based Compute Instances are often required for Machine Learning, since GPUs allow massive parallel computation due to their high memory bandwidth. Similarly, storage systems also play a crucial role and are optimized for speed in order to keep up with the processing capabilities of the GPUs.

Since different AI projects have different needs, it is mandatory for the ML Solution Architect to have freedom of choice in the customization of their infrastructure components. Additionally, these resources need to be made available in a dynamic, scale on-demand manner, in order to minimize costs and maximize efficiency.

Since all 3 providers have a vast offer of customization at an infrastructure level, we score a tie for this category. 

2. Availability of pre-configured environments

Being able to configure custom machine learning infrastructures is a basic need when building scalable data processing systems. However, when experimenting with new algorithms or building custom models, it is not uncommon for ML engineers to want to spin up a pre-configured environment in minutes. We consider this capability to be a truly important aspect. 

In this sense, all 3 providers offer pre-configured VM instances which use the latest releases of machine learning (and deep learning!) libraries and which run out of the box.

Google’s solution is based on  Debian 9 "Stretch" while Microsoft Azure provides Data Science Virtual Machines based on Linux (Ubuntu 16.04 LTS and CentOS 7.4) and Windows Server 2016. The AWS Deep Learning AMIs are built for Amazon Linux 2018.03, Windows 2016 and multiple Ubuntu 16.04. 

The pre-configured VM instances allow rapid prototyping without worrying about software compatibility issues. Just check out which VM images support your favorite data analytics tools and spin up a fresh machine learning environment!

3. Learnability and Availability of Online Resources

The consumers of ML cloud services are data professionals responsible with designing, implementing and maintaining big data solutions. But even the most experienced Data Freak might be left scratching his head when faced with the multitude of cloud services available.

Confused? Choosing a cloud service for your next Big Data Solution should not be hard! Online resources might be helpful with this matter.
Source: www.theawsblog.com

We consider the availability of online resources to be an important factor in the adoption of a specific provider for your business needs. In the end this translates to how fast you (or the new Data Solutions Architect in your team) will become productive.

How do the three top cloud service providers aid data professionals to quickly master their ML ecosystems?

Tutorials and Quickstart Guides

All three providers make it really simple for engineers to get started with machine learning projects. However, AWS scores a bit better, given its Use Cases section in which successful big data solutions are showcased, including a thorough overview and step-by-step analysis of their architecture. 

Community Forums

The AWS Developer Forums is an active platform for professionals to exchange knowledge and help one another. A forum category for each of the provided cloud services make is really easy to find information. 

Similarly well structured is the MSDN Forum where issues related to Azure cloud services are also well categorized. However, the community seems not to be so tight-night and many of the Azure related topics are spread across multiple online platforms.

The Google Cloud Discuss page on the other hand is still based on the good old Google Groups platform. A bit more digging is required to reach a specific topic, since the cloud services are only roughly gathered into topics.

Developer Training Programmes

As of November 2019, AWS and GCP are the only cloud service providers that offer certifications with a focus on Machine Learning. This time Microsoft Azure falls behind with its limited certification offer

4. MLOps Tooling

What to do when the DevOp Engineer calls in sick

Maybe your business is still iterating to reach the most optimized custom ML infrastructure. Or maybe you haven’t hired that certified DevOp Engineer yet. Or even better, you’re sick and tired of the “Undifferentiated Heavy Lifting”.

Does that mean that your Data Science Team has to take a break? No way!

Amazon SageMaker is a service that allows management of the full ML model life cycle. Without worrying about infrastructure nor software dependency conflicts!

All vendors offer MLaaS (Machine Learning as a Service) support for continuous ML development: Amazon SageMaker, Microsoft Azure ML Services, Google Cloud AI Platform (previously “ML Engine”). This allows data scientists to tune, train and host models without worrying about setting up their virtual environments. They are already installed and usually come with the latest versions of the most popular data science tools. 

An exhaustive comparison of the MLaaS services can be found here. For the purpose of our ranking, however, AWS scores an extra point again: it is the only service that provides built-in ML algorithms.  

Knowing the differences between the three cloud vendors allows you to make better choices for your next ML endeavor. The following summarizes the above ranking and gives an overview of our criteria.

Our criteria for choosing ML cloud service providers.


What do you usually consider when you set up your infrastructure? Let us know in the comments below and maybe we’ll add it to our list :) 

Written by: Paul Anton