Over the past 30 years, the demand for machine learning and artificial intelligence solutions has increased exponentially. Now more than ever, the ability for companies to quickly and accurately predict the needs and wants of their target audience is vital to being first to market and maintaining relevance in a fast paced and unpredictable economy. Machine learning is the best way to make these predictions that leads companies to success. Nearly every company can benefit from incorporating machine learning into their products, utilizing tools like chatbots to help customers instantly, measuring click-through rates to optimize advertising efforts, and tracking customer purchases to make accurate recommendations.

The increase in popularity in the machine learning market has not come without some significant growing pains, including difficulty with scalability, data management, versioning, continuous integration, reliability, and deployment, just to name a few. Businesses began to feel these growing pains, and a search for a solution began.

The concept of MLOps was first explored in a 2015 paper titled “Hidden Technical Debt in Machine Learning Systems”. This paper described many of the problems that occurred as machine learning projects grew and evolved and their effects on the outcome of the products and models being produced. Configuration management, data dependencies, and system-wide anti-patterns were determined to be some of the main risk factors that indebted development teams. While these issues have been longstanding in machine learning development, the lack of solutions became more glaring due to DevOps also gaining traction during this time. “Well, other projects are using DevOps to solve their problems, why can’t we?” became a sentiment of machine learning teams.

While DevOps is the tried-and-true application development lifecycle solution, MLOps made its way into popularity as it was discovered that DevOps concepts and principles do not cover the unique needs and requirements posed by machine learning projects. There is a vital part of the development lifecycle that isn’t accounted for in DevOps – managing data and training models.

What is MLOps?

MLOps combines Machine Learning (ML) and DevOps. Just as DevOps delivers applications at a high velocity and shortens the development lifecycle, MLOps layers machine learning concepts on the existing DevOps principles, creating a streamlined process that plays the same role in the development, training, and release of machine learning applications and models.

MLOps also emphasizes and encourages collaboration between machine learning engineers, data scientists, and operations teams that are responsible for the infrastructure training, testing, and deploying of the models. This collaboration allows for the standardization of the lifecycle of machine learning models and aims to deploy and maintain reliable and efficient models for production.

Key Phases of MLOps

Data gathering and analysis
Data transformation and preparation
Model training and development
Model validation
Model serving
Model monitoring
Model re-training

Source: Diego Gosmar, 2020

What Will MLOps Do for Me? 

  1. Provides a streamlined deployment process. In machine learning, “deployment” is not simply the product arriving at the end user. MLOps ensures that the model not only goes through continuous integration and continuous delivery, but also through the additional steps of continuous training and continuous monitoring. All of these aspects are vital to the deployment of a successful machine learning model and MLOps creates an environment where these processes are all streamlined.
  2. Improve collaboration. Research surrounding the success or failure of machine learning projects has consistently pointed to the lack of collaboration as a major culprit in the failure of a project. MLOps promotes a culture of collaboration, where data scientists, machine learning engineers, and operations engineers are all encouraged to contribute their expertise and rely on the expertise of others to eliminate waste and automate the lifecycle as much as possible.
  3. Increase reliability. MLOps uses automatic model re-training and is configured to automate the screening of the training data to assess for quality and drift. Creating a repeatable process where testing and validation is automated creates a more reliable and credible machine learning model.
  4. Easily enforce data traceability. In MLOps, versioning both the training data and modeling of the data is emphasized, which allows for model lineage that can evidence how models were trained and what data was used. Enforcing traceability creates machine learning models that are auditable and reproducible.
  5. Provides monitoring. The quality of monitoring and metrics can have a huge impact on a machine learning model. MLOps can use high quality monitoring to detect data drift, dependency changes, data invariants, and computational performance. Machine learning model training is inherently computationally heavy – which can get expensive in the day and age of cloud computing. By utilizing monitoring, unnecessary steps and computations can be identified and eliminated.

What to watch out for?

The harsh truth is nearly 85% of machine learning projects fail to deliver (Gartner, 2017). These failures are often attributed to a lack of cooperation and collaboration with operations. Using the established standards of MLOps, you can ensure that your machine learning project is part of the 15% that succeeds.

 

Sources:

  1. Gartner Says Nearly Half of CIOs Are Planning to Deploy Artificial Intelligence. February 13, 2018. https://www.gartner.com/en/newsroom/press-releases/2018-02-13-gartner-says-nearly-half-of-cios-are-planning-to-deploy-artificial-intelligence
  2. MLOps Scalability. January 2, 2021. https://www.gosmar.eu/machinelearning/2021/01/02/mlops-scalability