How to optimize your ML development process based on the problem you’re trying to solve.

production machine learning

People often talk about integrating machine learning into their business. But, this conversation is often driven only as part of a discussion on the strategy a company may want to take or the results that stakeholders want for a given problem.

This means that ‘machine learning’ is often used as a blanket description for the means of solving a problem, without reflecting the nature of that problem. Systematically, and I would argue as a consequence, achieving production status for a machine learning solution is neglected as an afterthought.

This blog will flesh out examples of how machine learning solutions can actually take on fundamentally different forms, and how these forms lead to practically different (and sometimes cross-purpose) production scenarios with radically different pain-points.

Estimates as a service

Let us take as our reference model the simple linear regression model

Ŷ = β X + α

In the estimates as a service framework, your core responsibility is a set of estimates Ŷ. A concrete example of this would be predicting the market close prices of every stock on the NASDAQ. In this use case, the domain for your model outputs is static or slowly changing. The model, which we can just refer to as β for now, is essentially arbitrary. The core delivery for your system is a constrained set of estimates, that are meant to be as accurate as possible, and essentially nothing else.

The main production questions you need to answer in this scenario are:

  • How often are my estimates updated?
  • How are my estimates versioned?
  • How stable is my model across estimate versions?

In this scenario you do not necessarily operate in real-time, so estimates can be hosted in some database and vended via a simple micro-service. Performance (latency, throughput) should not be a major issue since you may be essentially just vending values. The pain-point for this scenario is synchronous (deployment) and asynchronous (research) validation of estimates. Overall though, it should be considered a relatively manageable situation to be in, as there is less risk of fundamental failure.

Model as a service

In the model as a service framework, your core responsibility is to vendor as robust a model β as possible. The input domain for your problem is known to be mostly unbounded. For example, if you are Facebook trying to deploy automatic photo tagging, you understand you can’t really put concrete bounds on the domain of photos that you have to make predictions for.

The unbounded nature of the domain for a model as a service platform is one of the major pain-points. This is essentially the training data vs testing data problem played out in the real world.

Another pain-point is performance. Surfacing a set of estimates on a bounded domain is relatively easy. Directly surfacing your model requires consideration of resource management, latency, and throughput. These performance considerations interact and conflict with model choice.

The main production questions you need to answer in this scenario are:

  • Do I have sufficient computational resources to manage demand for my service?
  • Am I confident that my model is robust to expected user behaviour?

The model as a service problem is solved by heavy engineering and research investment. Performance tradeoff in accuracy and throughput for model variations needs to be thoroughly researched and understood.

Workflow as a service

In the workflow as a service framework, your core responsibility is to take ownership of, and service, a platform for model training and estimate generation. You are responsible for being able to explore the model choice β efficiently in order to produce a good set of estimate Y for a new clients dataset X. In this use case, the skills of your machine learning researchers and data scientists are offered on-demand to solve entirely new problems (although perhaps restricted to a given domain). The pain points in this scenario are to automate as much of your workflow as possible in order to achieve efficient turn around. The workflow as a service platform solves the problem of consultancy and building new client relationships.

Often, companies or teams will make the mistake of solving the model as a service problem, when their current level of maturity is more suited to a workflow as a service platform. The mistake is essentially in assuming that solving the model as a service problem automatically solves the workflow as a service problem. This is not really the case. Having one bespoke model and pipeline does not really help you generalize to new domains or new clients.

Turnaround in the workflow as a service framework is the most essential driver of success. This translates to the ethos that: if you are growing a business, or a team within a large company, it is important to make sure you have an ecosystem in place to generate models to solve problems before over-investing resources in a single deployment strategy.

As an additional requirement, you need to solve the problem of data processing and exploration on top of model generation.

The main production questions you need to answer in this scenario are:

  • Can I reliably extend or alter my current tooling?
  • How do I validate the performance of a new model?
  • Is my platform easy for new employees to use?

These can actually be quite difficult architectural problems to solve. The best approach is to focus on modularity and usability. Some key investments that will pay dividends are investing in

  • Making resource provisioning easy
  • Modularizing data processing and machine learning tooling
  • Prioritizing API stability

The key takeaway from this post should be to understand that machine learning solutions come in different forms, and often with opposing requirements to achieve production. My recommendation for team leaders and business owners would be to always define your problem scope before sinking time into machine learning development. Invest resources based on the form of the problem you are really trying to solve.