Lorem ipsum dolor sit amet, consectetur adipiscing elit lobortis arcu enim urna adipiscing praesent velit viverra sit semper lorem eu cursus vel hendrerit elementum morbi curabitur etiam nibh justo, lorem aliquet donec sed sit mi dignissim at ante massa mattis.
Vitae congue eu consequat ac felis placerat vestibulum lectus mauris ultrices cursus sit amet dictum sit amet justo donec enim diam porttitor lacus luctus accumsan tortor posuere praesent tristique magna sit amet purus gravida quis blandit turpis.
At risus viverra adipiscing at in tellus integer feugiat nisl pretium fusce id velit ut tortor sagittis orci a scelerisque purus semper eget at lectus urna duis convallis. porta nibh venenatis cras sed felis eget neque laoreet suspendisse interdum consectetur libero id faucibus nisl donec pretium vulputate sapien nec sagittis aliquam nunc lobortis mattis aliquam faucibus purus in.
Nisi quis eleifend quam adipiscing vitae aliquet bibendum enim facilisis gravida neque. Velit euismod in pellentesque massa placerat volutpat lacus laoreet non curabitur gravida odio aenean sed adipiscing diam donec adipiscing tristique risus. amet est placerat in egestas erat imperdiet sed euismod nisi.
“Nisi quis eleifend quam adipiscing vitae aliquet bibendum enim facilisis gravida neque velit euismod in pellentesque massa placerat”
Eget lorem dolor sed viverra ipsum nunc aliquet bibendum felis donec et odio pellentesque diam volutpat commodo sed egestas aliquam sem fringilla ut morbi tincidunt augue interdum velit euismod eu tincidunt tortor aliquam nulla facilisi aenean sed adipiscing diam donec adipiscing ut lectus arcu bibendum at varius vel pharetra nibh venenatis cras sed felis eget dolor cosnectur drolo.
People often talk about integrating machine learning into their business. But, this conversation is often driven only as part of a discussion on the strategy a company may want to take or the results that stakeholders want for a given problem.
This means that ‘machine learning’ is often used as a blanket description for the means of solving a problem, without reflecting the nature of that problem. Systematically, and I would argue as a consequence, achieving production status for a machine learning solution is neglected as an afterthought.
This blog will flesh out examples of how machine learning solutions can actually take on fundamentally different forms, and how these forms lead to practically different (and sometimes cross-purpose) production scenarios with radically different pain-points.
Let us take as our reference model the simple linear regression model
Ŷ = β X + α
In the estimates as a service framework, your core responsibility is a set of estimates Ŷ. A concrete example of this would be predicting the market close prices of every stock on the NASDAQ. In this use case, the domain for your model outputs is static or slowly changing. The model, which we can just refer to as β for now, is essentially arbitrary. The core delivery for your system is a constrained set of estimates, that are meant to be as accurate as possible, and essentially nothing else.
The main production questions you need to answer in this scenario are:
In this scenario you do not necessarily operate in real-time, so estimates can be hosted in some database and vended via a simple micro-service. Performance (latency, throughput) should not be a major issue since you may be essentially just vending values. The pain-point for this scenario is synchronous (deployment) and asynchronous (research) validation of estimates. Overall though, it should be considered a relatively manageable situation to be in, as there is less risk of fundamental failure.
In the model as a service framework, your core responsibility is to vendor as robust a model β as possible. The input domain for your problem is known to be mostly unbounded. For example, if you are Facebook trying to deploy automatic photo tagging, you understand you can’t really put concrete bounds on the domain of photos that you have to make predictions for.
The unbounded nature of the domain for a model as a service platform is one of the major pain-points. This is essentially the training data vs testing data problem played out in the real world.
Another pain-point is performance. Surfacing a set of estimates on a bounded domain is relatively easy. Directly surfacing your model requires consideration of resource management, latency, and throughput. These performance considerations interact and conflict with model choice.
The main production questions you need to answer in this scenario are:
The model as a service problem is solved by heavy engineering and research investment. Performance tradeoff in accuracy and throughput for model variations needs to be thoroughly researched and understood.
In the workflow as a service framework, your core responsibility is to take ownership of, and service, a platform for model training and estimate generation. You are responsible for being able to explore the model choice β efficiently in order to produce a good set of estimate Y for a new clients dataset X. In this use case, the skills of your machine learning researchers and data scientists are offered on-demand to solve entirely new problems (although perhaps restricted to a given domain). The pain points in this scenario are to automate as much of your workflow as possible in order to achieve efficient turn around. The workflow as a service platform solves the problem of consultancy and building new client relationships.
Often, companies or teams will make the mistake of solving the model as a service problem, when their current level of maturity is more suited to a workflow as a service platform. The mistake is essentially in assuming that solving the model as a service problem automatically solves the workflow as a service problem. This is not really the case. Having one bespoke model and pipeline does not really help you generalize to new domains or new clients.
Turnaround in the workflow as a service framework is the most essential driver of success. This translates to the ethos that: if you are growing a business, or a team within a large company, it is important to make sure you have an ecosystem in place to generate models to solve problems before over-investing resources in a single deployment strategy.
As an additional requirement, you need to solve the problem of data processing and exploration on top of model generation.
The main production questions you need to answer in this scenario are:
These can actually be quite difficult architectural problems to solve. The best approach is to focus on modularity and usability. Some key investments that will pay dividends are investing in
The key takeaway from this post should be to understand that machine learning solutions come in different forms, and often with opposing requirements to achieve production. My recommendation for team leaders and business owners would be to always define your problem scope before sinking time into machine learning development. Invest resources based on the form of the problem you are really trying to solve.