Extending from the first two articles in Credera’s machine learning (ML) series, Machine Learning Essentials and Introduction to Microsoft Machine Learning Tools, we now turn our attention to how ML can drive results for businesses via example use cases.
In a data driven world, understanding the applications for data via advanced technologies like ML helps to frame how best to use it. Along these lines, companies will often try to force the latest technologies into their organization without truly understanding the scenarios they are designed to address. As a result, with ML, it is critical to focus on the quality, quantity, and granularity of data that support a given objective. These three elements, as shown below, drive the appropriate use of ML tooling. So, when it comes to ML, data is currency and the more data you have the more return on your investment you can get. One of the most valuable forms of data is prior experience, and in this blog post we would like to give you an inside look at two of our ML experiences.
use case #1: utilization forecasting problem
One common use case for ML is forecasting, or in other words, predicting outcomes. While working on a utilization forecasting model for a financial services client, a model was needed to predict if a set of computer servers was going to be at capacity within six months. This was important because it took six months to procure new servers to supplement the maxed-out servers. Previously, the company had been using a simple linear regression model (example shown below) to assess the capacity of the servers.
Source: Stock Charts
With this technique, and monitoring results over time, the company was only able to predict with 50% accuracy if a server would be at capacity within the allotted time frame. In the world of forecasting, 50% is not adequate. A better algorithm had to be explored and utilized.
Several ML regression algorithms were explored with no successful outcome because the data source wasn’t of high enough quality or granular enough. This led us to explore an old statistical forecasting method designed to discover trends in time-series data: autoregressive integrated moving average (ARIMA). ARIMA uses historical data to forecast the future values, and returns a cone of certainty (think of the potential path of a hurricane, for example) that will have an upper limit, lower limit, and the expected value. The following displays a diagram of an ARIMA model that is leveraging the utilization data from January through May to predict utilization through June, July, and August:
Server Problem Outcome
In this instance, the initial ML technique was not a good choice because the data source being used only met one of the three characteristics (quality and not quantity or granularity) required for the ML model to be successful. However, the project was still successful because we found an alternative (ARIMA) that could accurately predict with 90% accuracy if a cluster would be at capacity within six months. This enabled the company to manage its purchases of servers accordingly, and as a result saved significant IT expense in the process.
use case #2: headcount prediction problem
While working for a professional services company, we were tasked with creating a predictive model to determine headcount (hiring) needs based on expected revenue. In this instance, there was high quality, granular data that would be able to drive the model. Yet, with a lack of quantity, the underlying algorithm would have to be carefully selected to maximize the input of the small dataset available. Knowing that the problem required us to determine levels of something, in this case headcount and revenue, we knew that a regression algorithm was required.
The key for this project was determining what type of regression algorithm should be used. In the early stages of the project, we tried multiple different regression algorithms because there is no one-size-fits-all model. After trying nearly a dozen approaches, the team selected a boosted decision tree regression model (BDTR) because it performed so well with the given dataset.
From a high level, BDTR operates very similarly to a jury selection process, complete with a screening method (the boosted training process), individuals (the decision trees), and juror difference of opinions (the skews) who are required to collectively come to an impartial verdict (the prediction outcome). This means that the model could find the underlying facts based on the hundreds of decisions that were collectively made.
As a reminder, and noted in the first ML entry Machine Learning Essentials, there are essentially two types of ML algorithms, regression and classification. The boosted decision tree approach is very versatile because it can be used for regression or classification. The only difference between the two is how the model comes to a verdict; for classification it uses majority rules and for regression it typically takes the average value of the predictions.
Hiring Problem Outcome
The flow of this ML model is not unlike that of any data process, it starts with data, progresses through joining of the data and isolating fields (columns) which then feed the underlying model. In the end, this model, while straightforward, had dynamic results because of the algorithm selected. For this scenario, the algorithm not only produced good results, but it had the effect of amplifying a relatively small dataset. As a result, the company was able to better determine hiring needs and expected revenue using the ML model that was created. Going forward, this model will be further tuned to achieve results even better than the initial 90%.
machine learning benefits
As shown in our examples, ML is a tool that can save you time and money. But using it also comes with the potential risk of over-reliance on ML without truly understanding the underlying problem, data, and algorithm that is being applied. With continued advancement in computing power, data management, and ML understanding, ML will likely become commonplace for all businesses with a data strategy.
If your company would like help navigating the ML space, feel free reach out to us at firstname.lastname@example.org. We’d love to help as you think through how your company can gain a competitive advantage through machine learning.