Back

StrategyJan 28, 2020

Predicting the Future: 3 Examples of Predictive Analytics Algorithms

Shawnasty Bankovich

Imagine if you could know every move your customers would make before they make them. For example, if you knew a customer who bought marshmallows would also buy chocolate and graham crackers, you would likely increase marketing for chocolate and graham crackers to this customer. Alternatively, knowing that a customer would buy an item at a certain price point, but not above that, would potentially be used to set the item price or choose a promotional offer. Significant influential possibilities would be at your fingertips! Predictive analytics actually allows you to know this type of information and discover similar insights.

so what is predictive analytics?

If you think of analytics as a car, traditional business intelligence is looking at everything behind you and predictive analytics is utilizing that knowledge to predict what is in front of the car. Predictive analytics is the use of known data, statistics, and machine learning to predict future events. Many companies are using predictive analytics to forecast the future and make automated and manual decisions that drive business value. Predictive analytics does this by taking customer data and transforming it into knowledge about that customer. In our previous example, predictive analytics would have taken in data about the customer such as products purchased, location, and time of year and produced knowledge about the customer such as which products to market to the customer and which sales the customer will likely participate in.

how does predictive analytics work?

Predictive analysis can deliver great business value even when the individual predictions are not always accurate. How is that possible? In our initial example of purchasing marshmallows indicating that a customer will purchase chocolate and graham crackers, assume we know from historical data that prior to any targeted advertising only 5% of users who purchased item X also purchased item Y. After conducting an experiment with targeted advertising we find that 25% of customers who purchased item marshmallows also purchased item chocolate and graham crackers after the targeted advertising. While our initial prediction does not fully stand since 75% of customers did not behave as predicted, there is still substantial benefit from the increase in sales captured from this targeted advertising as the amount of customers who purchased item chocolate and graham crackers grew five times with the addition of the targeted advertising. It is much more important to identify segments and predict the behavior of groups of customers rather than be correct in predictions for each customer.

how does predictive analytics make a prediction?

A simple explanation is that it leverages past experiences captured in data to find patterns associated with the underlying problem and then makes an educated guess. A visual representation of this is shown below; as humans we can look at the image and quickly guess the color of the grayed out areas. A predictive analytics solution will try to mimic human learning behavior through the use of advanced statistics and machine learning.

what algorithms are used for predictive analytics?

There is no one-size-fits-all algorithm for predictive analytics, as different models have their own strengths and weaknesses. While the implementations of these algorithms are complex, the underlying idea can be very simple. There are two major types of prediction algorithms, classification and regression. Classification refers to predicting a discrete value such as a label, while regression refers to predicting a continuous number such as a price.

Examples of these algorithms include k nearest neighbor, linear regression, and random forest:

1. k nearest neighbor

K nearest neighbor (KNN) states that a prediction for an element should be the average of the n-closest elements to that element based on feature sets. KNN works for classification and regression applications and is quick to train and easy to implement but can be slow to test for large sets of data. For this reason KNN would be a great model to use if you are building a proof of concept or have a smaller set of data. A good use of KNN is classifying if a customer will purchase a new product given characteristics about the customer such as frequency of purchase, number of items purchased, and geographic location given a customer base of less than 10,000 people.

2. linear regression

Linear regression plots the data and creates a best fit line through it. The equation of this line will have coefficients from the features in the data. This equation will be applied to testing data to make predictions. Linear regression is a simple model that is best used when the outcome is continuous and has a linear relationship to its features. This makes linear regression a good first model to explore for many regression applications. If the result of the testing data set is not accurate it is likely that the data does not have a linear relationship to its features. In this case, try polynomial or another form of regression. Logistic regression is a good alternative to linear regression for classification applications. A good use of linear regression is predicting the amount of sales a specific product will generate given information about it such as price, category, seasonality, etc.

3. random forest

A random forest is a collection of decision trees. A decision tree is basically a set of questions about the features of the data that lead to a result. Individually these trees perform OK for predictive analytics but grouping them together and having each tree vote for its prediction is much more powerful. Random forests are known for their versatility and low bias. However, random forests do not provide a clear cut reasoning behind the predicted outcome like the linear regression equation and is more of a “black box.” Random forests are a great model to use for most applications that do not need extensive reasoning behind predictions. A good use of random forest is loan approval. The random forest classifier will classify the applicant into approval or denial based on collected features from the application such as household income, debt, employment, etc.

what’s next?

In addition to using predictive analytics to make predictions, we can take it one step further and utilize predictive analytics as a foundation for prescriptive analytics. Prescriptive analytics allows us to recommend business decisions through optimizing data generated in predictive analytics. In some cases these decisions can even be automated.

how can predictive analytics help your business?

Predictive analytics is growing in popularity because of its potential to be leveraged for significant business success. In general, these predictions do not need to be overwhelmingly accurate to see good success. You might not be able to know every move your customers will make before they make them, but there’s no doubt you can provide meaningful, timely insights that drive more informed decisions and, ultimately, happier customers.

Have additional questions? Interested in what Credera can do with predictive analytics for you? Reach out to us at findoutmore@credera.com.

Have a Question?

Please complete the Captcha