machine learning prediction

March Machine Learning Mayhem

Machine Learning and the NCAA Men’s Basketball Tournament Methodology

 <<This article is meant to be the technical document following the above article. Please read the following article before continuing.>>

“The past may not be the best predictor of the future, but it is really the only tool we have”

Before we delve into the “how” of the methodology, it is important to understand “what” we were going for: A set of characteristics that would indicate that a lower seed would win. We use machine learning to look through a large collection of characteristics and it finds a result set of characteristics that maximizes the number of lower seed wins while simultaneously minimizing lower seed losses. We then apply the result set as a filter to new games. The new games that make it through the filter are predicted as more likely to have the lower seed win. What we have achieved is a set of criteria that are most predictive of a lower seed winning.
This result set is fundamentally different than an approach trying to determine the results of all new games whereby an attempt is made to find result set that would apply to all new games. There is a level of complexity and ambiguity with a universal model which is another discussion entirely. By focusing in on one result set (lower seed win) we can get a result that is more predictive than attempting to predict all games.
This type of predictive result set has great applications in business. What is the combination of characteristics that best predict a repeat customer? What is the combination of characteristics that best predict a more profitable customer? What is the combination of characteristics that best predict an on time delivery? This is different from just trying to forecast a demand by using a demand signal combined with additional data to help forecast. Think of it as the difference between a stock picker that picks stocks most likely to rise vs. forecasting how far up or down a specific stock will go. The former is key for choosing stocks the later for rating stocks you already own.
One of the reasons we chose “lower seed wins” is that there is an opportunity in almost all games played in the NCAA tournament for there to be a data point. There are several games where identical seeds play. Most notably, the first four games do involve identical seeds and the final four can possibly have identical seeds. However, that still gives us roughly 60 or so games a year. The more data we have, the better predictions we get.
The second needed item is more characteristics. For our lower seed win we had >200 different characteristics for years 2012-2015. We used the difference between the characteristics of the two teams as the selection. We could have used the absolute characteristics for both teams as well. As the analysis is executed, if a characteristic is un-needed it is ignored. What the ML creates is a combination of characteristics. We call our tool, “Evolutionary Analysis”. It works by adjusting the combinations in an ever improving manner to get result. There is a little more in the logic that allows for other aspects of optimization, but the core of Evolutionary Analysis is finding a result set.
The result set was then used as a filter on 2016 to confirm that the result is predictive. It is possible that the result set from 2012-2015 doesn’t actually predict 2016 results. Our current result set as a filter on 2016 data had 47% underdog wins vs. the overall population. The historic average is 26% lower seed wins and randomly, the 47% underdog win result could happen about 3.4% of the time. Our current result is therefore highly probable as a predictive filter.
The last step in the process is to look at those filter criteria that have been chosen and to check to see if they are believable. For example, one of the criteria that was Defensive Efficiency Rank. Evolutionary Analysis chose a lower limit of … well it set a lower limit, let’s just say that. This makes sense, if a lower seed has a defense that is ranked so far inferior to the higher seed, it is unlikely to prevail. A counter example is that the number of blocks per game was not a criteria that was chosen. In fact, most of the >200 criteria were not used, but that handful of around ten criteria set the filter that chooses a population of games that is more likely to contain a lower seed winning.
And that is one of the powerful aspects of this type of analysis, you don’t get the one key driver, or even two metrics that have a correlation. You get a whole set of filters that points to a collection of results that deviates from the “normal”.
Please join us as we test our result set this year. We’ll see if we get around 47%. Should be interesting!
If you have questions on this type of analysis or machine learning in general, please don’t hesitate to contact Gordon Summers of Cabri Group ( or Nate Watson at CAN (
**Disclaimer: Any handicapping sports odds information contained herein is for entertainment purposes only. Neither CAN nor Cabri Group condone using this information to contravene any law or statute; it’s up to you to determine whether gambling is legal in your jurisdiction. This information is not associated with nor is it endorsed by any professional or collegiate league, association or team. Machine Learning can be done by anyone, but is done best with professional guidance.

exactly where the answer lies

Predictive Analytics: Why should you use it?

We get asked quite frequently: Why should my company invest in predictive analytics? Why even bother? What can it do for us?

Great questions. Predictive Analytics, or predictive analysis, used to be a competitive advantage. All through the first part of the 2010s, companies used data science, predictive analytics, and machine learning to take their business intelligence (knowing what is happening inside the company right now) and turning it into what is going to happen in the future so we can plan for it before it happens. We call this moving up the data hierarchy. But somewhere in the middle of 2019, we saw a switch. As CAN took companies through our process to get them data-driven decision making, we realized companies weren’t using it for their competitive advantage anymore, they are using it to stay relevant.

Companies now are required to do more with less. They are required to stay relevant to their customers. They are required to know who their customers are and what they want-all before the customer does. Data intelligence is now so common in our lives, companies have to implement predictive analytics to even stay with (not ahead) of their customers. 

Example: With technology developing so quickly, new ways to implement marketing strategies and more effectively reach consumers are popping up all the time. Predictive analytics is one such technique. Praised for its ability to inform companies of future trends and reveal important information, predictive analytics is growing in popularity, with 87 percent of B2B marketing leaders saying they had already implemented or were planning to implement predictive analytics in the coming 12 months. 

So what is it? What is predictive analytics and how do you use it. 

What Is Predictive Analytics?

Before fleshing out its benefits, it’s probably best to first explain what predictive analytics is. Predictive analytics is a process for collecting and analyzing current data using Business Intelligence, Machine Learning, and potentially AI.  

How Can Predictive analytics Benefit Marketing and Sales?


  1. More Efficient Customer Acquisition

By providing your sales team with specific data, predictive analytics can allow them to acquire new customers and keep old ones more efficiently and with less cost. What journey do they take to purchase a product? What advertising do they respond to? What is it about your product/service that they enjoy the most? All these questions can be answered by analyzing previous data and drawing conclusions about future activity. This information can then be used to determine which customers to reach out and how best to appeal to them, saving time and money.

  1. Determine Up-sell Opportunities

Predictive analytics also assists in drawing conclusions about other aspects of your customers’ buying behavior. Through analytics, brands can better understand what their customers’ needs are and what exactly they’re looking for. This can then be used to tailor the sales and marketing strategy to specific customers.
For example, if you are a fashion brand and have customers who are in need of shoes, it would be inefficient and wasteful to send them an advertisement for a new shoe promotion. Instead, it would be better to send this to customers in need of footwear to maximize on profit.

  1. Optimize Marketing Strategy

Not only can predictive analytics benefit brands by helping to find information on customers, it can also help in regards to the market environment. You can learn what time of the year spending peaks, how much people are spending and what they’re spending their money on. This information can assist in the successful execution of marketing strategies by ensuring you are targeting the right people at the right time.
Or you can figure out where to score the most candy on Halloween as we did back in 2013 when we invented a dashboard to help trick-or-treaters.  See, predictive analytics can be fun too.
Predictive analytics is an increasingly popular method for brands to more effectively initiate sales and marketing strategies. By providing detailed information about market trends and buying behavior, brands can cut costs, boost profit and increase overall efficiency.

Featured Posts – Click the Brain
CAN Jewels