Related

Good Visualization Example #5,345: Here’s how America uses its land

on October 1

Data Science: America’s Hottest Job

on May 23

Contemporary Analysis (CAN) and Cabri Group and have teamed up again to use Machine Learning to predict the 2018 NCAA Men’s Basketball Tournament. This is different than last year as we are picking the entire 2018 bracket instead of just upsets.

Historically, only 26% of tournament’s games end in an upset (this includes games from all rounds). That’s 17 out of 64 games. Last year we did really good. Only failing to predict 3 upsets and getting 50% of our predictions right. We are going to need to improve a bunch to win that 1M/year for life from Berkshire Hathaway–including that wee bit about having to work for Berkshire Hathaway to be eligible. This year we added far more variables and used an ensemble model. Will we be perfect? Probably not. Here is the problem with using Machine Learning to try and predict a perfect bracket:

 

A). Error propagates itself through the bracket. This is why the odds of a perfect bracket are around 1:128 billion. If you pick San Diego State to upset Houston-

Side note: The machine learning is in fact, picking Houston by the slimmest of margins. However, if San Diego State wins, the machine learning is actually picking them to go on to beat Michigan, Providence, and then Ohio State to win the entire region.

 

and then Houston actually wins, you will lose the entire region. Perfection may have to do with a 6/11 game that no one would normally care about except its the tournament, and everyone cares about every game.

B). Machine Learning and Predictive Analytics aren’t about being 100% accurate. You wouldn’t want to pay for that kind of accuracy even if it were possible. We are trying to be less wrong for companies. This is why predicting upsets made sense and the whole 2018 NCAA Bracket is so hard. Figuring out who is most likely to be an outlier (churn) is something we do all the time. And, we can error on the side of being wrong. We would just tell you to call both Houston and San Diego State (in this instance) because calling them to talk to them about staying at your company has no ill effect. (i.e. there is very little cost to being wrong in this example.) There is a huge cost to being wrong in the tournament in the later rounds as you are predicting the next game based on your assumption of correctly predicting the last game.

 Without further ado, here is what the Machine Learning algorithm predicted as the bracket:

 

CAN Bracket-pdf

 

If you have questions on this type of analysis or machine learning in general, (or if we are perfect and you would like to congratulate us), please don’t hesitate to contact:

Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com), or

Nate Watson at CAN (nate@canworksmart.com).

 

Now for some disclaimers: 

Understand the technique that finds a group of winners (or losers) in 2018 NCAA bracket can be based on any metric. Our analysis isn’t to support gambling, but to open up people’s minds onto the possibilities of leveraging Machine Learning for their businesses. If we can predict things as seemingly complex as a basketball tournament (Something that has never been correctly predicted), then imagine what we could do with your data that drives your decisions

We will be keeping score using the very traditional 1,2,4,8,16 point process. 

 

**Any handicapping sports odds information contained herein is for entertainment purposes only. Neither CAN nor Cabri Group condone using this information to contravene any law or statute; it’s up to you to determine whether gambling is legal in your jurisdiction. This information is not associated with nor is it endorsed by any professional or collegiate league, association or team. Machine Learning can be done by anyone, but is done best with professional guidance.


Related

2018 NCAA bracket picks using Machine Learning.

on March 15

Re-Blog: Why Visualizing Data is Important

on February 24, 2017

At the beginning of the project, we set out to show how the 2017 NCAA College Basketball Tournament could be a proving ground for Machine Learning analysis. There are very few places in the world where we can use the same model to predict multiple outcomes in a short period of time, have a ready-made scorecard (Vegas), have the general public understand what we are trying to do, and have a chance to “beat” the algorithm with their own knowledge.

You could say our findings have been a “Slam Dunk” (I couldn’t help myself).

Before diving into the results, I wanted the reader to understand what we were up against. It’s easy to pick chalk (always picking the better seed). In fact, that is how the games are supposed to work. The 8 seed is supposed to beat the 9. And for the most part, the NCAA does a decent job. Historically, only 26% of tournament’s games end in an upset (this includes games from all rounds). That’s 17 out of 64 games. This was never going to be easy.

 

Project Recap

We predicted 20 upsets and got 10 right (50%). We only missed predicting 3 upsets.

Using Vegas as a scorecard and having bet $100 “dollars” on each predicted upset, we would have ended up +$2,605 off our simulated bets (a 30% ROI)–the majority of this coming from long shot underdogs.

Think about this. If we would have bet all chalk on games except the ones the algorithm predicted as upsets, then out of 61 games we would have only missed 13. That’s 79% accurate!

Let’s look at this another way. Our algorithm predicted 77% (10/13) of something that is only 26% likely to happen in the first place. Now think about what you would do if you could identify an unlikely event in your business with 77% accuracy.

  • What would you do if you knew 77% of the customers who were going to leave before they left?
  • What would you do if you knew 77% of failed batches before they happened?
  • What would you do if you knew 77% of your plant’s machine failures before they happened?

Business Scenario

You have a theory that some of your clients would buy more “product” if they were called and offered an upgraded deal. However you don’t want to call all of your clients because you have so many. What you do have is a dataset of past customers that successfully responded to this type of nudge. Using your data, our machine learning algorithm could predict a set of your clients that would be 77% likely to purchase more product if called.

 

Game changer right?

 

Why this is huge

Our Machine Learning lower seed winning project was looking to predict as accurately as we could a lower seeded team winning in the NCAA tournament. Our stated goal from the beginning was to get 47% of our picks correct and a mere 10% ROI. We beat both of those goals. Our Machine Learning algorithm, which uses a custom optimization engine called Evolutionary Analysis, looked at a comparison of 207 different metrics of college basketball teams and their results in prior tournaments. It selected ranges of those 207 measures that best matched up with historic wins by lower seeded teams. We then confirmed that the range was predictive by testing the selected ranges against a “clean” historic data set. This comparison is how we got our goal percent and ROI. We then published our forecasts before each round was played – the results speak for themselves.

While we still have 3 games to go, our initial point that Machine Learning can help you be better at making decisions from your data has been proven. Implementing Machine Learning isn’t hard so long as your business has these three characteristics:

  • A data set with a large number of characteristics
  • A measure of success to optimize upon
  • A desire to learn from data to make changes in your organization

 

If this sounds like something that your business could use, please contact Nate Watson of CAN (Nate@CanWorkSmart.com) or Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com) today.

 


Prediction Results

Here is a summary of our picks from the beginning of the project ($ indicates our successful pick where “money” was made):

East Tennessee St. over Florida
$ Xavier over Maryland
Vermont over Purdue
Florida Gulf Coast over Florida St.
Nevada over Iowa St.
$ Rhode Island over Creighton
$ Wichita St. over Dayton
$ USC over SMU
$ Wisconsin over Villanova
$ Xavier over Florida St.
Rhode Island over Oregon (tied with a minute to go)
Middle Tennessee over Butler
Wichita St. over Kentucky (tied with a minute to go)
Wisconsin over Florida (OT last second shot)
$ South Carolina over Baylor
$ Xavier over Arizona
Purdue over Kansas
Butler over North Carolina
$ South Carolina over Florida
$ Oregon over Kansas

And for those who are curious, our algorithm has detected one Final Four upset for this weekend:

Oregon over North Carolina

For more information about how we created the Machine Learning algorithm and how we kept score, please read our Machine Learning technical document. Additionally you can find results for the whole tournament here.


Related

2018 NCAA bracket picks using Machine Learning.

on March 15

Machine Learning Upset Prediction Project Proves its Value

on March 27, 2017
The Tableau data visualization above, found at Tableau Public, shows the “Top 100 Songs of All Time Lyrics”. Click here to hover over each square and see what words were used in which lyrics. Tableau is a software that converts data into graphs, charts, and images.

 

CAN’s data scientists love sorting through piles and piles of spreadsheets and numerical data, but it’s not for everyone. There are some amazing tools that convert raw data into visualizations. They help bring out the story of data, so everyone can understand it.

Here’s an old favorite from our blog about the importance of visualization. It’s a way for us at CAN to gear up for the next round of Tableau students at the Omaha Data Science Academy!

We are still accepting applicants for the third round of the Oma-DSA! You can apply here. We accept applications until three weeks before the start date, and start a waiting list after the spots are filled. 

 

Why Visualizing Data is Important


Related

Generating Sales Leads

on March 30, 2014
10 Questions to Ask Before Buying Sales Leads

Thinking about buying sales leads? Here are 10 questions that you should ask first.

1. What is the minimum purchase?

List brokers try to capture as much of your marketing budget as possible. They do this by setting minimum purchase amounts and charging for filtering: both encourage larger purchases. So while you might find a broker with low minimum purchases, there is a good chance they charge high fees to filter their lists.

The key is to find balance. Often, buying an extra thousand sales leads won’t cost as much as the first thousand. However, you might not want to use them all. You want to avoid using sales leads that don’t fit your target audience, because interrupting the wrong people is a good way to erode the credibility of your brand (and is a waste of your time and resources). Buying names and contact information is the cheapest part of marketing and selling. You should only use the leads that are the best fit for what you sell; even if that means not using every name.  Read more…


Related

2018 NCAA bracket picks using Machine Learning.

on March 15

Machine Learning Upset Prediction Project Proves its Value

on March 27, 2017
Generating Sales Leads

Every sales organization requires three things: sales managers, salespeople, and sales leads. In principal, the formula is simple: the sales team will meet their quota if the sales manager focuses the salespeople on the right sales leads.

Most sales organizations know how to find salespeople and sales managers, leaving sales leads. There are 4 sources of sales leads: 1.) referrals, 2.) conferences and trade shows, 3.) inbound marketing and 4.) proactive sales. Each sources has its pros and cons: the key is selecting the right sources for what you sell.

For example, there are businesses where referrals are often the best or the only way to grow. These “word-of-mouth” businesses tend to offer services that are intimate, offer solutions to frequent problems, and have limited marketing resources.

However, most businesses need more than one sources of leads to maximize revenue. Not having the right combination of sources stagnates growth and increases your cost of client acquisitionDifferent lead sources vary in the amount of upfront investment, sophistication required, and payback period.

Read more…


What if you knew which prospects to focus on for the best results?

Working with a Top 10 Online University, CAN used predictive analytics and data science to find patterns in their admissions data to help them make better decisions and focus their efforts.

We developed a model showing which prospects were most likely to convert, which needed extra attention, and which were unlikely to enroll at all. Armed with these insights, they are able put their most valuable resources — time and money — towards building relationships with the prospects that mattered, instead of wasting their efforts trying to engage uninterested individuals. Read more…


What if you could determine — in advance — your most beneficial business relationships?

Our predictive models help you sort relationship opportunities to determine which are beneficial, and which are distractions. You will be able to focus your resources on the Requests for Proposals (RFPs) that will have the most impact on your organization.

To make business relationship decisions, companies and organizations often rely on solicited bids and RFPs. The problem is that responding takes time and money — often more than 20 hours for a basic RFP and weeks for a more complex RFP. This investment of resources makes it very important to select and respond to the RFPs you are most likely to win.

Using your existing data, and the knowledge and intuition of your team, we build a model that helps focus your efforts. You will be able to select the RFPs that you have the greatest chance of winning, allowing you to use your resources more effectively — and close more bids.  Read more…


What if you could increase loyalty — and revenue — by selling smarter to your existing customers?

By failing to recognize cross-, re-, and up-sell opportunities within your existing customer base, your organization can experience decreased share of wallet, decreased customer loyalty, and increased customer churn.

Using predictive analytics, we were able to increase the share of wallet and customer loyalty of a 12,000 member credit union.  We identified which members were most likely to need a home or auto loan — and which were most likely to leave. These insights allowed them to create a proactive sales approach targeting their most valuable existing customers. Read more…



Finding that first data scientist is hard. Because your current staff already knows your processes and procedures, and they fit your culture, why not train someone you already employ?



Looking for something?