The Man Behind the Scenes: An Interview with Nate Watson

Last month, TechBus interviewed our very own President, Nate Watson. TechBus is an Omaha-based group that posts bi-weekly interviews about local businesses and new technology.
This video is everything you’ve ever wanted to know about CAN: who we are, what we do, how we’re related to the Data Science Academy, and how our staff augmentation model works. It’s a great way to get a glimpse into CAN for those who may be interested in contracting us, being employed by us, or being taught by us at the Oma-DSA.

To schedule a phone call with our very own Nate Watson, send us an e-mail at nate@canworksmart.com.

Women in Tech: A Visualization from Tableau Public

Here at CAN our free-time is spent researching the latest trends in and facts about data science. In a skim of Tableau Public, we found this fascinating visualization about women in tech. Tableau Public is a platform to post data visualizations made with Tableau. You don’t have to be a data expert to share a visualization, you just have to be excited about data.
With this particular visualization, you can see how many fewer women receive tech-related degrees than men. As women are quickly overtaking men in educational status, it’s more important than ever to attract their attention to the opportunities of the tech world. We at CAN believe in giving a real-life data science education to all who want to pursue it. That’s why we created the Omaha Data Science Academy with Interface Web School. Sound like something you’d like to know more about? Check out more information here.

Matt Hoover Dynamo

Matt Hoover Reps the Startup Collaborative

Check out the video below of one of our data scientists and Director of Data Visualization Matt Hoover giving a tour of the The Startup Collaborative. Matt was interviewed by Omaha tech company Dynamo. Dynamo is a new kind of IT consulting and recruiting agency that is based on an understanding of who companies actually need — valuing people and culture fit over transactions and placement fees.

Dynamo + Matt Hoover from Brody Deren on Vimeo.

CAN HQ @ The Exchange Building

In the video you’ll watch Matt as he shows off the Omaha Startup Collaborative’s coworking space at the Exchange Building, learn a little about the Omaha Data Science Academy, and see up close footage of CAN’s headquarters. Matt also mentions his newest project involving March Madness, Creighton basketball, Tableau, and statistics. Sound intriguing? Find out more here.

Single Mom of Three Rocks Web Development World

We found this article on Interface’s blog. We thought it was an awesome story about how Interface’s web school turned a busy woman’s career around. Despite obstacles of daily life, Miranda Tharp jump-started a web development career.
DSC_4754-300x300
To read the full article about, click here.
Interface partnered with CAN at the end of last year to create The Omaha Data Science Academy. With a certificate from the Data Science Academy, skilled professionals can boost their resumes with additional real life experience.

Machine Learning Upset Prediction Project Proves its Value

At the beginning of the project, we set out to show how the 2017 NCAA College Basketball Tournament could be a proving ground for Machine Learning analysis. There are very few places in the world where we can use the same model to predict multiple outcomes in a short period of time, have a ready-made scorecard (Vegas), have the general public understand what we are trying to do, and have a chance to “beat” the algorithm with their own knowledge.
You could say our findings have been a “Slam Dunk” (I couldn’t help myself).
Before diving into the results, I wanted the reader to understand what we were up against. It’s easy to pick chalk (always picking the better seed). In fact, that is how the games are supposed to work. The 8 seed is supposed to beat the 9. And for the most part, the NCAA does a decent job. Historically, only 26% of tournament’s games end in an upset (this includes games from all rounds). That’s 17 out of 64 games. This was never going to be easy.
 

Project Recap

We predicted 20 upsets and got 10 right (50%). We only missed predicting 3 upsets.
Using Vegas as a scorecard and having bet $100 “dollars” on each predicted upset, we would have ended up +$2,605 off our simulated bets (a 30% ROI)–the majority of this coming from long shot underdogs.
Think about this. If we would have bet all chalk on games except the ones the algorithm predicted as upsets, then out of 61 games we would have only missed 13. That’s 79% accurate!
Let’s look at this another way. Our algorithm predicted 77% (10/13) of something that is only 26% likely to happen in the first place. Now think about what you would do if you could identify an unlikely event in your business with 77% accuracy.

  • What would you do if you knew 77% of the customers who were going to leave before they left?
  • What would you do if you knew 77% of failed batches before they happened?
  • What would you do if you knew 77% of your plant’s machine failures before they happened?

Business Scenario

You have a theory that some of your clients would buy more “product” if they were called and offered an upgraded deal. However you don’t want to call all of your clients because you have so many. What you do have is a dataset of past customers that successfully responded to this type of nudge. Using your data, our machine learning algorithm could predict a set of your clients that would be 77% likely to purchase more product if called.
 
Game changer right?
 

Why this is huge

Our Machine Learning lower seed winning project was looking to predict as accurately as we could a lower seeded team winning in the NCAA tournament. Our stated goal from the beginning was to get 47% of our picks correct and a mere 10% ROI. We beat both of those goals. Our Machine Learning algorithm, which uses a custom optimization engine called Evolutionary Analysis, looked at a comparison of 207 different metrics of college basketball teams and their results in prior tournaments. It selected ranges of those 207 measures that best matched up with historic wins by lower seeded teams. We then confirmed that the range was predictive by testing the selected ranges against a “clean” historic data set. This comparison is how we got our goal percent and ROI. We then published our forecasts before each round was played – the results speak for themselves.
While we still have 3 games to go, our initial point that Machine Learning can help you be better at making decisions from your data has been proven. Implementing Machine Learning isn’t hard so long as your business has these three characteristics:

  • A data set with a large number of characteristics
  • A measure of success to optimize upon
  • A desire to learn from data to make changes in your organization

 
If this sounds like something that your business could use, please contact Nate Watson of CAN (Nate@CanWorkSmart.com) or Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com) today.
 


Prediction Results

Here is a summary of our picks from the beginning of the project ($ indicates our successful pick where “money” was made):

East Tennessee St. over Florida
$ Xavier over Maryland
Vermont over Purdue
Florida Gulf Coast over Florida St.
Nevada over Iowa St.
$ Rhode Island over Creighton
$ Wichita St. over Dayton
$ USC over SMU
$ Wisconsin over Villanova
$ Xavier over Florida St.
Rhode Island over Oregon (tied with a minute to go)
Middle Tennessee over Butler
Wichita St. over Kentucky (tied with a minute to go)
Wisconsin over Florida (OT last second shot)
$ South Carolina over Baylor
$ Xavier over Arizona
Purdue over Kansas
Butler over North Carolina
$ South Carolina over Florida
$ Oregon over Kansas

And for those who are curious, our algorithm has detected one Final Four upset for this weekend:

Oregon over North Carolina

For more information about how we created the Machine Learning algorithm and how we kept score, please read our Machine Learning technical document. Additionally you can find results for the whole tournament here.

2017 NCAA Tournament Elite Eight Upset Picks

We are doing better than anticipated even after the heartbreaker of a game last night between Wisconsin and Florida and are still ahead $165.
 
Here are our Round 4 Potential Upsets:
 

South Carolina over Florida

Oregon over Kansas

 
We will have a weekend breakdown of all of our picks on Monday.
 
For more information about how we created the Machine Learning algorithm and how we are keeping score, you may read the Machine Learning article here:
 
http://can2013.wpengine.com/machine-learning-basketball-methodology
 

2017 NCAA Tournament Machine Learning Prediction Results

After the first weekend of basketball, our Machine Learning Prediction tool has good results.
We had two measures of success: We wanted to win at least 46% of our picks and we wanted to “win” using virtual money bet on the money lines. By both measures, we had success: We correctly picked 6 upsets out of the 13 games we chose (46%) and we won $1,359 off the 6 correctly picked upsets (profit of $59 on $1300 laid ($100 per game) or 5% ROI).
The details:
Overall there were 10 instances where the lower seed won in the first two rounds. This year is on track for fewer lower seeds winning (22%) than the historic rate (26%). So even with “tough headwinds” we still came close to our expectations.
“But CAN, there were multiple lower seed winning that you didn’t pick. Why didn’t the model see Middle Tennessee upsetting Minnesota?” The answer is simple, MT winning was a result of variables that we weren’t measuring. Our picks were based on games that matched our criteria were based on variables found in most (not all) of the games in which the lower seed won in past years. Lower seeds can and will still win, our model was built to predict the highest number of upsets without over picking. This is actually the perfect example of a model, even great ones, will not predict all. However, most, even some, in business, can mean huge revenue increases or monies saved.
Besides we had some really, really close calls that would have put us way, way ahead. There were several games where we had that the lower seed having a good chance of winning and they simply lost (Both Wichita State and Rhode Island had the games tied with under a minute to go). We picked multiple games where the money lines showed Vegas gave no chance of the upset, yet the teams came very close. Our play was to choose games that match the criteria and spread the risk over several probable winners. This wasn’t about picking the only upsets or all of the upsets, this was about picking a set of games that had
Our goal was to not choose games in a vacuum (which is how you bet), but instead to choose games that match the criteria and spread the risk over several probable winners. This wasn’t about picking the only upsets or all of the upsets, this was about picking a set of games that had the highest probability of the lower seed winning. And by our measures of success, we achieved our goal.
We aren’t done quite yet either.
For the next round, we have 5 games that match our criteria:

Wisconsin over Florida
South Carolina over Baylor
Xavier over Arizona
Purdue over Kansas
Butler over North Carolina

**If any games match our predictive criteria in the next round, we’ll post them Saturday before tip-off.
The results of the first rounds:
The Machine Learning algorithm performed as advertised: It identified a set of characteristics from historic data that was predictive of future results. The implications for any business is clear: if you have historic data and you leverage this type of expertise, you can predict the future.
For more information about how we created the Machine Learning algorithm and how we are keeping score, you may read the Machine Learning article here:
http://can2013.wpengine.com/machine-learning-basketball-methodology
If you would like to see how Machine Learning could improve your business, please feel free to reach out to either of us: this can relate to your business contact Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com) or Nate Watson of CAN (nate@canworksmart.com).
 

2017 NCAA Tournament Round of 32 Upset Picks

The CAN/Cabri Group Machine Learning Lower Seed Win Prediction tool has made its second round forecast! Without further ado:

 Wisconsin (8) over Villanova (1)
Xavier (11) over Florida State(3)
Middle Tennessee (12) over Butler (4)
Rhode Island (11) over Oregon (3)
Wichita St. (10) over Kentucky (2)

We’ll do a review on Monday 3/20 of the first and second round.
For more information about how we created the Machine Learning algorithm and how we are keeping score, you may read the Machine Learning article here:
http://can2013.wpengine.com/machine-learning-basketball-methodology

2017 NCAA Tournament Round of 64 Upset Predictions

The Cabri Group / CAN Machine Learning Lower Seed Win Prediction tool has made its first round forecast! Without further ado:

East Tennessee St. (13) over Florida (4)
Xavier (11) over Maryland (6)
Vermont (13) over Purdue (4)
Florida Gulf Coast (14) over Florida St. (3)
Nevada (12) over Iowa St. (5)
Rhode Island (11) over Creighton (6)
Wichita St. (10) over Dayton (7)

 
* If the last play in games add another predicted upset, we’ll update that prior to the game starting.

Update: USC (11) over SMU (6)

One of the obvious observations on the predictions is: “Wait, no 8/9 upsets????” Remember these games show the most similar characteristics of the largest historic collection of upsets. This doesn’t mean that there will be no upsets as 8/9 nor that all of the predictions above will hit (remember we are going for 47% upsets) nor that all games not listed will have the favorites win. The games on the list are there because they share the most characteristics with historic times when the lower seed won.
Also, one of the key team members on this project, Matt, is a big Creighton fan (and grad). He was not happy to see Creighton on the list. I’ll speak to that one specifically. In the technical notes, I indicated that one of the many criteria that is being used is was Defensive Efficiency (DE). Machine Learning algorithm (Evolutionary Analysis) doesn’t like it when the lower seed has a large gap of DE between the lower seed and the higher seed. Creighton actually has a lower Defensive Efficiency than Rhode Island. Sorry Matt. Again, it doesn’t mean Creighton won’t win, it only means that the Rhode Island v. Creighton game shares more criteria with a the largest collection of historic upsets than the other games in the tournament.
As we indicated, we will use the odds as well as a count of upsets to determine how well we do as the tournament goes on. We’ll have a new set of predictions on Saturday for the next round of the tournament and a recap coming on Monday.
For more information about how we created the Machine Learning algorithm and how we are keeping score, you may read the Machine Learning article here:
http://can2013.wpengine.com/machine-learning-basketball-methodology

machine learning prediction

March Machine Learning Mayhem

Machine Learning and the NCAA Men’s Basketball Tournament Methodology

 <<This article is meant to be the technical document following the above article. Please read the following article before continuing.>>

“The past may not be the best predictor of the future, but it is really the only tool we have”

 
Before we delve into the “how” of the methodology, it is important to understand “what” we were going for: A set of characteristics that would indicate that a lower seed would win. We use machine learning to look through a large collection of characteristics and it finds a result set of characteristics that maximizes the number of lower seed wins while simultaneously minimizing lower seed losses. We then apply the result set as a filter to new games. The new games that make it through the filter are predicted as more likely to have the lower seed win. What we have achieved is a set of criteria that are most predictive of a lower seed winning.
 
This result set is fundamentally different than an approach trying to determine the results of all new games whereby an attempt is made to find result set that would apply to all new games. There is a level of complexity and ambiguity with a universal model which is another discussion entirely. By focusing in on one result set (lower seed win) we can get a result that is more predictive than attempting to predict all games.
 
This type of predictive result set has great applications in business. What is the combination of characteristics that best predict a repeat customer? What is the combination of characteristics that best predict a more profitable customer? What is the combination of characteristics that best predict an on time delivery? This is different from just trying to forecast a demand by using a demand signal combined with additional data to help forecast. Think of it as the difference between a stock picker that picks stocks most likely to rise vs. forecasting how far up or down a specific stock will go. The former is key for choosing stocks the later for rating stocks you already own.
 
One of the reasons we chose “lower seed wins” is that there is an opportunity in almost all games played in the NCAA tournament for there to be a data point. There are several games where identical seeds play. Most notably, the first four games do involve identical seeds and the final four can possibly have identical seeds. However, that still gives us roughly 60 or so games a year. The more data we have, the better predictions we get.
 
The second needed item is more characteristics. For our lower seed win we had >200 different characteristics for years 2012-2015. We used the difference between the characteristics of the two teams as the selection. We could have used the absolute characteristics for both teams as well. As the analysis is executed, if a characteristic is un-needed it is ignored. What the ML creates is a combination of characteristics. We call our tool, “Evolutionary Analysis”. It works by adjusting the combinations in an ever improving manner to get result. There is a little more in the logic that allows for other aspects of optimization, but the core of Evolutionary Analysis is finding a result set.
The result set was then used as a filter on 2016 to confirm that the result is predictive. It is possible that the result set from 2012-2015 doesn’t actually predict 2016 results. Our current result set as a filter on 2016 data had 47% underdog wins vs. the overall population. The historic average is 26% lower seed wins and randomly, the 47% underdog win result could happen about 3.4% of the time. Our current result is therefore highly probable as a predictive filter.
 
The last step in the process is to look at those filter criteria that have been chosen and to check to see if they are believable. For example, one of the criteria that was Defensive Efficiency Rank. Evolutionary Analysis chose a lower limit of … well it set a lower limit, let’s just say that. This makes sense, if a lower seed has a defense that is ranked so far inferior to the higher seed, it is unlikely to prevail. A counter example is that the number of blocks per game was not a criteria that was chosen. In fact, most of the >200 criteria were not used, but that handful of around ten criteria set the filter that chooses a population of games that is more likely to contain a lower seed winning.
 
And that is one of the powerful aspects of this type of analysis, you don’t get the one key driver, or even two metrics that have a correlation. You get a whole set of filters that points to a collection of results that deviates from the “normal”.
 
Please join us as we test our result set this year. We’ll see if we get around 47%. Should be interesting!
 
If you have questions on this type of analysis or machine learning in general, please don’t hesitate to contact Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com) or Nate Watson at CAN (nate@canworksmart.com).
**Disclaimer: Any handicapping sports odds information contained herein is for entertainment purposes only. Neither CAN nor Cabri Group condone using this information to contravene any law or statute; it’s up to you to determine whether gambling is legal in your jurisdiction. This information is not associated with nor is it endorsed by any professional or collegiate league, association or team. Machine Learning can be done by anyone, but is done best with professional guidance.
 
 
 

Featured Posts – Click the Brain
Archives
CAN Jewels