Nate is well known throughout the region as a leader who has helped numerous companies bridge the gap between data overload and actionable intelligence. He has been CAN's President since 2015 and since then has worked ceaselessly to strengthen and expand its operations.
We are excited to announce a partnership with Bellevue University, the beginnings of which are data science leadership and data science learning seminars on their online platform. The first is an upcoming lunch and learn following our recent book entitled: “Leading a Data-Driven Organization.”
This online lunch and learn will be:
December 4th and 6th, 11am-1pm each day
The seminar will outline a practical guide you can use to transform yourself and your organization to Win the Data Science Revolution!
It is specifically built for those who are tasked, or are going to be tasked, with leading the philosophical change that is Data-Driven Decision Making. The seminar will include topics like:
what data science is
how data science works
how data science can drive better decisions
how to implement ideas, and
how to hire, train, and manage talent.
The seminar will also address some of the major barriers that leaders deal with when implementing data-driven decision making including:
the bias of inactivity
the bias of decision avoidance
ROI for “science” projects
choosing the right projects, and
management’s ability to understand the results.
You will come away with a much better understanding of the people you are managing, the people who need their predictions, and how to stand in between both camps.
Bellevue University is a private, non-profit, accredited university physically located in Omaha, NE but teaches students around the globe. They offer career-relevant bachelors, masters, Ph.D. degree programs including a Masters in data science and one in Business Intelligence.
CAN and Bellevue have decided to partner together in an effort to provide seminars and lunch and learns about relevant topics in data science. Our goal is to produce a monthly talk about relevant topics such as Data Science Legal, Practical Machine Learning, Data Science Leadership, Data Science Sales, Data Science Implementation, and hopefully, eventually, the entire Data Science Academy for Credit.
Gordon Summers is Contemporary Analysis’s Chief Data Strategist. With 25 years experience helping companies from large Fortune 500 corporations to small not-for-profit companies, he is the region’s leading voice on the changes needed to implement and to change permanently the culture necessary to be data-driven. He also teaches data science and has helped many leaders understand and implement data science within their organizations. Feel free to reach out with questions about his book or other management related questions you might have.
Contemporary Analysis (CAN) and Cabri Group and have teamed up again to use Machine Learning to predict the 2018 NCAA Men’s Basketball Tournament. This is different than last year as we are picking the entire 2018 bracket instead of just upsets.
Historically, only 26% of tournament’s games end in an upset (this includes games from all rounds). That’s 17 out of 64 games. Last year we did really good. Only failing to predict 3 upsets and getting 50% of our predictions right. We are going to need to improve a bunch to win that 1M/year for life from Berkshire Hathaway–including that wee bit about having to work for Berkshire Hathaway to be eligible. This year we added far more variables and used an ensemble model. Will we be perfect? Probably not. Here is the problem with using Machine Learning to try and predict a perfect bracket:
A). Error propagates itself through the bracket. This is why the odds of a perfect bracket are around 1:128 billion. If you pick San Diego State to upset Houston-
Side note: The machine learning is in fact, picking Houston by the slimmest of margins. However, if San Diego State wins, the machine learning is actually picking them to go on to beat Michigan, Providence, and then Ohio State to win the entire region.
and then Houston actually wins, you will lose the entire region. Perfection may have to do with a 6/11 game that no one would normally care about except its the tournament, and everyone cares about every game.
B). Machine Learning and Predictive Analytics aren’t about being 100% accurate. You wouldn’t want to pay for that kind of accuracy even if it were possible. We are trying to be less wrong for companies. This is why predicting upsets made sense and the whole 2018 NCAA Bracket is so hard. Figuring out who is most likely to be an outlier (churn) is something we do all the time. And, we can error on the side of being wrong. We would just tell you to call both Houston and San Diego State (in this instance) because calling them to talk to them about staying at your company has no ill effect. (i.e. there is very little cost to being wrong in this example.) There is a huge cost to being wrong in the tournament in the later rounds as you are predicting the next game based on your assumption of correctly predicting the last game.
Without further ado, here is what the Machine Learning algorithm predicted as the bracket:
If you have questions on this type of analysis or machine learning in general, (or if we are perfect and you would like to congratulate us), please don’t hesitate to contact:
Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com), or
Nate Watson at CAN (firstname.lastname@example.org).
Now for some disclaimers:
Understand the technique that finds a group of winners (or losers) in 2018 NCAA bracket can be based on any metric. Our analysis isn’t to support gambling, but to open up people’s minds onto the possibilities of leveraging Machine Learning for their businesses. If we can predict things as seemingly complex as a basketball tournament (Something that has never been correctly predicted), then imagine what we could do with your data that drives your decisions
We will be keeping score using the very traditional 1,2,4,8,16 point process.
**Any handicapping sports odds information contained herein is for entertainment purposes only. Neither CAN nor Cabri Group condone using this information to contravene any law or statute; it’s up to you to determine whether gambling is legal in your jurisdiction. This information is not associated with nor is it endorsed by any professional or collegiate league, association or team. Machine Learning can be done by anyone, but is done best with professional guidance.
Today is Contemporary Analysis (CAN)’s 10th birthday!!! Although we are not the company we started back in 2008—different logo, different owners, new leaders, new data scientists, even a new way to serve up data science,—we still have the best team in the region and now 10 years of wisdom of how to implement, build, and train data scientists as well. Let us help — even if it is just a phone call for advice. Our goal is to help every company in the region use data science to be competitive in their niche.
Also, stay tuned for our upcoming article on CAN via 2008, 2018, and in 2028. The world is going to be an exciting place for data science in 10 years and we are here to help you get there.
Recently, Contemporary Analysis (CAN) was presented with the Greater Omaha Chamber’s Small Business of the Month award. It means a lot to be recognized for the hard work the team has done over the last year improving how companies start and scale data science internally.
On the consulting side, CAN has spent most of its 9+ years implementing Data Science in the traditional format: bidding via proposals and statements of work. While we still make bids and builds via proposal, we realized a lot of companies that need data science have a difficult time formulating their need into a written document. They don’t know where to start, what they need, or how to scope time and materials. Not having a statement of work presents a problem for traditional consulting. Even when they did understand how to build a needs document, an outside vendor wasn’t what companies wanted. Their desire was to own their own analytics team. We couldn’t agree more.
Once they had decided to make data science an internal strategy, companies hired a senior-level data scientist. This was due to the fact that the person needed to be all things for the department for a long time until it showed an ROI and would be granted abudget to hire a team. This required the first data scientist to be a programmer, database manager, mathematician, data visualist, data science strategist, and implementation manager. This came with a whole new set of problems. A person experienced enough to do all things is expensive ($150k+ salary), hard to find (time to hire is 6+ months), implementation requires a philosophical change in problem-solving (reactive to proactive), and scale requires a new management process (Agile is ineffective). It is simply too much for one person to be successful.
We realized we had to change how companies implemented data science. They needed a fully functional team inside their company from day one and for a time requiring an outside vendor, but needed to manage the process inside their company for company buy-in and scalability moving forward. A new way of implementation had to be invented.
We came up with something different, a method with immediate results and little risk. Instead of hiring senior-level talent out of the gate, use a full team of consultants to help you stand up your group. Then find, hire, and train someone to run the team once it’s already up. This means you get multiple people (with no recruiting and no time to hire) and expertise (understanding how to implement and manage all aspects of the team) immediately on day one – all at a price similar to hiring one senior-level data scientist.
Additionally, there is a benefit of when it’s time to find, hire, and train someone to run the team. Because some of the heavy lifting is being done by the vendor, a person skilled in data science implementation (a data science strategist) can now be hired to run and scale the department. This person is usually much less expensive than a senior data scientist.
We pioneered this thought process at a local bank in Treynor, IA. TS Bank is one of the fastest-growing banks in our area. They reached out to CAN in late 2015 asking how they could be better at predicting what is likely to happen not only in their portfolio but also in marketing, sales, operations, M&A – almost every function of their business. They already had business intelligence but didn’t know how to make the transition from reactive to proactive. That’s when CAN stepped in.
CAN became their data science team for 18 months, deploying 4 data scientists skilled in NoSql, data visualization, coding, and computational modeling. We served as their team until they were able to stand by themselves. Now, just 2 years later, they have their own team of two data scientists, a data strategist, a business intelligence analyst, and a database engineer. TS Bank now has a better data science team than banks five times their size, and they have plans to hire more. With their team, augmented by ours only when needed, TS Bank can make decisions faster and less expensive than their peers. They know when to buy. They know when to sell. They have better risk analysis. Their business intelligence team, now coupled with their predictive analytics team, is the poster child for how to start, grow, and scale data science in an organization. This pilot allowed CAN to better understand how to implement the “Us then You” strategy.
Today CAN offers three approaches to improving outcomes with data science:
Data Science as a Service (to get you started)
Training (to make it yours), and
Staff Augmentation (to keep your need fulfilled, even if that need is temporary).
Data Science as a Service (DSaaS)
CAN begins the process of serving its clients by initially and temporarily serving as their data science team. Day one, we show up and provide our client with an established data science team that knows exactly what they’re doing, knows how to dig into their data, and knows how to cut through the red tape.
Different than most consultants is the fact that from our first second on the job, there is a timer running. We establish an agreed-upon milestone and, once that milestone is reached, CAN will give you everything you need to have your own data science capabilities: all the data, all the knowledge, no black box, and nothing secret.
About midway through DSaaS, CAN will identify, hire, train, and place a person to run everything CAN is building. While this person can be and often is from outside the organization, sometimes it is an internal person who just needs a few additional skills. When this happens, there is an additional savings of time and money as this person required no hiring process, no internal training of tools, is already a culture fit, and requires no spin-up time figuring out internal politics.
To formalize this training process, CAN built a training curriculum designed to help individuals already in the workforce gain necessary and valuable skills in the four key parts of data science: coding, database, statistics and computational modeling, and data visualization. We call it the Omaha Data Science Academy. While initially only for individuals hired to manage the data science portfolio after CAN has reached the milestone, CAN has opened enrollment to the community so they too can have data scientists for job openings at companies with established data science teams. The Omaha Data Science Academy’s new goal is to train a data scientist for every company in Omaha.
Once established, teams like those at TS Bank aren’t finished building. It took TS Bank 24 months before they felt they had enough talent to cut CAN loose. And even then, CAN still helps out from time to time, providing talent for project work so the company can continue its lean data science team while retaining the talent and expertise to do the high-level or high-speed need projects. Because CAN offers staff augmentation in addition to DSaaS, companies can hire the senior-level talent much further down the road and not fear not have the senior-level thinking to tackle the hard projects as they arise. CAN also offers entry-level data scientists allowing extra staff for those projects that require hours of work, not level of expertise. In this way, CAN closes the loop of need making sure at all points, a company can run a data science team of any size and make noticeable gains from the insights gathered by having a team.
CAN’s new way of data science team implementation lets a company gain access to the decision making of their ability without the fear or risk of single person dependence. It creates better data science much faster with higher ROI than traditional implementation with the helping hand of a company who has done data science for years.
CAN provides insight to all teams as they grow and develop. We have completed over 150 projects across 100+ companies, and have been the data science teams for 3 companies in the proof of concept stage, 2 in implementation in past years, and 5 more planned for 2018. We have the experience and wisdom necessary to help companies navigate the new kind of management necessary in data science.
Sometimes obsession breads genius. Fans of Game of Thrones have dedicated much time to tracking the deaths, births, twists, and turns of the previous seasons. Now that season 7 has arrived, there are some amazing maps of the story out there. We found one we particularly liked on Tableau Public.
Check it out “Games of Thrones Interactive Death Viz” by David Murphy. Select a character and see how they died, who killed them, and what the circumstances were. Turn it into a game and test your friend’s knowledge too. There may be a few more to add before the season is over . . .
Our staff augmentation model proves that CAN believes in building a data analytics team from within. Our goal is to get a data scientists in every company in Omaha. We want to add value to every business team by training a data scientist with the latest tools of the industry and on the ground field experience.
At the beginning of the project, we set out to show how the 2017 NCAA College Basketball Tournament could be a proving ground for Machine Learning analysis. There are very few places in the world where we can use the same model to predict multiple outcomes in a short period of time, have a ready-made scorecard (Vegas), have the general public understand what we are trying to do, and have a chance to “beat” the algorithm with their own knowledge.
You could say our findings have been a “Slam Dunk” (I couldn’t help myself).
Before diving into the results, I wanted the reader to understand what we were up against. It’s easy to pick chalk (always picking the better seed). In fact, that is how the games are supposed to work. The 8 seed is supposed to beat the 9. And for the most part, the NCAA does a decent job. Historically, only 26% of tournament’s games end in an upset (this includes games from all rounds). That’s 17 out of 64 games. This was never going to be easy.
We predicted 20 upsets and got 10 right (50%). We only missed predicting 3 upsets.
Using Vegas as a scorecard and having bet $100 “dollars” on each predicted upset, we would have ended up +$2,605 off our simulated bets (a 30% ROI)–the majority of this coming from long shot underdogs.
Think about this. If we would have bet all chalk on games except the ones the algorithm predicted as upsets, then out of 61 games we would have only missed 13. That’s 79% accurate!
Let’s look at this another way. Our algorithm predicted 77% (10/13) of something that is only 26% likely to happen in the first place. Now think about what you would do if you could identify an unlikely event in your business with 77% accuracy.
What would you do if you knew 77% of the customers who were going to leave before they left?
What would you do if you knew 77% of failed batches before they happened?
What would you do if you knew 77% of your plant’s machine failures before they happened?
You have a theory that some of your clients would buy more “product” if they were called and offered an upgraded deal. However you don’t want to call all of your clients because you have so many. What you do have is a dataset of past customers that successfully responded to this type of nudge. Using your data, our machine learning algorithm could predict a set of your clients that would be 77% likely to purchase more product if called.
Game changer right?
Why this is huge
Our Machine Learning lower seed winning project was looking to predict as accurately as we could a lower seeded team winning in the NCAA tournament. Our stated goal from the beginning was to get 47% of our picks correct and a mere 10% ROI. We beat both of those goals. Our Machine Learning algorithm, which uses a custom optimization engine called Evolutionary Analysis, looked at a comparison of 207 different metrics of college basketball teams and their results in prior tournaments. It selected ranges of those 207 measures that best matched up with historic wins by lower seeded teams. We then confirmed that the range was predictive by testing the selected ranges against a “clean” historic data set. This comparison is how we got our goal percent and ROI. We then published our forecasts before each round was played – the results speak for themselves.
While we still have 3 games to go, our initial point that Machine Learning can help you be better at making decisions from your data has been proven. Implementing Machine Learning isn’t hard so long as your business has these three characteristics:
A data set with a large number of characteristics
A measure of success to optimize upon
A desire to learn from data to make changes in your organization
If this sounds like something that your business could use, please contact Nate Watson of CAN (Nate@CanWorkSmart.com) or Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com) today.
Here is a summary of our picks from the beginning of the project ($ indicates our successful pick where “money” was made):
East Tennessee St. over Florida $ Xavier over Maryland
Vermont over Purdue
Florida Gulf Coast over Florida St.
Nevada over Iowa St. $ Rhode Island over Creighton $ Wichita St. over Dayton $ USC over SMU $ Wisconsin over Villanova $ Xavier over Florida St.
Rhode Island over Oregon (tied with a minute to go)
Middle Tennessee over Butler
Wichita St. over Kentucky (tied with a minute to go)
Wisconsin over Florida (OT last second shot) $ South Carolina over Baylor $ Xavier over Arizona
Purdue over Kansas
Butler over North Carolina $ South Carolina over Florida $ Oregon over Kansas
And for those who are curious, our algorithm has detected one Final Four upset for this weekend:
Machine Learning and the NCAA Men’s Basketball Tournament Methodology
<<This article is meant to be the technical document following the above article. Please read the following article before continuing.>>
“The past may not be the best predictor of the future, but it is really the only tool we have”
Before we delve into the “how” of the methodology, it is important to understand “what” we were going for: A set of characteristics that would indicate that a lower seed would win. We use machine learning to look through a large collection of characteristics and it finds a result set of characteristics that maximizes the number of lower seed wins while simultaneously minimizing lower seed losses. We then apply the result set as a filter to new games. The new games that make it through the filter are predicted as more likely to have the lower seed win. What we have achieved is a set of criteria that are most predictive of a lower seed winning.
This result set is fundamentally different than an approach trying to determine the results of all new games whereby an attempt is made to find result set that would apply to all new games. There is a level of complexity and ambiguity with a universal model which is another discussion entirely. By focusing in on one result set (lower seed win) we can get a result that is more predictive than attempting to predict all games.
This type of predictive result set has great applications in business. What is the combination of characteristics that best predict a repeat customer? What is the combination of characteristics that best predict a more profitable customer? What is the combination of characteristics that best predict an on time delivery? This is different from just trying to forecast a demand by using a demand signal combined with additional data to help forecast. Think of it as the difference between a stock picker that picks stocks most likely to rise vs. forecasting how far up or down a specific stock will go. The former is key for choosing stocks the later for rating stocks you already own.
One of the reasons we chose “lower seed wins” is that there is an opportunity in almost all games played in the NCAA tournament for there to be a data point. There are several games where identical seeds play. Most notably, the first four games do involve identical seeds and the final four can possibly have identical seeds. However, that still gives us roughly 60 or so games a year. The more data we have, the better predictions we get.
The second needed item is more characteristics. For our lower seed win we had >200 different characteristics for years 2012-2015. We used the difference between the characteristics of the two teams as the selection. We could have used the absolute characteristics for both teams as well. As the analysis is executed, if a characteristic is un-needed it is ignored. What the ML creates is a combination of characteristics. We call our tool, “Evolutionary Analysis”. It works by adjusting the combinations in an ever improving manner to get result. There is a little more in the logic that allows for other aspects of optimization, but the core of Evolutionary Analysis is finding a result set.
The result set was then used as a filter on 2016 to confirm that the result is predictive. It is possible that the result set from 2012-2015 doesn’t actually predict 2016 results. Our current result set as a filter on 2016 data had 47% underdog wins vs. the overall population. The historic average is 26% lower seed wins and randomly, the 47% underdog win result could happen about 3.4% of the time. Our current result is therefore highly probable as a predictive filter.
The last step in the process is to look at those filter criteria that have been chosen and to check to see if they are believable. For example, one of the criteria that was Defensive Efficiency Rank. Evolutionary Analysis chose a lower limit of … well it set a lower limit, let’s just say that. This makes sense, if a lower seed has a defense that is ranked so far inferior to the higher seed, it is unlikely to prevail. A counter example is that the number of blocks per game was not a criteria that was chosen. In fact, most of the >200 criteria were not used, but that handful of around ten criteria set the filter that chooses a population of games that is more likely to contain a lower seed winning.
And that is one of the powerful aspects of this type of analysis, you don’t get the one key driver, or even two metrics that have a correlation. You get a whole set of filters that points to a collection of results that deviates from the “normal”.
Please join us as we test our result set this year. We’ll see if we get around 47%. Should be interesting!
If you have questions on this type of analysis or machine learning in general, please don’t hesitate to contact Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com) or Nate Watson at CAN (email@example.com).
**Disclaimer: Any handicapping sports odds information contained herein is for entertainment purposes only. Neither CAN nor Cabri Group condone using this information to contravene any law or statute; it’s up to you to determine whether gambling is legal in your jurisdiction. This information is not associated with nor is it endorsed by any professional or collegiate league, association or team. Machine Learning can be done by anyone, but is done best with professional guidance.