2017 NCAA Tournament Round of 32 Upset Picks

The CAN/Cabri Group Machine Learning Lower Seed Win Prediction tool has made its second round forecast! Without further ado:

 Wisconsin (8) over Villanova (1)
Xavier (11) over Florida State(3)
Middle Tennessee (12) over Butler (4)
Rhode Island (11) over Oregon (3)
Wichita St. (10) over Kentucky (2)

We’ll do a review on Monday 3/20 of the first and second round.
For more information about how we created the Machine Learning algorithm and how we are keeping score, you may read the Machine Learning article here:
http://can2013.wpengine.com/machine-learning-basketball-methodology

2017 NCAA Tournament Round of 64 Upset Predictions

The Cabri Group / CAN Machine Learning Lower Seed Win Prediction tool has made its first round forecast! Without further ado:

East Tennessee St. (13) over Florida (4)
Xavier (11) over Maryland (6)
Vermont (13) over Purdue (4)
Florida Gulf Coast (14) over Florida St. (3)
Nevada (12) over Iowa St. (5)
Rhode Island (11) over Creighton (6)
Wichita St. (10) over Dayton (7)

 
* If the last play in games add another predicted upset, we’ll update that prior to the game starting.

Update: USC (11) over SMU (6)

One of the obvious observations on the predictions is: “Wait, no 8/9 upsets????” Remember these games show the most similar characteristics of the largest historic collection of upsets. This doesn’t mean that there will be no upsets as 8/9 nor that all of the predictions above will hit (remember we are going for 47% upsets) nor that all games not listed will have the favorites win. The games on the list are there because they share the most characteristics with historic times when the lower seed won.
Also, one of the key team members on this project, Matt, is a big Creighton fan (and grad). He was not happy to see Creighton on the list. I’ll speak to that one specifically. In the technical notes, I indicated that one of the many criteria that is being used is was Defensive Efficiency (DE). Machine Learning algorithm (Evolutionary Analysis) doesn’t like it when the lower seed has a large gap of DE between the lower seed and the higher seed. Creighton actually has a lower Defensive Efficiency than Rhode Island. Sorry Matt. Again, it doesn’t mean Creighton won’t win, it only means that the Rhode Island v. Creighton game shares more criteria with a the largest collection of historic upsets than the other games in the tournament.
As we indicated, we will use the odds as well as a count of upsets to determine how well we do as the tournament goes on. We’ll have a new set of predictions on Saturday for the next round of the tournament and a recap coming on Monday.
For more information about how we created the Machine Learning algorithm and how we are keeping score, you may read the Machine Learning article here:
http://can2013.wpengine.com/machine-learning-basketball-methodology

March Machine Learning Mayhem

Machine Learning and the NCAA Men’s Basketball Tournament Methodology

 <<This article is meant to be the technical document following the above article. Please read the following article before continuing.>>

“The past may not be the best predictor of the future, but it is really the only tool we have”

 
Before we delve into the “how” of the methodology, it is important to understand “what” we were going for: A set of characteristics that would indicate that a lower seed would win. We use machine learning to look through a large collection of characteristics and it finds a result set of characteristics that maximizes the number of lower seed wins while simultaneously minimizing lower seed losses. We then apply the result set as a filter to new games. The new games that make it through the filter are predicted as more likely to have the lower seed win. What we have achieved is a set of criteria that are most predictive of a lower seed winning.
 
This result set is fundamentally different than an approach trying to determine the results of all new games whereby an attempt is made to find result set that would apply to all new games. There is a level of complexity and ambiguity with a universal model which is another discussion entirely. By focusing in on one result set (lower seed win) we can get a result that is more predictive than attempting to predict all games.
 
This type of predictive result set has great applications in business. What is the combination of characteristics that best predict a repeat customer? What is the combination of characteristics that best predict a more profitable customer? What is the combination of characteristics that best predict an on time delivery? This is different from just trying to forecast a demand by using a demand signal combined with additional data to help forecast. Think of it as the difference between a stock picker that picks stocks most likely to rise vs. forecasting how far up or down a specific stock will go. The former is key for choosing stocks the later for rating stocks you already own.
 
One of the reasons we chose “lower seed wins” is that there is an opportunity in almost all games played in the NCAA tournament for there to be a data point. There are several games where identical seeds play. Most notably, the first four games do involve identical seeds and the final four can possibly have identical seeds. However, that still gives us roughly 60 or so games a year. The more data we have, the better predictions we get.
 
The second needed item is more characteristics. For our lower seed win we had >200 different characteristics for years 2012-2015. We used the difference between the characteristics of the two teams as the selection. We could have used the absolute characteristics for both teams as well. As the analysis is executed, if a characteristic is un-needed it is ignored. What the ML creates is a combination of characteristics. We call our tool, “Evolutionary Analysis”. It works by adjusting the combinations in an ever improving manner to get result. There is a little more in the logic that allows for other aspects of optimization, but the core of Evolutionary Analysis is finding a result set.
The result set was then used as a filter on 2016 to confirm that the result is predictive. It is possible that the result set from 2012-2015 doesn’t actually predict 2016 results. Our current result set as a filter on 2016 data had 47% underdog wins vs. the overall population. The historic average is 26% lower seed wins and randomly, the 47% underdog win result could happen about 3.4% of the time. Our current result is therefore highly probable as a predictive filter.
 
The last step in the process is to look at those filter criteria that have been chosen and to check to see if they are believable. For example, one of the criteria that was Defensive Efficiency Rank. Evolutionary Analysis chose a lower limit of … well it set a lower limit, let’s just say that. This makes sense, if a lower seed has a defense that is ranked so far inferior to the higher seed, it is unlikely to prevail. A counter example is that the number of blocks per game was not a criteria that was chosen. In fact, most of the >200 criteria were not used, but that handful of around ten criteria set the filter that chooses a population of games that is more likely to contain a lower seed winning.
 
And that is one of the powerful aspects of this type of analysis, you don’t get the one key driver, or even two metrics that have a correlation. You get a whole set of filters that points to a collection of results that deviates from the “normal”.
 
Please join us as we test our result set this year. We’ll see if we get around 47%. Should be interesting!
 
If you have questions on this type of analysis or machine learning in general, please don’t hesitate to contact Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com) or Nate Watson at CAN (nate@canworksmart.com).
**Disclaimer: Any handicapping sports odds information contained herein is for entertainment purposes only. Neither CAN nor Cabri Group condone using this information to contravene any law or statute; it’s up to you to determine whether gambling is legal in your jurisdiction. This information is not associated with nor is it endorsed by any professional or collegiate league, association or team. Machine Learning can be done by anyone, but is done best with professional guidance.
 
 
 

Predicting the upsets for the NCAA Men’s Basketball Tournament using machine learning

Contemporary Analysis (CAN) and Cabri Group and have teamed up to use Machine Learning to predict the upsets for the NCAA Men’s Basketball Tournament. By demonstrating the power of ML through our results, we believe more people can give direction to their ML projects.
 
Machine Learning (ML) is a powerful technology and many companies rightly guess that they need to begin to leverage ML. Because there are so few successful ML people and projects to learn from, there is a gap between desire and direction. 
 
We will be publishing a selection of games in the 2017 NCAA Men’s Basketball Tournament. Our prediction tool estimates games where the lower seed has a better than average chance of winning against the higher seed. We will predict about 16 games from various rounds of the tournament. The historical baseline for lower seeds winning is 26%. Our current model predicted 16 upsets for the 2016 tournament. We were correct in 7 of them (47%), which in simulated gambling gave the simulated gambler an ROI was 10% (because of the odds). Our target for the 2017 tournament will be to get 48% right.
 
Remember, our analysis isn’t to support gambling, but to prove the ability of ML. However, we will be keeping score with virtual dollars. We will be “betting” on the lower seed to win. We aren’t taking into consideration the odds in our decisions, only using them to help score our results.
 
We will be publishing our first games on Wednesday 15th after the first four games are played. We won’t have any selections for the first four games as they are played by teams with identical seeds. Prior to each round, we will publish all games that our tool thinks have the best chance of the lower seed winning. We’ll also publish weekly re-caps with comments on how well our predictions are doing.
 
Understand the technique that finds a group of winners (or losers) in NCAA data can be used on any metric. Our goal is to open up people’s minds onto the possibilities of leveraging Machine Learning for their businesses. If we can predict things as seemingly complex as a basketball tournament (Something that has never been correctly predicted), then imagine what we could do with your data that drives your decisions?
 
If you have questions on this type of analysis or machine learning in general, please don’t hesitate to contact Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com) or Nate Watson at CAN (nate@canworksmart.com).
 
Those interested in the detailed description of our analysis methodology can read the technical version of the article found here.
**Disclaimer: Any handicapping sports odds information contained herein is for entertainment purposes only. Neither CAN nor Cabri Group condone using this information to contravene any law or statute; it’s up to you to determine whether gambling is legal in your jurisdiction. This information is not associated with nor is it endorsed by any professional or collegiate league, association or team. Machine Learning can be done by anyone, but is done best with professional guidance.

Re-Blog: Why Visualizing Data is Important

The Tableau data visualization above, found at Tableau Public, shows the “Top 100 Songs of All Time Lyrics”. Click here to hover over each square and see what words were used in which lyrics. Tableau is a software that converts data into graphs, charts, and images.

 
CAN’s data scientists love sorting through piles and piles of spreadsheets and numerical data, but it’s not for everyone. There are some amazing tools that convert raw data into visualizations. They help bring out the story of data, so everyone can understand it.
Here’s an old favorite from our blog about the importance of visualization. It’s a way for us at CAN to gear up for the next round of Tableau students at the Omaha Data Science Academy!
We are still accepting applicants for the third round of the Oma-DSA! You can apply here. We accept applications until three weeks before the start date, and start a waiting list after the spots are filled. 
 

Why Visualizing Data is Important

Now Accepting Applications for Oma-DSA

We are now accepting applications for the June 2017 cohort of the Omaha Data Science Academy!
Apply at Interface Web School’s website.
Are you interested in predictive analytics? Are you applying for jobs involving machine learning? Would you like to learn how to design and create algorithms? If so, the Oma-DSA may be a perfect fit. The Oma-DSA is designed for people who want to add to their data science knowledge for marketable skills. We use hands on teaching from leading data scientists in the Omaha area to craft courses that will boost your knowledge exponentially. More details at can2013.wpengine.com

A Celebration of CAN's Best Ideas

Contemporary Analysis 2017 ebook coming soon!
In celebration of CAN’s forthcoming 10th birthday, we’ve decided to bring out the best ideas from our blog and combine them into an educational ebook. The posts were originally written by some of CAN’s most notable alumni, many of whom have gone on to start their own businesses in data science.
At CAN, we are all about education. We believe in educating our employees through hands on experience and through courses offered at the Data Science Academy. We also believe in educating our clients about who we are and what we do. We want our clients to understand the systems we put in place. We’re proud of our work. What follows is a six step model on how to implement data, taken from our new ebook to be released soon. We hope you enjoy it, and learn something too.
 
CAN’s Best Practices for Implementing Data Science

  1. Define a company’s mission, vision, and values. We want to know how they do business; what values they have that are unique and permanent even when the strategy changes. This understanding set the priorities and filters that guide future discussions.
  2. Define a company’s goals. Goals have clear beginnings and ends and typically are accomplished in less than a year. Goals should be in alignment with the company’s vision for the future, and should be accomplished in a way that adheres to the company’s values.
  3. Define the business question to be answered. The business question is about business process improvement, and should not involve technology or research questions. When answered a business questions should have a noticeable impact on at least one of the three parts of a business; sales, operations and administrative support.
  4. Determine what resources are available. This includes political approval, availability of necessary data, and determining research methodology.
  5. Determine how the models will be implemented. Formal Reports help our clients understand the nuances and details of our research. Marketing Summaries provide our clients with colorful and easy to understand summaries of our research.Visual Dashboards help our clients quickly get the up to date information that need to run their operations. Workflow Integration provides our clients with the ability to use our research to impact the activities and operations of large number of people through the systems they are currently using.
  6. Evaluate the model. Does the model answer the intended business question? Does the model produce results that reflect reality? Does the model produce the expected results?

 
Keep your eyes out for our new ebook. For more information and great ideas, contact Nate Watson (nate@canworksmart.com) or Bridget Lillethorup (bridget@canworksmart.com). 

Predictive Analytics: Why should you use it?

We get asked quite frequently: Why should my company invest in predictive analytics? Why even bother? What can it do for us?

Great questions. Predictive Analytics, or predictive analysis, used to be a competitive advantage. All through the first part of the 2010s, companies used data science, predictive analytics, and machine learning to take their business intelligence (knowing what is happening inside the company right now) and turning it into what is going to happen in the future so we can plan for it before it happens. We call this moving up the data hierarchy. But somewhere in the middle of 2019, we saw a switch. As CAN took companies through our process to get them data-driven decision making, we realized companies weren’t using it for their competitive advantage anymore, they are using it to stay relevant.

Companies now are required to do more with less. They are required to stay relevant to their customers. They are required to know who their customers are and what they want-all before the customer does. Data intelligence is now so common in our lives, companies have to implement predictive analytics to even stay with (not ahead) of their customers. 

Example: With technology developing so quickly, new ways to implement marketing strategies and more effectively reach consumers are popping up all the time. Predictive analytics is one such technique. Praised for its ability to inform companies of future trends and reveal important information, predictive analytics is growing in popularity, with 87 percent of B2B marketing leaders saying they had already implemented or were planning to implement predictive analytics in the coming 12 months. 

So what is it? What is predictive analytics and how do you use it. 

What Is Predictive Analytics?

 
Before fleshing out its benefits, it’s probably best to first explain what predictive analytics is. Predictive analytics is a process for collecting and analyzing current data using Business Intelligence, Machine Learning, and potentially AI.  

How Can Predictive analytics Benefit Marketing and Sales?

 

  1. More Efficient Customer Acquisition

By providing your sales team with specific data, predictive analytics can allow them to acquire new customers and keep old ones more efficiently and with less cost. What journey do they take to purchase a product? What advertising do they respond to? What is it about your product/service that they enjoy the most? All these questions can be answered by analyzing previous data and drawing conclusions about future activity. This information can then be used to determine which customers to reach out and how best to appeal to them, saving time and money.

  1. Determine Up-sell Opportunities

Predictive analytics also assists in drawing conclusions about other aspects of your customers’ buying behavior. Through analytics, brands can better understand what their customers’ needs are and what exactly they’re looking for. This can then be used to tailor the sales and marketing strategy to specific customers.
For example, if you are a fashion brand and have customers who are in need of shoes, it would be inefficient and wasteful to send them an advertisement for a new shoe promotion. Instead, it would be better to send this to customers in need of footwear to maximize on profit.
 

  1. Optimize Marketing Strategy

Not only can predictive analytics benefit brands by helping to find information on customers, it can also help in regards to the market environment. You can learn what time of the year spending peaks, how much people are spending and what they’re spending their money on. This information can assist in the successful execution of marketing strategies by ensuring you are targeting the right people at the right time.
Or you can figure out where to score the most candy on Halloween as we did back in 2013 when we invented a dashboard to help trick-or-treaters.  See, predictive analytics can be fun too.
 
Predictive analytics is an increasingly popular method for brands to more effectively initiate sales and marketing strategies. By providing detailed information about market trends and buying behavior, brands can cut costs, boost profit and increase overall efficiency.
 

Why People Adopt Technology

CAN’s product roadmap is driven by:

  • Who adopts new technology
  • Why they adopt new technology
  • The hurdles they encounter

 
The adoption of new technology starts with play. Play is inquisitive and experimental. Try something, if you don’t like it: no worries, on to the next thing. The goal is to have a good time.
 
Work is about producing. Doing what you say you will, when you say. Work is about being dependable, known, dedicated. There is nothing inquisitive or experimental with work. Work is about doing what is known to produce value.
 
Value vs. Known
The most common fallacy is that value is what drives business adoption. It doesn’t, don’t act like it does. Known is more important than value, especially if value requires change. Anyone who is currently comfortable will take a bird in the hand vs. venture for two in the bush.
 
The Goal.
How can CAN build a product the allows play, but once familiar transforms work. This is one of the reasons that Twitter — Yammer: a similar service — has been able to gain substantial transaction among professionals for sharing knowledge.

  • Open Source: In the technology community your credibility comes from what you have built. To stay current developers have to build outside of work. They use free open source software — with enterprise support, and once they have gain familiarity it often ends up in their work lives. Play becomes work.
    • Examples: Hadoop, R, Ruby on Rails, AngularJS, Backbone.js.

 

  • That Next-Level: Take technology that people love to use in their personal lives to the next level. No new technology, but technology that is work ready. Yammer is a work ready version of Twitter and Facebook. Microsoft Lync is a work ready version of Microsoft Skype. Windows is a work ready version of Mac OSX.
    • Linux, Apache, Personal Computer, Drones, 3D Printing, iPhone

 

  • Academic Bump: Professors provide advice to students as teachers and professionals as consultants. When possible providing software to professors — free or discounted — can spur adoption in businesses as students get jobs and consulting results are operationalized.
    • Qualtrics, SAS, SPSS,

 
MVP to Maturity:
Technology tends to mature from general to specific applications. Flint Scrapers evolved into a specific application of using sharp edges to process materials, e.g. knives, spears, axes, and cleavers.
 

  • Cutting: Flint Scrapers evolved into a specific application of using sharp edges to process materials, e.g. knives, spears, axes, and cleavers.
  • Digital Screen: A modern example is the splintering of digital screens from a lab tool into the variety of digital screens we have today. Matrix of light bulbs, CRT’s, Plasma, LED, Liquid Crystal.
  • Personal Computer: Even the PC has evolved into a spectrum of diversity. Starting with desktop PCs, and moving to servers, laptops, mobile phones, smart phones, tablets, netbooks, cloud computing, and Internet of Things.

It is impossible to fight the extropy — the trend towards order — of the nature of technium — technology as a biological Kingdom. Technology will always fracture from general to specialized. CAN’s product roadmap leverages the nature of technology instead of fights it.

The Advantage of Hiring an Hourly Data Scientist

Any person you hire for your team is an investment. You take careful steps to ensure their fit in the company. You go the extra mile to ensure their skills translate as perfectly as possible to the position you seek to be filled.
Most companies, unfortunately, do not understand the complications of hiring a data scientist. No two data scientists have the same skill set. And there’s not a specific “attitude” associated with data scientists. Personalities range quite drastically. Therefore, you can’t simply choose a “data scientist” and trust that he or she will fit with your company. It’s different than hiring a salesperson or HR representative.
 
So, before you invest in hiring a full time data scientist for your medium to large sized business, there are more than a few things you need to consider.
 
Contemporary Analysis (CAN) offers another option to fill your data analytics needs. We offer a full analysis of your company to determine the projects that will improve your areas of need. We lend you one of our data scientists to use hourly until the projects are over. Less strings attached, less money, higher ROI.
 
Let’s first explore the scenario of hiring a data scientist blindly, and see where it takes this hypothetical company.
 

Scenario 1: Hiring a data scientist full time

 
Perhaps you are the manager of a local bank. You’ve grown significantly in the past 10 years, and you know you have enough data to start analyzing trends with your clients. You’ve noticed that at least three customers drop their services every month, and you wonder if a data scientist could provide an answer to stop this trend.
 
The first step, you believe, is to hire a full time data analyst as part of your team. You write up a short job description and send it out to the hiring sites.
 
Someone with a Master’s degree in data science doesn’t accept a position for less than $100,000/year. Along with $25,000 in benefits, this is quite the price tag. You may understand this and accept it begrudgingly as the only option.
 
What you may not realize, however, is that it takes you about 6 months to find someone with a Master’s in data science and relevant work experience. Then it takes your bank about 3 months to train him in. And finally, 3 months after he is trained in, he builds a software useful to your company.
 
That’s a year from the time you posted the job before you see any kind of ROI, and by then you’ve spent half of a year’s salary on this person.
 
The expenses for hiring and training a data scientist are immense, but there are more than just monetary issues with hiring a data scientist full time. Here is the short list of things you need to consider before you hire full time:
 

  • Not all data scientists will like your company.  Just because someone is qualified on paper doesn’t mean they fit well with your company’s community. At $125,000/year and with 6 months needed before ROI, do you want to take that risk? Personality clash is a real threat.
  • No two data scientists have the same sets of skills. That’s right. “Data science” is a loose term meaning an interdisciplinary field about scientific processes and systems to extract knowledge or insights from data in various forms.” Interdisciplinary is the key word here. Some data scientists may be trained in specific softwares, others may have not even heard of it. When you lock yourself into one person, you may be losing out on knowledges beyond his education.
  • How do you know you need a data scientist? Without experts to examine your needs, are you sure the hire is worth the investment? Perhaps figuring out why your bank is dropping clients every month will only take about a month, but you’ve hired someone indefinitely. What other work will you find for him?

 
There are other options. Consider hiring an hourly data scientist through CAN.  Let’s explore the benefits below.
 

Scenario 2: Work with Contemporary Analysis

 

A Case Study from CAN

CAN has worked with several Fortune 500 companies. In one instance, one of these companies needed assistance creating a software that could predict the failure of telecommunication huts. Loss of several huts slows service to customers, which ends up a nightmare for the finance team.
 
The scope of the project was large: 2,500 telecommunication huts over the Western United States. Over 500 of these locations were in fairly remote areas, making them hard to reach. The scope of this project may have been enough to convince the company that they needed to hire a full-time data scientist, but instead, they saw the benefits of working with CAN.
 
The result? CAN set up a weekly survey process for employees at each station, covering 12 potential problem areas.  
 
The data collected from these services was used to create a “survival model” for each roof. CAN set up a system for predictive analytics with this Fortune 500 company over an established period of time, then made sure the system was self sufficient and did not rely on CAN’s constant attention.
 
With work complete, the company paid CAN and CAN moved on to new projects. Always available for advice, CAN remains a tool for that company, but not at a cost of $125,000/year.

The benefits of using Contemporary Analysis to hire an hourly data scientist

 
To save time and money, and increase productivity, consider these benefits of using CAN for your data science needs.
 

  • With CAN, you can hire a data scientist that fits the skill level needed to attain your goals. If you need someone trained in Tableau, then you get someone with that training. With CAN, there is no risk of a learning curve with your hire.
  • Hiring a part time data scientist means you’re not locked into one set of skills. In a similar vein, when your needs change, instead of paying to train your full time hire, CAN simply assigns a new person to the job. This gives you a bigger skill set advantage.
  • No 6 month hiring process. No training. No wallet-bursting budget. No issues with HR. CAN can write up a proposal for your needs in as little as two weeks, and work starts immediately upon signing the contract. Work starts on Monday, not 6 months from Monday.
  • You will see an ROI in 30 days or less. We at CAN works fast and effectively. It’s our job. Once our systems are in place, you see immediate results. Before you would have a job description for a full time position on your site, CAN will have created analytic software.
  • You don’t need to worry about keeping your full time data scientist busy. Once the project is over, it’s over. You don’t need to worry about filling someone’s time or wasting money on little work.
  • You data scientist won’t quit! A part-time data scientist will do his work and fulfill his duties. You won’t have any surprise 2 week notices in your office.

 
Perhaps you believe your data science needs are great, and you still believe full time is the way to go. Before you plummet down this expensive road, give us a call. If anything, we can give you an analysis of how great your needs are, and you can make a decision from there.
 
For more success stories, see CAN’s website for more case studies at http://can2013.wpengine.com/case-study/.

Featured Posts – Click the Brain
Archives
CAN Jewels