Steps in going from BI to Predictive to AI

Data Hierarchy

Machine Learning, Business Intelligence, and Artificial Intelligence are buzz words that are being thrown around at planning sessions a lot these last few years. They have real meanings that most people don’t understand. They are using them to mean “more sophisticated at using data to make decisions”. And while that is right, there is a right way and a very wrong way to lead your company down the path of using data to make data-driven decisions. After 10+ years of helping companies understand what that path is, we wanted to help you the reader understand the order and the real definitions of the buzz words. This way, you can not be educated, but you can give your company the direction it needs to go up the Data Hierarchy.

Data Science is in an integral part of everyday life at this point and you just don’t know it.  As a society, we’re generating more data than ever before. Smart businesses are tapping into that data to do things that were previously unheard of.

Take Facebook for example.  20 years ago Facebook didn’t exist, now people are addicted to it and seemingly can’t live without it.  But even then, people are still weary of the dreaded “Facebook algorithm” that cuts 50% of the posts you might want to see.  That algorithm is data science at work

That’s right, you’ve generated enough data that Facebook wrote some code to cut 50% of your friends out of your life.  You didn’t interact with them enough, they didn’t post enough, there are hundreds of reasons why that system feels like your college roommates buddy from down the hall with the cat doesn’t need to be at the top of your feed.  It also looks at what you read on a regular basis and then tries to predict what you would want to read next.

So to help people truly understand what we do as a company, and to help you hire us.  (let’s be honest) We put together a series on the sophistication of data usage as businesses mature that we call the Business Data Hierarchy.  The goal of this series is to help people and companies understand where they are now, and where they could go with data driven decision making.

We’ve written the series to be informative and insightful, with a splash of humor mixed in to keep you awake through the whole process.  If you like it or if you feel like someone needs to read this…we ask that you share the info or…better yet…get them in touch with us and we’ll bring the show to you!  The pyrotechnic guys tell us we’ll need a 25’ ceiling for the fire and lasers…Hey, it’s a good show.

…this will also be the longest post of the entire series, don’t worry!

When you look at Data, and what it can do for you and your company, there are six different levels of Data Hierarchy.  It’s a hierarchy because each level is codependent on another.

These levels are important to understand because jumping from one to another, without a long term goal, can be cost prohibitive.  This is even more devastating when you finally get your executive level to believe in the power of data, and it breaks the bank in the execution.

“Skipping” leads to “Skippers”

There are consultants with lovely summer and winter homes who have paid for them “skipping” to the end and then back billing/building the solutions.

To insulate against catastrophic failure of a data-driven initiative we at Contemporary Analysis (CAN) have created a Data Hierarchy to help companies understand where they are and more importantly, where they are going. This understanding helps drive the strategy and vision needed to be successful.  These levels are

  1. Reporting:  Tracking and “What happened?”
  2. Business Intelligence:  “What just happened?”
  3. Descriptive Data:  “Why did that happen?”
  4. Predictive Data:  “What is going to happen next?”
  5. Prescriptive Data:  “What should we do to make it happen?”
  6. Artificial Intelligence:  “Automated recommendations”
  7. Omnipotent AI (Skynet): “Automated Doing of its own recommendations” a.k.a. “Terminator Movies”

Every business is trying to move “forward”.  If you work for a company whose response is anything but “forward” or “more” start polishing up your resume, you’ll need it sooner than later.

Most companies are so focused on today’s business they don’t know what the path to the future looks like.  

Imagine you tell a CEO you’re going to walk a mile to get another 1 million in sales.  Most CEO’s would look at the distance and agree that a short distance is worth the time and effort to get the additional revenue.  

The sprint to 1 million

You and your team(s) work feverishly to get from point A to point B as quickly as possible.  You cross the finish line and there’s your 1 million. The CEO checks the box and there it is, project complete.

Now imagine if you told a CEO you’re going to get 20 million in sales.  After the confused look and possible laughing subsides you tell them how.  Instead of a mile, you have to walk 15 miles. But you’re not going to do them all in 1 year.  Instead you’re going to walk that distance over 5-6 years. You’ll measure success with each mile you pass and each mile will result in ROI for the company.

Mountain road in Norway.

You also let them know that you can cover the ground when and how you want to.  If one mile is too tough to work in the time and effort this year, you postpone it to the next.  If, as you’re walking, a business need changes and you need to walk a completely different direction you can.  The steps remain the same but the road you use to get there is slightly different.

Understanding the long term goal allows you and your team(s) the ability to work smarter not harder.  You’re building toward the vision at every turn so you have little to no wasted effort. And, because you’re building over time, you can staff accordingly for each mile and access the right talent at the right time

Part of CAN’s role is being that “Data Visionary” that helps you see over the horizon with possibilities.  The hardest part of this whole process is getting the decision makers in an organization to embrace the culture of change.

“We’ve done it this way for __X__ years and it works just fine.”  Is becoming the leading indicator of a dying business. If you’re 40 years old the technology available today wasn’t even conceptualized when you were in grade school.  “We’ve done it this way for 50 years…” means you’re already behind the curve.

The posts that will follow will walk you through each level of the Business Data Hierarchy concept.  We’ll be sure to include examples that are relatable. The subject matter can be a bit dry, so we’ll also make sure we include some humor along the way to keep things lively.  We’re a Data Science Consulting firm..not monsters after all.

At any point, feel free to reach out and let us know how we can help you through these steps:

Reporting

Business Intelligence

Descriptive Analytics

Predictive Analtyics

Prescriptive Analtyics

Machine Learning

Artificial Intelligence

Software engineers working on project and programming in company

Python or R – CAN’s Advice on How to Choose

The age-old Python or R debate always rages here at CAN. While we have a pretty impressive staff of data scientists who all have their individual quirks (Some like to run in their spare time, some bird watch, some of them binge-watch obscure sci-fi), they have something in common. They work hard, around the clock if they have to, to accomplish projects, and put their best foot forward for clients.

But, they do differ in one big way. Some use Python and some use R
So, today, we let them debate: Python v. R — which one is for you?

If you’re completely new to the computer programming discussion

Webopedia defines computer programming language as “A vocabulary and set of grammatical rules for instructing a computer to perform specific tasks.” How does one talk to computers? In code. It’s gets tricky, however, because there are a lot of different codes that computers can understand. There are not just 10, 20, or 30 different computer languages that exist. There are hundreds and hundreds of languages. You can browse a full list here. Python and R are just two of the most popular for data science.

For some additional help, we’ve compiled a list of terms that will help you understand the background of this topic (inspired by LinkedIn).

Programmatic thinking. It’s exactly what it sounds like. It’s a way of thinking that you have to turn on when you learn computer programming. It means seeing the large problem as a series of smaller steps. It also requires being able to transcribe ideas into a code that computers understand.
Compiled and interpreted languages. Compiled languages require the user to compile and build code before it can run. Interpreted languages can read code directly without compiling.
API. API stands for application programming interface. Basically, it’s instructions put out by the program designers for accessing the full functions of the language and softwares.
Pseudocode. It’s like code, but not. It’s shorthand for standard code and helps programmers with outlining before they dig into bigger coding tasks.
Armed with a few definitions, let’s jump into the debate.

Python v. R: Where to Start

First, we’re going to hit at the hard truth. In order to succeed in the data science world, you need to be familiar with both languages (or at least good at one and familiar with the other). Particularly in Omaha, where CAN is headquartered and data analyst jobs are highly competitive, knowledge of both languages gives you a leg up on the competition. In fact we have training classes through the Omaha Data Science Academy that teach both. 
But that’s not what you want to hear, we know that. So we’re still going to break the two down and tear them apart in comparison.

Both Python and R are good at . . .

Python and R are both free to download, and the learning curve is about the same once you’ve already mastered some basic programming skills. They’re both impressive to master, so in that way you can’t go wrong. No one will shame you for mastering one and not the other.

Python Positives

Python is know for data munging, data wrangling, website scraping, web app building, and data engineering.
Let’s say you’re tackling a project with a lot of disparate data. Maybe you’re collecting sales data from the past 5 years for a company to help them predict new trends. The problem is that the company has had several turns in management, and that data is stored in multiple locations. Python would be more helpful in this situation. It succeeds as a software for gathering data from many databases and making it one.
If you already know Java or C, Python is going to come more naturally for you. The similarities coincide for your benefit.
It is an object-oriented programming language (see above), so it’s easy to write large scale and robust code. And, some people say there is data to prove that more business owners are looking for those proficient in Python over other languages.

Positives of R

R has better visualization tools than Python. It’s also been around a lot longer, which means there are more online support communities than Python (think: APIs). There are over 5,000 softwares you can find on the internet to run alongside R to boost its capabilities.
R is known for being great at statistical modeling, graphing, and converting math to code.
Perhaps you’re working on a project for a company that has a nice and neat database. The problem is, it’s difficult for most people to look at a bunch of numbers and understand trends. R is the most helpful for these situations, as it can successfully take data and make it into graphs and pictures for others to understand it.

Let’s talk to CAN

In attempt to settle this debate, we’ve brought in some professional opinions.
Matt Hoover, Director of Data Visualization, Flywheel: Matt sees R used as a more efficient math language, emphasis on the word “math”. It can achieve in one line of code what Python needs several lines to accomplish. R’s specialty is research, statistics, and data analysis, so it’s more efficient on the stats side. He continues, “Python is way more flexible as a language overall and can be used to do a wider range of things.” Matt sees R used in more learning settings than on the field, and sees Python used for more high-level data science.
Essentially, R is easier to learn and better on the math/statistics side, but overall Python has more capabilities.
Gordon Summers, Senior Data Scientist, CAN: Gordon’s advice is a bit more far-reaching. He says, “The hardest thing about picking between Python and R isn’t choosing which one to start learning, it is in choosing when it is time to stop learning it”. Basically, Gordon’s advice is to not focus so much on which language to master, but instead realize that something new could come along at any time, so don’t invest too much time in one.

In summation

If you work consistently with clean data, and your goal is to dissect the data and creative visualizations from it, go with R. If you have messy data that you need to “wrangle,” Python is more helpful.
Still stuck? Answer the following questions to help you navigate the Python v. R world.

  • What are your teammates using? Maybe you just got a job in data science and can’t decide which one to learn. Look around – what are you friends and fellow employees using? Are they successful in their work?
  • What are the data trends of you job market? It wouldn’t be inappropriate for you to call up a company who just posted a data science job and ask what they would prefer. Get a feel more the market, decide from there.
  • Whose data are you working with? Is the data messy and needs to be gathered? Python is your answer. Is your data clean and needs to be visualized? Go with R.

You can’t go wrong

Neither Python nor R is perfect. Both will have downfalls, but there are packages that exist to help alleviate those pains. Examples of libraries that can help alleviate problems can be found at https://elitedatascience.com/r-vs-python-for-data-science.
To summarize more thoughts by Gordon Summers, the IT world is changing. He says, “To do development is to use the application and to use the application is to do development. There is no IT person and no business user. The person is both a developer and a business user. One of the reasons that larger organization have struggled to embrace Python and R is that frequently there is an organizational barrier between IT and Business.” When you enter the programming language, data science, or IT world, be ready to be flexible. Businesses are still struggling to figure out where IT fits in their company. The best advice is to be adaptable and to understand where you are going so you can understand the best way to get there.

Oh, and not to complicate the entire argument, but about the time we get the R v Python debate settled, Scala might just come from the back of the pack to win the whole thing. After all, Twitter is in part written in Scala and Hadoop choose to write Spark in Scala.  Social Media Speed and Big Data Prowess? Perhaps this dark horse isn’t the long shot after all.

Featured Posts – Click the Brain
Archives
CAN Jewels