## Tadd and Jefferson go Mining for Data in Wyoming

CAN is helping one of our clients improve their asset management strategy, by building predictive models to determine when heavy equipment is most likely to fail.
CAN’s asset management models will allow our client save hundreds of thousands of dollars each year, by converting emergency repairs into scheduled maintenance.  Imagine the money and time that can be saved if repairs can be preemptively made in several hours instead of the weeks or months it takes to make repairs in the field.
While we could have developed the model from our offices in the Old Market, we needed to make sure that we understood the conditions on the ground. Jefferson and Tadd decided to take a trip to Wyoming and spend a week learning about the machines and interviewing the experts that use the equipment on a daily basis.
Their goal was to make sure that we had political support from the people that were going to use our models, and that we could build balanced models that combine data, theory and math.  The following are some of the photos from their trip.  I hope you enjoy.

## Using Mean Absolute Error for Forecast Accuracy

Using mean absolute error, CAN helps our clients that are interested in determining the accuracy of industry forecasts. They want to know if they can trust these industry forecasts, and get recommendations on how to apply them to improve their strategic planning process. This posts is about how CAN accesses the accuracy of industry forecasts, when we don’t have access to the original model used to produce the forecast.

First, without access to the original model, the only way we can evaluate an industry forecast’s accuracy is by comparing the forecast to the actual economic activity. This is a backwards looking forecast, and unfortunately does not provide insight into the accuracy of the forecast in the future, which there is no way to test. Thus it is important to understand that we have to assume that a forecast will be as accurate as it has been in the past, and that future accuracy of a forecast can be guaranteed.

As consumers of industry forecasts, we can test their accuracy over time by comparing the forecasted value to the actual value by calculating three different measures. The simplest measure of forecast accuracy is called Mean Absolute Error (MAE). MAE is simply, as the name suggests, the mean of the absolute errors. The absolute error is the absolute value of the difference between the forecasted value and the actual value. MAE tells us how big of an error we can expect from the forecast on average.

One problem with the MAE is that the relative size of the error is not always obvious. Sometimes it is hard to tell a big error from a small error. To deal with this problem, we can find the mean absolute error in percentage terms. Mean Absolute Percentage Error (MAPE) allows us to compare forecasts of different series in different scales. For example, we could compare the accuracy of a forecast of the DJIA with a forecast of the S&P 500, even though these indexes are at different levels.

Since both of these methods are based on the mean error, they may understate the impact of big, but infrequent, errors. If we focus too much on the mean, we will be caught off guard by the infrequent big error. To adjust for large rare errors, we calculate the Root Mean Square Error (RMSE). By squaring the errors before we calculate their mean and then taking the square root of the mean, we arrive at a measure of the size of the error that gives more weight to the large but infrequent errors than the mean. We can also compare RMSE and MAE to determine whether the forecast contains large but infrequent errors. The larger the difference between RMSE and MAE the more inconsistent the error size. The following is an example from a CAN report,

While these methods have their limitations, they are simple tools for evaluating forecast accuracy that can be used without knowing anything about the forecast except the past values of a forecast.

Finally, even if you know the accuracy of the forecast you should be mindful of the assumption we discussed at the beginning of the post: just because a forecast has been accurate in the past does not mean it will be accurate in the future.  Professional forecasters update their methods to try to correct for past errors.  However, these corrections may make the forecast less accurate. Also, there is always the possibility of an event occurring that the model producing the forecast cannot anticipate, a black swan event. When this happens, you don’t know how big the error will be. Errors associated with these events are not typical errors, which is what RMSE, MAPE, and MAE try to measure. So, while forecast accuracy can tell us a lot about the past, remember these limitations when using forecasts to predict the future.

## Why I became a Data Scientist at Contemporary Analysis

My name is Branden Collingsworth. I interned at Contemporary Analysis this summer, and joined the team full-time January 2nd, 2012 as a data scientist.  As a data scientist I use tools from econometrics, statistics, operations research, and data mining to solve our client’s business problems.
Why did I decide to work at Contemporary Analysis?
While working on my undergraduate degree in economics and my MBA, I really enjoyed learning about the tools available for the kind of analysis we do here at CAN.  While data science is a growing field, there are relatively few data science companies.  More importantly, many of CAN’s competitors don’t have the same emphasis on business that we have.  CAN approaches clients to help improve their business, not just to show off fancy statistical techniques. That business-centered approach is really refreshing in an industry that is easily bogged down in esoteric discussions of complex methodology. I’m passionate about the focus on practical result that can be easily implemented to make a big impact.
What do I like most about working at CAN?
Because our clients represent a wide range industries and sizes, every project is unique. I am really excited by the opportunity to learn something new about our clients. When we can look at a data set and can discover new knowledge like which customers are most loyal, what causes employee turnover, how economic forces outside the business influence revenue, or where is the best physical location for a business, we are learning something new about our clients’ businesses, something that is valuable and has real business consequences. We can have big impact on our client’s business which makes for very satisfying work.
What do I like most about being a data scientist?
Data science a really interesting and challenging field. It’s such a new field, there is a lot to learn and the methods are changing and evolving. New techniques are becoming more and more relevant all of the time. Working in this field requires knowledge and passion about such a wide range of topics: psychology and human behavior, economics, computer science, marketing, statistics, operations research, decision science, and even linguistics. I love the challenges that come with working on these kinds problems. There is never a dull day.
I think as more businesses begin to understand what we can do for them, we’ll get more and more interesting projects. I like to see people make informed and well thought out decisions, and that is what CAN is all about. I think we are on the vanguard of a big change in the way people think about making business decisions. Now that the data is increasingly available people can either use it or ignore it. Those who put information to its best use will have a significant advantage over everyone else. It’s inevitable, and I want to be a part of it.
I am excited to join the Contemporary Analysis team, and helping CAN’s clients work smart.  You might also enjoy Predictive Analytics and the Evolution of Business Intelligence and the Future Belongs to Data Scientists.

## How to use FRED's Add-in to Quickly get Economic Data

Our data scientists download hundreds of datasets from the phenomenal database of economic research known as FRED.  FRED is maintained by the Federal Reserve Bank of St. Louis and provides open access to economic indicators, employment variables and business trends.  It’s also really useful for getting awesome external variables to use in all your econometric modeling projects.
In this brief tutorial, you’ll learn how to use the FRED add-in to download specific economic time series, adjust the frequency of aggregation and time range, and build a simple graph.
To start, let’s get two time series regarding foreign direct investment and domestic unemployment within the United States.  First, let’s go to the “Data Search” function of the FRED add-in and look for Foreign Direct Investment, then add the series ID.  We’ll also find and add unemployment rate to our sheet.  As long as the A1 cell is selected, the FRED add-in automatically spaces the time series.

Okay, now we have the series IDs for FDI and unemployment in row 1. Row 2 contains the type of data manipulation, row 3 shows the frequency of aggregation and row 4 shows the first date of the series.  Let’s make sure the A1 cell is selected and click “Get FRED Data” to populate the time series we selected.

The data has been populated, but now you’ll notice that the frequency of aggregation and start date do not match.  Before continuing, we’ll want to standardize these values, and the plug-in makes it very easy to adjust both frequency of aggregation and start date.
Because we cannot disaggregate data, we’ll have to change the UNRATE frequency to quarterly.  Click cell C3 and use the “Frequency Aggregation” to choose quarterly.  We’ll also need to set UNRATE to the same start date as BOPIPD, and we can do this manually by just changing the value in the cell to “1/1/1960” or any other start date you choose.

Click “Update Data” and the series refreshes with the FRED database.  At this point, CAN typically exports the data for use in SPSS or Gretl. For those using Excel, these time series can be referenced by other sheets, and the FRED add-in is a great way to refresh your spreadsheets and analyses as new data is released.
The FRED add-in also has some built in tools for quickly graphing datasets.  Let’s try it out by going to “Build Graph” and selecting “Create Multiple Series Graph”.  Select the series IDs you would like to view, click secondary axis, and then “Build Graph”.

Whether you use it for business or pleasure, the FRED add-in is a great way to rapidly download, update and view economic time series data from the Federal Reserve.
Also, the next time you’re out and about, should you find yourself interested in the health insurance coverage rate or the privately owned housing starts in Illinois, FRED publishes an app for your iDevice so you can get some of the best economic data sources “to go”.

## Why I work at Contemporary Analysis

I get asked why in the prime of my career I went back to working for a startup company, run by young talent, in a field on the cutting edge of analytics.  It was because, for the first time, I felt like an owner had a vision I could get behind.  He wanted to be something better, do something different, and wanted me to help him create something magnificent.  I saw it as a unique opportunity because, for the first time, I found a true entrepreneur.
Most people define anyone that starts a business as an entrepreneur, which is actually not accurate.  That definition is the definition of a business owner.  An entrepreneur is a mindset, a way you do business and how you look at problems.  I knew Grant, the CEO of Contemporary Analysis, was different when he told me he was going to turn down being bought by 2 different companies. That alone puts you in a different class.  Most owners would sell if they ever got the chance.  In fact, Grant has no intention selling off or IPO’ing his  juggernaut of a company.  In fact, he wants to be a privately owned Fortune 500 company headquartered in Omaha, NE.  Grant not only is an entrepreneur, but he has vision.  And big vision at that.
People have told us that we can’t do it, and yet, we keep doing it.  We will easily have over \$3 million in revenue for 2012, double that of last year and 10x’s that of 3 years ago, and already have contracts with 2 of the Fortune 500’s in our hometown.  We are hiring, building out, releasing new products, and thinking about how to do business better.  I have worked at a couple of startups in my life, but this is the first that does that kind of reflection and planning.  Our goal is to not only grow, but grow in a way that is sustainable and scalable by taking the time and energy to do things right the first time.  We want to build our products, people, systems and processes so they last, instead of being obsolete the next year.  While this requires extra time to research, tinker and think about what the future will look like, this philosophy allows CAN to grow without having to look back.  I wanted to be part of a company that has that kind of philosophy.
This philosophy has appealed to me.  I used to think I needed all the answers before I could recommend change.  Through the books Grant, he wants us to grow as humans and executives, has given me to read, I realized that I didn’t need all the answers before tackling a problem.  In fact, our whole company is based on the fact the answers that are out there are not the best way any more.  We have to invent new ways to stay ahead of competition or risk being a follower.  That understanding changed what I defined an entrepreneur as.  No longer did I see it as someone who likes risk, who lives on the stress created by it, and who loves the idea that while he or she may fail, the reward for winning is enormous.  Instead, I began to see an entrepreneur as someone who isn’t willing to accept things as they are as the best way.  In the hands of a true entrepreneur business is the best platform to change the world.
My philosophy also changed how I viewed risk and how I found that entrepreneurs viewed risk.
In the book Breakthrough Entrepreneurship, Harvard Business Professor Howard Stevenson defines entrepreneurship as “the pursuit of opportunity without regard to resources currently controlled.”  From working with Grant, I know that this is true.  He has the unique ability to take action that require using resources that he doesn’t have and sometimes that don’t even exist.  For example, he founded Contemporary Analysis in 2008 well before you could even Google “big data”, “data science” or “predictive analytics”.
Also in the same book, Jon Burgstone, summarizes a true entrepreneur’s ideology:

Every time you want to make any important decision, there are two possible courses of action. You can look at the array of choices that present themselves, pick the best available option and try to make it fit. Or, you can do what the true entrepreneur does: Figure out the best conceivable option and then make it available.

This is what makes Contemporary Analysis great, our leader does not look to see what choices are available, instead he looks for the option that would be best for the business, and then goes and finds out how to make that option available.  This takes leadership that is empowered, and empowers everyone they work with to question why everyone has always done everything.
For example, we go directly to buyers and  talk only to people who have the need,willingness, and resources to buy what we sell.  Also, we research every decision we make from chairs and desks to computers and phone systems.  We find the system that makes sense to us and then find the vendor who sells it.  We certainly don’t wait for cold calls, don’t put up with bad customer service, or buy from poorly trained sales people.  We do things differently.
I am excited to work here.  I have no pedigree of how things have been done for years to try and get out from under.  I have the freedom to help my clients answer the questions that they have been struggling to answer for years, and help them make better decisions on how to make their businesses succeed.  At CAN I am not limited by the technology or resources available, but empowered by the mandate to help every client work smart.
One more quote from Professor Stevenson:

When you don’t have the cash to boss people around, like in a corporation, you have to create a more horizontal organization. “You hire people who want what you have and not what you don’t have.”  In other words, entrepreneurs offer their team members a larger share of a vision for a future payoff, rather than a smaller share of the meager resources at hand. Opportunity is the only real resource you have.

And this place is one of the best opportunities to make a difference in the world I have ever seen.
That’s why I work here.

## Why Customer Segmentation Will Improve Your Marketing

I have been married for almost 10 years. I have gotten good at buying gifts, even clothes. I can match blouses and jewelry, dresses and belts, shoes and jeans. My secret? I look at the mannequins. Mannequins were designed to attract your attention in a store window and to lure you into the store. Then inside, they are designed to show you some of the combinations you could make with their clothes. Essentially they are designed to get you to buy more than one thing. This is perfect for guys. All we have to do is point to the mannequin and ask a store sales person where we can find those certain pieces of clothing.  We can buy the mannequin lock stock and barrel and end up with a complete outfit for our brides.
This is actually what I do when I do marketing for companies. Marketing comes in two forms. Marketing to your current clients and marketing to people who aren’t your current clients. I view each as a completely different problem while most marketing companies do not. They have only have one mannequin with the same clothes for every store. Children’s clothes, women’s clothes, men’s clothes, all the same mannequin.
Unfortunately most of the marketing plans I see are all the same. The commercials are different but the plan is the same. They do direct mail, email, print campaign, radio, and if they have the budget, TV ads. Their view is that all people are potential clients which as you who follow the blog know, is not true.  The problem is that they have no idea who their target audience is, and how to market to them. They found a campaign that reaches a lot of people and they sell you that one. It may have different clothes, but it is the same mannequin. It is essential to use customer segmentation to improve marketing.
This is critical to understand. Gone are the days when people walk down a street with stores on it, see a mannequin, and make the decision to go inside. Now, people browse their favorite stores online and are more loyal to store brands like Gap, Old Navy, JC Penny’s, Younkers, and Von Mauer. We have to get a new mannequin.
Mannequins, models, are going through a huge makeover right now.  In fact, if you do a job search for entry-level marketing positions, they are looking for things like statistics, modeling, and analytics in your background. What they are struggling to come to understand is the idea of marketing to one person at a time. Marketing now needs individualized messaging.
I don’t have to go to far back to find when this was still science fiction. Minority Report (2002) shows a seen where character John Anderton is walking through the mall and cameras recognize, data base search, and present relevant ads, based on his buying history. To see a version of this now, just go to iTunes. Any song I buy will create 5 suggestions of other songs I might like, based on my buying history, songs with similar tempos, themes, or by the same artist. With a predictive analytics company like ours, we can do the same thing for your product.
First you have to understand something about how we view marketing. This is the key philosophy that makes us different. Marketing is using a different medium to get in front of your target audience for the sole purpose of selling to them. Selling to them. The sole purpose.  EVERYTHING else is branding. Branding is fine. We do a lot of branding. We just don’t call it marketing. That key aspect alone will forever change the way you do marketing. When you use it as a sales tool, you will no longer accept marketing with no measurement of who looks at it, how it is crafted, and where it is put. It will focus your marketing on only the people who have an actual chance to buy from you in the marketing cycle. This does not include: people who might buy, people you think need to be introduced to your product, or someone who might have a need someday. What marketing with the intent to sell does is only spend your time and money on the people who are ready to buy now.
How do we do this? We use math and econometrics to understand the buying process. What causes people to think of buying a product like yours? What series of events leads to needing a product like yours? Who are the people in a company that make the decision to buy a product like yours? These are important things to understand in the process. Don’t market to someone who doesn’t need what you are selling, isn’t high enough up in a company to make any kind of decision, or hasn’t experienced any kind of problem that your product would fix. It wastes time. Why would you ever market a phone system to a sales person. They can’t buy it. Why would you ever market a copy machine to a company of 4 people. They can’t afford it.
I have heard the argument that you need to be in front of those people now so they think of you when he problem arises. Valid argument. However, because of the new view of marketing I just gave you, that states that marketing is used as a sales tool to find people who are ready to buy now, you can see that this is branding. Branding is necessary, we do branding; however, if you have a company like us who markets you correctly, i.e. to the right people, at the right company, at the right time, you don’t even need to do that.
Example.  Name someone who makes shingles.  Not someone who installs them, someone who makes them. Why don’t you know? Shingles protect everything in your home and are the first thing damaged in a storm. If you own a house, shouldn’t you know who the best, worst, and middle of the road companies are in the shingle business? The reason you don’t need to know, is that you don’t need shingles. When the time comes and you need shingles, you will do your research and find a company that installs the type of shingle you want on your home.
It’s the same for business. People don’t really need to know about your product until they start doing research about your product. Key point: Up until now, you had to guess when companies were going to need you and you had to brand so that people remembered you when that time came. Now, with analytics, we can predict when to contact companies because we can predict when they should be beginning research on products similar to yours.

## Branden Collingsworth Joins CAN as a Data Scientist

Branden Collingsworth has officially joined CAN’s team as a data scientist.  Branden interned at Contemporary Analysis this summer as a data scientist.  In December, he just graduated with an MBA and Law Degree from the University of Nebraska at Lincoln.  This is on top of a bachelor’s degree in economics from UNL.
Branden is a true data geek as you can see.  He loves scrapping data from the internet, designing surveys, building databases, testing models, and creating data visualizations.  He will be working on models to power our clients and the next generation of CAN’s products.  He is also looking forward to demonstrating his technical chops on CAN’s blog.  We are very excited to have Branden on board.

## Contemporary Analysis Job Board: Data Scientist

Contemporary Analysis (CAN) is a global data science company based in Omaha, NE that provide predictive analytics to multiple Fortune 500 companies and small businesses in the United States, Europe and Asia.  CAN is focused on making analytics accessible to companies of all sizes and industries, and offers standard products and professional services.
The purpose of this position to help expand our professional services team.  CAN’s professional services team is responsible for developing solutions for CAN’s largest and most unique clients including Fortune 500 and Global Fortune 50 companies.  The by-products from the team’s professional services are used to create new and enhance existing CAN products.
Each Data Scientist is responsible for working with a CAN Sales Executive to understand each client’s business, define projects to help clients achieve their business objectives, use data science to develop solutions, and present results as a written report and presentation.  Data Scientists must be familiar enough with statistics and computer science to develop creative solutions, and have the written and verbal skills to develop compelling reports and presentations.
The Data Scientist will be responsible for:

1. Working with the Sales Executive, the Data Scientist will work at all executive levels to help design solutions that will meet the needs of the client.  To be able to design creative solutions that go beyond simple client feature requests will require Data Scientists to have an advanced familiarity with modeling, mathematics and statistics.  Also, during the discovery phase the Data Scientist will coordinate with the COO and Sales Executive to develop project budgets.
2. During the implementation phase, the Data Scientist will work with other CAN Data Scientists and vendors to implement the Analytical Blueprint, and monitor client results, and adjust the Analytical Blueprint to optimize the client results and experience.  Since CAN offers data science solutions as a service, implementation can last from a month to several years.  This creates a unique project management scenario that requires continuous monitoring to ensure that the project does not fall behind.
3. The Data Scientist working with the Sales Executive will maintain a positive relationship with the client, ensure ongoing deliverables are met, and assess any future need for CAN’s services.  In some cases the Data Scientist will need to record best practices from the project, or write specific business issue case studies.

Qualifications:

1. Minimum Education:  Bachelor’s Degree from an accredited institution.
2. Able to maintain focus in highly-charged environments and manage competing priorities.  This includes experience managing multiple projects simultaneously against tight deadlines
3. Experience solving business issues with the consultative application of advanced analytics and/or information technology
4. Strong presentation and client management skills – up to the highest executive level.  This includes being able to explain highly detailed and technical subject matter to non-technical audience, and being able to present and sell analytical concepts to clients
5. Experience delivering insight to internal or external clients by building on a technical foundation that includes a conceptual understanding of modeling techniques and a basic grasp of statistics.   Ability to use analytical applications to solve a practical problem, in an on the spot high-pressure situation
6. Experience in project management and managing a team to meet a deadline, manage client expectations, and maximize client satisfaction relative to solution profitability
7. Functional experience in one or more of the following areas, selling analytic services, project management, data science product development, pre-sales, technology implementation, and/or account management
8. Technical foundation including one or more of the following areas, Bayesian statistics, multiple regression analysis, and/or econometric modeling is preferred but not required.
If you are interested in learning more and applying contact Grant Stanley by phone at 866-963-6941 #801 or connect with him on LinkedIn.  Please have your LinkedIn profile up to date before applying.

## Dashboard Design: Bullet Graph vs. Bar Chart

We invest a lot of time and energy communicating our research, because unless we can effectively communicate our findings they are useless.  When the goal is to communicate the most valuable information with the least amount of ink that can be understood with the least amount of effort.  For your reference, our major influences are Deirdre McCloskey on writing, Stephen Few on dashboard design, and Edward Tufte on data visualization.

Recently, CAN conducted a customer satisfaction survey for the Georgia Regional transportation Authority.  In addition to developing, deploying and analyzing the customer survey, CAN went above and beyond to improve how GRTA reported the results of their annual survey.  In this post, I will explain why we used a modified bullet graph instead of a bar chart to answer the business question.

The purpose of the graph is to help answer the business question of how does GRTA compare to two competitors across 17 different metrics.  While GRTA needs to continually improve, for the purpose of  answering the business question the exact score was not important, but instead the difference between each competitor and compared to others how does GRTA score.  Comparing each company by metric was the main influence behind the design on CAN’s graph.

The Original Graph

The CAN Graph

– In the original graph, the bold vertical lines focus the viewer how each metric scored, by encouraging the eyes to go up and down.  In the CAN graph, the light gray horizontal lines encourage the eyes to travel left and right to compare each companies performance.  Also, we used light gray lines so that we did not dominate the graph with supporting data.
– In the original graph, there is no simple way to show the spread between the different competitors, besides comparing each line together.  However, it important to know how competitive each metric is when answering the business question.  When designing the CAN Graph, we darkened a length of the light gray horizontal lines to show the minimum and maximum score on the service quality index.  This
– In the original graph, using four different colors made it difficult to make a memorable distinction between each company, take up an unnecessary amount of space, and impossible for color blind (10% of males) to make distinctions.  Using different shades of gray CAN made it easy for everyone, including the colorblind, to distinguish between different companies.  In addition to adding an additional way to differentiate between companies, using different shapes allowed for better distinction when multiple companies score close to each other.
– In the original graph, the overall low graphical quality such as broken vertical lines, faded colors and pixilated font created an unnecessary distraction, and reduce the credibility of the results.  While this might seem petty, producing graphs that are crisp and well designed help develop trust with the audience.  In the CAN Graph, we produced the entire graph in black and white, so that the report can easily be reproduced on either a color or black and white printer.
If you enjoyed this post, visit these other related posts from our blog:

## Presenting Predictive Analytics

The nature of forecasting the future makes presenting predictive analytics unique and challenging.  There is no flashy server or dashboard that will make presenting analytics any easier.  There is only a model that tells a story about the future of users’ business, customers, non-customers and competitors.  While models are very valuable they are not your typical business intelligence artifacts.  To produce a meaningful return on investment you need to translate the details of the story into results that can be applied to a specific business question.

While you can not replace sound scientific and statistical methodology, CAN has found that users don’t care about a model’s Durbin-Watson, standard error, or R², or they are not familiar enough to properly understand the statistical nuances.  The key to proving that a model works and getting political support required for implementation is to ask the experts if the story the model tells reflects reality.  It is also valuable to prove that a model works by letting the prove itself over time.

It is important to note that predictive models typically do not provide solutions to business questions, but instead often offer incomplete answers and important insights.  When presenting predictive analytics your audiences expectations should be set on becoming less wrong, instead of finding the perfect solution.  CAN finds that users appreciate our philosophy of Less Wrong.  While it seems counter intuitive, our lack of hubris builds confidence in our models and sets realistic expectations.  The basic principle behind, Less Wrong is that in business winners are not right, they are simply less wrong.  There are no perfect answers in complex sciences, such as data science and predictive analytics, only less wrong answers.  The goal is to reduce the uncertainty of making the wrong decisions, not thinking uncertainty can be eliminated.

In conclusion when presenting predictive analytics don’t be afraid to kill your darlings.  If you can not justify an element of your presentation get rid of it.  This will help you focus your presentation, your audience will listen and the results of your hard work building predictive model will get implemented.