Last Thursday, Data Science on the Plains hosted their first meetup event. The event started with peer to peer networking over pizza and beer followed by a presentation by Union Pacific’s Jason Hochwender who spoke about the multiple data analytic projects currently happening at UP and his own pursuit of creating a data driven culture. 25 attendees from a range of different companies across Omaha were able to brainstorm strategies on how to lead data science initiatives into our own organizations and navigate politics that may arise in the process. Come join us at our next meeting in early November and help us continue to build the data science community in Omaha.
Live Well Omaha is a nonprofit that leads a coalition of organizations to collectively prioritize health in Omaha. Live Well is paving new ground in the nonprofit sector with their data driven mindset. They approached Contemporary Analysis with an idea: to visualize their health data with interactive storytelling maps, tracking the progress of different health metrics in Omaha such as child healthy weight.
Join them this Thursday, June 21 to discuss possible solutions to health issues in Omaha. Discover how the right visuals and story can help people digest information that may be hard to glean from printed statistics alone. Look for our case study about Live Well in the coming weeks to learn more. Register for the event: Map Chats with Live Well Omaha.
Here at Contemporary Analysis we believe good visualization is the key to understanding data and making data-driven decisions from it. We have worked with multiple companies (including nonprofits) over the years to provide valuable visualizations of their data, both at the macro and micro levels, to help them use their data more strategically. While technologically agnostic, we do recommend Tableau for those users who are either new or non-technical. We offer classes on how to use data visualization through our school, the Omaha Data Science Academy.
Data science has been named America’s hottest job by an article written in Bloomberg.
In recent years, there has been an explosion in the amount of available data and an advancement in tools that can tame and harness it. Companies are counting on data scientists to make discoveries within the data, yet there is a major shortage of people who are skilled in this area. The article recounts how this scarcity is causing companies to pay incredibly high wages to attract these sought-after professionals.
Programs for aspiring data scientists are difficult to find within traditional institutions because data science has only sprung up in recent years. Nontraditional educational routes such as the Omaha Data Science Academy has tried to fill this gap. Interested in joining the next cohort? Apply now.
The GI Bill is now accepted by the Omaha Data Science Academy. Veterans, in partnership with the Interface Web School, can now use their GI Bill® to receive relevant tech training at the DSA.
We want to help veterans jumpstart their career transition by preparing them with the necessary skills needed for a successful and profitable job in data science. Learn from practicing data scientists and get a leg up on college grads.
Cohort 1 starts with Introduction to Data Science classes on July 11. Apply now.
Global Data Protection Regulation (GDPR) is the EU’s new data regulation and it applies to everyone who has customers that are citizens in the EU. That means it applies to almost any internet business.
These new regulations may completely change how your business is required to handle user data and sometimes even how you operate.
Your organization could be fined up to 20 Million pounds ($28M US dollars as of today) or 4% of global turnover (whichever is greater), so pay close attention!
Seven New GDPR Requirements
Here’s a quick summary of seven new regulatory requirements and how they might affect you. Before we get started, here are two important terms you need to understand:
Data Controller: Any entity that “controls” the data by deciding the purpose or manner that the data is or will be used.
Data Processor: Any person or group that “processes” (obtains, records, adapts, or holds) the data on behalf of a controller.
When asking users to consent to your terms, you cannot use indecipherable terms or conditions documents that are filled with legalese. As a user, I’m a big fan of this; from a company’s perspective, this can be a gray area. Read into the official documentation (linked at the end of this post) for details.
On top of clarity, you also need to ensure that it’s just as easy for users to withdraw their consent (after giving it, not just when you present it initially) as it is for them to give their consent.
2. Breach Notification
In the event of a data breach, you have to notify any data controllers and processors within 72 hours. If a data controller determines that the breach “is likely to result in a high risk to the rights and freedoms of individuals” then they also have to notify each individual user that was affected.
These notifications must contain at least:
The nature of the personal data breach (number of categories of subjects and records affected).
The Data Protection Officer’s contact information.
Describe the likely consequences of the personal data breach.
Describe how you’re going to address the breach.
Thankfully you are allowed to provide this information in phases if it isn’t available all at once.
3. Right to Access
Your users (or “data subjects”) have the right to obtain a free copy of their personal data. In addition, they have a right to receive a confirmation of their personal data being used or processed.
If you’re wondering what providing “a free copy of their personal data” looks like, check out how Google does it1.
4. Right to Be Forgotten
Users (data subjects) have the right to have their data erased from the data controller “without undue delay” if:
The controller doesn’t need the data anymore.
The subject uses their “right to object” to the data processing, or withdraws their previous consent, or was a child when the data was collected.
There is a legal requirement for the erasure.
The controller or processor is processing the data unlawfully.
As always, there are a lot of exceptions here, be sure to read the detailed resources below if this applies to you.
5. Data Portability
Not only do users need to have access to download their data, you should also offer different tools for portability; such as APIs alongside a direct download. Direct downloads should be offered in multiple formats, again, Google is a great example here1.
This could mean that you need to allow a competitor to be able to directly import your data if the user requests it.
Thankfully, you’re not responsible for protecting the data copy that has been received by the user.
6. Privacy by Design
This means you need to be thinking about data protection all the way down to the design of your internal systems.
Privacy by design calls for data protection in infrastructure too, meaning there may even be non-technical changes you need to make to your company structure. Now is a great time to look for vulnerabilities in your internal practices and even consider getting a security audit.
7. Data Protection Officers
Qualified officers have to be appointed in any public authority or large organization (over 250 employees) that monitor or process personal data.
If your company qualifies, you should dive into the qualifications and start looking for an officer right away. These regulations go into effect May 2018.
If you’re doing business with EU citizens it’s in your best interest to get on top of these new regulations as quickly as possible. Hopefully, this article provided you with enough detail to know where to start and what to expect.
Contemporary Analysis (CAN) and Cabri Group and have teamed up again to use Machine Learning to predict the 2018 NCAA Men’s Basketball Tournament. This is different than last year as we are picking the entire 2018 bracket instead of just upsets.
Historically, only 26% of tournament’s games end in an upset (this includes games from all rounds). That’s 17 out of 64 games. Last year we did really good. Only failing to predict 3 upsets and getting 50% of our predictions right. We are going to need to improve a bunch to win that 1M/year for life from Berkshire Hathaway–including that wee bit about having to work for Berkshire Hathaway to be eligible. This year we added far more variables and used an ensemble model. Will we be perfect? Probably not. Here is the problem with using Machine Learning to try and predict a perfect bracket:
A). Error propagates itself through the bracket. This is why the odds of a perfect bracket are around 1:128 billion. If you pick San Diego State to upset Houston-
Side note: The machine learning is in fact, picking Houston by the slimmest of margins. However, if San Diego State wins, the machine learning is actually picking them to go on to beat Michigan, Providence, and then Ohio State to win the entire region.
and then Houston actually wins, you will lose the entire region. Perfection may have to do with a 6/11 game that no one would normally care about except its the tournament, and everyone cares about every game.
B). Machine Learning and Predictive Analytics aren’t about being 100% accurate. You wouldn’t want to pay for that kind of accuracy even if it were possible. We are trying to be less wrong for companies. This is why predicting upsets made sense and the whole 2018 NCAA Bracket is so hard. Figuring out who is most likely to be an outlier (churn) is something we do all the time. And, we can error on the side of being wrong. We would just tell you to call both Houston and San Diego State (in this instance) because calling them to talk to them about staying at your company has no ill effect. (i.e. there is very little cost to being wrong in this example.) There is a huge cost to being wrong in the tournament in the later rounds as you are predicting the next game based on your assumption of correctly predicting the last game.
Without further ado, here is what the Machine Learning algorithm predicted as the bracket:
If you have questions on this type of analysis or machine learning in general, (or if we are perfect and you would like to congratulate us), please don’t hesitate to contact:
Gordon Summers of Cabri Group (Gordon.Summers@CabriGroup.com), or
Nate Watson at CAN (firstname.lastname@example.org).
Now for some disclaimers:
Understand the technique that finds a group of winners (or losers) in 2018 NCAA bracket can be based on any metric. Our analysis isn’t to support gambling, but to open up people’s minds onto the possibilities of leveraging Machine Learning for their businesses. If we can predict things as seemingly complex as a basketball tournament (Something that has never been correctly predicted), then imagine what we could do with your data that drives your decisions
We will be keeping score using the very traditional 1,2,4,8,16 point process.
**Any handicapping sports odds information contained herein is for entertainment purposes only. Neither CAN nor Cabri Group condone using this information to contravene any law or statute; it’s up to you to determine whether gambling is legal in your jurisdiction. This information is not associated with nor is it endorsed by any professional or collegiate league, association or team. Machine Learning can be done by anyone, but is done best with professional guidance.
Today is Contemporary Analysis (CAN)’s 10th birthday!!! Although we are not the company we started back in 2008—different logo, different owners, new leaders, new data scientists, even a new way to serve up data science,—we still have the best team in the region and now 10 years of wisdom of how to implement, build, and train data scientists as well. Let us help — even if it is just a phone call for advice. Our goal is to help every company in the region use data science to be competitive in their niche.
Also, stay tuned for our upcoming article on CAN via 2008, 2018, and in 2028. The world is going to be an exciting place for data science in 10 years and we are here to help you get there.
Recently, Contemporary Analysis (CAN) was presented with the Greater Omaha Chamber’s Small Business of the Month award. It means a lot to be recognized for the hard work the team has done over the last year improving how companies start and scale data science internally.
On the consulting side, CAN has spent most of its 9+ years implementing Data Science in the traditional format: bidding via proposals and statements of work. While we still make bids and builds via proposal, we realized a lot of companies that need data science have a difficult time formulating their need into a written document. They don’t know where to start, what they need, or how to scope time and materials. Not having a statement of work presents a problem for traditional consulting. Even when they did understand how to build a needs document, an outside vendor wasn’t what companies wanted. Their desire was to own their own analytics team. We couldn’t agree more.
Once they had decided to make data science an internal strategy, companies hired a senior-level data scientist. This was due to the fact that the person needed to be all things for the department for a long time until it showed an ROI and would be granted abudget to hire a team. This required the first data scientist to be a programmer, database manager, mathematician, data visualist, data science strategist, and implementation manager. This came with a whole new set of problems. A person experienced enough to do all things is expensive ($150k+ salary), hard to find (time to hire is 6+ months), implementation requires a philosophical change in problem-solving (reactive to proactive), and scale requires a new management process (Agile is ineffective). It is simply too much for one person to be successful.
We realized we had to change how companies implemented data science. They needed a fully functional team inside their company from day one and for a time requiring an outside vendor, but needed to manage the process inside their company for company buy-in and scalability moving forward. A new way of implementation had to be invented.
We came up with something different, a method with immediate results and little risk. Instead of hiring senior-level talent out of the gate, use a full team of consultants to help you stand up your group. Then find, hire, and train someone to run the team once it’s already up. This means you get multiple people (with no recruiting and no time to hire) and expertise (understanding how to implement and manage all aspects of the team) immediately on day one – all at a price similar to hiring one senior-level data scientist.
Additionally, there is a benefit of when it’s time to find, hire, and train someone to run the team. Because some of the heavy lifting is being done by the vendor, a person skilled in data science implementation (a data science strategist) can now be hired to run and scale the department. This person is usually much less expensive than a senior data scientist.
We pioneered this thought process at a local bank in Treynor, IA. TS Bank is one of the fastest-growing banks in our area. They reached out to CAN in late 2015 asking how they could be better at predicting what is likely to happen not only in their portfolio but also in marketing, sales, operations, M&A – almost every function of their business. They already had business intelligence but didn’t know how to make the transition from reactive to proactive. That’s when CAN stepped in.
CAN became their data science team for 18 months, deploying 4 data scientists skilled in NoSql, data visualization, coding, and computational modeling. We served as their team until they were able to stand by themselves. Now, just 2 years later, they have their own team of two data scientists, a data strategist, a business intelligence analyst, and a database engineer. TS Bank now has a better data science team than banks five times their size, and they have plans to hire more. With their team, augmented by ours only when needed, TS Bank can make decisions faster and less expensive than their peers. They know when to buy. They know when to sell. They have better risk analysis. Their business intelligence team, now coupled with their predictive analytics team, is the poster child for how to start, grow, and scale data science in an organization. This pilot allowed CAN to better understand how to implement the “Us then You” strategy.
Today CAN offers three approaches to improving outcomes with data science:
Data Science as a Service (to get you started)
Training (to make it yours), and
Staff Augmentation (to keep your need fulfilled, even if that need is temporary).
Data Science as a Service (DSaaS)
CAN begins the process of serving its clients by initially and temporarily serving as their data science team. Day one, we show up and provide our client with an established data science team that knows exactly what they’re doing, knows how to dig into their data, and knows how to cut through the red tape.
Different than most consultants is the fact that from our first second on the job, there is a timer running. We establish an agreed-upon milestone and, once that milestone is reached, CAN will give you everything you need to have your own data science capabilities: all the data, all the knowledge, no black box, and nothing secret.
About midway through DSaaS, CAN will identify, hire, train, and place a person to run everything CAN is building. While this person can be and often is from outside the organization, sometimes it is an internal person who just needs a few additional skills. When this happens, there is an additional savings of time and money as this person required no hiring process, no internal training of tools, is already a culture fit, and requires no spin-up time figuring out internal politics.
To formalize this training process, CAN built a training curriculum designed to help individuals already in the workforce gain necessary and valuable skills in the four key parts of data science: coding, database, statistics and computational modeling, and data visualization. We call it the Omaha Data Science Academy. While initially only for individuals hired to manage the data science portfolio after CAN has reached the milestone, CAN has opened enrollment to the community so they too can have data scientists for job openings at companies with established data science teams. The Omaha Data Science Academy’s new goal is to train a data scientist for every company in Omaha.
Once established, teams like those at TS Bank aren’t finished building. It took TS Bank 24 months before they felt they had enough talent to cut CAN loose. And even then, CAN still helps out from time to time, providing talent for project work so the company can continue its lean data science team while retaining the talent and expertise to do the high-level or high-speed need projects. Because CAN offers staff augmentation in addition to DSaaS, companies can hire the senior-level talent much further down the road and not fear not have the senior-level thinking to tackle the hard projects as they arise. CAN also offers entry-level data scientists allowing extra staff for those projects that require hours of work, not level of expertise. In this way, CAN closes the loop of need making sure at all points, a company can run a data science team of any size and make noticeable gains from the insights gathered by having a team.
CAN’s new way of data science team implementation lets a company gain access to the decision making of their ability without the fear or risk of single person dependence. It creates better data science much faster with higher ROI than traditional implementation with the helping hand of a company who has done data science for years.
CAN provides insight to all teams as they grow and develop. We have completed over 150 projects across 100+ companies, and have been the data science teams for 3 companies in the proof of concept stage, 2 in implementation in past years, and 5 more planned for 2018. We have the experience and wisdom necessary to help companies navigate the new kind of management necessary in data science.
CAN recently got back from Tableau’s 2017 customer conference, affectionally shortened “TC17”. I brought back several things from the week in Las Vegas: a couple Tableau tips and tricks, several new connections within the Tableau community, and of course, some swag.
In no particular order, here are the top 5 swag (swags?) I picked up at Tableau Conference 2017.
① Interworks Viz Socks
In a world of suit and ties, socks can be a great way to show off some pizazz while remaining professional. I wish I had grabbed 6 more pairs because I can see myself wearing these every day of the week!
②Pluralsight Fidget Cube
That didn’t take long. Within a year of invention, fidget cubes are now officially branded swag. Pluralsight’s fidget cubes are perfect for mindlessly fidgeting around while thinking through your next great visualization.
③ Data Sleep Mask
This was a surprise from the Tableau Partner Summit on Monday. Tableau was kind enough to provide its partners with a TC17 Rally Pack that contained cool, yet practical swag like this to help make sure they were able to fully recover each night of the conference.
④ VizItPhilly Koozie
Shoutout to the VizItPhilly crew (specifically Corey Jones) for supplying me with this neat koozie! Without it, my hands surely would have frozen from holding craft beer cans during Data Night Out.
⑤ “We Are Data People” Pennants
These small felt pennants are perfect for selfies and are a great way of reminding your coworkers that you are in fact a real data person.
*Bonus – Mini Speech Bubble Whiteboard
Another item in the Tableau Partner swag bag – a miniature whiteboard version of the large speech bubbles with fun data phrases that are seen in about 80% of Tableau users’ Twitter avatars.
Did we miss any top swag from the conference? Let us know in the comments below.