How Much Data Do I Need For Predictive Analytics?
Before beginning any predictive analytics project, its essential to investigate the breadth and depth of data available. However, at what point is it acceptable to say you have enough data to start?
The politically correct answer to this question is that it depends. Depends on what though?
Well for starters, certain types of data science and predictive analysis projects require more specific data requirements. In an extreme case, predicting survival rates of people or machines may require data spanning their entire lifespan. However, in most cases, data requirements are less stringent.
In most cases taking a snapshot of 3 to 5 years worth of data can yield a breadth of patterns surrounding consumer and business behavior. Why?
Consumer and business behavior is influenced by short-term and long-term economic outlooks. Customers and employees’ short-term economic outlook is what happened to them yesterday and what they think will happen tomorrow. Their long-term economic outlook is what they think will happen to them in future, in most cases greater than 3 years. In the short term, behavior in consecutive time periods will be more volatile when compared relative to the next. However, in the long-term, there are cyclical patterns in how consumers and businesses behave relative to certain economic cycles. How long is an economic cycle? From peak to peak, the average of the last 11 cycles since 1945 have lasted 66 months or 5.5 years. The last cycle last 81 months or 6.75 years. Therefore, as a general rule of thumb, we like there to be at least 3 years and preferably 5 worth of data before we begin any predictive analysis project.
So in a nut shell, by analyzing a snapshot of data spanning an economic cycle, we can develop a more comprehensive understanding of how consumers and businesses behave and better predict what they are likely to do next.
Does this mean you need the foresight to plan this analysis 5 years previously? Not at all. In fact, you’ve been gathering customer data since your business started. Although all data is vulernable to errors, your typical CRM, Accounting, and other customer transaction systems are built a substantial amount of consistentcy, structure, and validation to protect the integrity of your data already. Most of these systems also have native functions to export data fairly easily for further analysis.
If you’re still not convinced or unsure about about the integrity of your, we pride ourselves with our ability to develop a partnership with clients to work through the data discovery process step by step. Would you like to learn more?
What is the least amount of data in GB we should have to start performing analytics .
It doesn’t work like that. You need at least 100 instances of what you are trying to predict and probably at least 100 where it didn’t work to train your model. Size is irrelevant. We tell new companies to predictive analytics we would like for them to have 3 years of data. It can be done with 18 months, but it wont be as accurate. But, if you view the purpose of V1 of a project like we do, the purpose is to complete and get buy-in so we can build a much more robust V2, then 18 months worth of data is perfect for the proof of concept.
I hope this helps.