Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Get Started Now!

HOW MUCH DATA IS ENOUGH IN PREDICTIVE ANALYTICS?

Source: demand-planning.com

But there’s a balance between not enough data and too much. What’s the right amount of data to work with as demand planner or data scientist?

There is a debate about how much data is enough and how much data is too much. According to some, the rule of thumb is to think smaller and focus on quality over quantity. On the other hand, Viktor Mayer-Schönberger and Kenneth Cukier explained in their book Big Data: A Revolution That Will Transform How We Live, Work, and Think, that “When data was sparse, every data point was critical, and thus great care was taken to avoid letting any point bias the analysis. However, in many new situations that are cropping up today, allowing for imprecision—for messiness—may be a positive feature, not a shortcoming.”

Of course, larger datasets are more likely to have errors, and analysts don’t always have time to carefully clean each and every data point. Mayer-Schönberger and Cukier have an intriguing response to this problem, saying that “moving into a world of big data will require us to change our thinking about the merits of exactitude. The obsession with exactness is an artifact of the information-deprived analog era.”

Supporting this idea, some studies in data science have found that even massive, error-prone datasets can be more reliable than simple and smaller samples. The question is, therefore, are we willing to sacrifice some accuracy in return for learning more?

Like so many things in demand planning and predictive analytics, one size does not always fit all. You need to understand your business problem, understand your resources, and understand the trade-offs. There is no rule about how much data you need for your predictive modeling problem.

The amount of data you need ultimately depends on a variety of factors:

The Complexity Of The Business Problem You’re Solving

Not necessarily the computational complexity, (although this an important consideration). How important is precision verses information? You should define this business problem and then select the closest possible data to achieve that goal. For example, if you want to forecast the future sales of a particular item, the historical sales of that item may be the closest to that goal. From there, other drivers that may contribute to future sales or understanding past sales should be next. Attributes that have no correlation to the problem are not needed.

The Complexity Of The Algorithm

How many samples are needed to demonstrate performance or to train the model? For some linear algorithms, you may find you can achieve good performance with a hundred or few dozen examples per class. For some machine learning algorithms, you may need hundreds or even thousands of examples per class. This is true of nonlinear algorithms like random forest or an artificial neural network. In fact, some algorithms like deep learning methods can continue to improve in skill as you give them more data.

How Much Data Is Available

Are the data’s volume, velocity, or variety beyond your company’s ability to store, or process, or use it? A great starting point is working with what is available and manageable. What kind of data do you already have? In Business-to-Business, most companies are in possession of customer records or sales transactions. These datasets usually come from CRM and ERP systems. A lot of companies are already collecting or beginning to collect third party data in the form of POS data. From here, consider other sources, both internal and external, that can add value or insights.

Summary

This does not solve the debate and the right amount of data is still unknowable. Your goal should be to continue to think big and work with what you have, gather the data you need for the problem and algorithm you have.

When it comes to gathering data, it is like the best time to plant a tree was ten years ago.  Focus on the data available and the insights you have today while building the roadmap and capabilities you want to achieve in the future. Even though you may not use it now, don’t wait until tomorrow to start collecting what you may need for tomorrow.

Related Posts

What is Data Ethics and what are the Types of Data Ethics Tools?

What is Data Ethics? Data ethics is a branch of ethics that focuses on the responsible collection, use, and dissemination of data. With the rapid advancement of Read More

Read More

What is High-Performance Computing Clusters and what are the Components of HPC Clusters

Introduction to High-Performance Computing Clusters High-Performance Computing (HPC) clusters are crucial for organizations that need to process and analyze vast amounts of data in a short period. Read More

Read More

What is Cloud Computing and what are the Features and Benefits of Cloud Computing Platforms?

Introduction to Cloud Computing Platforms When we talk about cloud computing, we often refer to the various platforms that allow us to store, manage, and access data Read More

Read More

What is Big Data Processing and what are the Types of Big Data Processing Tools ?

What is Big Data Processing? Big data refers to extremely large data sets that cannot be processed by traditional computing methods. Big data processing involves various techniques Read More

Read More

Big Data Role in Decision making in addressing organizational problems

Source – https://www.techiexpert.com/ Enterprises and organizations always work to improve and mitigate how they respond to challenges and make their businesses agile at the center of every Read More

Read More

What Is The Definition Of Big Data?

Source – https://timesnewsexpress.com/ Did you realize that a fly motor can produce more than ten terabytes of data for only 30 minutes of flight time? What’s more, Read More

Read More
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x