Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Get Started Now!

Is ‘Big Data’ About What We Do With Our Data Not How Much Of It We Have?

Source:- forbes.com

What is it about “big data” that resists definition? Today we have myriad competing definitions that each attempt to circumscribe just what it is we mean when we talk about the idea of using “big data” to understand the world around us. The notion that the size, speed or modality of data warrants such a label falls apart when we recognize that every Google search involves analyzing a 100-petabyte archive using hundreds of query terms. Instead of referring to the size of our datasets, could “big data” refer to the way in which we utilize our data, regardless of its size?

The question of just what constitutes “big data” has become a perennial point of debate in the digital world. Typically, most definitions relate to the characteristics of the data being analyzed, but such definitions become increasingly strained when we recognize that the most mundane of internet tasks, from conducting a Google search to querying Twitter all involve processing enormous volumes of multimodal material that is growing rapidly.

Using the example of a Google search, it seems absurd to label every Web search a “big data analysis” merely because it examined 100 petabytes using hundreds of parameters.

Yet what differentiates a keyword Google search from an SQL query of a data warehouse of the sort that is routinely described as precisely such a big data analysis? Does a keyword search written as an SQL query count as big data where a keyword typed into a Web page does not? Does an SQL-computed histogram count or does it take at least a linear regression?

Does using an SQL query to count how many records there are in a ten petabyte database count as a big data analysis? What about a summation or field extraction?

Where do we draw the line between ordinary data and “big data?” Does that boundary depend on the industry in which we work? To the Google’s of the world, petabytes are passé. In the arts, humanities and social sciences, datasets of hundreds of megabytes are still often referred to in the literature as “big data” analyses and genuinely reflect in some fields datasets far larger than those ordinarily used.

Does it matter whether we are the ones storing or analyzing that data or whether it is outsourced? If an enterprise manages tens of petabytes of desktop backups in its own data centers, there are very real complexities to the management of large datasets. At the same time, today there are plenty of vendors that sell turnkey petascale storage systems complete with onsite representatives to manage and service the units. Does that count as big data management? What if a company simply ships their petabytes to the cloud and accesses them using a giant cloud-provided fileserver? Does that count as “big data” if they themselves are not actually doing any of the management?

Similarly, does it count as big data analytics if we use a point-and-click analysis tool that alleviates us of the need to write a single line of code? What about a tool like Google’s AutoML that leverages transfer learning and incredibly sophisticated model generation and tuning algorithms to quite literally allow the creation of state-of-the-art deep learning models with a few mouse clicks – no coding or AI experience necessary? Does using or deploying an AutoML model count as big data even if we didn’t have to write any code ourselves?

Perhaps the answer to what counts as “big data” lies in how we use all of that data.

Using Google to conduct a keyword search implies a human-directed task. A human being has a question, translates that question into a query, enters that query into a search box and peruses the results. Such a workflow hardly seems to justify the big data label.

What if instead, Google’s algorithms monitored the world’s information on our behalf, searching out insights and new developments it believes are of greatest relevance to us and providing us a real-time summarized digest of the top highlights most relevant to our needs at the moment.

The latter sounds far more like a “big data” application that the former, yet both involve the exact same dataset being searched.

In fact, in many ways the functional tasks each performs are the same. The difference lies in who performs that analysis – the machine or the human. When a human manually queries a dataset is it “big data” or does an analysis require some degree of creative or advanced machine assistance to be worthy of that moniker?

Using the example of an SQL query, if a human manually interrogates a dataset using simplistic queries like counting rows that match different criteria, it seems to strain credibility to call such tasks, which differ little from a keyword search under a different name, as “big data.” Alternatively, if a human interrogates that same dataset using more complex queries like applying machine learning algorithms or complex analytic models, the label would seem to more readily apply.

Putting this all together, perhaps instead of focusing on petabytes or exabytes or trillions of rows, the answer to what constitutes “big data” lies in what we do with all of that data. Simple keyword searches or SQL queries might interrogate exabytes, but it seems unreasonable to classify every Google search as a “big data analysis.” Instead, if we focus on how that data is used and in particular the use of machine creativity to analyze data proactively on our behalves or to surface patterns and trends we were not expecting or to perform complex queries on our behalves, perhaps that might yield a more satisfactory definition.

In the end, shifting our gaze from how much data we hoard to what we actually do with all of that data would go a long way towards moving the field from meaningless marketing buzzword towards genuine business insights.

Related Posts

What is Data Ethics and what are the Types of Data Ethics Tools?

What is Data Ethics? Data ethics is a branch of ethics that focuses on the responsible collection, use, and dissemination of data. With the rapid advancement of Read More

Read More

What is High-Performance Computing Clusters and what are the Components of HPC Clusters

Introduction to High-Performance Computing Clusters High-Performance Computing (HPC) clusters are crucial for organizations that need to process and analyze vast amounts of data in a short period. Read More

Read More

What is Cloud Computing and what are the Features and Benefits of Cloud Computing Platforms?

Introduction to Cloud Computing Platforms When we talk about cloud computing, we often refer to the various platforms that allow us to store, manage, and access data Read More

Read More

What is Big Data Processing and what are the Types of Big Data Processing Tools ?

What is Big Data Processing? Big data refers to extremely large data sets that cannot be processed by traditional computing methods. Big data processing involves various techniques Read More

Read More

Big Data Role in Decision making in addressing organizational problems

Source – https://www.techiexpert.com/ Enterprises and organizations always work to improve and mitigate how they respond to challenges and make their businesses agile at the center of every Read More

Read More

What Is The Definition Of Big Data?

Source – https://timesnewsexpress.com/ Did you realize that a fly motor can produce more than ten terabytes of data for only 30 minutes of flight time? What’s more, Read More

Read More
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x