Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Get Started Now!

Pepperdata Releases Inaugural “Big Data Performance Report” 2020

Source: aithority.com

Pepperdata, the leader in Analytics Stack Performance (ASP), announced the release of its inaugural “Big Data Performance Report” for 2020. The report was compiled after reviewing comprehensive data on the applications contained in the company’s largest enterprise customer clusters, representing nearly 400 petabytes of data on 5000 nodes. This equates to 4.5 million applications running in a 30-day timeframe. The report provides insights into the enormous compute waste that occurs with big data applications in the cloud.

Pepperdata research shows how IT operations teams are dealing with this challenge. The new “Big Data Performance Report” reveals that, within enterprise data applications that are not optimized by solutions that allow for observability and continuous tuning, there exists enormous waste—and tremendous potential to optimize and reduce that waste.

The shift to cloud computing is solidly underway. As Statista reports, “in 2020, the public cloud services market is expected to reach around $266.4 billion U.S. dollars in size, and by 2022 market revenue is forecast to exceed $350 billion U.S. dollars.” However, as the cloud expands, so does cloud wastage. As more complex big data applications migrate, the likelihood of resource misallocation rises. This is why, as Gartner reports, “through 2024, nearly all legacy applications migrated to public cloud infrastructure as a service (IaaS) will require optimization to become more cost-effective.” Without this optimization, the data highlights there will be overspend.

“When we analyzed the data, we were amazed to see how much underutilization and other wasted resources there were—unnecessarily driving costs up,” said Joel Stewart, VP, Customer Success, Pepperdata. “The failure to optimize means companies are leaving a tremendous amount of money on the table—funds that could be reinvested in the business or drop straight to the bottom line. Unfortunately, many companies just don’t have the visibility they need to recapture the waste and increase utilization.”

The research from Pepperdata sheds further light on the nature of cloud wastage. For instance:

  • Spark clusters and jobs are dominating spend across clusters. This is where the highest amount of net wastage was found.
  • When it comes to wastage, failures are important. Job failures cause serious performance degradation, and consume significant computational resources. In an unoptimized dataset, Pepperdata sees a wide range of failure rates across clusters. Some clusters will fail above 10%, and Spark applications tend to fail more often than MapReduce.
  • Prior to implementing Spark optimization: Across clusters, within a typical week, the median rate of maximum memory utilization is a mere 42.3%. The underutilization here represents two states: not enough jobs running to fully utilize the cluster resources or the jobs are wasting resources.
  • Prior to implementing cloud optimization: Comparing jobs used and wasted, the average wastage across 40 large clusters is 60+%. This wastage takes an interesting form; typically, with 95% of jobs,  there is little wastage. Major wastage is usually found in 5% to 10% of total jobs.

This is why optimization is inherently such a needle-in-a-haystack challenge, and why machine learning can be such a help. Studies show that ML-powered statistical models predict task failures with a precision up to 97.4%, and a recall up to 96.2%. Applied to Hadoop, the percentage of failed jobs is reduced by up to 45%, with an overhead of less than five minutes.

Cloud optimization delivers big savings. According to Google, even low effort cloud optimization efforts can net a business as much as 10% savings per service within two weeks. Cloud services that are fully optimized and running on extended periods (over six weeks) can save more than 20%.

The research showed:

  • With the visibility afforded by real cloud optimization, three quarters of customer clusters immediately win back task hours.
  • Most enterprises are able to increase task hours by a minimum of 14%. Some enterprises are able to increase task hours by as much as 52%.
  • 25% of users are able to save a minimum of $400,000 per year. At the higher end, the most successful users are able to save a projected $7.9 million for the year.

To cut the waste out of IT operations processes and achieve true cloud optimization, enterprises need visibility and continuous tuning. This requires machine learning and a unified analytics stack performance platform. Such a setup equips IT operations teams with the cloud tools they need to keep their infrastructure running optimally, while minimizing spend.

Related Posts

What is Data Ethics and what are the Types of Data Ethics Tools?

What is Data Ethics? Data ethics is a branch of ethics that focuses on the responsible collection, use, and dissemination of data. With the rapid advancement of Read More

Read More

What is High-Performance Computing Clusters and what are the Components of HPC Clusters

Introduction to High-Performance Computing Clusters High-Performance Computing (HPC) clusters are crucial for organizations that need to process and analyze vast amounts of data in a short period. Read More

Read More

What is Cloud Computing and what are the Features and Benefits of Cloud Computing Platforms?

Introduction to Cloud Computing Platforms When we talk about cloud computing, we often refer to the various platforms that allow us to store, manage, and access data Read More

Read More

What is Big Data Processing and what are the Types of Big Data Processing Tools ?

What is Big Data Processing? Big data refers to extremely large data sets that cannot be processed by traditional computing methods. Big data processing involves various techniques Read More

Read More

Big Data Role in Decision making in addressing organizational problems

Source – https://www.techiexpert.com/ Enterprises and organizations always work to improve and mitigate how they respond to challenges and make their businesses agile at the center of every Read More

Read More

What Is The Definition Of Big Data?

Source – https://timesnewsexpress.com/ Did you realize that a fly motor can produce more than ten terabytes of data for only 30 minutes of flight time? What’s more, Read More

Read More
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x