Source – lightreading.com
Data science techniques have been around for many years and successfully applied in several areas like fraud detection, personalized recommendations, etc. Most recently, these techniques are being leveraged in service provider and telco network operations. The combination of SDN/NFV and data science is becoming a powerful new approach for making networks more reliable and secure.
What is data science?
Data science involves using automated methods to analyze massive amounts of data and to extract knowledge from them. Data science is a broad discipline that includes statistics, computer science, applied mathematics, machine learning/AI and visualization.
One of the common use cases of machine learning is in email spam filters. The algorithms are trained by processing millions of emails that have been pre-categorized as either spam or not. The result is an application that can automatically identify the vast majority of junk email and can also continuously improve and adapt as more examples become available.
Relevance to SP network operations
As SPs adopt SDN/NFV, the underlying network infrastructure has become more complex and distributed. SP operations teams have to deal with the dynamic network with unprecedented change, scale and complexity. In this dynamic SP network environment, it is challenging to predefine and determine what will or could go wrong. Relying on the human correlation processes and manual methods that have been in place for past many decades are no longer effective.
Data science has the potential to transform the way SP network operations are done, including the reduction of manual effort involved in network monitoring, troubleshooting and optimization. However, the trick is how to do it in a way that provides clear business value, embedded into the SP operations workflow, and leveraging expert knowledge combined with the data.
Below is a list (not exhaustive) of emerging use cases of data science with in the context of SP network operations. As the SPs adopt modern technologies and operations practices, more applications will surface.
Reducing alert fatigue
In the new world of SDN/NFV, the number of components that need to be monitored and managed has increased exponentially compared to legacy networks. One of the most significant problems facing SP operations teams today is the overwhelming amount of information from distributed network components that generate logs and alerts.
With minimal prioritization and a high false-positive rate, it impossible for operations teams to focus on what matters. With data science techniques, it is possible to understand the context of the alerts and suppress the ones that are not relevant, resulting in a prioritized list of alerts for SP operations team to review and take action.
Proactive network optimization
Good performance and high availability are the primary goals of SP operations teams. They need to proactively detect, identify and resolve performance crises in their network.
Data science provides a methodology for quickly processing the large quantities of monitoring data generated by the network devices, finding repeating patterns in their behavior and building accurate models of their performance. Anomaly detection methods can be used to automatically spot deviations from normal system behavior that could correspond to network failures. A simple example could be if the number of link errors on a particular network interface in the last ten minutes is three standard deviations higher than on other links in the same network; this could indicate a problem.
Traditional security technologies rely on rules and signatures that only use stale information to find threats. The tactics of adversaries are evolving rapidly, and the number of advanced and unknown threats targeting SP networks continues to increase.
Algorithms can be trained to learn the SP environment and adapt to the threat landscape, making decisions about whether something is malicious, and then providing context for the expert to assist with rapid investigation.
Future of SP operations
Self-driving cars provide important insight into the path that data-driven automation is likely to follow. The general principles used in self-driving cars can be extended into SP network operations domain. Collecting massive amounts of data, allowing algorithms to navigate their way through routine tasks, implementing self-learning systems that can adapt to unpredictable situations. The result is likely to be smart network management software that can perform many SP operations tasks with a high degree of reliability.
Some of the hyper-scale operators (Facebook, LinkedIn, Netflix, etc.) are already using self-healing for some basic operational tasks. In the future, SP operations needs to move towards “management by exception,” wherein most common errors and performance degradations are addressed via automated self-healing.