4 Machine Learning Challenges for Threat Detection
The growth of machine learning and its ability to provide deep insights using big data continues to be a hot topic. Many C-level executives are developing deliberate ML initiatives to see how their companies can benefit, and cybersecurity is no exception. Most information security vendors have adopted some form of ML, however it’s clear that it isn’t the silver bullet some have made it out to be.
While ML solutions for cybersecurity can and will provide a significant return on investment, they do face some challenges today. Organizations should be aware of a few potential setbacks and set realistic goals to realize ML’s full potential.
False positives and alert fatigue
The greatest criticism of ML-detection software is the “impossible” number of alerts it generates — think millions of alerts per day, effectively delivering a denial-of-service attack against analysts. This is particularly true of “static analysis” approaches that rely heavily on how threats look.
Even an ML-based detection solution that is 97% accurate may not help because, simply put, the math is not favorable.
Let’s say organizations have one threat among 10,000 users on their network. Thanks to Bayes’ law, we can calculate an alert is truly a positive attack by multiplying 0.97 (for 97% accuracy) by the chance of an actual threat amongst all users, or 1/10,000. This means that even with 97% accuracy, the actual likelihood of an alert being a real attack is 0.0097%!
Since improving beyond 97% may not be feasible, the best way to address this is to limit the population under evaluation by whitelisting or prior filtering with domain expertise. This could mean focusing on highly credentialed, privileged users or a specific vital part of the business unit.
ML algorithms work by learning the environment and establishing baseline norms before they monitor for anomalous events that can indicate a compromise. However, if the IT enterprise is constantly reinventing itself to meet business agility needs and the dynamic environment doesn’t have a steady baseline, the algorithm cannot effectively determine what is normal and will issue alerts on completely benign events.
To help minimize this impact, security teams must work within DevOps environments to know what changes are being made and update their tooling accordingly. The DevSecOps (development, security, and operations) acronym is beginning to gain traction since each of these elements should be synchronized and work within a shared consciousness.
ML’s power comes from its ability to conduct massive multi-variable correlation to develop its predictions. However, when a real alert makes its way to a security analyst’s queue, this powerful correlation takes the appearance of a black box and leaves little more than a ticket that says, “Alert.” From there, an analyst must comb through logs and events to figure out why it triggered the action.
The best way to minimize this challenge is to enable a security operations center with tools that can quickly filter through log data on the triggering entity. This is an area where artificial intelligence can help automate and speed data contextualization. Data visualization tools can help as well by providing a fast timeline of events coupled with an understanding of a specific environment. A security analyst can then determine rapidly why the ML software sent the alert and whether it is valid.
The final challenge for ML is hackers who are quickly able to adapt and bypass detection. When that does occur, it can have catastrophic effects, as recent hackers demonstrated by causing a Tesla to accelerate to 85 MPH by altering a 35 MPH sign on a road.
ML in security is no different. A perfect example is an ML-network-detection algorithm that uses byte analysis to very effectively determine whether traffic is benign or shellcode. Hackers adapted quickly by using polymorphic blending attacks, padding their shellcode attacks with additional bytes to alter the byte frequency and fully bypass detection algorithms. It’s more ongoing proof that no one tool is bulletproof and security teams need to constantly assess their security posture and stay educated on the latest attack trends.
ML can be extremely effective in enabling and advancing security teams. The ability to automate detection and correlate data can save a significant amount of time for security practitioners.
However, the key to an improved security posture is human-machine teaming where a symbiotic relationship exists between machine (an evolving library of indicators of compromise) and man (penetration testers and a cadre of mainframe white-hat hackers). ML brings the speed and agility needed to stay ahead of the curve, and humans bring qualities that it can’t (yet) replicate — logic, emotional reasoning, and decision-making skills based on experiential knowledge.