Hacking AI: Exposing Vulnerabilities in Machine Learning
A military drone misidentifies enemy tanks as friendlies. A self-driving car swerves into oncoming traffic. An NLP bot gives an erroneous summary of an intercepted wire. These are examples of how AI systems can be hacked, which is an area of increased focus for government and industrial leaders alike.
As AI technology matures, it’s being adopted widely, which is great. That is what is supposed to happen, after all. However, greater reliance on automated decision-making in the real world brings a greater threat that bad actors will employ techniques like adversarial machine learning and data poisoning to hack our AI systems.
What’s concerning is how easy it can be to hack AI. According to Arash Rahnama, Phd., the head of applied AI research at Modzy and a senior lead data scientist at Booz Allen Hamilton, AI models can be hacked by inserting a few tactically inserted pixels (for a computer vision algorithm) or some innocuous looking typos (for a natural language processing model) into the training set. Any algorithm, including neural networks and more traditional approaches like regression algorithms, is susceptible, he says.
“Let’s say you have a model you’ve trained on data sets. It’s classifying pictures of cats and dogs,” Rahnama says. “People have figured out ways of changing a couple of pixels in the input image, so now the network image is misled into classifying an image of a cat into the dog category.”
Unfortunately, these attacks are not detectable through traditional methods, he says. “The image still looks the same to our eyes,” Rahnama tells Datanami. “But somehow it looks vastly different to the AI model itself.”
The ramifications for mistaking a dog for a cat are small. But the same technique has been shown to work in other areas, such as using surreptitiously placed stickers to trick the Autopilot feature of Tesla Model S into driving into on-coming traffic, or tricking a self-driving car into mistaking a stop sign for a 45 mile-per-hour speed limit sign.
“It’s a big problem,” UC Berkeley professor Dawn Song, an expert on adversarial AI who has worked with Google to bolster its Auto-Complete function, said last year at an MIT Technology Review event. “We need to come together to fix it.”
That is starting to happen. In 2019, DARPA launched its Guaranteeing AI Robustness against Deception (GARD) program, which seeks to build the technological underpinnings to identify vulnerabilities, bolster AI robustness, and build defensiveness mechanisms that are resilient to AI hacks.
There is a critical need for ML defense, says Hava Siegelmann, the program manager in DARPA’s Information Innovation Office (I2O).
“The GARD program seeks to prevent the chaos that could ensue in the near future when attack methodologies, now in their infancy, have matured to a more destructive level,” he stated in 2019. “We must ensure ML is safe and incapable of being deceived.”
There are various open source approaches to making AI models more resilient to attacks. One method is to create your own adversarial data sets and train your model on that, which enables the model to classify adversarial data in the real world.
Rahnama is spearheading Modzy’s offering in adversarial AI and explainable AI, which are two heads of the same coin. His efforts so far have yielded two proprietary offerings.
The first approach is to make the model more resilient to adversarial AI by making it function more like a human does, which will make the model more resilient during inference.
“The model learns to look at that image in the same way that our eyes would look at that image,” Rahnama says. “Once you do this, then you can show that it’s not easy for an adversary to come in and change the pixels and hack your system, because now it’s more complicated for them to attack your model and your model is more robust against these attacks.”
The second approach at Modzy, which is a subsidiary of Booz Allen Hamilton, is to detect efforts to poison data before it gets into the training set.
“Instead of classifying images, we’re classifying attacks, we’re learning from attacks,” Rahnama says. “We try to have an AI model that can predict the behavior of an adversary for a specific use cases and then use that to reverse engineer and detect poison data inputs.”
Modzy is working with customers in the government and private sectors to bolster their AI systems. The machine learning models can be used by themselves or used in conjunction with open source AI defenses, Rahnama says.
Right now, there’s a trade-off between performance of the machine learning model and robustness to attack. That is, the models will not perform as well when these defensive mechanisms are enabled. But eventually, customers won’t have to make that sacrifice, Rahnama says.
“We’re not there yet in the field,” he says. “But I think in the future there won’t be a trade-off between performance and adversarial robustness.”