Artificial intelligence (AI) applications are increasingly being integrated into our everyday lives. Smart thermostats use AI to learn our schedules and adjust temperatures, AI applications in self-driving cars enable them to navigate and respond to road conditions, and manufacturing robots in smart factories use AI to do everything from predictive maintenance to dynamic production.
How an AI system works
The typical AI system has a machine learning (ML) component and an inference engine (IE) component. The ML component is “trained” on a specialized set of data called training data. Training data is the first essential element to making sure an AI application is able to carry out its intended task. The quality and accuracy of the training data received by the ML component is one of the most crucial elements to the AI system as a whole. Without accurate data to train from, the AI system cannot function.
When the ML component has consumed a sufficiently large and accurate data set, it produces a set of values called weights. These weights drive the AI inference engine, enabling it to make decisions. Some AI systems use the ML component on a large set of training data, produce a set of weights, and consider those weights static throughout the life of the AI system. Others include an ML component in the deployed AI system and incorporate on-going training on real-world data to create an adaptable and continuously improving autonomous system.
Once deployed, sensors feed real-world data into the AI system. The inference engine applies the data from the sensors to its internal model, which is then tuned by the weights created during the machine learning phase. For example, sensors on an autonomous vehicle gather Lidar and radar readings. Those readings are then sent to the inference engine for processing which produces results to manage the cruise control system and maintain a safe distance.
In smart factories, one application of AI is an autonomous forklift. Driverless forklifts have existed in industrial settings for over three decades, but relied on guide wires or magnetic tracks in order to navigate the factory floor. With the adoption of AI, those forklifts are now able to self-navigate by collecting and analyzing data from sensors. These sensors behave like most autonomous vehicles and typically collect Lidar readings, but some newer AI-powered forklifts use cameras that collect images every 1-3 seconds in order to navigate the factory floor. Once the data is collected by the sensors, it is transmitted to the AI system for processing and decision-making. Then the AI system sends appropriate commands to the targeted controller to change directions, speed up, or slow down, depending on the data the system has processed.
Authenticity of the data is a major problem for AI
In order to ensure that sensor data being sent to an AI system is protected, it must first be signed so that the authenticity of the data is proven. As it is received the data undergoes digital signature authentication, which proves the data comes from a trusted source (a sensor) and can safely be used by the AI inference engine.
In the case of the autonomous forklift, once the sensors collect data either in the form of Lidar readings or images, that data is signed by the sensor and is then sent to the AI system, which authenticates the digital signature on the data before processing. Once processed, the AI system sends instructions to the controller of the forklift in the form of data, which is signed again before leaving the AI system. When that data is received, it is signature verified by the navigation controller before the instruction is carried out.
When the AI system sends a command to the controller, the data must again be signed before sending. The SoC of the controller would perform the same signature authentication of the data being sent from the AI system before allowing it to take action.
It is during the signing and signature verification process when data that feeds AI systems is the most vulnerable. An attacker can gain access to a system that incorporates AI by finding and exploiting a software vulnerability, like a buffer overflow. Once they have gained access to the system, they can then execute a code injection attack which would allow them to intercept the data going to and from the AI system by monitoring a specific memory location or function that is part of the data transmission process. The attacker can then alter or replace the data being fed into the AI system, making the AI-powered device do something it was not intended to do. When you consider the heavy-duty machinery that uses AI in a manufacturing setting, the consequences of an attack could be extremely dangerous and severe.
Take for example, the autonomous forklift in a smart factory—its distance readings could be manipulated by an attacker, causing it to crash. In a situation such as this, the destruction of this expensive piece of machinery is a best case scenario. In a worst case scenario, the people working in the factory side-by-side with that forklift are at risk of serious injury or even death.
Attackers can poison machine learning
Manipulating the data being sent to or coming from an AI system is not the only type of attack threatening AI today. A particularly AI-savvy attacker could corrupt the machine learning process—called poisoning—by feeding incorrect training data into the system over a longer period of time so that the system learns from false data, rather than legitimate sensor readings.
This type of attack is particularly relevant in situations where predictive maintenance is used. Predictive maintenance is a common application that uses AI to predict when a machine will need maintenance, in order to prevent the breakdown of the machine. It is used across industries to predict anything from mechanical issues on aircrafts which will result in flight delays, to predicting when an industrial refrigerator will need maintenance to continue functioning properly at a food production facility.
For predictive maintenance to work, it relies on high-quality, accurate data readings. If an attacker is able to manipulate those data readings, the AI system will not have a clear or accurate picture with which to make predictions. This could mean that aircrafts requiring maintenance will go unnoticed and a malfunction could occur mid-flight or thousands of dollars worth of inventory spoiling because the refrigerator storing it did not maintain a safe temperature.
Stopping attackers from gaining access to your AI system
AI systems run complex pieces of software, and we all know that complex software is inherently buggy. In fact, studies have shown that there are anywhere from 15 to 50 bugs per 1,000 lines of code. While that number might not seem too concerning at first glance, it is estimated that the first fully autonomous vehicle will contain about 1 billion lines of code, which means that there could be as many as 50 million bugs in an autonomous vehicle. In other words, there are potentially millions of opportunities for an attacker to turn those bugs into exploits and execute a cyberattack—in just one vehicle.
Dover’s CoreGuard technology is specifically designed to protect against the exploitation of software vulnerabilities. In fact, out of the box, CoreGuard comes with a base set of micropolicies that protect against the most common and severe types of software vulnerabilities, including 100% of buffer overflows, buffer overreads, and code injection attacks. This minimum level of protection will shut the door on the most common entrance paths for attackers to gain access to your AI system.
How CoreGuard can ensure data authenticity
For AI systems which can have such high stakes if compromised, shutting down the most common weak points is not enough. A sophisticated and determined attacker could find another way in, so a defense-in-depth approach is the only way to truly protect your AI system.
As mentioned earlier, the data that powers our AI systems is most vulnerable just before the signing and just after the signature verification process. If an attacker can intercept and alter that data, your AI system could be severely compromised.
CoreGuard can ensure data authenticity with our AI Data Integrity micropolicy. This micropolicy prevents the modification of data between digital signature authentication and the AI system, ensuring only trusted and secure data is fed into the system.
So, how does the AI Data Integrity micropolicy work?
First, the micropolicy taints all sensor data as “trusted.” Then, all data is tracked through computations. If the tainted data is modified in any way, the “trusted” label is removed. This ability to trace the flow of specific words throughout their lifecycle and to know each and every instruction that operates on those words is a unique and extremely powerful capability of CoreGuard called Information Flow Control.
It means that if an attacker tries to add or subtract some value from any data words in the stream, CoreGuard detects that and instantly removes the taint which removes the “trusted” property from that data. CoreGuard then enforces that only tainted data with the “trusted” label is fed into the AI system. This means that a system with CoreGuard installed will prevent an attacker from writing any incorrect or manipulated data to the AI system, as it will not be tainted with the “trusted” label.
AI poses a serious risk without better security
The increasing adoption of AI is changing lives, but it’s also posing a very serious—and potentially dangerous—security problem. The only way to address this problem is with a cybersecurity solution that protects the critical data end-to-end that powers AI from manipulation and corruption.
CoreGuard is that solution.
To learn more about CoreGuard and how it will address the unique cybersecurity risks created by AI, read our Securing AI & ML white paper.