A brief history of machine learning in cybersecurity
How to connect all the dots in a complex threat landscape
(IMAGE COURTESY OF BIGSTOCK.COM)
As the volume of cyberattacks grows, security analysts have become overwhelmed. To address this issue, developers are showing more interest in using Machine Learning (ML) to automate threat-hunting. In fact, researchers have tried to implement ML in cybersecurity solutions since the late 1980s, but progress has been slow. Today, ML is showing increasing promise with the advent of Big Data because the quality of information from which ML can learn is improving. However, there is much more to be done.
Anomaly Detection – The Early Days
When we talk about security, we want a system that can separate good from bad, normal from abnormal. Therefore, it is quite natural to apply anomaly detection to security. We can trace the beginning of anomaly detection back to 19871 when researchers started building intrusion detection systems (IDS). Around 1998-1999, DARPA (the government agency that created the Internet), created benchmark sets and called for research on ML methods in security2. Unfortunately, few of the results were practical enough and even fewer products got to the operational stage.
Anomaly detection is based on unsupervised learning, which is a type of self-organized learning that helps find previously unknown patterns in a data set without the use of pre-existing labels. In essence, a system based on unsupervised learning knows what is normal, and identifies anything abnormal as an anomaly. For example, an IDS might know what ‘normal’ traffic looks like, and it will alert on any traffic variants that don’t match that knowledge such as a vulnerability scanner. In short, anomaly detection systems based on unsupervised learning make a binary decision (normal/abnormal) and don’t make sophisticated evaluations. Some refer to unsupervised learning applications as ‘one-class problems.’
As you might imagine, systems based on unsupervised learning can generate a lot of false positives, because a situation deemed abnormal can be perfectly innocuous (think vulnerability scanner again). This is a problem that security analysts still struggle with today.
The Rise of Big Data
After 2000, developers and researchers began creating spam, phishing, and URL filtering systems based on supervised learning. In supervised learning, decisions are based on comparing a set of data (or labels) against a perceived threat. One such example is a URL blacklist, where incoming e-mail is matched against a list of undesirable URLs and rejected if it matches a label on the list. A supervised learning algorithm analyzes the data and produces an inferred function (i.e., this traffic behavior matches this input data, therefore it is bad), which can be used for mapping new examples.
Early filtering systems using supervised learning were based on relatively small datasets, but datasets have grown in size and sophistication with the advent of Big Data. For example, Gmail offers an Internet-scale database of known good addresses, and it’s easier to train its ML engine with sophisticated models of what is acceptable.
Big models (in terms of a number of parameters) based on Big Data, such as deep learning models, have gradually become more popular. For example, supervised ML has been successfully used in anti-virus signature generation for years, and in 2012, Cylance began offering next-generation anti-virus systems based on datasets other than signatures, such as anomalous traffic behavior.
Combining Supervised and Unsupervised Learning
Supervised learning has shown more success in security applications, but it requires easy access to large sets of labeled data, which are very difficult to generate for cyberattacks such as APT (advanced persistent threats) and zero-day attacks targeted at enterprises. Therefore, we cannot easily apply supervised ML to solve all cyberattacks.
This is where unsupervised learning comes back into the situation. We need to develop more advanced AI/ML that can be unsupervised or semi-supervised (e.g., through adaptive learning) to solve the additional cybersecurity challenges. Adaptive learning (human-guided analysis) coupled with supervised and unsupervised learning improves your ability to detect those APTs and zero-day exploits.
A New Direction: Connecting the Dots
One of the big problems with simple anomaly detection is the volume of false positives. One way to address this issue is by correlating multiple events (dots) and then evaluating whether or not the correlation indicates a strong signal for a cyberattack. For example, one ‘dot’ might be an executive logging into the network at 2 a.m., and while this alone might be seen as a false positive, it wouldn’t be enough to trigger an alert. However, if the executive is seen logging in at 2 a.m. from an IP address in Russia or China, it would trigger an alert.
Developers and researchers are just now combining supervised and unsupervised learning into cybersecurity products. For example, Stellar Cyber’s Starlight product correlates multiple events and evaluates whether, when looked at together, they constitute a threat. This approach significantly reduces false positives and helps analysts identify APTs or zero-day attacks more quickly.
The next frontier will be to make ML self-teaching, so that past experiences in detecting and responding to threats will be factored into new evaluations of potential threats. The system would thus grow more accurate over time. Some security systems are beginning to implement self-teaching technology today, but the history of ML in cybersecurity is relatively short, and large-scale improvements will emerge in the future. While security analysts will always be needed to make the ultimate decision about whether to kill a threat, ML can make their jobs much easier if applied properly.
 D. E. Denning, “An Intrusion-Detection Model,” IEEE Transactions on Software Engineering, vol. 13, no. 2, pp. 222–232, 1987.
 R. Lippmann, R. K. Cunningham, D. J. Fried, I. Graf, K. R. Kendall, S. E. Webster, and M. A. Zissman, “Results of the 1998 DARPA Off-line Intrusion Detection Evaluation,” in Proc. Recent Advances in Intrusion Detection, 1999.