A Conversation about the New Wave of Cybersecurity
Its time, again, to change the conversation in cybersecurity.
It’s neither data-driven nor AI-driven cybersecurity, which you might have heard before – it’s both and more, much more.
It is correlation-driven cybersecurity. It is about correlations of many detections, from very basic like NGFW to very advanced like AI-based EDR, from various data sources in a single cohesive platform.
We hear about many security challenges from prospects, customers and partners—why? Because it is part of what humans do—share pain! As you may or may not know, attackers have access to the same tools as we all do. They have access to both Big Data and AI technologies for more advanced attacks.
Nevertheless, with complex threats on the rise, we all agree – it’s no wonder we hear such consistent themes:
- I don’t have enough data to have effective detection, or
- On the contrary, I have too much data and I’m swamped
- I get too much noise in the data or too many false alarms
- I have recently tried a few advanced tools that use AI / ML to reduce the noise or false positives, but that intelligence is specific only to each tool.
- I have lots of independent tools which don’t talk to each other and lead to siloed answers and high costs
What can you do about a complex attack that uses these challenges against you? Here is a simple example:
- Your CEO receives an email with an embedded URL
- Your CEO downloads a file to his laptop by going to the URL
- Your CEO accesses a file server at 2am on a weekday
- Your CEO’s laptop sends out lots of DNS traffic
By themselves, each of these individual events may look normal. If you happen to have the right security tools deployed, with some being advanced with machine learning like EDR and UBA, you may find out that:
- Your CEO receives a PHISHING email with an embedded MALICIOUS URL.
- Your CEO downloads a MALWARE file to his laptop by going to the URL
- Your CEO accesses a file server at 2am on a weekday, an ABNORMAL BEHAVIOR in a UBA term
- Your CEO’s laptop sends out lots of DNS traffic via DNS TUNELLING
That is a lot of independent analysis by four different tools. How fast and how easily can you correlate these events in order to trace this breach, and how many people do you need to wrap it all together by looking at many different screens?
Let’s take a step back and ask ourselves how we got here. Clearly there are three waves of cybersecurity, which are built on top of each other: the rise of data, the rise of AI, and the rise of correlations.
1. The Rise of data—Increasing the amount of data to achieve comprehensive visibility.
Data-driven security was the main theme of the Big Data era where data is the new “gold.” It started with logs and raw network packets, separately. The main purpose of SIEMs was to collect and aggregate logs from different tools and applications for compliance, incident investigation and log management. ArcSight, one of the legacy SIEM tools, released in 2000, was a typical example of a SIEM and log management system. The raw packets were collected and stored as-is for forensics despite the fact that they require lots of storage space and it is very hard to sift through these huge numbers of packets to find any indication of breaches. In 2006, NetWitness found a solution analyzing raw packets .
Quickly, we realized that neither raw logs nor raw packets individually are enough to be effective in detecting breaches, and raw packets are too heavy and have limited usage besides forensics. Information extracted from traffic such as Netflow/IPFix, traditionally used for network visibility and performance monitoring, started to be used for security. SIEMs also started to ingest and store Netflow/IPFix too. However, due to both technical scalability concerns and cost concerns, SIEMs have never become the mainstream tool for traffic analysis.
As time went by, more data are collected: files, user information, threat intelligence, etc. The goal of collecting more data was valid – get pervasive visibility— but the net challenge, responding to critical attacks, is like finding needles in a haystack, especially via manual searches or rules defined by humans manually. It’s labor-intensive and time-inefficient.
There are two technical challenges facing data-driven security: how to store large volumes of data at scale, allowing for efficient searches and analysis, and how to deal with the variety of data – especially unstructured data – as data can be of any format. Traditional relational databases based on SQL ran into both of these problems. Earlier vendors scrambled to solve these problems with many home- grown solutions. Unfortunately, most of them were not as efficient as what we are using today based on NoSQL databases for Big Data lakes.
There is one more challenge facing data-driven security: the software architecture to cost-effectively build a scalable system for enterprise customers. The typical 3-tier architecture with front-end business logic and database tiers became a big hurdle. Today’s cloud-native architectures, building upon micro services architecture with containers, provide much more scalable and cost-effective solutions.
2. The Rise of AI—Use machine learning with big data analysis to help find and automate detections
Once you have lots of data, what do you do with it? As mentioned previously, with a large of volume of data, sifting through it looking for meaningful patterns is tedious and time-consuming. If your IT infrastructure somehow unfortunately gets hacked, it might take days to find it out. It is too late because the damage is already done, or sensitive data is already stolen. In this case, too much data becomes a problem. Fortunately, we have seen the rise of machine learning thanks to advances of machine learning algorithms as well as computing power.
Machines are very good at doing repetitive and tedious work very quickly, very efficiently and tirelessly 24×7. When machines are equipped with intelligence like learning capabilities, they help humans scale. Lot of researchers and vendors in security started leveraging AI to solve the problem, to help them find those needles, or to see trends that are hidden inside large datasets. Thus, the rise of AI-driven security. There are lots of innovations in this space. For example, there are lots of Endpoint Detection and Response (EDR) companies using AI to address endpoint security problems; lots of User and Entity Behavior Analysis (UEBA) companies using AI to address insider threats, and lots of network traffic Analysis (NTA) companies using AI to find abnormal network traffic patterns.
If data is the new gold, breaches detected via AI are like jewelry made out of gold. It requires lots of time, patience and hard work in order to make beautiful jewelry by hand out of plain gold. With the help of machines, especially advanced machinery, commercial production of great jewelry becomes possible.
On the surface, with AI-driven security, lots of data becomes less of a problem as ML usually requires lots of data to train the model and learn the patterns. On the contrary, not enough data is obviously a problem as the less data, the less accurate and thus the less useful the ML model becomes. However, as time goes by, researchers gradually realized that the right data is far more important. Too much data without the right information is just a waste of computing power for ML as well as a waste of storage at the same time. Lots of earlier UEBA vendors with solutions based on logs from SIEM tools learned this hard lesson. The SIEM might have collected lots of logs, but only a few of them contain the right information related to user behaviors. So, although data-driven security builds a great foundation for AI-driven security, in order to build scalable and accurate AI-driven security, the right data is far more important.
Using AI definitely helps alleviate the pains with Big Data, but it has its own challenges. For example, both UEBA and NTA leverage unsupervised machine learning for behavior analysis. However, an abnormal behavior observed for a user or from network traffic does not necessarily mean a security incident. These tools can generate lots of noise, causing alert fatigue. Furthermore, the smart hacks usually go through several stages of the kill chain before they can be caught. How can you recover the trace of a breach and fix the root cause?
There is another big challenge facing AI-driven security collectively: cost – the capital cost of the tools themselves, the cost of the infrastructure of compute and storage used by these tools, and the cost of operations of so many different tools in their silos with different screens.
So, even if each tool has the ability to distill gigabytes or terabytes of data down to a short list of a few critical detections, the question still remains, “What are you missing by not consolidating these tools into a single platform and correlating the detections across all tools and feeds?”
3. The Rise of Correlations—Correlate detections and automate response across the entire attack surface in a single platform
With this new wave, the conversation is shifted from data and AI to correlations. Obviously, this wave is built upon the two previous waves. However, it is all about getting above the data as well as the tools, and it is about wrapping everything together in a single platform. Following our early gold-to-jewelry analogy, this is about matching a set of the right jewelry and putting it together on a person to make it look nice as a whole.
Analysts in security from ESG, Gartner, Forrester, IDC and Omdia all agree this change in thinking from siloed tools to a consolidated platform is key to help us see and respond to critical breaches. Specifically, the platform needs to take a holistic approach and look at correlating detections across network, cloud, endpoints and applications – the entire attack surface.
The key objectives of correlations of detections across tools, feeds and environments are to improve detection accuracy, to detect complex attacks by combining weaker signals from multiple tools to spot attacks that might otherwise be ignored, and to improve operational efficiency and productivity. No longer does comprehensive visibility mean finding the right data—rather, it means finding the complex attacks.
To do so, you should consider Open XDR. XDR is a cohesive security operations solution with tight integration of many security applications in a single platform with a single pane of glass. It automatically collects and correlates data from multiple tools, improves detections, and provides automated responses. A platform tying together tools and applications innately drives the cost down, in both tools cost and infrastructure cost, while it improves operational efficiency with an easy-to-use single pane of glass.
We think there are five primary foundational requirements of XDR:
- Centralization of normalized and enriched data from a variety of data sources including logs, network traffic, applications, cloud, Threat Intelligence, etc.
- Automatic detection of security events from the data collected through advanced analytics such as NTA, UBA and EBA.
- Correlation of individual security events into a high-level view.
- Centralized response capability that interacts with individual security products.
- Cloud-native micro-services architecture for deployment flexibility, scalability and high availability.
In conclusion, Stellar Cyber is the only purpose-built Open XDR platform that ingests and curates all cybersecurity data to detect, correlate and respond across the entire kill chain. The wave of correlations has started, and you are welcome to ride along with us enjoying the journey together!
SIEMs – EMPTY PROMISES?
SIEMs have been the foundation of security operations for decades, and that should be acknowledged. However, SIEMs have made a lot of great promises, and to this day, have not fulfilled many of them…