What Are DGAs and How to Detect them?
Domain Generation Algorithms (DGAs) are a class of algorithms that periodically and dynamically generate large numbers of domain names. Typically, the domains are used by malware and botnets as rendezvous points to facilitate callback to the malicious actor’s Command & Control servers. DGAs allow malware to generate tens of thousands of domains per day, the vast majority of them unregistered. The enormous numbers of unregistered domains are used to masquerade the registered ones, allowing the infected botnets to evade detection and deterrence by signature or IP-reputation based security detection systems.
The first known malware family to use a DGA was Kraken in 2008. Later that year, the Conflicker worm pushed the DGA tactic into notoriety. Even after 10 years, it is still possible to find Conflicker or one of its variants on some of today’s networks.
In tandem with the increasing proliferation of malware, the usage of DGAs has become more pervasive.
The Objectives of DGA Detection
Because DGA activity is a considerable indicator of compromise, it becomes critical to detect any such activities on your network. There are three levels of DGA detection, with each subsequent level correlating to a rise in severity. Detection at later levels is more difficult, but more critical.
If a DGA is detected, it means that one or more of your systems have been infected by DGA-based malware and have become botnets. Some actions need to be taken. The first objective is to identify the affected systems, properly cleaning or quarantining them to prevent escalation.
The next objective is to determine whether a given DGA domain name is registered. If the domain is registered, it has become an active Command & Control server that presents a great risk to your network. Infected systems, now botnets, may use these servers to call home and receive commands from the malicious attacker. Therefore, the second component of an effective DGA detection system is the ability to differentiate registered domains from the unregistered ones.
For example, a DGA may generate 1000 domains, from xyzwer1, xyzwer2 …. to xyzwer1000. The hacker only needs to register one domain, i.e., xyzwer500, not the other 999 domains. If the registered domain and its associated IP can be identified, the information can be used to block the communication channel between the targeted system and the Command & Control server. Additionally, the intel should be propagated to all other prevention or detection systems in place to obstruct callback to that server from any system in the network.
The last but most critical objective of a DGA detection system is to determine whether callback was successful with the registered domains and contact was made between the infected system(s) and the Command & Control server. If such activity is detected, some damage may have already been done. Perhaps the malware in your network was updated, or new malware was installed. Sensitive data may have been exfiltrated.
How does DGA Detection work?
DGA activity is detected by capturing and analyzing network packets, usually in five general steps.
Step 1 – Detect DNS Application
Detection begins via DNS request and/or response messages. DNS is a fundamental Internet protocol, and most firewalls have a policy to allow outgoing DNS traffic on its reserved port 53. However, a hacker may take advantage of port 53 to send its traffic without adherence to the standard DNS message format. This attack is called DNS tunneling. A Deep Packet Inspection (DPI) Engine is recommended to identify the DNS applications more precisely.
Step 2 – Extract Domain Names
Once a network application is identified as DNS, the domain names in the DNS query and response messages need to be extracted. In order to extract the right domain name, the DNS message’s content needs to be parsed carefully and a DPI engine is required to perform this task.
Step 3 – Detect any DGA
Analysis needs to be performed on the domains extracted from DNS messages to determine whether they are DGAs. This is perhaps the most complicated step. The challenge is to reduce both false positives as well as false negatives. Detection mechanisms have evolved dramatically over the last 10+ years.
Some mechanisms are based on the relatively simple Shannon Entropy.
Some mechanisms are based on more sophisticated Ngrams as presented by Fyodor in the Hitb conference
Lately, with machine learning becoming popularized, its methodologies have also been applied to DGA detection. Machine learning can combine the features of Ngrams, Shannon Entropy, as well as the length of the domain names to influence decisions. Several machine learning models have been tried. There is a very good blogpost by Jay Jacobs in 2014 describing the process.
Here is another open source DGA detector based on Machine Learning with Markov Chain:
Step 4 – Detect Registered DGA Domains
In order to detected whether a DGA domain name is registered, DNS responses need to be checked. Merely tracking DNS requests is not sufficient – the detection system should track the entire transaction to facilitate correlation between pieces of information.
Step 5 – Detect Traffic to Registered DGA Domains
When most existing DGA detection systems focus on detecting whether a domain name is a DGA domain, they often forget the last question, the most important one: is there any traffic that has been sent to the registered DGA domains? In order to detect this in a timely fashion, DGA domain detection must be tightly coupled with network traffic inspection. The results need to be echoed back to the traffic inspection engine immediately before any damage is done.
Step 6 – Blocking the Traffic to Registered DGA Domains
While not technically a part of detection, if there is an integration with a prevention system such as a Firewall or IPS, a rule should be inserted right away to block all the traffic to the registered domains.
A great DGA detection system should perform all 5 steps. An excellent DGA detection system should also include Step 6. Unfortunately, most DGA detection systems today stop at either step 3 or step 4.
Because DGAs are difficult to detect with signature or reputation based detection or prevention system, they have become quite popular with malware developers.
An intelligent detection system is required to perform the detection. An excellent DGA detection system must extract domain name information from DNS transactions, perform thorough analytics to detect DGA status, check registration status of suspected domains, correlate with network traffic inspection to assess the level of compromise, and ideally integrate with prevention systems to avoid further compromise. In order to reduce both false positives as well as false negatives, a machine learning should be seriously considered. Only with comprehensive and pervasive intelligence at every stage can the threat be truly ameliorated.
The repository in Github by Andrey Abakumove contains algorithms for generating domain names, as well as dictionaries of malicious domain names.