Cyber security vendors and researchers have reported for years how PowerShell is being used by cyber threat actors to install backdoors, execute malicious code, and otherwise achieve their objectives within enterprises. Security is a cat-and-mouse game between adversaries, researchers, and blue teams. The flexibility and capability of PowerShell has made conventional detection both challenging and critical. This blog post will illustrate how FireEye is leveraging artificial intelligence and machine learning to raise the bar for adversaries that use PowerShell.
In this post you will learn:
PowerShell is one of the most popular tools used to carry out attacks. Data gathered from FireEye Dynamic Threat Intelligence (DTI) Cloud shows malicious PowerShell attacks rising throughout 2017 (Figure 1).
Figure 1: PowerShell attack statistics observed by FireEye DTI Cloud in 2017 – blue bars for the number of attacks detected, with the red curve for exponentially smoothed time series
FireEye has been tracking the malicious use of PowerShell for years. In 2014, Mandiant incident response investigators published a Black Hat paper that covers the tactics, techniques and procedures (TTPs) used in PowerShell attacks, as well as forensic artifacts on disk, in logs, and in memory produced from malicious use of PowerShell. In 2016, we published a blog post on how to improve PowerShell logging, which gives greater visibility into potential attacker activity. More recently, our in-depth report on APT32 highlighted this threat actor’s use of PowerShell for reconnaissance and lateral movement procedures, as illustrated in Figure 2.
Figure 2: APT32 attack lifecycle, showing PowerShell attacks found in the kill chain
Let’s take a deep dive into an example of a malicious PowerShell command (Figure 3).
Figure 3: Example of a malicious PowerShell command
The following is a quick explanation of the arguments:
What is hidden inside the Base64 decoded portion? Figure 4 shows the decoded command.
Figure 4: The decoded command for the aforementioned example
Interestingly, the decoded command unveils a stealthy fileless network access and remote content execution!
It’s worth mentioning that a similar malicious PowerShell tactic was used in a recent cryptojacking attack exploiting CVE-2017-10271 to deliver a cryptocurrency miner. This attack involved the exploit being leveraged to deliver a PowerShell script, instead of downloading the executable directly. This PowerShell command is particularly stealthy because it leaves practically zero file artifacts on the host, making it hard for traditional antivirus to detect.
There are several reasons why adversaries prefer PowerShell:
Additionally, from an economics perspective:
Next, we would like to share how we at FireEye are combining our PowerShell threat research with data science to combat this threat, thus raising the bar for adversaries.
Can we use machine learning to predict if a PowerShell command is malicious?
One advantage FireEye has is our repository of high quality PowerShell examples that we harvest from our global deployments of FireEye solutions and services. Working closely with our in-house PowerShell experts, we curated a large training set that was comprised of malicious commands, as well as benign commands found in enterprise networks.
After we reviewed the PowerShell corpus, we quickly realized this fit nicely into the NLP problem space. We have built an NLP model that interprets PowerShell command text, similar to how Amazon Alexa interprets your voice commands.
One of the technical challenges we tackled was** **synonym, a problem studied in linguistics. For instance, “NOL”, “NOLO”, and “NOLOGO” have identical semantics in PowerShell syntax. In NLP, a stemming algorithm will reduce the word to its original form, such as “Innovating” being stemmed to “Innovate”.
We created a prefix-tree based stemmer for the PowerShell command syntax using an efficient data structure known as trie, as shown in Figure 5. Even in a complex scripting language such as PowerShell, a trie can stem command tokens in nanoseconds.
Figure 5: Synonyms in the PowerShell syntax (left) and the trie stemmer capturing these equivalences (right)
The overall NLP pipeline we developed is captured in the following table:
NLP Key Modules
|
Functionality
—|—
Decoder
|
Detect and decode any encoded text
Named Entity Recognition (NER)
|
Detect and recognize any entities such as IP, URL, Email, Registry key, etc.
Tokenizer
|
Tokenize the PowerShell command into a list of tokens
Stemmer
|
Stem tokens into semantically identical token, uses trie
Vocabulary Vectorizer
|
Vectorize the list of tokens into machine learning friendly format
Supervised classifier
|
Binary classification algorithms:
Reasoning
|
The explanation of why the prediction was made. Enables analysts to validate predications.
The following are the key steps when streaming the aforementioned example through the NLP pipeline:
Figure 6: NLP pipeline that predicts the malicious probability of a PowerShell command
More importantly, we established a production end-to-end machine learning pipeline (Figure 7) so that we can constantly evolve with adversaries through re-labeling and re-training, and the release of the machine learning model into our products.
Figure 7: End-to-end machine learning production pipeline for PowerShell machine learning
We successfully implemented and optimized this machine learning model to a minimal footprint that fits into our research endpoint agent, which is able to make predictions in milliseconds on the host. Throughout 2018, we have deployed this PowerShell machine learning detection engine on incident response engagements. Early field validation has confirmed detections of malicious PowerShell attacks, including:
The unique values brought by the PowerShell machine learning detection engine include:
The ultimate value of this innovation is to evolve with the broader threat landscape, and to create a competitive edge over adversaries.
We would like to acknowledge: