Emulating Ransomware Behavior in Purple Teaming

Published in

Fraktal

9 min readMar 25, 2021

Ransomware is the most prevalent cyber threat facing most organizations today. With purple team testing, organizations can verify their detection and response capabilities against this growing threat.

We built a tool for our purple teaming assignments that models ransomware attacks and can help companies improve their defenses. In this blog post we describe the tool, compare detection capabilities of popular endpoint detection and response (EDR) tools and explain what readers should take away from our findings.

Ransomware is big business for cyber criminals

In business environments, ransomware detections increased 365 % and ransomware was related to one-third (28%) of security incidents in 2019. The problem is not going away anytime soon, as criminals are now exploiting the fear and uncertainty caused by the COVID-19 pandemic. Organizations in industries like healthcare and critical infrastructure cannot risk losing access to their systems and are motivated to pay the ransom.

When organizations pay the ransom, they are making matters worse. With the average ransom payment being $154,108, ransomware is undoubtedly a lucrative business for cyber criminals. Such rewards motivate criminals to develop new tactics and tools constantly, making it challenging for standard security tools to keep up.

Estimated global damage from ransomware added up to 20 billion dollars in 2020. Losing sensitive data, customer information, money and facing days of downtime are consequences companies want to avoid — not to mention the harm on reputation and customer trust, which can take years to restore. To stay prepared for evolving ransomware attacks, organizations need a continuous process for testing and improving their capabilities.

Estimated global damage from ransomware added up to 20 billion dollars in 2020. Such rewards motivate criminals to develop new tactics and tools constantly.

Better Detection and Response Capabilities with Purple Teaming

Purple teaming is a collaborative effort of offensive red teams and defending blue teams. In purple teaming service, we emulate threat actors in our clients’ environments and work together with their operational infosec teams — the blue teams — to improve their detection and response capabilities.

Attack scenarios conducted in purple team testing and constant communication between the teams give organizations better means to identify real-world cases. The co-operative approach increases understanding of attacker Tactics, Techniques and Procedures (TTPs) within the organization. Often in case of ransomware, companies rely on EDR and other defensive tools as the only point of protection. However, static detection tools can miss abnormal behavior, and the main risk of ransomware is related to threat actors.

Purple teaming is a collaborative effort of offensive red teams and defending blue teams. Its goal is continuous improvement.

To emulate these behaviors in attack scenarios, we started to collect ransomware TTPs from recent reports. We quickly realized that ransomware attacks have repeating patterns that should be monitored by blue teams. To support companies in protecting themselves, we built a ransomware emulator tool that can be used to model ransomware attacks that originate from compromised endpoints.

Mapping ransomware behavior patterns to the MITRE ATT&CK framework

To understand what kind of commonalities ransomware attacks have in terms of TTPs, we read through a bunch of recent reports describing ransomware attacks in technical detail.

We were mostly interested in activities after initial access has been achieved. Modeling how attackers gain access to the environment is not our focus in purple teaming, as we can safely assume that attackers will always find a way in. However, we want to concentrate on improving the blue team’s capability to detect and respond to active breaches.

We focused on ransomware behavioral patterns instead of the use of individual tools, commands or techniques. Focusing on this level of indicators for detection and response is brittle, as attackers can switch away from a tool easily, or perform obfuscation that bypasses detection. However, they always need to conduct certain behaviors in the environment to achieve their goals. This is also reflected in the classic Pyramid of Pain, where detecting attacker TTPs is at the apex. When operating at this level, the goal is to respond to attacker behavior, which also has the benefit of forcing the attacker to learn new behavior instead of simply switching tools or IP addresses.

In recent years, the MITRE ATT&CK framework has emerged as the go-to approach for modeling real-world attacker TTPs across the lifecycle of a cyber attack. It has evolved into a language that security practitioners use when describing attacks, as well a benchmark used for EDR evaluations. We collected the identified behavioral patterns into a table and mapped each activity to ATT&CK.

Ransomware behaviors identified for the purpose of building a tool.

Next, we implemented a tool that would emulate these behaviors. The tool can be used to walk through steps in a typical ransomware attack lifecycle while observing which activities are detected by security controls in the environment.

Bypassing antivirus products to emulate ransomware behavior

We decided to use C# for implementing the ransomware emulator tool, mainly because it has a good native support for most behavior we want to model, and there are good open source projects we can amend to implement the rest. Also, executing .NET assemblies in memory with tools such as Cobalt Strike is common both in purple teaming and in ransomware attacks.

The immediate problem with .NET assemblies is that they are easily decompiled back to source code, which makes it easy for antivirus products to detect and block them. Our focus is on emulating behavior patterns, so we want to avoid AV products detecting our tool based on static analysis.

We ran into this problem early on, when implementing the LSASS dumping functionality. A common solution is to execute the assemblies in memory, bypassing the file-based AV detections. However, AV products also detect and block the in-memory execution done with C#, so we implemented a separate stager that downloads and executes the emulator assembly in memory.

From our research we learned about C++/CLI, which is a mechanism in Visual C++ that can be used to mix native and managed code. This is a perfect tool for us, as it allows loading and executing managed code from an unmanaged context. You can find the excellent research and POC code by TheWover here.

We wrote a version of TheWover’s tool that downloads, decrypts and executes our emulator assembly in memory. Currently this stager can be used to execute our emulator without a detection by any of the EDR tools we evaluated.

Fransom: The open source ransomware emulator

We named our tool Fransom, and you can find it in our GitHub.

The tool collects the behavior listed above into a single executable that can easily be used to emulate active ransomware on an endpoint. The tool can be executed using either command line options or the interactive shell. Command line options are useful if using Cobalt Strike and execute-assembly. The interactive shell can be used with our stager so that we can execute multiple tests without downloading the assembly each time. It is also convenient when we run a ransomware emulation exercise and want to emulate the full lifecycle of a ransomware outbreak on an endpoint.

We released our ransomware emulator Fransom on GitHub for other purple teams to use.

The following table shows the ransomware behaviors supported by the tool at this time.

Ransomware behaviors currently supported by the Fransom tool. More will be added.

Demos

Here are brief demonstrations about how to use the tool.

User profile encryption using Cobalt Strike execute-assembly.

Starting a Cobalt Strike beacon.

EDR tool comparison

To find out how well EDR tools detect ransomware behavior, we compared leading EDR products using Fransom. We had access to fully functional production environments utilizing each of these tools. We executed Fransom in a workstation monitored by the EDR and observed if the tool detected or blocked the behavior. The results are summarized in the following table.

The comparison was conducted in our clients’ production environments and using their custom configurations. It is therefore possible that the products offer features and configuration options to block these behaviors that were not in use at the time of testing. These results should be considered indicative and not as a fair comparison between different products.

Detections made by popular EDR products in production use. F-Secure RDS is a managed service.

Observations

As shown, none of the EDRs detected anything in the discovery, persistence and credential access phases of the attack. SentinelOne was the only one to detect process injection techniques, with a detection on the APC injection technique.

One of the most interesting aspects of this comparison was that the tools were unable to detect the encryption activities emulated by Fransom in the impact phase of the attack. Cortex and SentinelOne only detected manipulation of shadow copies. Defender was the only one to react when encrypting files on the endpoint. For some reason, this triggered a generic Cobalt Strike alert, and the Fransom process was killed. We counted this as a detection, even though the alert type didn’t describe the correct behavior and might be misleading to the blue team responding to the incident.

LSASS dumping was not successful on the Defender endpoint, and the endpoint itself became unresponsive and needed a reboot. Defender didn’t produce any alerts though, so we did not count this as a detection.

F-Secure’s offering was the only managed service in the comparison. This means that the agent on the endpoint does not block malicious behavior or alert the user. Instead, it streams the events to a backend environment where analysts monitor and correlate the events and escalate incidents to be investigated by their clients’ security teams. In our test, we received one such escalation regarding shadow copy manipulation using the vssadmin.exe utility. It should be noted that this test environment was not joined to a Windows domain, so we were unable to execute the domain enumeration activities with F-Secure’s EDR tool installed.

Conclusions

We expected the EDR tools to perform better against this kind of non-stealthy emulation. Our emulated activities in the persistence, credential access and defense evasion phases of the attack replicate techniques that have been well-known for years. Activities related to discovery and impact phases are noisy and should stand out compared to any kind of normal baseline behavior.

To be fair, our emulation bypassed the initial access phase, where the objective of the attacker is to gain a foothold on the endpoint before executing the ransomware payload. In our experience, all of these tools are excellent in detecting and alerting on common techniques of achieving this goal. This in turn gives the blue team several changes of detecting and disrupting the attack before it even gets properly started.

Depending solely on the detection and blocking capability of an EDR tool running on the endpoint is currently not enough to disrupt a ransomware attack before it causes significant harm.

Our results are not usable as a benchmark when comparing EDR tools as they all fared essentially equally well. Instead, we recommend looking into the following capabilities of the EDR tool candidates to make the correct decision:

Look into what kind of logs the EDR tool collects and how you can gain access to them. Detecting some of the behaviors that we emulated requires developing custom detection use cases, you might need to collect the EDR logs to a SIEM in order to correlate with other sources. There are good open source projects to leverage so that you don’t need to start from scratch, such as Sigma.
Determine whether the EDR offers ways to customize and enhance the default detection capabilities to detect the behaviors you most care about. As an example, Defender offers attack surface reduction rules that can be utilized to enhance Defender’s capabilities (here’s an excellent blog post from Palantir on ASR rule recommendations).
The R stands for Response, so it is worth investigating what kind of capabilities the tools offer for collecting information for analysis. You want to be able to pull information for investigation when an alert goes off to determine what’s going on. As shown here, the information produced by EDR tools is not always correct, so you need to do your own triage before hitting the panic button.
It might be worth doing some detection testing with each tool before committing to a purchase. This is useful especially with the managed offerings when you want also to ensure that the escalations from the provider are timely and contain sufficient and correct information to take the incident forward.

Fraktal is an unbiased testing provider of security tools and processes, and a pathfinder in the purple team services approach. We will be happy to assist your company in this area.

About the authors

The Framsom tool was developed and the testing conducted by the Fraktal cyber security team. Found this post useful? Let us know what you think on @FraktalCyber Twitter.