open source software

Introducing Threatest, a CLI and Go framework for end-to-end testing of threat detection rules

August 13, 2022

Introducing Threatest, A Cli And Go Framework For End-to-end Testing Of Threat Detection Rules

Reliably detecting threats in an environment is critical for securing applications and infrastructure. But the increasing complexity of modern data pipelines makes it difficult to verify that detection rules are consistently able to spot the threats they are designed to look for.

Today, we are happy to announce the release of a new open source project: Threatest, a CLI and Go framework for end-to-end testing of threat detection rules. Threatest allows you to easily define test scenarios where you detonate an attack technique then expect an alert to have been created on an external platform.

The lifecycle of detection engineering

Broadly speaking, detection engineering is the discipline of identifying threats relevant to an organization, understanding them in depth, and coming up with reliable strategies to detect them.

Although there is no standardized process, detection engineering generally follows several phases:

The detection engineering lifecycle at Datadog

  1. Ideation: What attack techniques are relevant to our organization?

  2. Research: How does the attack technique work? What logs or telemetry does it generate?

  3. Gathering requirements: What are the logs required to implement a detection? Do we need additional visibility or a broader scope to implement the detection?

  4. Development: Defining a concrete detection strategy to craft a detection rule.

  5. Testing and deployment: Testing the rule, ideally against real-world data, to ensure it works as expected and does not generate too many false positives or false negatives.

  6. Maintenance: Continuously gathering metrics on the alerts generated by the detection rule and taking corrective actions as needed.

The challenges of detection engineering

It’s important to remember that the threat detection process involves multiple complex and moving pieces. Before creating and testing rules even come into play, you need to know the overall anatomy of a logging and detection pipeline—which includes log collection and centralization, parsing, indexing, and aggregation.

To make it more concrete, let’s take the example of using CloudTrail as a data source for writing threat detection rules. Below is a typical architecture to collect and process logs from AWS CloudTrail in a multi-account environment. It uses an organization trail that forwards logs to an S3 bucket, which a Lambda function then picks up.

A typical logging and detection pipeline for AWS CloudTrail logs

A number of things can go wrong here, such as:

  • The CloudTrail trail may be misconfigured and unable to deliver logs to the S3 bucket
  • The Lambda function may not have enough permissions to read from the S3 bucket
  • The Lambda function may incorrectly forward the logs to our SIEM
  • The parsing pipeline may incorrectly parse CloudTrail logs
  • Detection rules may be unexpectedly disabled
  • A recent modification in a detection rule may have changed its behavior in an unexpected way

One can unit test rules by feeding them with sample logs as a first step toward ensuring they behave as expected. However, it does little to account for the complexity of our end-to-end cloud infrastructure and associated threat detection pipeline, as it only tests for a single step of the overall logic of our threat detection rules.

The need for end-to-end threat detection testing

The only way to gain full confidence in our ability to detect threats is to perform end-to-end testing of our detections. Namely, we consider all our logging and processing pipelines as a blackbox; we reproduce the attacks we expect to detect and verify on the other end that an expected alert is produced.

Blackbox testing detection rules

The process is generally manual. First, we reproduce an attack manually or using an automated tool such as Stratus Red Team (for cloud environments) or Atomic Red Team (for Windows/Linux machines). Then, we head over to our SIEM or logs management platform and verify that the alert we expect was produced. Finally, we close the alert to ensure it doesn’t pollute our daily operations.

Unfortunately, this approach is time-consuming, challenging to scale, and hard to automate.

Introducing Threatest

Threatest is a CLI and Go framework that helps writing end-to-end tests for threat detection rules. It allows you to easily define test scenarios where you detonate an attack technique then expect an alert to have been created on an external platform.

Threatest supports detonating attacks through several ways: using Stratus Red Team or the AWS SDK for cloud attacks and executing a remote (over SSH) or local bash command for OS-level attacks. It then uses the Datadog API to verify that the expected alert was created.

Let’s have a look at how to use Threatest in practice.

Using Threatest as a CLI

First, we define a scenario as YAML. Here, we detonate the Stratus Red Team attack technique "Exfiltration of an EBS snapshot" and verify that the corresponding alert is created in Datadog.

  - name: Exfiltrating an EBS snapshot
        attackTechnique: aws.exfiltration.ec2-share-ebs-snapshot
      - timeout: 15m
          name: "AWS EBS Snapshot possible exfiltration"

Then, we run it using the CLI:

threatest run scenario.threatest.yaml

Threatest will invoke Stratus Red Team programmatically to detonate the attack technique and then poll the Datadog API until the expected alert shows up. Finally, it will close the alert in the Datadog platform.

INFO[0000] Running 1 scenarios with a parallelism of 1
Detonating 'aws.exfiltration.ec2-share-ebs-snapshot' with Stratus Red Team
Execution ID:c59242de-a4de-4c8f-a5ae-145e5c484820
INFO[0104] Scenario 'Exfiltrating an EBS snapshot' passed in 104.16 seconds
Threatest verified that the expected alert was created on Datadog and closed it with a comment.

Using Threatest as a Go framework

You can also use Threatest programatically. First, we import it and instantiate it:

import (
   . ""

func TestDetection(t *testing.T) {
   threatest := Threatest()

Next, we create a scenario, similar to the first example above:

ttp := StratusRedTeamTechnique("aws.exfiltration.ec2-share-ebs-snapshot")

threatest.Scenario("Exfiltrating an EBS snapshot").
  Expect(DatadogSecuritySignal("AWS EBS Snapshot possible exfiltration")).
  WithTimeout(15 * time.Minute)

Then, we run the scenario:

require.NoError(t, threatest.Run())

Finally, we run our test.

$ go test detection_test.go -v
=== RUN   TestDetection
Detonating 'aws.exfiltration.ec2-share-ebs-snapshot' with Stratus Red Team
Execution ID:5b24cd0e-d4d5-4984-8d67-a2ab3073216c
--- PASS: TestDetection (92.34s)
ok      command-line-arguments    93.296s

Again, if the expected alert hasn’t triggered after the specified timeout, Threatest will raise an error:

  Error Trace:    detection_test.go:21
  Error:          Received unexpected error:
                  At least one scenario failed:

  Exfiltrating an EBS snapshot: 1 assertions did not pass
  => Did not find Datadog security signal 'AWS EBS Snapshot possible exfiltration'

--- FAIL: TestDetection (948.09s)

Head over to the examples section of our repository to see additional ways of using Threatest, including how to detonate commands over SSH or use it along with Terratest.

How we use Threatest at Datadog

We built Threatest to meet our needs for testing out-of-the-box detection rules that ship with our Datadog Cloud SIEM and Cloud Workload Security products, as well as rules used internally by our threat detection team.

What’s Next

Open source is at the core of engineering culture at Datadog. We wanted to ensure the community could benefit from Threatest early on and participate in its development.

Moving forward, we are eager to add support for platforms other than Datadog in Threatest, and we’d love to collaborate. Feel free to open an issue on the repository and we’ll be happy to chat. We also plan to add additional detonators such as running commands inside a container or Kubernetes pod and detonating commands locally or remotely using Atomic Red Team.

We hope Threatest will advance the state of end-to-end threat detection testing, and we can’t wait to hear from you!

Updates made to this entry

November 28, 2022Updated the post to reflect that Threatest v1.1.1 now has a standalone CLI.

June 28, 2023Updated the post title to reflect that Threatest v1.1.1 now has a standalone CLI.

Did you find this article helpful?

Related Content