Introducing GuardDog 2.0: YARA scanning, user-supplied rules, and Golang support

GuardDog is an open source project at Datadog for identifying malicious PyPI and npm packages. Using GuardDog’s one-two punch of package metadata scanning and Semgrep-powered code behavior analysis, you can make sure your Python and JavaScript code remains free of malicious dependencies.

Over a year ago, we released GuardDog v1.0, which added support for the npm package ecosystem and GitHub CI integrations. Today, we are proud to announce the release of GuardDog v2.0, which brings with it several exciting new features:

Support for YARA rules
Support for custom source code (Semgrep and YARA) rules
Initial support for the Golang ecosystem

In this blog post, we’ll take a tour of what’s new, examine some improvements to our current capabilities, and discuss how Datadog Software Composition Analysis (SCA) uses GuardDog. For more information on v2.0, be sure to check out the release notes and the project GitHub repository.

Scan packages with YARA rules

YARA is a popular tool among security researchers for finding complex textual or binary indicators in files or in memory. Like Semgrep, YARA’s powerful pattern-matching capabilities make it a great fit as a backend engine for GuardDog, enabling it to comb through source code in search of malicious signatures. By adding YARA support to GuardDog, we’re taking advantage of the rich body of publicly available YARA rules to detect more malware associated with specific campaigns and threat actors.

YARA support gives us an easy way to add language-agnostic detection rules to GuardDog that can be used across all of our supported language ecosystems—no modifications or parsing overhead necessary. It also gives GuardDog a new ability to scan for malicious indicators in binary files.

Here is a simplified version of GuardDog’s shady-links rule for identifying suspicious URLs, implemented with YARA:

rule yara_shady_links {
    strings:
        $shady1 = /(http[s]?:\/\/bit\.ly.*)$/
        $shady2 = /(http[s]?:\/\/.*\.(link|xyz|tk|ml|ga|cf|gq|pw|email|stream))$/
    condition:
        any of ($shady*)
}

GuardDog treats this rule just like any of our Semgrep source code rules, and we can run it in both PyPI and npm ecosystems:

$ guarddog pypi scan requests --rules yara_shady_links
Found 0 potentially malicious indicators scanning requests

$ guarddog npm scan react --rules yara_shady_links
Found 0 potentially malicious indicators scanning react

Use custom source code rules

Though each GuardDog distribution includes a diverse and ever-growing collection of source code rules, users often wish to deploy their own tried-and-true rules or test out community-sourced rules to see how they perform in their own environments. Moreover, particular ecosystems or use cases can have nuances that are not captured by the standard rule set. In order to meet our users where they are, GuardDog now officially supports Bring Your Own (BYO) Semgrep and YARA rules.

For example, consider the following YARA rule written in response to a JavaScript cryptocurrency theft campaign discovered in early 2024.

rule clinksink {
    strings:
        $crypto1 = "solanaWeb3.Connection"
        $crypto2 = "solanaWeb3.LAMPORTS_PER_SOL"
        $crypto3 = "solanaWeb3.PublicKey.findProgramAddress"
        $crypto4 = "solanaWeb3.SystemProgram.transfer"
        $crypto5 = "solanaWeb3.Transaction"
        $func1 = "async function info("
        $func2 = "async function updateConnectText("
        $func3 = "async function updateMintText("
        $func4 = "async function start("
        $func5 = "async function connect("
        $func6 = "async function waitForWalletConnection("
        $func7 = "async function connectSolana("
        $func8 = "async function getTokenBalance("
        $func9 = "async function createTxs("
        $func10 = "async function createPrizeTxs("
        $func11 = "async function claim("
        $func12 = "async function createTokenTxs("
        $func13 = "async function claimSolana("
        $phantom1 = ".phantom"
        $phantom2 = ".isphantom" nocase
        $phantom3 = "phantom.app"
        $transaction1 = ".AccountLayout."
        $transaction2 = ".TOKEN_PROGRAM_ID."
        $transaction3 = ".Token.createAssociatedTokenAccountInstruction("
        $transaction4 = ".Token.createTransferInstruction("
    condition:
        5 of ($func*) or (3 of ($crypto*) and any of ($phantom*) and 3 of ($transaction*))
}

None of the standard rules target cryptocurrency stealers and would therefore not alert on the glaring indicators specific to this application setting and malware campaign. Thanks to GuardDog’s custom rules feature, however, we can now use this rule as-is:

$ guarddog npm scan samplePackage --rules clinksink
Found 1 potentially malicious indicators scanning samplePackage

Just drop your custom .yml and .yar rule files alongside GuardDog’s in the guarddog/analyzers/sourcecode directory and you’re good to go.

Early support for Golang

Last but not least, GuardDog can now scan Golang modules:

$ guarddog go scan github.com/aws/aws-sdk-go-v2
Found 0 potentially malicious indicators scanning github.com/aws/aws-sdk-go-v2

Golang continues to see increased adoption across the industry and is already a key language for our customers (and for us). We are very excited to see what GuardDog will dig up in this new ecosystem.

For now, Golang scanning supports a single rule that identifies suspicious URLs in source code, but we hope to add Golang versions of other standard source code rules in the coming months. Make sure to keep an eye on the GitHub repository for updates (contributions welcome!).

Improved data exfiltration and DLL hijacking rules

Along with these new features, GuardDog v2.0 also includes substantial improvements in our PyPI and npm source code rules for data exfiltration and DLL sideloading.

The exfiltrate-sensitive-data rule for Python packages now covers data exfiltration from sensitive SQL databases via the sqlite3 library. Attackers frequently target such databases because browsers use them to store credentials and user profile information.

Meanwhile, the dll-hijacking rules for both PyPI and npm now cover a MITRE ATT&CK-sourced family of related TTPs for executing malicious DLLs, typically via rundll32.exe.

Both of these rule enhancements were inspired by malware samples found in the wild.

How Datadog Software Composition Analysis (SCA) leverages GuardDog

Our Security Research team continuously scans PyPI and npm for malicious packages using GuardDog and manually analyzes and triages the results. These results are used to inform Datadog SCA, which automatically discovers dependencies in your applications at runtime or by scanning your code and raises an alert when a malicious dependency is found:

SCA finding — Datadog SCA identifying a malicious dependency (click to enlarge)

Check out GuardDog v2.0 today

GuardDog v2.0 is now available for download via PyPI (pip install guarddog) and GitHub.

As always, we love to hear your thoughts on all things GuardDog and software supply chain security. Write to us at securitylabs@datadoghq.com or get involved on GitHub.

Introducing GuardDog 2.0: YARA scanning, user-supplied rules, and Golang support

Scan packages with YARA rules

Use custom source code rules

Early support for Golang

Improved data exfiltration and DLL hijacking rules

How Datadog Software Composition Analysis (SCA) leverages GuardDog

Check out GuardDog v2.0 today

Did you find this article helpful?

Related Content

work with us

Scan packages with YARA rules

Use custom source code rules

Early support for Golang

Improved data exfiltration and DLL hijacking rules

How Datadog Software Composition Analysis (SCA) leverages GuardDog

Check out GuardDog v2.0 today

Did you find this article helpful?

Subscribe to the Datadog Security Digest

Thank you for subscribing!

Related Content

work with us