GuardDog is an open source project at Datadog for identifying malicious PyPI and npm packages. Using GuardDog’s one-two punch of package metadata scanning and Semgrep-powered code behavior analysis, you can make sure your Python and JavaScript code remains free of malicious dependencies.
Over a year ago, we released GuardDog v1.0, which added support for the npm package ecosystem and GitHub CI integrations. Today, we are proud to announce the release of GuardDog v2.0, which brings with it several exciting new features:
- Support for YARA rules
- Support for custom source code (Semgrep and YARA) rules
- Initial support for the Golang ecosystem
In this blog post, we’ll take a tour of what’s new, examine some improvements to our current capabilities, and discuss how Datadog Software Composition Analysis (SCA) uses GuardDog. For more information on v2.0, be sure to check out the release notes and the project GitHub repository.
Scan packages with YARA rules
YARA is a popular tool among security researchers for finding complex textual or binary indicators in files or in memory. Like Semgrep, YARA’s powerful pattern-matching capabilities make it a great fit as a backend engine for GuardDog, enabling it to comb through source code in search of malicious signatures. By adding YARA support to GuardDog, we’re taking advantage of the rich body of publicly available YARA rules to detect more malware associated with specific campaigns and threat actors.
YARA support gives us an easy way to add language-agnostic detection rules to GuardDog that can be used across all of our supported language ecosystems—no modifications or parsing overhead necessary. It also gives GuardDog a new ability to scan for malicious indicators in binary files.
Here is a simplified version of GuardDog’s shady-links
rule for identifying suspicious URLs, implemented with YARA:
rule yara_shady_links {
strings:
$shady1 = /(http[s]?:\/\/bit\.ly.*)$/
$shady2 = /(http[s]?:\/\/.*\.(link|xyz|tk|ml|ga|cf|gq|pw|email|stream))$/
condition:
any of ($shady*)
}
GuardDog treats this rule just like any of our Semgrep source code rules, and we can run it in both PyPI and npm ecosystems:
$ guarddog pypi scan requests --rules yara_shady_links
Found 0 potentially malicious indicators scanning requests
$ guarddog npm scan react --rules yara_shady_links
Found 0 potentially malicious indicators scanning react
Use custom source code rules
Though each GuardDog distribution includes a diverse and ever-growing collection of source code rules, users often wish to deploy their own tried-and-true rules or test out community-sourced rules to see how they perform in their own environments. Moreover, particular ecosystems or use cases can have nuances that are not captured by the standard rule set. In order to meet our users where they are, GuardDog now officially supports Bring Your Own (BYO) Semgrep and YARA rules.
For example, consider the following YARA rule written in response to a JavaScript cryptocurrency theft campaign discovered in early 2024.
rule clinksink {
strings:
$crypto1 = "solanaWeb3.Connection"
$crypto2 = "solanaWeb3.LAMPORTS_PER_SOL"
$crypto3 = "solanaWeb3.PublicKey.findProgramAddress"
$crypto4 = "solanaWeb3.SystemProgram.transfer"
$crypto5 = "solanaWeb3.Transaction"
$func1 = "async function info("
$func2 = "async function updateConnectText("
$func3 = "async function updateMintText("
$func4 = "async function start("
$func5 = "async function connect("
$func6 = "async function waitForWalletConnection("
$func7 = "async function connectSolana("
$func8 = "async function getTokenBalance("
$func9 = "async function createTxs("
$func10 = "async function createPrizeTxs("
$func11 = "async function claim("
$func12 = "async function createTokenTxs("
$func13 = "async function claimSolana("
$phantom1 = ".phantom"
$phantom2 = ".isphantom" nocase
$phantom3 = "phantom.app"
$transaction1 = ".AccountLayout."
$transaction2 = ".TOKEN_PROGRAM_ID."
$transaction3 = ".Token.createAssociatedTokenAccountInstruction("
$transaction4 = ".Token.createTransferInstruction("
condition:
5 of ($func*) or (3 of ($crypto*) and any of ($phantom*) and 3 of ($transaction*))
}
None of the standard rules target cryptocurrency stealers and would therefore not alert on the glaring indicators specific to this application setting and malware campaign. Thanks to GuardDog’s custom rules feature, however, we can now use this rule as-is:
$ guarddog npm scan samplePackage --rules clinksink
Found 1 potentially malicious indicators scanning samplePackage
Just drop your custom .yml
and .yar
rule files alongside GuardDog’s in the guarddog/analyzers/sourcecode
directory and you’re good to go.
Early support for Golang
Last but not least, GuardDog can now scan Golang modules:
$ guarddog go scan github.com/aws/aws-sdk-go-v2
Found 0 potentially malicious indicators scanning github.com/aws/aws-sdk-go-v2
Golang continues to see increased adoption across the industry and is already a key language for our customers (and for us). We are very excited to see what GuardDog will dig up in this new ecosystem.
For now, Golang scanning supports a single rule that identifies suspicious URLs in source code, but we hope to add Golang versions of other standard source code rules in the coming months. Make sure to keep an eye on the GitHub repository for updates (contributions welcome!).
Improved data exfiltration and DLL hijacking rules
Along with these new features, GuardDog v2.0 also includes substantial improvements in our PyPI and npm source code rules for data exfiltration and DLL sideloading.
The exfiltrate-sensitive-data
rule for Python packages now covers data exfiltration from sensitive SQL databases via the sqlite3
library. Attackers frequently target such databases because browsers use them to store credentials and user profile information.
Meanwhile, the dll-hijacking
rules for both PyPI and npm now cover a MITRE ATT&CK-sourced family of related TTPs for executing malicious DLLs, typically via rundll32.exe
.
Both of these rule enhancements were inspired by malware samples found in the wild.
How Datadog Software Composition Analysis (SCA) leverages GuardDog
Our Security Research team continuously scans PyPI and npm for malicious packages using GuardDog and manually analyzes and triages the results. These results are used to inform Datadog SCA, which automatically discovers dependencies in your applications at runtime or by scanning your code and raises an alert when a malicious dependency is found:
Check out GuardDog v2.0 today
GuardDog v2.0 is now available for download via PyPI (pip install guarddog
) and GitHub.
As always, we love to hear your thoughts on all things GuardDog and software supply chain security. Write to us at securitylabs@datadoghq.com or get involved on GitHub.