In November 2022, we released GuardDog, an open source project that helps identify malicious Python packages using Semgrep and package metadata analysis.
Today, we're excited to release a new major version of GuardDog, v1.0, with a number of new features that we describe below. Head over to the GitHub repository, and refer to the release notes if you're an existing user.
For this occasion, GuardDog is also getting a logo!
In this post, we’ll discuss how this new version of GuardDog adds support for:
- scanning npm packages
- integration in CI pipelines with GitHub Actions and GitHub Code Scanning
- verifying PyPI package integrity
- A few other updates, events, and what’s coming next for GuardDog
Scanning npm packages
This new version of GuardDog introduces support for scanning not only PyPI, but also npm packages.
guarddog npm scan react
You can also use guarddog verify
to scan all the dependencies listed in a package.json
file:
guarddog npm verify package.json
We wrote several new heuristics to scan npm packages:
npm-serialize-environment
identifies when a package serializesprocess.env
to exfiltrate environment variables.npm-silent-process-execution
identifies when a package silently executes an executable file.npm-exec-base64
identifies when a package dynamically executes code through theeval
function.npm-install-script
identifies packages that, when installed, would trigger a pre-install or post-install script to run automatically.
In addition to these new npm-specific heuristics, GuardDog supports all existing package metadata heuristics, such as typosquatting detection. Have a look at the README, or run guarddog npm list-rules
for more information.
Easy integration in CI with GitHub code scanning
One of GuardDog's strengths is its ease of use, which makes it a great choice for running in a continuous integration (CI) pipeline. For instance, you may want to scan new dependencies introduced by a pull request to make sure they are not malicious. Alternatively, you could consider automatically scanning your dependencies on a monthly basis to ensure that none of them have been compromised.
GuardDog now supports writing scan results to a SARIF file, making it straightforward to integrate with GitHub code scanning. GitHub code scanning allows third-party tools such as GuardDog to effortlessly integrate in the GitHub UI—for instance, by automatically posting pull request comments and allowing you to mark scan results as false positives.
You can start benefiting from the integration with two simple steps:
- Install and run GuardDog inside your GitHub action.
- Upload the resulting SARIF file to GitHub using the upload-sarif action.
The GitHub action below allows to scan your dependencies every time a pull request is opened against the main
branch:
name: GuardDog
on:
push:
branches: [main]
pull_request:
branches: [main]
permissions:
contents: read
jobs:
guarddog:
permissions:
contents: read # for actions/checkout to fetch code
security-events: write # allow to upload scan results
name: Scan all dependencies
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: "3.10"
- name: Install GuardDog
run: pip install guarddog
- run: guarddog pypi verify requirements.txt --output-format sarif --exclude-rules repository_integrity_mismatch > guarddog.sarif
- name: Upload SARIF file for GitHub code scanning
uses: github/codeql-action/upload-sarif@v2
with:
category: guarddog-builtin
sarif_file: guarddog.sarif
When a malicious or compromised package is added to your dependencies, GuardDog will let you know about it by commenting on your pull request:
If the reported issue is a false positive, you can triage it directly from the GitHub UI using the "Dismiss alert" dropdown. Then, GuardDog will not notify you again for this dependency.
Verifying that PyPI package contents match the source code on GitHub
In addition to the four new npm-specific heuristics mentioned above, we added a more advanced PyPI metadata heuristic, repository_integrity_mismatch
. This heuristic compares the contents of the PyPI package with the ones available on GitHub and flags any file that differs or has been added without being committed to GitHub.
This heuristic can help teams identify an attacker who compromises a maintainer's PyPI account and publishes a malicious version of the package directly to PyPI.
Heuristics list automatically generated from code
Documentation is notoriously challenging to keep up to date. The list and documentation of heuristics supported by GuardDog is now automatically generated and injected into the README, so the latest information is always available.
This same mechanism applies to the new guarddog pypi list-rules
and guarddog npm list-rules
commands, so you can be sure the output of the list-rules
commands is always up-to-date.
GuardDog at Insomni'Hack 2023
We'll be presenting GuardDog along with real-world analyses of malicious packages in the wild at Insomni'Hack 2023 in Lausanne, Switzerland. Insomni'Hack is one of the largest security conferences in Switzerland, welcoming a vibrant community of blue teamers, offensive security professionals, and security researchers every year. If you're in the area, we hope to see you there!
What's next
In the future, we'll classify GuardDog heuristics in several confidence buckets. This will make it easier to run only low-false-positive heuristics (e.g., in CI) or more complete ones (e.g., for ad hoc assessments), depending on your specific needs.
We'll also add more heuristics so GuardDog keeps catching the most recent supply-chain malware families. Interested in real-world malware samples? Have a look at our GitHub repository, where we share with the community over 270 malicious packages that GuardDog has already identified in the wild.
We're also eager to hear what you think and how you're using GuardDog. Feel free to open an issue on GitHub, or write to us at securitylabs@datadoghq.com.