In November 2022, we released GuardDog, an open source project that helps identify malicious Python packages using Semgrep and package metadata analysis.
Today, we're excited to release a new major version of GuardDog, v1.0, with a number of new features that we describe below. Head over to the GitHub repository, and refer to the release notes if you're an existing user.
For this occasion, GuardDog is also getting a logo!
In this post, we’ll discuss how this new version of GuardDog adds support for:
- scanning npm packages
- integration in CI pipelines with GitHub Actions and GitHub Code Scanning
- verifying PyPI package integrity
- A few other updates, events, and what’s coming next for GuardDog
This new version of GuardDog introduces support for scanning not only PyPI, but also npm packages.
guarddog npm scan react
You can also use
guarddog verify to scan all the dependencies listed in a
guarddog npm verify package.json
We wrote several new heuristics to scan npm packages:
npm-serialize-environmentidentifies when a package serializes
process.envto exfiltrate environment variables.
npm-silent-process-executionidentifies when a package silently executes an executable file.
npm-exec-base64identifies when a package dynamically executes code through the
npm-install-scriptidentifies packages that, when installed, would trigger a pre-install or post-install script to run automatically.
In addition to these new npm-specific heuristics, GuardDog supports all existing package metadata heuristics, such as typosquatting detection. Have a look at the README, or run
guarddog npm list-rules for more information.
One of GuardDog's strengths is its ease of use, which makes it a great choice for running in a continuous integration (CI) pipeline. For instance, you may want to scan new dependencies introduced by a pull request to make sure they are not malicious. Alternatively, you could consider automatically scanning your dependencies on a monthly basis to ensure that none of them have been compromised.
GuardDog now supports writing scan results to a SARIF file, making it straightforward to integrate with GitHub code scanning. GitHub code scanning allows third-party tools such as GuardDog to effortlessly integrate in the GitHub UI—for instance, by automatically posting pull request comments and allowing you to mark scan results as false positives.
You can start benefiting from the integration with two simple steps:
- Install and run GuardDog inside your GitHub action.
- Upload the resulting SARIF file to GitHub using the upload-sarif action.
The GitHub action below allows to scan your dependencies every time a pull request is opened against the
name: GuardDog on: push: branches: [main] pull_request: branches: [main] permissions: contents: read jobs: guarddog: permissions: contents: read # for actions/checkout to fetch code security-events: write # allow to upload scan results name: Scan all dependencies runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up Python uses: actions/setup-python@v3 with: python-version: "3.10" - name: Install GuardDog run: pip install guarddog - run: guarddog pypi verify requirements.txt --output-format sarif --exclude-rules repository_integrity_mismatch > guarddog.sarif - name: Upload SARIF file for GitHub code scanning uses: github/codeql-action/upload-sarif@v2 with: category: guarddog-builtin sarif_file: guarddog.sarif
When a malicious or compromised package is added to your dependencies, GuardDog will let you know about it by commenting on your pull request:
If the reported issue is a false positive, you can triage it directly from the GitHub UI using the "Dismiss alert" dropdown. Then, GuardDog will not notify you again for this dependency.
In addition to the four new npm-specific heuristics mentioned above, we added a more advanced PyPI metadata heuristic,
repository_integrity_mismatch. This heuristic compares the contents of the PyPI package with the ones available on GitHub and flags any file that differs or has been added without being committed to GitHub.
This heuristic can help teams identify an attacker who compromises a maintainer's PyPI account and publishes a malicious version of the package directly to PyPI.
Documentation is notoriously challenging to keep up to date. The list and documentation of heuristics supported by GuardDog is now automatically generated and injected into the README, so the latest information is always available.
This same mechanism applies to the new
guarddog pypi list-rules and
guarddog npm list-rules commands, so you can be sure the output of the
list-rules commands is always up-to-date.
We'll be presenting GuardDog along with real-world analyses of malicious packages in the wild at Insomni'Hack 2023 in Lausanne, Switzerland. Insomni'Hack is one of the largest security conferences in Switzerland, welcoming a vibrant community of blue teamers, offensive security professionals, and security researchers every year. If you're in the area, we hope to see you there!
In the future, we'll classify GuardDog heuristics in several confidence buckets. This will make it easier to run only low-false-positive heuristics (e.g., in CI) or more complete ones (e.g., for ad hoc assessments), depending on your specific needs.
We'll also add more heuristics so GuardDog keeps catching the most recent supply-chain malware families. Interested in real-world malware samples? Have a look at our GitHub repository, where we share with the community over 270 malicious packages that GuardDog has already identified in the wild.