open source software

Announcing GuardDog 1.0, with npm support, new heuristics, and easier CI integration

February 14, 2023

Announcing Guarddog 1.0, With Npm Support, New Heuristics, And Easier Ci Integration

In November 2022, we released GuardDog, an open source project that helps identify malicious Python packages using Semgrep and package metadata analysis.

Today, we're excited to release a new major version of GuardDog, v1.0, with a number of new features that we describe below. Head over to the GitHub repository, and refer to the release notes if you're an existing user.

For this occasion, GuardDog is also getting a logo!

The new GuardDog logo

In this post, we’ll discuss how this new version of GuardDog adds support for:

Scanning npm packages

This new version of GuardDog introduces support for scanning not only PyPI, but also npm packages.

guarddog npm scan react

You can also use guarddog verify to scan all the dependencies listed in a package.json file:

guarddog npm verify package.json

We wrote several new heuristics to scan npm packages:

  • npm-serialize-environment identifies when a package serializes process.env to exfiltrate environment variables.
  • npm-silent-process-execution identifies when a package silently executes an executable file.
  • npm-exec-base64 identifies when a package dynamically executes code through the eval function.
  • npm-install-script identifies packages that, when installed, would trigger a pre-install or post-install script to run automatically.

In addition to these new npm-specific heuristics, GuardDog supports all existing package metadata heuristics, such as typosquatting detection. Have a look at the README, or run guarddog npm list-rules for more information.

Easy integration in CI with GitHub code scanning

One of GuardDog's strengths is its ease of use, which makes it a great choice for running in a continuous integration (CI) pipeline. For instance, you may want to scan new dependencies introduced by a pull request to make sure they are not malicious. Alternatively, you could consider automatically scanning your dependencies on a monthly basis to ensure that none of them have been compromised.

GuardDog now supports writing scan results to a SARIF file, making it straightforward to integrate with GitHub code scanning. GitHub code scanning allows third-party tools such as GuardDog to effortlessly integrate in the GitHub UI—for instance, by automatically posting pull request comments and allowing you to mark scan results as false positives.

You can start benefiting from the integration with two simple steps:

  1. Install and run GuardDog inside your GitHub action.
  2. Upload the resulting SARIF file to GitHub using the upload-sarif action.

The GitHub action below allows to scan your dependencies every time a pull request is opened against the main branch:

name: GuardDog

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  contents: read

jobs:
  guarddog:
    permissions:
      contents: read # for actions/checkout to fetch code
      security-events: write # allow to upload scan results
    name: Scan all dependencies
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v3
        with:
          python-version: "3.10"

      - name: Install GuardDog
        run: pip install guarddog

      - run: guarddog pypi verify requirements.txt --output-format sarif --exclude-rules repository_integrity_mismatch > guarddog.sarif

      - name: Upload SARIF file for GitHub code scanning
        uses: github/codeql-action/upload-sarif@v2
        with:
          category: guarddog-builtin
          sarif_file: guarddog.sarif

When a malicious or compromised package is added to your dependencies, GuardDog will let you know about it by commenting on your pull request:

GuardDog commenting on a GitHub pull request through the GitHub code scanning integration

If the reported issue is a false positive, you can triage it directly from the GitHub UI using the "Dismiss alert" dropdown. Then, GuardDog will not notify you again for this dependency.

Verifying that PyPI package contents match the source code on GitHub

In addition to the four new npm-specific heuristics mentioned above, we added a more advanced PyPI metadata heuristic, repository_integrity_mismatch. This heuristic compares the contents of the PyPI package with the ones available on GitHub and flags any file that differs or has been added without being committed to GitHub.

This heuristic can help teams identify an attacker who compromises a maintainer's PyPI account and publishes a malicious version of the package directly to PyPI.

Heuristics list automatically generated from code

Documentation is notoriously challenging to keep up to date. The list and documentation of heuristics supported by GuardDog is now automatically generated and injected into the README, so the latest information is always available.

This same mechanism applies to the new guarddog pypi list-rules and guarddog npm list-rules commands, so you can be sure the output of the list-rules commands is always up-to-date.

GuardDog at Insomni'Hack 2023

We'll be presenting GuardDog along with real-world analyses of malicious packages in the wild at Insomni'Hack 2023 in Lausanne, Switzerland. Insomni'Hack is one of the largest security conferences in Switzerland, welcoming a vibrant community of blue teamers, offensive security professionals, and security researchers every year. If you're in the area, we hope to see you there!

What's next

In the future, we'll classify GuardDog heuristics in several confidence buckets. This will make it easier to run only low-false-positive heuristics (e.g., in CI) or more complete ones (e.g., for ad hoc assessments), depending on your specific needs.

We'll also add more heuristics so GuardDog keeps catching the most recent supply-chain malware families. Interested in real-world malware samples? Have a look at our GitHub repository, where we share with the community over 270 malicious packages that GuardDog has already identified in the wild.

We're also eager to hear what you think and how you're using GuardDog. Feel free to open an issue on GitHub, or write to us at securitylabs@datadoghq.com.

Did you find this article helpful?

Subscribe to the Datadog Security Digest

Get Security Labs posts, insights from the cloud security community, and the latest Datadog security features delivered to your inbox monthly. No spam.

Related Content