Semgrep Static Analysis: Custom Rules for Your Codebase

Semgrep is a fast, open-source static analysis tool that matches code patterns using rules written in a syntax that closely resembles the code being analyzed. Unlike traditional SAST tools that require deep compilation or complex setup, Semgrep runs directly against source files and produces results in seconds. Its real advantage is the ability to write custom rules that encode your team's specific security requirements and coding standards. This guide covers rule writing, CI integration, triage workflows, and how Semgrep compares to SonarQube.

Installation

# pip
pip install semgrep

# Homebrew
brew install semgrep

# Docker
docker pull returntocorp/semgrep

Verify:

semgrep --version

Running Semgrep with the Rule Registry

Semgrep maintains a registry of thousands of community and professionally-maintained rules. Running the security-focused registry rules against your codebase takes one command:

semgrep --config=p/security-audit .
semgrep --config=p/owasp-top-ten .
semgrep --config=p/nodejs-security .
semgrep --config=p/python-security .

For a quick audit of a JavaScript/TypeScript project:

semgrep --config=p/javascript --config=p/typescript --config=p/react .

Output includes the file path, line number, rule ID, severity, and a snippet of the matching code. The --json flag produces structured output for pipeline integration.

Writing Custom Rules

This is where Semgrep's real value lies. A Semgrep rule is a YAML file describing a pattern to match, a message, and metadata.

Anatomy of a rule:

rules:
  - id: hardcoded-api-key
    patterns:
      - pattern: |
          const $VAR = "..."
      - metavariable-regex:
          metavariable: $VAR
          regex: ".*(key|token|secret|password|api_key).*"
    message: >
      Potential hardcoded credential in variable '$VAR'. Use environment variables
      or a secrets manager instead.
    severity: ERROR
    languages: [javascript, typescript]
    metadata:
      category: security
      cwe: CWE-798

Key concepts:

Metavariables ($VAR, $EXPR) match any code fragment and can be referenced elsewhere in the rule.
patterns (list) requires all sub-patterns to match.
pattern-either matches if any sub-pattern matches (OR logic).
pattern-not excludes matches.
pattern-inside constrains a match to within a larger pattern.

Example: Detecting unsafe eval calls that use user input:

rules:
  - id: eval-user-input
    patterns:
      - pattern: eval($X)
      - pattern-not: eval("...")
    message: "eval() called with non-literal argument — potential code injection"
    severity: ERROR
    languages: [javascript, typescript, python]
    metadata:
      cwe: CWE-95

Example: Enforcing parameterized queries in Node.js:

rules:
  - id: raw-sql-concatenation
    patterns:
      - pattern: |
          $DB.query("..." + $INPUT)
      - pattern-either:
          - pattern-inside: |
              app.$METHOD($PATH, function($REQ, $RES) { ... })
          - pattern-inside: |
              router.$METHOD($PATH, async ($REQ, $RES) => { ... })
    message: >
      SQL query built with string concatenation inside a route handler.
      Use parameterized queries to prevent SQL injection.
    severity: ERROR
    languages: [javascript]
    metadata:
      cwe: CWE-89

Example: Detecting missing authorization checks in Express routes:

rules:
  - id: express-route-missing-auth
    patterns:
      - pattern: |
          app.$METHOD($PATH, async ($REQ, $RES) => { ... })
      - pattern-not: |
          app.$METHOD($PATH, $AUTH, async ($REQ, $RES) => { ... })
      - metavariable-regex:
          metavariable: $PATH
          regex: "^\"/api/.*\""
    message: "API route '$PATH' may be missing an authentication middleware"
    severity: WARNING
    languages: [javascript, typescript]

Testing Your Rules

Semgrep has a built-in test framework. Annotate your test files with comments:

// ruleid: eval-user-input
eval(userInput);          // should match

// ok: eval-user-input
eval("console.log(1)");   // should NOT match

Run tests:

semgrep --test rules/ --test-ignore-todo

This is critical for avoiding false positives and false negatives before deploying rules to CI.

Running Semgrep in CI

GitHub Actions

name: Semgrep SAST

on:
  pull_request: {}
  push:
    branches: [main]

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep
    steps:
      - uses: actions/checkout@v4

      - name: Run Semgrep
        run: |
          semgrep ci \
            --config=p/security-audit \
            --config=./semgrep-rules/ \
            --sarif \
            --output=semgrep.sarif \
            --error
        env:
          SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: semgrep.sarif

The --error flag exits non-zero when findings are present, blocking the PR. Remove it or set --severity ERROR to only block on errors and allow warnings through.

For teams using Semgrep AppSec Platform (the managed SaaS version), set SEMGREP_APP_TOKEN to get findings tracked across runs with deduplication, assignment, and metrics.

Pre-commit Hook

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/returntocorp/semgrep
    rev: v1.60.0
    hooks:
      - id: semgrep
        args: ['--config=p/security-audit', '--error']

Pre-commit hooks catch issues before they reach CI, which is faster feedback for developers.

Triage Strategy

Raw Semgrep output on a large codebase can be overwhelming. A practical triage approach:

1. Baseline suppression. On first run, generate a baseline file that records all current findings. Future runs only report new findings:

semgrep --config=p/security-audit --json > baseline.json
semgrep --config=p/security-audit --json --baseline-commit=HEAD~1 .

2. In-line suppression for accepted risk. When a finding is a known false positive or accepted risk, suppress it with a comment:

// nosemgrep: eval-user-input
eval(safeTemplate);

3. Rule severity tuning. Demote noisy rules from ERROR to WARNING in your custom configuration rather than suppressing findings wholesale.

4. Triage by rule, not by finding. If a rule generates 50 findings across the codebase, evaluate the rule's accuracy before triaging individual hits. A 30% true positive rate means the rule needs refinement.

Semgrep vs. SonarQube

Dimension	Semgrep	SonarQube
Setup complexity	Low (single binary, no DB)	High (server, DB, scanner)
Rule customization	Excellent — rules look like code	Complex Java/XML rule writing
Languages	30+ with pattern-level support	30+ with deeper semantic analysis
Semantic analysis	Limited (pattern-based)	Deep (data flow, taint tracking)
Taint analysis	Available in Pro tier	Available in Developer+ editions
Open source	Yes (core engine)	Community Edition only
CI integration	Native, lightweight	Requires SonarScanner agent
Pricing	Free (OSS), $40+/dev/month (Pro)	Free (Community), $150+/dev/year
False positive rate	Low for pattern rules; varies for taint	Moderate — known for noisy findings

When to choose Semgrep: You want fast, code-pattern matching with custom rules that your team writes and owns. You need lightweight CI integration without a persistent server. You are primarily focused on security rather than code quality metrics.

When to choose SonarQube: You need deep semantic analysis with data-flow tracking. You want code quality metrics (duplication, complexity, test coverage) alongside security. You are in an enterprise environment that already standardizes on SonarQube.

The two tools can also be run in parallel — Semgrep for fast, targeted security checks in the PR diff, and SonarQube for comprehensive quality gates on the full codebase.

Building a Rule Library for Your Codebase

For organizations building internal rule libraries, a recommended structure:

semgrep-rules/
  security/
    authentication.yaml
    authorization.yaml
    injection.yaml
    secrets.yaml
  standards/
    error-handling.yaml
    logging.yaml
    api-design.yaml
  third-party/
    custom-sdk-misuse.yaml

Version-control these rules in a dedicated repository, reference them in CI configs across all repos, and require a security review for rule changes. This creates a living, codebase-specific security specification that improves with every new vulnerability class your team encounters.