Semgrep Static Analysis: Custom Rules for Your Codebase
How to write Semgrep rules, run static analysis in CI, triage findings effectively, and how Semgrep compares to SonarQube for developer security.
Semgrep Static Analysis: Custom Rules for Your Codebase
Semgrep is a fast, open-source static analysis tool that matches code patterns using rules written in a syntax that closely resembles the code being analyzed. Unlike traditional SAST tools that require deep compilation or complex setup, Semgrep runs directly against source files and produces results in seconds. Its real advantage is the ability to write custom rules that encode your team's specific security requirements and coding standards. This guide covers rule writing, CI integration, triage workflows, and how Semgrep compares to SonarQube.
Installation
# pip
pip install semgrep
# Homebrew
brew install semgrep
# Docker
docker pull returntocorp/semgrep
Verify:
semgrep --version
Running Semgrep with the Rule Registry
Semgrep maintains a registry of thousands of community and professionally-maintained rules. Running the security-focused registry rules against your codebase takes one command:
semgrep --config=p/security-audit .
semgrep --config=p/owasp-top-ten .
semgrep --config=p/nodejs-security .
semgrep --config=p/python-security .
For a quick audit of a JavaScript/TypeScript project:
semgrep --config=p/javascript --config=p/typescript --config=p/react .
Output includes the file path, line number, rule ID, severity, and a snippet of the matching code. The --json flag produces structured output for pipeline integration.
Writing Custom Rules
This is where Semgrep's real value lies. A Semgrep rule is a YAML file describing a pattern to match, a message, and metadata.
Anatomy of a rule:
rules:
- id: hardcoded-api-key
patterns:
- pattern: |
const $VAR = "..."
- metavariable-regex:
metavariable: $VAR
regex: ".*(key|token|secret|password|api_key).*"
message: >
Potential hardcoded credential in variable '$VAR'. Use environment variables
or a secrets manager instead.
severity: ERROR
languages: [javascript, typescript]
metadata:
category: security
cwe: CWE-798
Key concepts:
- Metavariables (
$VAR,$EXPR) match any code fragment and can be referenced elsewhere in the rule. patterns(list) requires all sub-patterns to match.pattern-eithermatches if any sub-pattern matches (OR logic).pattern-notexcludes matches.pattern-insideconstrains a match to within a larger pattern.
Example: Detecting unsafe eval calls that use user input:
rules:
- id: eval-user-input
patterns:
- pattern: eval($X)
- pattern-not: eval("...")
message: "eval() called with non-literal argument — potential code injection"
severity: ERROR
languages: [javascript, typescript, python]
metadata:
cwe: CWE-95
Example: Enforcing parameterized queries in Node.js:
rules:
- id: raw-sql-concatenation
patterns:
- pattern: |
$DB.query("..." + $INPUT)
- pattern-either:
- pattern-inside: |
app.$METHOD($PATH, function($REQ, $RES) { ... })
- pattern-inside: |
router.$METHOD($PATH, async ($REQ, $RES) => { ... })
message: >
SQL query built with string concatenation inside a route handler.
Use parameterized queries to prevent SQL injection.
severity: ERROR
languages: [javascript]
metadata:
cwe: CWE-89
Example: Detecting missing authorization checks in Express routes:
rules:
- id: express-route-missing-auth
patterns:
- pattern: |
app.$METHOD($PATH, async ($REQ, $RES) => { ... })
- pattern-not: |
app.$METHOD($PATH, $AUTH, async ($REQ, $RES) => { ... })
- metavariable-regex:
metavariable: $PATH
regex: "^\"/api/.*\""
message: "API route '$PATH' may be missing an authentication middleware"
severity: WARNING
languages: [javascript, typescript]
Testing Your Rules
Semgrep has a built-in test framework. Annotate your test files with comments:
// ruleid: eval-user-input
eval(userInput); // should match
// ok: eval-user-input
eval("console.log(1)"); // should NOT match
Run tests:
semgrep --test rules/ --test-ignore-todo
This is critical for avoiding false positives and false negatives before deploying rules to CI.
Running Semgrep in CI
GitHub Actions
name: Semgrep SAST
on:
pull_request: {}
push:
branches: [main]
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
- name: Run Semgrep
run: |
semgrep ci \
--config=p/security-audit \
--config=./semgrep-rules/ \
--sarif \
--output=semgrep.sarif \
--error
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: semgrep.sarif
The --error flag exits non-zero when findings are present, blocking the PR. Remove it or set --severity ERROR to only block on errors and allow warnings through.
For teams using Semgrep AppSec Platform (the managed SaaS version), set SEMGREP_APP_TOKEN to get findings tracked across runs with deduplication, assignment, and metrics.
Pre-commit Hook
# .pre-commit-config.yaml
repos:
- repo: https://github.com/returntocorp/semgrep
rev: v1.60.0
hooks:
- id: semgrep
args: ['--config=p/security-audit', '--error']
Pre-commit hooks catch issues before they reach CI, which is faster feedback for developers.
Triage Strategy
Raw Semgrep output on a large codebase can be overwhelming. A practical triage approach:
1. Baseline suppression. On first run, generate a baseline file that records all current findings. Future runs only report new findings:
semgrep --config=p/security-audit --json > baseline.json
semgrep --config=p/security-audit --json --baseline-commit=HEAD~1 .
2. In-line suppression for accepted risk. When a finding is a known false positive or accepted risk, suppress it with a comment:
// nosemgrep: eval-user-input
eval(safeTemplate);
3. Rule severity tuning. Demote noisy rules from ERROR to WARNING in your custom configuration rather than suppressing findings wholesale.
4. Triage by rule, not by finding. If a rule generates 50 findings across the codebase, evaluate the rule's accuracy before triaging individual hits. A 30% true positive rate means the rule needs refinement.
Semgrep vs. SonarQube
| Dimension | Semgrep | SonarQube |
|---|---|---|
| Setup complexity | Low (single binary, no DB) | High (server, DB, scanner) |
| Rule customization | Excellent — rules look like code | Complex Java/XML rule writing |
| Languages | 30+ with pattern-level support | 30+ with deeper semantic analysis |
| Semantic analysis | Limited (pattern-based) | Deep (data flow, taint tracking) |
| Taint analysis | Available in Pro tier | Available in Developer+ editions |
| Open source | Yes (core engine) | Community Edition only |
| CI integration | Native, lightweight | Requires SonarScanner agent |
| Pricing | Free (OSS), $40+/dev/month (Pro) | Free (Community), $150+/dev/year |
| False positive rate | Low for pattern rules; varies for taint | Moderate — known for noisy findings |
When to choose Semgrep: You want fast, code-pattern matching with custom rules that your team writes and owns. You need lightweight CI integration without a persistent server. You are primarily focused on security rather than code quality metrics.
When to choose SonarQube: You need deep semantic analysis with data-flow tracking. You want code quality metrics (duplication, complexity, test coverage) alongside security. You are in an enterprise environment that already standardizes on SonarQube.
The two tools can also be run in parallel — Semgrep for fast, targeted security checks in the PR diff, and SonarQube for comprehensive quality gates on the full codebase.
Building a Rule Library for Your Codebase
For organizations building internal rule libraries, a recommended structure:
semgrep-rules/
security/
authentication.yaml
authorization.yaml
injection.yaml
secrets.yaml
standards/
error-handling.yaml
logging.yaml
api-design.yaml
third-party/
custom-sdk-misuse.yaml
Version-control these rules in a dedicated repository, reference them in CI configs across all repos, and require a security review for rule changes. This creates a living, codebase-specific security specification that improves with every new vulnerability class your team encounters.