Architecture & Design

Security Logging Best Practices: What to Log and How to Alert

A comprehensive guide to security logging—which authentication events, access failures, and data changes to capture, what sensitive data must never appear in logs, structured JSON logging patterns, and building effective anomaly-based alerting.

October 15, 20259 min readShipSafer Team

Logs as Security Infrastructure

Security logs serve three distinct purposes: detection (spotting an attack in progress), investigation (reconstructing what happened after an incident), and compliance (demonstrating that controls are operating as required). A logging strategy that serves all three purposes requires deliberate choices about what to capture, how to structure it, how long to retain it, and what patterns should trigger alerts.

The cost of poor logging becomes clear in incident response. When a breach is discovered, the first questions are: when did this start? What did the attacker access? How did they get in? Without comprehensive logs, these questions often go unanswered, leaving organizations unable to scope their breach notification obligations, unable to close the attack vector with confidence, and unable to demonstrate to regulators that they attempted to detect the intrusion.

What to Log

Authentication Events

Authentication is the entry point to your system. Every event here should be logged without exception.

Log these:

  • Successful login: user identifier, source IP, user agent, authentication method used (password, SSO, API key), timestamp
  • Failed login: user identifier (or the username that was attempted), source IP, user agent, failure reason (invalid password, account locked, MFA failure, account not found—log the reason in your system, but do not return the specific reason to the client to prevent user enumeration)
  • MFA challenge issued: user identifier, challenge type (TOTP, SMS, push)
  • MFA success and failure
  • Password change: user identifier, source IP, whether the change was self-initiated or admin-initiated
  • Password reset initiated and completed: same fields
  • Account lockout triggered and unlocked
  • Session created: session token ID (not the token value), user identifier, source IP, user agent
  • Session invalidated: session ID, reason (logout, expiry, admin revocation, suspicious activity)
  • API key created, rotated, and revoked: key ID (not the key value), user/service identifier

A single authentication log entry in structured JSON:

{
  "event_type": "auth.login.success",
  "timestamp": "2025-10-15T14:23:11.432Z",
  "user_id": "usr_01HXYZ123",
  "source_ip": "203.0.113.45",
  "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...",
  "auth_method": "password+totp",
  "session_id": "sess_01HABC456",
  "request_id": "req_01HDEF789",
  "geo": {
    "country": "US",
    "city": "San Francisco",
    "asn": 15169
  }
}

Authorization and Access Control Events

Authentication tells you who entered the system. Authorization logs tell you what they tried to do.

Log these:

  • Authorization check failure: who, what resource, what action, why denied (wrong role, wrong owner, resource not found vs. access denied—distinguish between 403 and 404 in server-side logs even if your API returns 404 for both)
  • Privilege escalation: any time a user assumes a higher-privilege role or context (sudo, assume-role, admin mode)
  • Access to sensitive resources: high-value data (PII, payment data, health records) access should be logged regardless of success or failure
  • Admin actions: any action taken by a user in an administrative role—user creation, deletion, permission changes, configuration changes

Data Change Events

Data integrity depends on knowing when data changed, who changed it, and what the change was.

Log these:

  • Create, update, delete operations on sensitive models (user records, payment information, configuration)
  • Bulk operations (bulk export, bulk delete) with record counts
  • Schema or configuration changes
  • Data export operations: who exported what, how many records, to what destination

The standard pattern is an audit log table adjacent to the primary data model:

CREATE TABLE audit_log (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  timestamp   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  actor_id    TEXT NOT NULL,          -- user or service identifier
  actor_type  TEXT NOT NULL,          -- 'user', 'service', 'cron'
  action      TEXT NOT NULL,          -- 'create', 'update', 'delete'
  model       TEXT NOT NULL,          -- 'User', 'PaymentMethod'
  record_id   TEXT NOT NULL,          -- the affected record's ID
  changes     JSONB,                  -- {field: [old_value, new_value]}
  source_ip   INET,
  request_id  TEXT
);

The changes field should contain field-level diffs for updates. Be careful: changes should not contain the new value of sensitive fields like passwords or payment card numbers—only metadata indicating they changed.

Infrastructure and System Events

Log these:

  • Service startup and shutdown (abnormal shutdowns are particularly important)
  • Configuration file loads and changes
  • Certificate loads (and expirations approaching)
  • Dependency health check failures
  • Cron job start, completion, and failure

Network and API Events

For APIs, request-level logging provides visibility into usage patterns and potential abuse.

Log these:

  • All requests to authenticated endpoints: path, method, status code, latency, authenticated user/service, source IP
  • All 4xx responses (client errors) are potential probing or abuse signals
  • All 5xx responses (server errors) for operational visibility
  • Rate limit hits
  • Large response bodies (potential data exfiltration signal if a single request returns 100MB)

What NOT to Log

Logging the wrong data creates its own security problems. Logs are often more accessible than primary databases (shipped to log aggregators, accessible to engineers, less access-controlled). Sensitive data in logs can itself become a breach.

Never log:

  • Passwords (plaintext or hashed)
  • Session tokens (log session IDs, never the token value—a logged token can be replayed)
  • API keys (log key IDs/prefixes, never the full key)
  • Credit card numbers, CVVs, or any raw payment card data
  • Social security numbers, passport numbers, or government identifiers
  • Biometric data
  • Authentication secrets (TOTP seeds, private keys)
  • Encryption keys

Be careful with:

  • Email addresses (PII under GDPR—log where necessary but be aware it is PII)
  • Full request bodies (may contain passwords in login requests, PII in registration flows)
  • Query strings (may contain access tokens passed as URL parameters)
  • Headers (Authorization header contains bearer tokens or basic auth credentials)

Implement log scrubbing at the logging middleware layer, before logs are emitted:

const SENSITIVE_FIELDS = ['password', 'token', 'secret', 'authorization', 'credit_card', 'cvv'];

function scrubSensitiveFields(obj, depth = 0) {
  if (depth > 5 || obj === null || typeof obj !== 'object') return obj;

  return Object.fromEntries(
    Object.entries(obj).map(([key, value]) => {
      if (SENSITIVE_FIELDS.some(f => key.toLowerCase().includes(f))) {
        return [key, '[REDACTED]'];
      }
      return [key, scrubSensitiveFields(value, depth + 1)];
    })
  );
}

Structured JSON Logging

Plain text log lines are difficult to query and impossible to reliably parse. Structured JSON logs are machine-readable from the start, enabling powerful search and alerting.

Standard fields every log entry should have:

{
  "timestamp": "2025-10-15T14:23:11.432Z",    // ISO 8601 with milliseconds
  "level": "info",                             // debug|info|warn|error|critical
  "service": "api-server",                     // service/component name
  "version": "1.4.2",                          // service version
  "environment": "production",
  "request_id": "req_01HDEF789",               // trace ID for correlation
  "event_type": "auth.login.success",          // structured event taxonomy
  "message": "User login successful",          // human-readable summary
  // ... event-specific fields
}

The event_type field is critical for alerting. Use a consistent taxonomy: auth.login.success, auth.login.failure, authz.access_denied, data.export, admin.user_deleted. This makes alert rules simple regular expressions or exact matches rather than fragile text parsing.

Node.js with Pino:

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  base: {
    service: 'api-server',
    version: process.env.APP_VERSION,
    environment: process.env.NODE_ENV,
  },
  timestamp: pino.stdTimeFunctions.isoTime,
  redact: {
    paths: ['req.headers.authorization', 'body.password', 'body.token'],
    censor: '[REDACTED]'
  }
});

// Usage
logger.info({
  event_type: 'auth.login.success',
  user_id: user.userId,
  session_id: session.id,
  source_ip: request.ip,
}, 'User login successful');

Alerting on Anomalies

Logs without alerting are a forensics tool, not a detection tool. Effective alerting requires defining what "normal" looks like so that deviations are detectable.

Threshold Alerts (Static)

Simple count-based rules:

AlertConditionSeverity
Brute force attempt>10 auth.login.failure for same user in 5 minHigh
Credential stuffing>100 auth.login.failure across any users in 1 minCritical
Privilege escalationAny admin.role_assumed event outside business hoursHigh
Bulk data exportdata.export with record_count > 10,000High
Admin actionAny admin.* event from new IPMedium
Account lockout spike>20 auth.account_locked in 10 minHigh

Anomaly-Based Alerts (Dynamic)

Static thresholds miss slow attacks and adapting baselines. Dynamic alerts compare current behavior to a rolling baseline:

  • Login failure rate: alert if current 5-minute rate is >3 standard deviations above the 7-day rolling average
  • API request rate from new IP: alert if a new IP (never seen before for this service) makes >N requests per minute
  • Unusual access time: alert if an admin account performs actions between midnight and 5am when it has never done so historically
  • New country login: alert if a user's authenticated source country has never been seen for their account

Building anomaly detection from scratch requires a streaming analytics system (Apache Flink, Kafka Streams, or simpler tools like Elasticsearch's ML features). Commercial SIEMs (Splunk, Microsoft Sentinel, Elastic SIEM) have built-in anomaly detection.

Alert Quality: Reducing False Positives

Alert fatigue kills security programs. If analysts are bombarded with low-fidelity alerts, they stop investigating. Invest in alert tuning:

  • Start with high-confidence, high-severity rules
  • Track false positive rates per rule
  • Tune thresholds based on operational experience
  • Suppress known-benign patterns (scheduled jobs that trigger rate limit alerts, penetration testing IPs during authorized assessments)
  • Implement alert deduplication: one P1 alert for a credential stuffing campaign, not one alert per failed login attempt

Log Management Infrastructure

Centralized log aggregation: Ship logs from all services to a central system (ELK stack, Splunk, Datadog, Cloudwatch Logs, Loki). Never rely on logs sitting on application servers—servers get terminated, disks fill up, and logs become inaccessible exactly when you need them most.

Log integrity: Logs are only useful for forensics if they cannot be tampered with. Implement:

  • Write-once storage (S3 Object Lock, WORM storage)
  • Cryptographic log signing (hash chaining or Merkle trees)
  • Separation of duties: application services can write logs but cannot delete them

Retention: See the retention requirements in the Data Retention article. Security logs typically: 90 days hot (immediately queryable), 12 months warm (queryable within minutes), up to 7 years cold archive for regulated industries.

Access control: Access to security logs should be restricted to security and engineering leads. Logs containing user PII should comply with your data handling policies.

The investment in structured logging and alerting pays off not just during incidents but in daily operations—understanding your system's normal behavior, debugging production issues faster, and demonstrating operational security controls to auditors and customers.

logging
security-monitoring
SIEM
alerting
observability
audit-trail

Check Your Security Score — Free

See exactly how your domain scores on DMARC, TLS, HTTP headers, and 25+ other automated security checks in under 60 seconds.