Security Logging Best Practices: What to Log and How to Alert
A comprehensive guide to security logging—which authentication events, access failures, and data changes to capture, what sensitive data must never appear in logs, structured JSON logging patterns, and building effective anomaly-based alerting.
Logs as Security Infrastructure
Security logs serve three distinct purposes: detection (spotting an attack in progress), investigation (reconstructing what happened after an incident), and compliance (demonstrating that controls are operating as required). A logging strategy that serves all three purposes requires deliberate choices about what to capture, how to structure it, how long to retain it, and what patterns should trigger alerts.
The cost of poor logging becomes clear in incident response. When a breach is discovered, the first questions are: when did this start? What did the attacker access? How did they get in? Without comprehensive logs, these questions often go unanswered, leaving organizations unable to scope their breach notification obligations, unable to close the attack vector with confidence, and unable to demonstrate to regulators that they attempted to detect the intrusion.
What to Log
Authentication Events
Authentication is the entry point to your system. Every event here should be logged without exception.
Log these:
- Successful login: user identifier, source IP, user agent, authentication method used (password, SSO, API key), timestamp
- Failed login: user identifier (or the username that was attempted), source IP, user agent, failure reason (invalid password, account locked, MFA failure, account not found—log the reason in your system, but do not return the specific reason to the client to prevent user enumeration)
- MFA challenge issued: user identifier, challenge type (TOTP, SMS, push)
- MFA success and failure
- Password change: user identifier, source IP, whether the change was self-initiated or admin-initiated
- Password reset initiated and completed: same fields
- Account lockout triggered and unlocked
- Session created: session token ID (not the token value), user identifier, source IP, user agent
- Session invalidated: session ID, reason (logout, expiry, admin revocation, suspicious activity)
- API key created, rotated, and revoked: key ID (not the key value), user/service identifier
A single authentication log entry in structured JSON:
{
"event_type": "auth.login.success",
"timestamp": "2025-10-15T14:23:11.432Z",
"user_id": "usr_01HXYZ123",
"source_ip": "203.0.113.45",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...",
"auth_method": "password+totp",
"session_id": "sess_01HABC456",
"request_id": "req_01HDEF789",
"geo": {
"country": "US",
"city": "San Francisco",
"asn": 15169
}
}
Authorization and Access Control Events
Authentication tells you who entered the system. Authorization logs tell you what they tried to do.
Log these:
- Authorization check failure: who, what resource, what action, why denied (wrong role, wrong owner, resource not found vs. access denied—distinguish between 403 and 404 in server-side logs even if your API returns 404 for both)
- Privilege escalation: any time a user assumes a higher-privilege role or context (sudo, assume-role, admin mode)
- Access to sensitive resources: high-value data (PII, payment data, health records) access should be logged regardless of success or failure
- Admin actions: any action taken by a user in an administrative role—user creation, deletion, permission changes, configuration changes
Data Change Events
Data integrity depends on knowing when data changed, who changed it, and what the change was.
Log these:
- Create, update, delete operations on sensitive models (user records, payment information, configuration)
- Bulk operations (bulk export, bulk delete) with record counts
- Schema or configuration changes
- Data export operations: who exported what, how many records, to what destination
The standard pattern is an audit log table adjacent to the primary data model:
CREATE TABLE audit_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
actor_id TEXT NOT NULL, -- user or service identifier
actor_type TEXT NOT NULL, -- 'user', 'service', 'cron'
action TEXT NOT NULL, -- 'create', 'update', 'delete'
model TEXT NOT NULL, -- 'User', 'PaymentMethod'
record_id TEXT NOT NULL, -- the affected record's ID
changes JSONB, -- {field: [old_value, new_value]}
source_ip INET,
request_id TEXT
);
The changes field should contain field-level diffs for updates. Be careful: changes should not contain the new value of sensitive fields like passwords or payment card numbers—only metadata indicating they changed.
Infrastructure and System Events
Log these:
- Service startup and shutdown (abnormal shutdowns are particularly important)
- Configuration file loads and changes
- Certificate loads (and expirations approaching)
- Dependency health check failures
- Cron job start, completion, and failure
Network and API Events
For APIs, request-level logging provides visibility into usage patterns and potential abuse.
Log these:
- All requests to authenticated endpoints: path, method, status code, latency, authenticated user/service, source IP
- All 4xx responses (client errors) are potential probing or abuse signals
- All 5xx responses (server errors) for operational visibility
- Rate limit hits
- Large response bodies (potential data exfiltration signal if a single request returns 100MB)
What NOT to Log
Logging the wrong data creates its own security problems. Logs are often more accessible than primary databases (shipped to log aggregators, accessible to engineers, less access-controlled). Sensitive data in logs can itself become a breach.
Never log:
- Passwords (plaintext or hashed)
- Session tokens (log session IDs, never the token value—a logged token can be replayed)
- API keys (log key IDs/prefixes, never the full key)
- Credit card numbers, CVVs, or any raw payment card data
- Social security numbers, passport numbers, or government identifiers
- Biometric data
- Authentication secrets (TOTP seeds, private keys)
- Encryption keys
Be careful with:
- Email addresses (PII under GDPR—log where necessary but be aware it is PII)
- Full request bodies (may contain passwords in login requests, PII in registration flows)
- Query strings (may contain access tokens passed as URL parameters)
- Headers (Authorization header contains bearer tokens or basic auth credentials)
Implement log scrubbing at the logging middleware layer, before logs are emitted:
const SENSITIVE_FIELDS = ['password', 'token', 'secret', 'authorization', 'credit_card', 'cvv'];
function scrubSensitiveFields(obj, depth = 0) {
if (depth > 5 || obj === null || typeof obj !== 'object') return obj;
return Object.fromEntries(
Object.entries(obj).map(([key, value]) => {
if (SENSITIVE_FIELDS.some(f => key.toLowerCase().includes(f))) {
return [key, '[REDACTED]'];
}
return [key, scrubSensitiveFields(value, depth + 1)];
})
);
}
Structured JSON Logging
Plain text log lines are difficult to query and impossible to reliably parse. Structured JSON logs are machine-readable from the start, enabling powerful search and alerting.
Standard fields every log entry should have:
{
"timestamp": "2025-10-15T14:23:11.432Z", // ISO 8601 with milliseconds
"level": "info", // debug|info|warn|error|critical
"service": "api-server", // service/component name
"version": "1.4.2", // service version
"environment": "production",
"request_id": "req_01HDEF789", // trace ID for correlation
"event_type": "auth.login.success", // structured event taxonomy
"message": "User login successful", // human-readable summary
// ... event-specific fields
}
The event_type field is critical for alerting. Use a consistent taxonomy: auth.login.success, auth.login.failure, authz.access_denied, data.export, admin.user_deleted. This makes alert rules simple regular expressions or exact matches rather than fragile text parsing.
Node.js with Pino:
import pino from 'pino';
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
base: {
service: 'api-server',
version: process.env.APP_VERSION,
environment: process.env.NODE_ENV,
},
timestamp: pino.stdTimeFunctions.isoTime,
redact: {
paths: ['req.headers.authorization', 'body.password', 'body.token'],
censor: '[REDACTED]'
}
});
// Usage
logger.info({
event_type: 'auth.login.success',
user_id: user.userId,
session_id: session.id,
source_ip: request.ip,
}, 'User login successful');
Alerting on Anomalies
Logs without alerting are a forensics tool, not a detection tool. Effective alerting requires defining what "normal" looks like so that deviations are detectable.
Threshold Alerts (Static)
Simple count-based rules:
| Alert | Condition | Severity |
|---|---|---|
| Brute force attempt | >10 auth.login.failure for same user in 5 min | High |
| Credential stuffing | >100 auth.login.failure across any users in 1 min | Critical |
| Privilege escalation | Any admin.role_assumed event outside business hours | High |
| Bulk data export | data.export with record_count > 10,000 | High |
| Admin action | Any admin.* event from new IP | Medium |
| Account lockout spike | >20 auth.account_locked in 10 min | High |
Anomaly-Based Alerts (Dynamic)
Static thresholds miss slow attacks and adapting baselines. Dynamic alerts compare current behavior to a rolling baseline:
- Login failure rate: alert if current 5-minute rate is >3 standard deviations above the 7-day rolling average
- API request rate from new IP: alert if a new IP (never seen before for this service) makes >N requests per minute
- Unusual access time: alert if an admin account performs actions between midnight and 5am when it has never done so historically
- New country login: alert if a user's authenticated source country has never been seen for their account
Building anomaly detection from scratch requires a streaming analytics system (Apache Flink, Kafka Streams, or simpler tools like Elasticsearch's ML features). Commercial SIEMs (Splunk, Microsoft Sentinel, Elastic SIEM) have built-in anomaly detection.
Alert Quality: Reducing False Positives
Alert fatigue kills security programs. If analysts are bombarded with low-fidelity alerts, they stop investigating. Invest in alert tuning:
- Start with high-confidence, high-severity rules
- Track false positive rates per rule
- Tune thresholds based on operational experience
- Suppress known-benign patterns (scheduled jobs that trigger rate limit alerts, penetration testing IPs during authorized assessments)
- Implement alert deduplication: one P1 alert for a credential stuffing campaign, not one alert per failed login attempt
Log Management Infrastructure
Centralized log aggregation: Ship logs from all services to a central system (ELK stack, Splunk, Datadog, Cloudwatch Logs, Loki). Never rely on logs sitting on application servers—servers get terminated, disks fill up, and logs become inaccessible exactly when you need them most.
Log integrity: Logs are only useful for forensics if they cannot be tampered with. Implement:
- Write-once storage (S3 Object Lock, WORM storage)
- Cryptographic log signing (hash chaining or Merkle trees)
- Separation of duties: application services can write logs but cannot delete them
Retention: See the retention requirements in the Data Retention article. Security logs typically: 90 days hot (immediately queryable), 12 months warm (queryable within minutes), up to 7 years cold archive for regulated industries.
Access control: Access to security logs should be restricted to security and engineering leads. Logs containing user PII should comply with your data handling policies.
The investment in structured logging and alerting pays off not just during incidents but in daily operations—understanding your system's normal behavior, debugging production issues faster, and demonstrating operational security controls to auditors and customers.