Using markup languages like XML and JSON is pretty standard practice on the web. With these technologies, managing and delivering both human-readable and machine-readable data is extremely simple and transparent. Therefore, it's common to find it on websites and platforms. However, it's essential to be aware of the potential risks and vulnerabilities that you might bring to your platform by misusing technologies like XML.
One such vulnerability is XML External Entities. And in this article, we'll address what they are, show you how to spot the vulnerabilities, and demonstrate how to protect your NodeJS applications against them.
We want to remind you that if you have no experience working with NodeJS, you might have trouble getting value from this article. Therefore, we strongly suggest you explore this introduction to NodeJS. We'll be discussing some elements that might not be immediately clear unless you have some background in JavaScript and NodeJS.
Now, with that out of the way, let's jump in.
Defining XML External Entities
So, what are XML External Entities?
XML, or Extensible Markup Language, is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. In addition, this language is used in the programming world to define rules for encoding documents in a format that is both human-readable and machine-readable.
How does the XML file structure present a vulnerability? By default, most XML processing tools allow the specification of an external entity. This entity, usually a URI, is retrieved and processed during the parsing of the XML file. When this happens, the parser can request and include the content at the specified URI inside the XML document.
This vulnerability opens up the system for exploitation. A malicious actor could use this property as an avenue to retrieve any resource on the server. So, an XML External Entities attack, or XXE injection, takes advantage of XML parsing vulnerabilities. It targets systems that use XML parsing functionalities that face the user, allowing an attacker to access files and resources on the server.
Attacks can include disclosing local files that contain sensitive data, such as passwords or private user data and using file: schemes or relative paths in the system identifier. Clearly, a determined attacker could potentially take over your server, especially with sufficient understanding of server structures and some information about your technology stack.
Examples of XML External Entities
Now that you have a basic understanding of XXE injection, let's go over an example. Here's a sample XML document containing a username XML element:
<?xml version="1.0" encoding="ISO-8859-1"?>
<username>John</username>
</xml>
Pretty harmless and straightforward, right? Where's the external entity?
Well, first, an external XML entity can be added using a system identifier within a DOCTYPE header. This header basically adds a few more properties to the XML file structure.
For example, the code below contains an external XML entity that would fetch the content of /etc/passwrd and display it to the user rendered by username:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwrd" >]>
<username>&xxe;</username>
</xml>
Yikes!
These entities can access local or remote content within your server, which is terrible if you keep sensitive files on it, potentially providing a path to give control of your website to the attacker.
Similarly, other XML External Entities attacks can access local resources that may not stop at returning data. So again, this is an avenue to impact application availability and lead to denial of service.
Mitigating XML External Entities Vulnerabilities
Mitigating XML External Entities vulnerabilities is, thankfully, relatively straightforward. As long as you're not intentionally trying to open a vulnerability window and consider that you need the functionality user-provided XML files, you don't have to worry too much.
As already mentioned, if an application has an endpoint that parses XML files, an attacker could send a specially crafted payload to the server and obtain sensitive files. The files the attacker can access depend heavily on how you set up your system and how you implement user permissions. So, to prevent this, first, don't use libraries that support entity replacement like LibXML.
Sadly, NodeJS does not have a built-in XML parsing engine. You might already be using this library in your project. But don't fret—entity replacement is disabled by default. Nevertheless, we recommend that you explicitly disable this feature. You can do this by simply changing all initializations of the library like so:
const lib = libxmljs.parseXml(xml, {noent: true});
It's also crucial to keep in mind that you may still be vulnerable to DDoS attacks if you decide to go this route.
Now, suppose your application actually makes use of external entities for some critical functionality. In that case, one approach you can take to minimize the potential for exploits is to safelist known external entities. You can do this simply by checking the XML file document before parsing it with your library for any strings containing any entity that's not on the list.
app.post('/load_xml', upload.single('xml'), async function (req, res) {
if (!req.file) {
res.sendStatus(500);
return;
}
try {
const xml = req.file.buffer;
const doc = libxmljs.parseXml(xml, {noent: true});
if (doc.text().includes("<!ENTITY")) {
throw new Error("INVALID XML FILE");
}
res.send(doc.text());
} catch (err) {
res.send(err.toString());
res.sendStatus(500);
}
});
Final Approach
Finally—and I want to make sure to emphasize this—do not parse XML if it's not an application requirement. I know it might be convenient and allow the platform to provide convenient features for the users. Even so, there are many ways to offer similar functionalities without using these libraries.
In the end, the best mitigation strategy is to not be open to vulnerabilities in any way. It's vital to keep in mind that providing robust and secure platforms to your users is becoming increasingly complex. Such work requires a considerable investment of time and expertise that your organization might not be able to afford.
If you are in charge of a team of brilliant and productive members focused on delivering speedy and innovative solutions, we recommend our security test suite StackHawk. Our product ensures that you get the best insight and tools to provide reliable decision-making information so that you can go back to focusing on the work you do best.
You can check it out and create a free account here.
Conclusion
Protecting your platforms against the most sophisticated exploits on the web requires an extensive understanding of the technology and a solid grasp of your platform's infrastructure. Thankfully, most of the tools and libraries used to build infrastructure are pretty robust and secure. Regardless, the potential of an engineer unintentionally introducing a vulnerability and compromising the work of their team is always real.
This post was written by Juan Reyes. Juan is an engineer by profession and a dreamer by heart who crossed the seas to reach Japan following the promise of opportunity and challenge. While trying to find himself and build a meaningful life in the east, Juan borrows wisdom from his experiences as an entrepreneur, artist, hustler, father figure, husband, and friend to start writing about passion, meaning, self-development, leadership, relationships, and mental health. His many years of struggle and self-discovery have inspired him and drive to embark on a journey for wisdom.