XML Path Language (XPath) is a query language for Extensible Markup Language (XML) data. We can use XPath to construct queries over XML documents. If user input is inserted into XPath queries without proper sanitization, XPath Injection vulnerabilities arise similar to SQL Injection.
XPath Foundations
Consider the following XML document:
<?xml version="1.0" encoding="UTF-8"?><academy_modules><module><title>Web Attacks</title><author>21y4d</author><tierdifficulty="medium">2</tier><category>offensive</category></module><!-- this is a comment --><module><title>Attacking Enterprise Networks</title><authorco-author="LTNB0B">mrb3n</author><tierdifficulty="medium">2</tier><category>offensive</category></module></academy_modules>
The XML declaration specifies version and encoding (defaults: 1.0 and UTF-8 if omitted).
XML forms a tree of nodes. Root element: academy_modules. Node types: element (e.g., module, title), attribute (e.g., co-author, difficulty), comment, and text nodes (e.g., Web Attacks).
Each element/attribute node has exactly one parent. Elements can have many children. Nodes with the same parent are siblings. You can traverse ancestors and descendants.
Selecting Nodes
Each XPath query selects a set of nodes from a context node (starting point). The same query can yield different results depending on the context. Base selections:
module β all module child nodes of the context node
/ β the document root node
// β all descendant nodes of the context node
. β the context node
.. β the parent of the context node
@difficulty β the difficulty attribute of the context node
text() β all text node children of the context node
To avoid ambiguity, start at the document root:
/academy_modules/module β module children of academy_modules
//module β all module elements
/academy_modules//title β all title descendants of academy_modules
/academy_modules/module/tier/@difficulty β difficulty attributes of tier elements under the path
//@difficulty β all difficulty attributes in the document
Note: If a query starts with //, it is evaluated from the document root.
Predicates
Predicates filter results (similar to SQL WHERE) and are enclosed in []: