XPath Injection
XML Path Language (XPath) is a query language for Extensible Markup Language (XML) data. We can use XPath to construct queries over XML documents. If user input is inserted into XPath queries without proper sanitization, XPath Injection vulnerabilities arise similar to SQL Injection.
XPath Foundations
Consider the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<academy_modules>
<module>
<title>Web Attacks</title>
<author>21y4d</author>
<tier difficulty="medium">2</tier>
<category>offensive</category>
</module>
<!-- this is a comment -->
<module>
<title>Attacking Enterprise Networks</title>
<author co-author="LTNB0B">mrb3n</author>
<tier difficulty="medium">2</tier>
<category>offensive</category>
</module>
</academy_modules>The XML declaration specifies version and encoding (defaults: 1.0 and UTF-8 if omitted).
XML forms a tree of nodes. Root element:
academy_modules. Node types: element (e.g.,module,title), attribute (e.g.,co-author,difficulty), comment, and text nodes (e.g.,Web Attacks).Each element/attribute node has exactly one parent. Elements can have many children. Nodes with the same parent are siblings. You can traverse ancestors and descendants.
Selecting Nodes
Each XPath query selects a set of nodes from a context node (starting point). The same query can yield different results depending on the context. Base selections:
moduleβ allmodulechild nodes of the context node/β the document root node//β all descendant nodes of the context node.β the context node..β the parent of the context node@difficultyβ thedifficultyattribute of the context nodetext()β all text node children of the context node
To avoid ambiguity, start at the document root:
/academy_modules/moduleβmodulechildren ofacademy_modules//moduleβ allmoduleelements/academy_modules//titleβ alltitledescendants ofacademy_modules/academy_modules/module/tier/@difficultyβdifficultyattributes oftierelements under the path//@difficultyβ alldifficultyattributes in the document
Note: If a query starts with //, it is evaluated from the document root.
Predicates
Predicates filter results (similar to SQL WHERE) and are enclosed in []:
/academy_modules/module[1]/academy_modules/module[position()=1]/academy_modules/module[last()]/academy_modules/module[position()<3]//module[tier=2]/title//module/author[@co-author]/../title//module/tier[@difficulty="medium"]/..
Supported operands: +, -, *, div, =, !=, <, <=, >, >=, or, and, mod.
Wildcards & Union
Wildcards:
node()β any node*β any element node@*β any attribute node
Examples:
//*β all element nodes//module/author[@*]/..β modules whereauthorhas at least one attribute/*/*/titleβ alltitlenodes exactly two levels below root
Note: * matches a single level, not descendants like //.
Union operator combines results:
//module[tier=2]/title/text() | //module[tier=3]/title/text()β titles of modules in tiers 2 and 3
Last updated