What is Xpath and what is it for?

Asked

Viewed 2,317 times

17

Opening Developer Tools in Chrome to copy HTML from an element <div id="teste">, saw that there is an option "copy Xpath" as shown below:

inserir a descrição da imagem aqui

When choosing this option, it sends the following to the clipboard:

//*[@id="teste"]

What I’d like to know is what that information would be and how I could use it.

  • https://www.guru99.com/xpath-selenium.html I think you might be interested in this right here, too, from W3C https://www.w3.org/TR/1999/REC-xpath-19991116/

2 answers

7


According to the Wikipedia (in free translation):

Xpath (XML Path Language) is a query language for selecting nodes from an XML document. In addition, Xpath can be used to calculate values (e.g., strings, numbers or boolean values ) of the contents of an XML document. The Xpath was defined by the World Wide Web Consortium (W3C)

Xpath uses path expressions to select a "node" or a set of "nodes" in an XML or HTML document.

What you received when performing the function Copy XPath was an "expression of the way" which represents a "path" according to the Xpath syntax for the "node" in the document with the attribute id equal to "test" the two "bars" being the indication that the part/start query at the document root and the asterisk (wildcard) to find everything, the options (parameters) between the brackets are used to filter the query.

In javascript your main interface uses Document.evaluate() that receives five parameters:

document.evaluate( xpathExpression, contextNode, namespaceResolver, resultType, result )
  • xpathExpression: xpath expression (path expressions)
  • contextNode: a "node" in the document to be evaluated
  • namespaceResolver: function for namespace prefixes (HTML does not have namespaces)
  • resultType: a constant that specifies the type of desired result
  • result: xpathresult object

The "path expressions" used by Xpath can be absolute represented by the "bar" / or relative (nodename, //, . ,.. , @, among others).

The contextNode can be a "node" in the document or itself document if a "global consultation is desired".

As mentioned HTML does not have "namespaces" so when parsing HTML documents the third parameter should receive a value null.

The three main types resultType are:

iterators

  • UNORDERED_NODE_ITERATOR_TYPE
  • ORDERED_NODE_ITERATOR_TYPE

A set of "nodes" that can be iterated individually using iterateNext() on the Object (Xpathresult). If the document is modified during an iteration the execution NS_ERROR_DOM_INVALID_STATE_ERR will be launched.

snapshots

  • UNORDERED_NODE_SNAPSHOT_TYPE
  • ORDERED_NODE_SNAPSHOT_TYPE

A set of static "nodes", snapshots do not change with document mutations, so they do not release exceptions if the "node" has changed in the document or even if it has been removed during iteration.

first knots

  • ANY_UNORDERED_NODE_TYPE
  • FIRST_ORDERED_NODE_TYPE

Returns the first node found that corresponded to the Xpath expression.

The constant ANY_TYPE will return-any type that naturally results from the evaluation of the expression. It can be any of the simple types (NUMBER_TYPE, STRING_TYPE, BOOLEAN_TYPE), but if the returned result type is a set of "nodes", it will be only one UNORDERED_NODE_ITERATOR_TYPE.

These constants are defined by the object XPathResult and the complete list of types can be found in the appendix of this documentation: Mozila Developer - Xpath in Javascript

The result (result) can be an object XPathResult or null(which will create a new Xpathresult object)

Xpath has a set of functions that can be observed here: Mozila Developer

A basic example to search and count all elements <p> in the document:

let btn = document.getElementById('count')

btn.addEventListener('click', function(e) {
    let paragraphCount = document.evaluate( 'count(//p)', document, null, XPathResult.ANY_TYPE, null )
    alert( 'Este documento contém ' + paragraphCount.numberValue + ' elementos de paragrafo')
}, false)
<p>Paragraph one</p>
<p>Paragraph two</p>
<p>Paragraph tree</p>
<p>...</p>

<br>

<button id="count" type="button">contar</button>

Path expressions can use "eixos" that complement the query query, these "axes" should always be wrapped in brackets, example:

  • //aside[@lang='en']: selects all the elements <aside> which have a "lang" attribute with an "en" value"

  • /footer/div[last()]: selects the last element <div> who is the child of the element <footer>

  • /nav/button[1]: selects the first element <button> who is the child of the element <nav>

These expressions also accept "operadores" as well as "jokers" * in order to make the consultation more constructive.


Opinion (case of use)

I used this syntax some time ago to search with XMLHttpRequest() HTML fragments that would be programmatically bad DOM accommodated, I confess that it is difficult to maintain the organization. I switched focus and moved to an approach using fetch() and .text() response and manipulate fragments using a specific function I created.

In HTML since it returns "nodes" can be useful to build analyzers or object handlers, its potential range of use depends on the proposal (need) and the creativity of the developer.

For me (my criticism) is just another way to cross document.

Already in XML it is easy to observe its potential, example:

fetch('https://subversivo58.github.io/lab/assets/XPath-example.xml').then(r => {
    if ( r.ok ) {
        return r.text()
    }
    throw new Error('Failed request XML Document! Status code: ' + r.status)
}).then(text => {
    showResult((new window.DOMParser()).parseFromString(text, "text/xml"))
}).catch(e => {
    console.error(e)
})


function showResult(xml) {
    var txt = "";
    path = "/bookstore/book/title"
    if ( xml.evaluate ) {
        var nodes = xml.evaluate(path, xml, null, XPathResult.ANY_TYPE, null);
        var result = nodes.iterateNext();
        while (result) {
            txt += result.childNodes[0].nodeValue + "<br>";
            result = nodes.iterateNext();
        }
    }
    document.getElementById("demo").innerHTML = txt;
}
<p id="demo"></p>

<br>

<p>Este código foi baseado nesta <a href="https://www.w3schools.com/xml/tryit.asp?filename=try_xpath_select_cdnodes" target="_blank" rel="nofollow">fonte</a></p>


Links and documentation:

Support the document.evaluate and XPath caniuse with.

xpath


PS:

Sources like Mozila Deleveloper, W3schools or MSDN need to present in their documentation all references to functions, operators or even the syntax used in Xpath.

  • Is there any advantage in using the document.evaluate instead of document.querySelectorAll or document.querySelector?

  • Proper querySelector() or querySelectorAll() are selectors of CSS already XPath is suitable for DOM (selector from us) ... I myself no longer use anymore and never made a point to test but after your comment I did a brief search and found this test on jsperf.com. It seems that XPath is faster than querySelectorAll()

  • Just one thing, front-end testing is never accurate... it should take into account that it will generate different results for each browser as well as the very basis of who is "testing" (system architecture, processor, memory, etc...)

5

Briefly Xpath is a language of searching for us in documents XML (but also used in HTML), you execute an Xpath expression on a certain XML/HTML/ document any supported ML and the result will be 0 or more nodes that match that expression.

It is very similar to seletores CSS. For example we can write the two equivalent searches in CSS and Xpath: div > .className p in css and //div/[contains(@class, 'className')//p in xpath, both represent "all elements p direct or indirect descendants of items with class className, being in turn the direct children of div".

Although almost identical in purpose to CSS selectors, Xpath selectors are more powerful. There are a multitude of searches that are possible in Xpath but not in CSS, some simple demo examples:

  • fetch us ancenstral knots //div/.. (all parents of Divs);
  • seek us according to rule > in attribute //section[valor>35.00] (sections with attribute value greater than 35.00);
  • search for elemets whose internal text is equivalent //p[text()=../blockquote/text()] (all the p whose Father has a child blockquote with text equal to p).

Xpath logo is very useful in cases where:

  1. be the only thing available (Pr);
  2. you need to do complex searches that cannot be expressed in another available search language (css for example).

A very common use case is in the Selenium, an action automation tool in browsers, basically you write code that mimics actions of users in the browser to test interfaces in an automated way. It is possible to use CSS and Xpath (the API will vary depending on the language), CSS is usually preferred for readability and simplicity, but sometimes you want to express something like "attribute value of input#id has the same value as that contained in textarea#id" and there are not many options outside Xpath.

Note: recommend this post by John Resig (creator of jQuery) about a comparison of Xpath vs CSS.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.