Wednesday, February 18, 2026

XPATH

XPath (XML Path Language) is a query language used to navigate XML documents and select nodes from the document based on their element name, attribute values, and other properties. It is commonly used in web scraping and automation tasks to extract specific data from HTML and XML documents.

XPath uses a path notation similar to that of a file system. XPath expressions can select elements, attributes, text, and other data nodes in an XML document.

An XPath expression is evaluated to yield an object that contains one of the following four basic types:

  • node-set: Represents unordered collection of nodes without duplicates.

  • boolean: Assumes a value either true or false.

  • number: Specifies a floating-point number.

  • string: Specifies a sequence of UCS characters.

node in XPATH:

In XPath, there are seven kinds of nodes:
Element, attribute, text, namespace, processing-instruction, comment, and document nodes.

Here are some common examples of XPath expressions:

Selecting Nodes by Element Name:

Selects all <book> elements: /bookstore/book

Selects all elements at any level: //book

Selecting Nodes by Attribute Value:

Selects <book> elements where the attribute category is equal to "fiction": /bookstore/book[@category='fiction']

Selecting Nodes by Position:

Selects the first <book> element: /bookstore/book[1]

Selects the last <book> element: /bookstore/book[last()]

Selecting Nodes by Partial Attribute Value:

Selects <book> elements where the category attribute contains the word "fiction": /bookstore/book[contains(@category, 'fiction')]

Selecting Nodes by Text Content:

Selects <book> elements with the text content "Introduction to XPath": /bookstore/book[text()='Introduction to XPath']

Selecting All Child Nodes:

Selects all child nodes of the current node: /*

Wildcard               Description

*                        Matches any element node

@*                      Matches any attribute node

node()        Matches any node of any kind

Path Expression   Result

/bookstore/*       Selects all the child element nodes of the bookstore element

//*                     Selects all elements in the document

//title[@*]           Selects all title elements which have at least one attribute of any kind

The // defines that the path starts from the root of the source document.

@ ? the @ symbol indicates attribute axis. The ? symbol specifies that the attribute axis is optional.

 / character selects the root node of the source document. 

Example:

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
  
<title lang="en">Everyday Italian</title>
  
<author>Giada De Laurentiis</author>
  
<year>2005</year>
  
<price>30.00</price>
</book>

<book category="children">
  
<title lang="en">Harry Potter</title>
  
<author>J K. Rowling</author>
  
<year>2005</year>
  
<price>29.99</price>
</book>

<book category="web">
  
<title lang="en">XQuery Kick Start</title>
  
<author>James McGovern</author>
  
<author>Per Bothner</author>
  
<author>Kurt Cagle</author>
  
<author>James Linn</author>
  
<author>Vaidyanathan Nagarajan</author>
  
<year>2003</year>
  
<price>49.99</price>
</book>

<book category="web">
  
<title lang="en">Learning XML</title>
  
<author>Erik T. Ray</author>
  
<year>2003</year>
  
<price>39.95</price>
</book>

</bookstore>

Regular expression:

\d    Any digit 0-9

\D    This disjoint of \d

\s     space or tab character

\S     the disjoint of \s

\w    a word:upper- or lowercase letter,number, or underscore

\W     disjoint of \ws

.        Any single characters

?       zero or one occurrence

*zero or more occurrences

+ one or more occurrences 

{}  indicates  the number of times  to match the preceding pattern

                Ex: \d{3} matches any 3 digit in row

[]  matches any pattern inside the brackets

                Ex:

[AB]  matches single A or B character (case sensitive) and nothing else.

[1-5]  matches single value of  1 to 5

[1-5A-E]   matches single value of  1 to 5 or A,B,C,D,E

[^0]  matches anything but 0

XSL – Extensible stylesheet

XSL uses 3 language
-          XSLT
-          XPATH
-          XSL-FO

XPATH : used to navigate through elements and attributes in XML documents.

XSL-FO : Extensible stylesheet Language Formatting Objects used to format XML Documents.

XSLT: Extensible stylesheet Language Transformation

XSLT (eXtensible Stylesheet Language Transformations) is a language used for transforming XML documents into different formats or structures. XPath is a fundamental part of XSLT, as it is used within XSLT expressions to navigate and manipulate XML data.

XSLT Transformations don’t work directly on the XML source document in its text form, but rather on the in-memory tree representation. 

XPath in XSLT:

Selecting Nodes:

XPath is used to specify which nodes in the input XML should be transformed. For example:

<xsl:template match="book">
   <!-- XSLT template for matching book elements -->
</xsl:template>

Accessing Node Values:

XPath can be used to access the values of nodes. For instance:

<xsl:value-of select="title"/>

This will extract the value of the <title> element under the current context.

Predicates:

XPath predicates can be used in XSLT to filter nodes based on specific conditions. For example:

<xsl:apply-templates select="book[price > 20]"/>

This selects <book> elements with a <price> child element greater than 20.

Iterating Through Nodes:

XSLT allows you to apply templates to nodes selected by an XPath expression. For example:

<xsl:apply-templates select="book"/>

This applies templates to all <book> elements in the input XML.

XPath Functions:

XSLT supports various XPath functions for string manipulation, mathematical operations, date and time functions, etc. For example:

<xsl:value-of select="concat('Title: ', title)"/>

This concatenates the string "Title: " with the value of the <title> element.

Example XSLT Transformation using XPath:

Consider the following XML input:

<library>
  <book>
    <title>Introduction to XSLT</title>
    <author>John Doe</author>
    <price>25</price>
  </book>
  <book>
    <title>XML Basics</title>
    <author>Jane Smith</author>
    <price>15</price>
  </book>
</library>

An XSLT template to transform this XML into an HTML list might look like this:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <ul>
      <xsl:for-each select="library/book">
        <li>
          <xsl:value-of select="concat('Title: ', title, ', Author: ', author, ', Price: $', price)"/>
        </li>
      </xsl:for-each>
    </ul>
  </xsl:template>
</xsl:stylesheet>

In this example, XPath is used to iterate through <book> elements and access their child elements (<title>, <author>, and <price>), transforming the data into an HTML list.


There are 4 possible output format method:
XML
HTML
XHTML
TEXT

There are three instructions you can use  to retrieve  information from the  source tree:
Xsl:value-of
Xsl:copy,and
Xsl:copy-of

 XPath has many functions available

•Advanced Functions

•BPEL XPath Extension Functions

•BPM Functions

•Conversion Functions

•DVM Functions

•Database Functions

•Date Functions

•Logical Functions

•Mathematical Functions

•Node Set Functions

•String Functions

•XREF Functions


No comments:

Post a Comment

SOA Overview Part-1

  Middleware It provides a mechanism for the process to interact with other processes running on multiple network machines. Advantages...