Tuesday, August 18, 2020

XML Overview

 

XML : Extensible markup language (well formed doc)

->XML stands for Extensible Mark-up Language.

->XML is designed to transport and store data.

-> XML was designed to carry data

-> XML is a W3C Recommendation

-> PROLOG

-> Must be maintain an unique root element

-> must be maintain closing tags for all the opening tags

-> all the xml elements we can write with proper case(case sensitive)

-> all attribute values we can enclose either with single quotes or double quotes

-> mark-up-<></>

-> user-defined tag

-> describe the data

-> To transform the data from one app to another app.

-> Designed to store and transport data.

-> Designed to be both human and machine readable

-> SW and HW independent 

xml document contain:

The text starting with a "<" character, and ending with a ">" character, is an XML tag. 

-> Elements

-> Attribute

-> EntityRefrences 

A well-formed XML document is a document that conforms to the XML syntax rules, like:

Proper Structure: The document must have a single root element that encloses all other elements. This root element serves as the entry point to the XML data.

Nesting: Elements must be properly nested within each other, meaning they must open and close in the correct order. For example, if you open an element <a>, you must close it with </a> before opening and closing any other elements inside it.

Matching Tags: All opening tags must have corresponding closing tags, and they must match exactly. For example, <tag> must be closed with </tag>, not with </other_tag>.

Attribute Quoting: Attribute values must be enclosed in either single or double quotes. For example, <element attribute="value"> is well-formed, while <element attribute=value> is not.

Special Character Escaping: Reserved characters like <, >, &, ", and ' must be escaped when used as data within an element or an attribute. For instance, &lt; represents <, and &amp; represents &.

Reserved Character Handling: Certain characters, like the ampersand (&) and angle brackets (< and >), have special meanings in XML. They should only be used for their intended purpose (e.g., < for starting tags) or properly escaped when used as data.

CDATA Sections: If you want to include character data that contains characters that would otherwise be treated as markup (e.g., < and >), you can use CDATA sections (<![CDATA[...]]>) to enclose that data.

Self-Closing Tags: If an element has no content and is self-contained, you can use a self-closing tag format, like <empty />, instead of opening and closing tags.

XML Declaration: While not required, an XML declaration at the beginning of the document (<?xml version="1.0" encoding="UTF-8"?>) is often included to specify the version of XML being used and the character encoding.

In Short:

It must begin with the XML declaration

It must have one unique root element

Start-tags must have matching end-tags

Elements are case sensitive

All elements must be closed

All elements must be properly nested

All attribute values must be quoted

Entities must be used for special characters

XML Standards: 

XML DOM

XML Ajax

XML Path

XML DTD

XML XSLT

XML Schema

XML Services

XML XQuery 

symbol: EntityReferenceName; 

1) &lt; less than <

2) &gt;  greater than >

3) &amp;  &

4) &apos;  '

5) &quot;  "" 

Ex:
<person>
                <name>diksha</name>
                <age> The person age is &lt; 18</age>
</person> 

The Difference between XML and HTML

XML (Extensible Markup Language) and HTML (Hypertext Markup Language) are both markup languages used for structuring and describing content on the web and in data interchange, but they have distinct purposes and characteristics.

Purpose:

XML (Extensible Markup Language): XML is designed as a generic markup language for defining and structuring data. It is not concerned with how data should be presented or displayed but focuses on describing the structure and meaning of data.

HTML (Hypertext Markup Language): HTML is primarily used for creating structured documents that are meant to be displayed in web browsers. It defines the structure of web pages, including text, images, links, and multimedia elements.

Tags and Elements:

XML: In XML, you define your own custom tags and elements based on your specific data needs. XML tags do not have predefined meanings, and you can create any element names that make sense for your data.

HTML: HTML uses a predefined set of tags and elements that have specific meanings and are used for creating web pages. For example, HTML has tags like <html>, <head>, <body>, <p>, and <a> with predefined semantics.

Semantics:

XML: XML does not provide inherent semantics for data. The meaning of XML elements and attributes is typically defined by the application or system using the XML data. XML is used for describing the structure of data and relies on external documentation or schemas to provide context and meaning.

HTML: HTML tags come with built-in semantics. For example, the <h1> tag is semantically understood as a top-level heading, while the <a> tag represents a hyperlink. Browsers and web agents interpret HTML tags and apply default styling and behavior based on their semantics.

Presentation vs. Data:

XML: XML focuses on data and does not dictate how data should be presented. It is often used for data interchange between systems, configuration files, or as the basis for defining custom document types.

HTML: HTML is concerned with both structuring content and defining how it should be presented in web browsers. It includes formatting, layout, and styling information to control how content is rendered on the screen.

Extensibility:

XML: XML is highly extensible and allows you to create custom document structures and data formats tailored to specific needs. You can define your own elements, attributes, and document types.

HTML: While HTML has some extensibility features, extending it often involves using custom data attributes or other workarounds. HTML5 introduced support for custom data attributes (data-*) to add metadata to elements.

Validation:

XML: XML can be validated against a Document Type Definition (DTD) or an XML Schema to ensure that it adheres to a specific structure and set of rules.

HTML: HTML is often less strict in terms of validation, and web browsers are designed to handle a wide range of variations and errors in HTML markup.

How Can XML be Used?

XML is platform and language independent, any time one computer program needs to communicate with another program, XML is a potential fit for the exchange format. 

XML (Extensible Markup Language) is a versatile technology that can be used in a wide range of applications and industries. Its flexibility and human-readable format make it a popular choice for representing structured data.

Here are some common use cases for XML:

Data Interchange: XML is commonly used for exchanging data between different systems and platforms. It provides a structured and standardized way to represent data, making it suitable for data exchange formats such as web services (SOAP and REST), APIs, and data feeds.

Configuration Files: Many software applications use XML files for configuration settings. These files define how an application should behave and can be easily modified without altering the application's source code.

Document Markup: XML is used to markup and structure documents in a way that makes them machine-readable. For example, it's used in various document formats like DocBook and DITA for technical documentation.

Web Development: XML can be used in web development for various purposes, including sitemaps, RSS feeds, and custom data formats. For example, RSS (Really Simple Syndication) uses XML to syndicate content on websites.

Database Integration: XML can be used to import/export data from and to databases. It allows for structured data representation that can be easily processed by database systems.

Middleware: XML is used in middleware and messaging systems for communication between different components or services in a service-oriented architecture (SOA). XML messages can be exchanged between applications and services.

Data Storage: XML databases and NoSQL databases like MarkLogic and BaseX store data in XML format, enabling efficient storage and retrieval of structured data.

Transportation and Serialization: XML is used to serialize data structures for transmission or storage. It can represent complex data structures and is suitable for data serialization in various programming languages.

Industry Standards: Many industries have adopted XML-based standards for data exchange and representation. Examples include HL7 for healthcare, FIX for financial transactions, and XBRL for financial reporting.

Semantic Web: XML is used as a foundational technology in the Semantic Web to represent and link structured data in a way that enables intelligent data processing by machines.

Custom Data Formats: Organizations can define their own XML-based data formats for specific use cases, allowing them to structure data in a way that suits their needs.

Human-Readable Data Storage: XML is often used for storing structured data in a human-readable format, making it easy for developers and administrators to understand and work with data.

Data Transformation: XML can be used as an intermediate format for data transformation and integration processes, facilitating the conversion of data between different systems.

Testing and Mock Data: XML can be used to generate test data or mock data for software testing and development purposes.

In summary, XML is a versatile technology that plays a crucial role in data interchange, configuration, documentation, and structured data representation across various domains and industries. Its use cases extend to virtually any scenario where structured data needs to be represented, exchanged, or stored.

Self-Describing Syntax

<?xml version="1.0" encoding="UTF-8"?>

<?xml: This part of the declaration marks the beginning of the XML declaration and is referred to as the "XML prolog" or "XML declaration tag."

version="1.0": This attribute specifies the version of the XML specification being used. In this case, it indicates that the document adheres to XML 1.0, which is the most commonly used version.

encoding="UTF-8": This attribute specifies the character encoding used in the XML document. UTF-8 is a widely used character encoding that can represent a wide range of characters from various languages and character sets.

The XML declaration, including its attributes, provides crucial information to software and systems processing the XML document:

Version: It tells software which version of the XML specification to expect. Different versions of XML may have different rules and features.

Character Encoding: It specifies the character encoding used in the document. This information is essential for correctly interpreting and displaying characters in the document, especially when dealing with non-ASCII characters or characters from different languages.

XML Tree

XML documents form a tree structure that starts at "the root" and branches to "the leaves".

XML Document Example

<?xml version="1.0"?>
<note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
</note>

XML declaration: - define the xml version and the character encoding used in the document.

The first line is the XML declaration. It defines the XML version (1.0) and character encoding used in the document

The next line describes the root element of the document (like saying: "this document is a note"):

<note>

XML documents must contain a root element.

The next 4 lines describe 4 child elements of the root (to, from, heading, and body):

<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>

And finally the last line defines the end of the root element:

</note>


 The image above represents one book in the XML below:

<bookstore>
  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

The root element in the example is <bookstore>. All <book> elements in the document are contained within <bookstore>.

The <book> element has 4 children: <title>,< author>, <year>, <price>.

XML Syntax Rules
All XML Elements Must Have a Closing Tag

The tags are paired together, so that any opening tag also has a closing tag.
 these are called start-tags and end-tags. The end-tags are the same as the start-tags, except that they have a "/" right after the opening < character.

<p>This is a paragraph.</p>
<br />

XML Tags are Case Sensitive

XML tags are case sensitive. The tag <Letter> is different from the tag <letter>.

Opening and closing tags must be written with the same case:

<Message>This is incorrect</message>
<message>This is correct</message>

XML Elements Must be Properly Nested

In HTML, you might see improperly nested elements:

<b><i>This text is bold and italic</b></i>

In XML, all elements must be properly nested within each other:

<b><i>This text is bold and italic</i></b>

In the example above, "Properly nested" simply means that since the <i> element is opened inside the <b> element, it must be closed inside the <b> element.

XML Documents Must Have a Root Element

XML documents must contain one element that is the parent of all other elements. This element is called the root element.

<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>
 

XML Attribute Values Must be Quoted

XML elements can have attributes in name/value pairs just like in HTML.

In XML, the attribute values must always be quoted.

Study the two XML documents below. The first one is incorrect, the second is correct:

<note date=12/11/2007>
  <to>Tove</to>
  <from>Jani</from>
</note>
 

<note date="12/11/2007">
  <to>Tove</to>
  <from>Jani</from>
</note>

The error in the first document is that the date attribute in the note element is not quoted.

Element Names

there are some rules that we must follow:

  • Names can start with letters (including non-Latin characters) or the "_" character, but not numbers or other punctuation characters.

  • After the first character, numbers are allowed, as are the characters "-" and ".".

  • Names can't contain spaces.

  • Names can't start with the letters "xml", in uppercase, lowercase, or mixed–you can't start a name with "xml", "XML", "XmL", or any other combination.

  • There can't be a space after the opening "<" character; the name of the element must come immediately after it. However, there can be space before the closing ">"character, if desired.

  • Here are some examples of valid names:

    <first.name> <résumé>

    And here are some examples of invalid names:

    <xml-tag>

    which starts with xml,

    <123>

    which starts with a number,

    <fun=xml>

    because the "=" sign is illegal, and:

    <my tag>

    which contains a space.

There are 5 predefined entity references in XML:

&lt;

< 

less than

 

&gt;

> 

greater than

 

&amp;

&

ampersand 

 

&apos;

'

Apostrophe

 

&quot;

"

quotation mark

 

 Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is a good habit to replace it.

Comments in XML

The syntax for writing comments in XML is similar to that of HTML.

<!-- This is a comment -->

White-space is Preserved in XML

HTML truncates multiple white-space characters to one single white-space:

HTML:

Hello           Tove

Output:

Hello Tove

With XML, the white-space in a document is not truncated.

XMLtruncates multiple white-space characters to one single white-space:

HTML:

Hello           Tove

Output:

Hello           Tove

With XML, the white-space in a document is not truncated.


XML Elements

An XML document contains XML Elements.

What is an XML Element?

An XML element is everything from (including) the element's start tag to (including) the element's end tag.

An element can contain:

  • other elements
  • text
  • attributes
  • or a mix of all of the above...

Empty XML Elements

<element></element>

<element />          // self-closing tag

Sometimes an element has no PCDATA. Recall our earlier example, where the <middle> element contained no name:

<name nickname='Shiny John'> <first>John</first> <!--John lost his middle name in a fire--> <middle></middle> <last>Doe</last> </name>

In this case, you also have the option of writing this element using the special empty element syntax:

<middle/>
<middle />

but not like these:

<middle/ > <middle / >
Keep in mind, however, that as far as XML is concerned <middle></middle> is exactly the same as <middle/>; 

XML Naming Rules

XML elements must follow these naming rules:

  • Names can contain letters, numbers, and other characters
  • Names cannot start with a number or punctuation character
  • Names cannot start with the letters xml (or XML, or Xml, etc)
  • Names cannot contain spaces

Any name can be used, no words are reserved.

Best Naming Practices

Create descriptive names, like this: <person>, <firstname>, <lastname>.

Create short and simple names, like this: <book_title> not like this: <the_title_of_the_book>.

Avoid "-". If you name something "first-name", some software may think you want to subtract "name" from "first".

Avoid ".". If you name something "first.name", some software may think that "name" is a property of the object "first".

Avoid ":". Colons are reserved for namespaces (more later).

Non-English letters like éòá are perfectly legal in XML, but watch out for problems if your software doesn't support them.

One of the beauties of XML, is that it can be extended without breaking applications.

XML Attributes

XML elements can have attributes, just like HTML.

Attributes are simple name/value pairs associated with an element.

Attributes must have values–even if that value is just an empty string (like "")–and those values must be in quotes. 

Attributes provide additional information about an element.

XML Attributes Must be Quoted

Attribute values must always be quoted. Either single or double quotes can be used. For a person's sex, the person element can be written like this:

<person sex="female">

or like this:

<person sex='female'>

If the attribute value itself contains double quotes you can use single quotes, like in this example:

<gangster name='George "Shotgun" Ziegler'>

or you can use character entities:

<gangster name="George &quot;Shotgun&quot; Ziegler">

The same rules apply to naming attributes as apply to naming elements: names are case sensitive, can't start with "xml", and so on. Also, you can't have more than one attribute with the same name on an element. So if we create an XML document like this:

<bad att="1" att="2"></bad>

XML Elements vs. Attributes

Take a look at these examples:

<person sex="female">
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>
 

<person>
  <sex>female</sex>
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>

In the first example sex is an attribute. In the last, sex is an element. Both examples provide the same information.

There are no rules about when to use attributes or when to use elements. Attributes are handy in HTML. In XML my advice is to avoid them. Use elements instead.

Avoid XML Attributes?

Some of the problems with using attributes are:

  • attributes cannot contain multiple values (elements can)
  • attributes cannot contain tree structures (elements can)
  • attributes are not easily expandable (for future changes)

Attributes are difficult to read and maintain. Use elements for data. Use attributes for information that is not relevant to the data.

An attribute in xsd always declared as a simple type.

Syntax-<xs:attribute name= “xxx” type=”yyy”>

Ex:- -<xs:attribute name= “lang” type=”xs:string”/>

XML Namespaces

namespaces are a mechanism used to avoid naming conflicts when elements or attributes from different sources or vocabularies are combined in a single XML document. Namespaces are essential when you have XML documents that include elements or attributes with the same names but belong to different domains or purposes.

To use namespaces in an XML document, you need to declare them. This is typically done in the root element of the XML document using the xmlns attribute.

Ex: 

<root xmlns:ns="http://example.com">

    <ns:element1>Content1</ns:element1>

    <ns:element2>Content2</ns:element2>

</root>

In this example xmlns:ns attribute declares a namespace prefix ns associated with the namespace URI http://example.com.

Namespace Prefix: The prefix, in this case, ns, is used to qualify element and attribute names within the specified namespace. It helps differentiate elements and attributes with the same local name but different namespaces.

xmlns attribute is reserved for namespace declarations.

By using the namespace prefix, you explicitly indicate which namespace an element or attribute belongs to. In the example above, ns:element1 and ns:element2 are in the http://example.com namespace.

Default Namespace: You can also declare a default namespace using xmlns without a prefix. Elements without a namespace prefix are assumed to belong to the default namespace.

<root xmlns="http://example.com">
    <element1>Content1</element1>
    <element2>Content2</element2>
</root>

In this example, both element1 and element2 belong to the http://example.com namespace.

Default Namespace:  A Namespace which does not have prefix.

Target Namespace:

In XML, a target namespace is a way to uniquely identify and categorize XML elements and attributes within a document. Target namespaces are often associated with XML schema definitions (XSD) to define the structure and validation rules for XML documents.

Defining a Target Namespace:

To define a target namespace in an XML schema, you typically use the xmlns attribute in the root element of the schema. This attribute declares a namespace and associates it with a Uniform Resource Identifier (URI), which serves as a unique identifier for that namespace.

Ex:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            targetNamespace="http://example.com/mynamespace">
   <!-- Define elements and types within this namespace -->
</xsd:schema>

In this example, the targetNamespace attribute associates the "http://example.com/mynamespace" URI with the XML schema. All elements and types defined within this schema are considered part of this namespace.

Using Elements from a Target Namespace:

When you define XML documents based on this schema, you reference elements from the target namespace by using the namespace prefix or declaring a default namespace for the entire document. You do this using the xmlns attribute in the root element of the XML document.

<myNamespaceRoot xmlns="http://example.com/mynamespace">
   <element1>Value 1</element1>
   <element2>Value 2</element2>
</myNamespaceRoot>

In this XML document, the xmlns attribute without a prefix specifies that elements within this document belong to the "http://example.com/mynamespace" namespace.

Avoiding Name Conflicts:

Target namespaces are essential for avoiding naming conflicts, especially when combining XML documents or elements from different sources. Different namespaces allow you to use the same element or attribute names without ambiguity, as long as they belong to different namespaces.

For example, two XML documents can both have an "element1" element, but if they are in different namespaces, they are treated as distinct:

<!-- Document 1 -->

<myNamespaceRoot xmlns="http://example.com/mynamespace">
   <element1>Value 1</element1>
</myNamespaceRoot>

<!-- Document 2 -->

<anotherNamespaceRoot xmlns="http://example.com/anothernamespace">
   <element1>Value 2</element1>
</anotherNamespaceRoot>

In this case, "element1" in Document 1 is not the same as "element1" in Document 2 because they belong to different namespaces.

Interoperability: Defining target namespaces and using namespaces in XML messages is essential for ensuring interoperability between different services in a SOA. It helps services understand the structure and meaning of XML data exchanged between them, even if they have different XML schemas.

By using target namespaces and XML namespaces effectively in SOA, you can achieve a high degree of flexibility and compatibility when integrating diverse services within your architecture.

Soap:

<Envolep>   // root tag

<Header> </ Header >   // optional tag

<body></body>  // mandatory tag

</Envolep>

PCDATA: means parsed character data.  i.e- PCDATA is text that will be parsed by a parser.

CDATA: means character data. i.e- CDATA is text that will not be parsed by parser.

Whitespace in PCDATA

There is a special category of characters, called whitespace. This includes things like the space character, new lines (what you get when you hit the Enter key), and tabs. Whitespace is used to separate words, as well as to make text more readable.

In XML, however, no whitespace stripping takes place for PCDATA. This means that for the following XML tag:
<tag>This is a paragraph. It has a whole bunch of space.</tag>

the PCDATA is:

This is a paragraph. It has a whole bunch of space.

Q) How to identify client patnerlink and invoke partnerlink ?

->If you see myRole inside partnerlink of .bpel then its a client partnerlink.

->If you see partnerRole inside partnerlink of .bpel then its a invoke partnerlink.

Client partnerLink: This will initiate BPEL (MyRole)

Invoke Partner Link: This will be triggered by BPEL(PartnerRole)

 

 

No comments:

Post a Comment

SOA Overview Part-1

  Middleware It provides a mechanism for the process to interact with other processes running on multiple network machines. Advantages...