Safekipedia

XML

Adapted from Wikipedia · Adventurer experience

The logo of XML – markup language by the W3C for encoding of data[réf. nécessaire]

Extensible Markup Language (XML) is a special way to write information so computers and people can read and use it easily. It has rules for how to store and share data, making sure everyone understands it the same way. These rules were created in 1998 by the World Wide Web Consortium, and anyone can use them for free.

XML was made to be simple, useful, and work well on the Internet. It can handle many different languages thanks to support for Unicode. Even though it was made for writing documents, XML is also used to represent all sorts of information that computers need to work with, like in web services.

There are many tools and ways for programmers to work with XML data, and several systems to help define new languages based on XML. This makes XML a flexible and powerful tool for handling information in many different situations.

Overview

XML is a special way to share information between computers and programs. It works like a common language everyone can understand.

XML uses tags to organize and label information. These tags make it easy for both people and computers to read. When an XML file follows the basic rules, it is called "well-formed". Following extra rules in a guide called a schema makes it even more reliable.

Applications

XML is used to share information on the Internet. Many document formats, like RSS, Atom, Office Open XML, and XHTML, use XML. It helps in communication methods such as SOAP and XMPP, and is part of the Asynchronous JavaScript and XML (AJAX) technique.

XML is the basis for many industry standards, including Health Level 7 and OpenTravel Alliance. It is important in publishing and is used in scientific fields, like weather information through IWXXM standards.

Key terminology

This section explains some important ideas about XML based on its Specification. It covers the main parts you will see often.

Character

An XML document is made up of characters. Almost any Unicode character can be used, except for the Null character.

Processor and application

A processor looks at the markup in an XML document and sends organized information to an application. The rules for what a processor must do are set by the specification, but the application itself is not covered by these rules. People often call the processor an XML parser.

Markup and content

The characters in an XML document are split into markup and content. You can tell them apart using simple syntactic rules. Markup usually starts with or with `&` and ends with `;`. Anything that is not markup is content. Inside a [CDATA](/wiki/CDATA) section, the symbols are markup, but the text between them is content. Extra spaces before or after the main part of the document are considered markup.

Tag

A tag is a type of markup that starts with ``. There are three kinds of tags:

Element

An element is a part of the document that either starts with a start-tag and ends with a matching end-tag, or it can be an empty-element tag. The text between the start-tag and end-tag, if there is any, is the element's content. This content can include more markup, even other elements, which are called child elements. For example, Hello, world! or ``.

Attribute

An attribute is a small piece of information inside a start-tag or empty-element tag. It has a name and a value. For example, in <img src="madonna.jpg" alt="Madonna" />, "src" and "alt" are attribute names, and "madonna.jpg" and "Madonna" are their values. Another example is Connect A to B., where "number" is the attribute name and "3" is its value. Each attribute can only have one value and can appear only once for each element. If you need to list many values, you have to put them together in a special way, like using commas or spaces. For example, <div class="inner greeting-box">Welcome!</div> shows the attribute "class" with the value "inner greeting-box", which also tells us about the two CSS classes "inner" and "greeting-box".

XML declaration

XML documents can start with an XML declaration that gives some details about the document. An example is ``.

Characters and escaping

XML documents use characters from the Unicode set. Most characters can be used, but a few special ones need special handling.

XML has ways to show characters that can't be used directly. For example, &amp; stands for the "&" symbol, and there are similar ways to show other special characters. You can also use a special code, like &#20013;, for certain symbols. This helps when a keyboard can't type a needed character directly.

Syntactical correctness and error-handling

Main article: Well-formed document

XML documents need to follow certain rules to work properly. These rules include using the right characters and having one main element that contains everything else. If a document does not follow these rules, the special software that reads XML will stop and tell you there is a problem.

Unlike some other formats, XML does not try to guess what you meant if you make a mistake. Instead, it stops and shows the error immediately. This strict rule helps keep documents clear and easy to use together.

Schemas and validation

An XML document can be checked to see if it follows certain rules. This check is called validity.

The document can refer to something called a Document Type Definition, or DTD. The DTD lists the rules for what elements and attributes the document can have.

There are different ways to create these rules, called schemas. The oldest way is the DTD. It is simple but has some limits. A newer way is XML Schema, or XSD. XSD can describe documents more powerfully and uses a format that is easy for computers to process. There is also RELAX NG, which offers a simpler set of rules. Another method is Schematron. Schematron lets you make specific checks on the document using special commands.

Related specifications

Many tools and rules work together with XML. When people talk about XML, they often include some of these tools too.

  • XML namespaces help keep different parts of a document separate.
  • XML Base helps decide where to find files mentioned in a document.
  • XML Information Set describes what pieces make up an XML document.
  • XSL (Extensible Stylesheet Language) has three parts:
    • XSLT changes XML documents into other formats.
    • XSL-FO helps make XML documents look nice, like in PDFs.
    • XPath finds parts inside XML documents.
  • XQuery is a way to search and work with XML, especially in XML databases.
  • XML Signature creates secure marks to show XML content hasn’t been changed.
  • XML Encryption keeps XML content private.
  • XML model links XML documents to rules about how they should be set up.

Some other tools meant to work with XML didn’t become very popular, like XInclude, XLink, and XPointer.

Programming interfaces

XML was created to help computers work with documents more easily. While the rules of XML don’t say how to do this, many tools have been made to help.

These tools are in a few main groups:

  • Stream-oriented tools like SAX and StAX read the document step by step, using less memory.
  • Tree-traversal tools like DOM load the whole document into memory. This uses more space but can be simpler to use.
  • **XML data binding](/w/5) links XML documents to programming objects automatically.
  • Declarative tools such as XSLT and XQuery help change and search XML documents.
  • Extensions to languages like LINQ and Scala add XML support right into the code.

Some tools, like SAX, work quickly but can be harder to use for finding random pieces of information. Others, like DOM, are slower because they need more memory but can make it easier to find specific parts of the document.

History

XML is a way to organize information. It is based on a bigger system called SGML. In the late 1980s, people who worked with digital media saw that SGML could be useful. When the Internet started to grow, some of these people thought SGML could help solve problems on the Web.

In 1995, Dan Connolly added SGML to the work of the World Wide Web Consortium (W3C). Work on XML began in mid-1996 when Jon Bosak, an engineer at Sun Microsystems, created a plan and brought together others to help. A small group of eleven people did most of the work, with support from about 150 others in an Interest Group. They talked and made decisions mostly through emails and phone meetings.

The main ideas for XML were decided between August and November 1996, and the first draft of the XML specification was published then. More work continued through 1997, and XML 1.0 became an official W3C standard on February 10, 1998.

Related articles

This article is a child-friendly adaptation of the Wikipedia article on XML, available under CC BY-SA 4.0.

Images from Wikimedia Commons. Tap any image to view credits and license.