HTML
Adapted from Wikipedia · Discoverer experience
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be shown in a web browser. It tells the browser what the content is and how it should be organized. HTML works together with other technologies like Cascading Style Sheets (CSS) for design and JavaScript for adding interactivity.
When you open a website, your web browser reads an HTML document. This document might come from a web server or be saved on your device. The browser then turns the HTML into a page you can see, with text, pictures, and more.
HTML uses special words called tags, written with angle brackets, to describe the structure of a page. For example, tags can make text into headings, paragraphs, or lists. They can also add images and links to other pages. You won’t see these tags when you look at the page, but they help the browser understand what to show.
With HTML, you can also add programs written in JavaScript to make pages do things like play games or update information automatically. A newer version called HTML5 lets browsers show video and audio directly on pages.
History
Development
In 1980, physicist Tim Berners-Lee, working at CERN, created a system called ENQUIRE to help researchers share documents. In 1989, he wrote a memo suggesting a way to connect documents over the Internet using hypertext. He developed HTML and created the first web browser and server in 1990. That same year, he and engineer Robert Cailliau asked for funding for the project, but it wasn’t officially supported by CERN.
The first public description of HTML was a document called “HTML Tags,” mentioned by Berners-Lee in 1991. It explained 18 basic elements of HTML. Except for links, these were inspired by a special format used at CERN. Eleven of these elements are still used in HTML today.
HTML is a markup language that web browsers use to show text, pictures, and other content on web pages. Browsers have default settings for HTML elements, but designers can change how things look using CSS. Early text formatting ideas came from old commands used to arrange documents for printing. But HTML focuses on the structure of content, with CSS handling the style.
Berners-Lee thought of HTML as a type of SGML, a standard for document formats. In 1993, the Internet Engineering Task Force published the first HTML proposal. It included ideas from a browser called NCSA Mosaic, like showing pictures directly in web pages. Another proposal by Dave Raggett in 1993 suggested adding features such as tables and forms.
After these early proposals ended in 1994, the Internet Engineering Task Force formed an HTML Working Group. In 1995, they finished “HTML 2.0,” the first official HTML standard.
Development slowed after that, but the World Wide Web Consortium took over in 1996. In 2000, HTML became an international standard. HTML 4.01 came out in 1999, with updates in 2001. In 2004, work began on HTML5, which was finished and officially released on October 28, 2014.
HTML version timeline
HTML 2
24 November 1995
HTML 2.0 was released as an official document. Additional updates added features like uploading files, tables, image maps, and international characters.
HTML 3
14 January 1997
HTML 3.2 was released. This was the first version made fully by the World Wide Web Consortium. It added support from popular web browsers but left out some special effects to keep things simple. A system for complex math formulas was created soon after.
HTML 4
18 December 1997
HTML 4.0 was released. It came in three versions: Strict, Transitional, and Frameset. It included many features that web browsers already used but tried to encourage using stylesheets instead of built-in designs. HTML 4.01 was released in 1999 with small updates, and became an international standard in 2000.
After HTML 4.01, no new HTML versions were released for many years because work shifted to XHTML.
HTML 5
28 October 2014
HTML5 was released as an official standard.
1 November 2016
HTML 5.1 was released as an official standard.
14 December 2017
HTML 5.2 was released as an official standard.
HTML draft version timeline
October 1991
The document “HTML Tags” was first mentioned publicly.
June 1992
The first informal draft of HTML was created.
November 1992
HTML DTD 1.1 was released, the first to include a version number.
June 1993
The Hypertext Markup Language was published as an early proposal.
November 1993
A competing proposal called HTML+ was published.
November 1994
Work began on what would become HTML 2.0.
April 1995
HTML 3.0 was proposed but was not finished.
January 2008
HTML5 was introduced as a work in progress.
2011 HTML5 – Last Call
In February 2011, HTML5 was reviewed for final checks.
2012 HTML5 – Candidate Recommendation
In December 2012, HTML5 was ready to become an official standard.
2014 HTML5 – Proposed Recommendation and Recommendation
In September 2014, HTML5 moved closer to becoming a standard.
On 28 October 2014, HTML5 was released as an official standard.
XHTML versions
Main article: XHTML
XHTML is a version of HTML using XML. It is now called the XML syntax for HTML and is no longer a separate standard.
- XHTML 1.0 was released on January 26, 2000, and updated on August 1, 2002. It had three versions like HTML 4.0, but written in XML.
- XHTML 1.1 was released on May 31, 2001. It was based on XHTML 1.0 Strict and could be customized.
- XHTML 2.0 was being developed but work stopped in 2009 in favor of HTML5 and XHTML5. XHTML 2.0 was not compatible with earlier XHTML versions.
Transition of HTML publication to WHATWG
See also: HTML5 § W3C and WHATWG conflict
In May 2019, the World Wide Web Consortium announced that the WHATWG would be the only group publishing HTML and DOM standards. The two groups had been publishing different versions since 2012. The WHATWG’s “Living Standard” was widely used, while the W3C released updates to match it.
Markup
HTML uses special words called tags to give structure to web pages. These tags often come in pairs, like and, where the first tag starts something and the second tag ends it. Some tags, like <img>, don’t need a matching end tag.
Here’s a simple example:
This is a title
<div>
Hello world!
</div>
The text between and shows what appears at the top of a browser tab. The text between <div> and </div> is part of the page that you can see.
HTML documents use tags to organize content. For example, headings use tags like to, with being the most important heading. Paragraphs are made with tags, and lines can be broken with the <br> tag.
Links are created using the <a> tag, with the href attribute pointing to the link’s address, like this:
A link to Wikipedia!
Semantic HTML
Main article: Semantic HTML
Semantic HTML is a way to write web pages that focuses on what the information means, rather than just how it looks. Since the late 1990s, people have been encouraged to use HTML that shows the meaning of the content instead of just its appearance. This helps computers understand web pages better.
When search engines look through the web, they need to understand what the pages are about. Good use of semantic HTML helps these search tools work better. It also makes web pages easier to use for everyone, including people who rely on special tools to read the web.
Delivery
HTML documents can be sent the same way as any other computer file. Most often, they are sent using HTTP from a web server or through email.
The World Wide Web is made up of HTML documents that travel from web servers to web browsers using HTTP. HTTP is also used for sending images, sounds, and other content. Along with each document, extra information is sent to help the browser understand how to display it. This includes details like the type of document and how characters are shown.
Some web browsers may show HTML documents differently depending on the type information sent with them. Most email programs let you use a small amount of HTML to make emails look better, like adding colors or pictures. However, using HTML in emails can sometimes cause problems for people who can't see the screen well, and it can also make emails larger.
The usual ending for HTML files is .html, though sometimes people use the shorter .htm as well.
HTML4 variations
Since HTML began, it gained acceptance quickly. However, early years lacked clear standards. Though creators saw HTML as a language for meaning, not looks, practical needs pushed many look-related parts into the language, mainly due to different web browsers. Today’s standards aim to bring order to HTML’s development and create a strong base for building clear and well-looking documents. To return HTML to its meaning-focused role, the W3C developed style languages like CSS and XSL to handle looks. With this, the HTML rules slowly reduced look-related parts.
HTML has two main ways to differ: SGML-based HTML versus XML-based HTML (called XHTML), and strict versus transitional (loose) versus frameset.
SGML-based versus XML-based HTML
A key difference in HTML rules is between SGML-based and XML-based HTML, called XHTML. The W3C meant for XHTML 1.0 to match HTML 4.01, except where XML’s rules differed from SGML’s more complex ones. Because XHTML and HTML are close, they are sometimes written together as (X)HTML or X(HTML).
Like HTML 4.01, XHTML 1.0 has three types: strict, transitional, and frameset.
Besides starting a document differently, HTML 4.01 and XHTML 1.0 mainly differ in their structure. HTML allows shortcuts XHTML does not, like tags without opening or closing parts, or empty tags without end tags. XHTML needs all tags to have opening and closing parts. XHTML also adds a new shortcut: a tag can open and close in one tag, like <br/>. This new way might confuse older software. Fixing this means removing the slash, like <br>.
To change a valid XHTML 1.0 document to HTML 4.01, do these steps:
- Use a
langattribute for language instead of XHTML’sxml:lang. - Remove the XML area (
xmlns=URI). HTML has no areas. - Change the document type from XHTML 1.0 to HTML 4.01.
- If present, remove the XML start part. It usually looks like ``.
- Make sure the document’s type is set to
text/html. This comes from the server’sContent-Typemessage. - Change XML empty-tag style to HTML style (
<br/>to<br>).
These are the main steps to change a document from XHTML 1.0 to HTML 4.01. Changing from HTML to XHTML also needs adding any missing opening or closing tags. It might be best to always include optional tags in HTML rather than remembering which can be left out.
A well-made XHTML document follows all XML’s structure rules. A right document follows XHTML’s content rules, which describe its building order.
The W3C suggests some ways to make moving between HTML and XHTML easier. These steps work for XHTML 1.0 documents only:
- Include both
xml:langandlangattributes when giving a language to elements. - Use the empty-tag style only for tags meant to be empty in HTML.
- Remove the closing slash in empty tags: like
<br>instead of<br/>. - Include full closing tags for elements that can hold content but are left empty, like
<div>not<div />. - Leave out the XML start part.
By following the W3C’s guidelines, a web browser should treat the document the same whether it is HTML or XHTML. For XHTML 1.0 documents made this way, the W3C allows them to be sent as HTML (with text/html MIME type) or as XHTML (with application/xhtml+xml or application/xml type). When sent as XHTML, browsers should use an XML reader, which follows XML’s rules strictly.
Transitional versus strict
HTML 4 had three types: Strict, Transitional (once called Loose), and Frameset. The Strict type is for new documents and is best practice. The Transitional and Frameset types help move older documents to HTML 4. They allow look-related parts that Strict leaves out. Instead, cascading style sheets are suggested to improve how HTML documents look. Since XHTML 1 only gives an XML way to write HTML 4, the same rules apply to XHTML 1.
The Transitional type allows these parts not in Strict:
- A more flexible content order
- Inline pieces and plain text can go directly in:
body,blockquote,form,noscriptandnoframes
- Inline pieces and plain text can go directly in:
- Look-related pieces
- underline (
u) (Old. may mix up links.) - strike-through (
s) center(Old. use CSS instead.)font(Old. use CSS instead.)basefont(Old. use CSS instead.)
- underline (
- Look-related attributes
background(Old. use CSS instead.) andbgcolor(Old. use CSS instead.) forbody(needed part for the W3C.) element.align(Old. use CSS instead.) fordiv,form, paragraph (p) and headings (h1...h6)align(Old. use CSS instead.),noshade(Old. use CSS instead.),size(Old. use CSS instead.) andwidth(Old. use CSS instead.) forhralign(Old. use CSS instead.),border,vspaceandhspaceforimgandobject(note:objectonly works in Internet Explorer of big browsers) elementsalign(Old. use CSS instead.) forlegendandcaptionalign(Old. use CSS instead.) andbgcolor(Old. use CSS instead.) ontablenowrap(No longer used),bgcolor(Old. use CSS instead.),width,heightontdandthbgcolor(Old. use CSS instead.) fortrclear(No longer used) forbrcompactfordl,dirandmenutype(Old. use CSS instead.),compact(Old. use CSS instead.) andstart(Old. use CSS instead.) forolandultypeandvalueforliwidthforpre
- Extra pieces in Transitional rules
menu(Old. use CSS instead.) list (no replace, but unordered list is suggested)dir(Old. use CSS instead.) list (no replace, but unordered list is suggested)isindex(Old.) (this needs work from the server and is usually added there,formandinputcan be used instead.)applet(Old. useobjectinstead.)
- The
language(No longer used) attribute for script piece (extra withtypeattribute). - Frame-related pieces
iframenoframestarget(Old inmap,linkandformpieces.) fora, image-map (map),link,formandbase
The Frameset type includes all Transitional parts, plus the frameset piece (used instead of body) and the frame piece.
Frameset versus transitional
Besides the above Transitional differences, the frameset rules (whether XHTML 1.0 or HTML 4.01) use a different content order, with frameset replacing body, holding either frame pieces, or sometimes noframes with a body.
Summary of specification versions
As this list shows, the loose rules are kept for old support. But, contrary to common belief, moving to XHTML does not mean removing this old support. The X in XML means extensible, and the W3C is separating the whole rule set and opening it to independent additions. The main win in moving from XHTML 1.0 to XHTML 1.1 is this separation. The Strict type of HTML is used in XHTML 1.1 through a set of added parts to the base XHTML 1.1 rules. Similarly, those looking for the loose (transitional) or frameset rules will find similar added XHTML 1.1 support (much in the old or frame modules). Separation also lets pieces grow on their own schedule. For example, XHTML 1.1 will let faster moves to new XML standards such as MathML (a way to show and mean math based on XML) and XForms—a new, advanced way for web forms beyond today’s HTML forms.
In short, the HTML 4 rules mostly brought all HTML ways into one clear rule book based on SGML. XHTML 1.0 moved this rule book to the new XML-based rules. Next, XHTML 1.1 used XML’s open nature to separate the whole rule book. XHTML 2.0 was meant to be the first step in adding new parts to the rules in a group-based way.[AI-generated?]
WHATWG HTML versus HTML5
The HTML Living Standard, created by WHATWG, is the main version used today. W3C HTML5 is no longer separate from WHATWG.
WYSIWYG editors
Some editors, called WYSIWYG (what you see is what you get), let users create web pages using a graphical user interface, much like word processors. These tools show how the page will look instead of showing the code, so you don’t need to know much about HTML.
However, these editors can sometimes create messy or unnecessary code. Some developers prefer a different approach called WYSIWYM (what you see is what you mean), which focuses more on the meaning of the content rather than just how it looks.
Related articles
This article is a child-friendly adaptation of the Wikipedia article on HTML, available under CC BY-SA 4.0.
Images from Wikimedia Commons. Tap any image to view credits and license.
Safekipedia