This version: http://www.w3.org/TR/2002/REC-xhtml1-20020801
This specification defines the Second Edition of XHTML 1.0, a
reformulation of HTML 4 as an XML 1.0 application, and three
DTDs(Document
Type definitions) corresponding to the ones defined by HTML 4.
This document has been produced as part of the
W3C HTML Activity .
This section is informative.
XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4 [HTML4].
The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while still remaining confident in their content's backward and future compatibility.
HTML 4 is an SGML (Standard Generalized Markup Language) application conforming to International Standard ISO 8879, and is widely regarded as the standard publishing language of the World Wide Web.
SGML is a language for describing markup languages, particularly those
used in electronic document exchange. HTML is an example of a language
defined in SGML.
SGML has been around since the middle 1980's and has remained quite stable.
HTML, as originally conceived, was to be a language for the exchange of
scientific and other technical documents, suitable for use by non-document
specialists.
In a remarkably short space of time, HTML became wildly popular and
rapidly outgrew its original purpose. Since HTML's inception, there has
been rapid invention of new elements for use within HTML (as a standard) and
for adapting HTML to vertical, highly specialized, markets. This plethora of
new elements has led to interoperability problems for documents across
different platforms.
XML is the shorthand name for Extensible Markup Language.
XML was conceived as a means of regaining the power and flexibility of SGML without most of its complexity. Although a restricted form of SGML, XML nonetheless preserves most of SGML's power and richness, and yet still retains all of SGML's commonly used features.
While retaining these beneficial features, XML removes many of the more complex features of SGML that make the authoring and design of suitable software both difficult and costly.
Some of the benefits of migrating to XHTML in general are:
Attribute
An attribute is a parameter to an element declared in the DTD. An attribute's type and value range, including a possible default value, are defined in the DTD.
DTD
(Document Type Definition) is a collection of XML markup declarations that, as a collection, defines the legal structure, elements, and attributes that are available for use in a document that complies to the DTD.
Document
A document is a stream of data that, after being combined with any other streams it references, is structured such that it holds information contained within elements that are organized as defined in the associated DTD. See Document Conformance for more information.
Element
An element is a document structuring unit declared in the DTD. The element's content model is defined in the DTD, and additional semantics may be defined in the prose description of the element.
Facilities are elements, attributes, and the semantics associated with those elements and attributes.
A document is well-formed when it is structured according to the rules defined in Section 2.1 of the XML 1.0 Recommendation [XML].
A Strictly Conforming XHTML Document is an XML document that requires only the facilities described as mandatory in this specification. Such a document must meet all of the following criteria:
html.
xmlns declaration for
the XHTML namespace [XMLNS].
The namespace for XHTML is defined to be
http://www.w3.org/1999/xhtml. An example root element
might look like: 4. <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
6. <!DOCTYPE html
7. PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
8. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
9. PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
10. PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
This section is informative.
Due to the fact that XHTML is an XML application, certain practices that were perfectly legal in SGML-based HTML 4 [HTML4] must be changed.
Well-formedness is a new concept introduced by [XML]. Essentially this means that all elements must either have closing tags or be written in a special form (as described below), and that all the elements must nest properly.
Although overlapping is illegal in SGML, it is widely tolerated in existing browsers.
CORRECT: nested elements.
<p>here is an emphasized <em>paragraph</em>.</p>
INCORRECT: overlapping elements
<p>here is an emphasized <em>paragraph.</p></em>
XHTML documents must use lower case for all HTML element and attribute names. This difference is necessary because XML is case-sensitive e.g. <li> and <LI> are different tags.
In SGML-based HTML 4 certain elements were permitted to omit the end tag;
with the elements that followed implying closure. XML does not allow end
tags to be omitted. All elements other than those declared in the DTD as
EMPTY
must have an end tag. Elements that are declared in the DTD as
EMPTY
can have an end tag or can use empty element shorthand (see
Empty Elements).
CORRECT: terminated elements
<p>here is a paragraph.</p><p>here is another paragraph.</p>
INCORRECT: unterminated elements
<p>here is a paragraph.<p>here is another paragraph.
All attribute values must be quoted, even those which appear to be numeric.
CORRECT: quoted attribute values
<td rowspan="3">
INCORRECT: unquoted attribute values
<td rowspan=3>
XML does not support attribute minimization. Attribute-value pairs must
be written in full. Attribute names such as
compact
and
checked cannot occur in elements without their value being
specified.
CORRECT: unminimized attributes
<dl compact="compact">
INCORRECT: minimized attributes
<dl compact>
Empty elements must either have an end tag or the start tag must end with
/>.
For instance,
<br/>
or
<hr></hr>. See
HTML Compatibility Guidelines for information on ways to ensure this is
backward compatible with HTML 4 user agents.
CORRECT: terminated empty elements <br/><hr/>
INCORRECT: unterminated empty elements
<br><hr>