HTML - Hypertext Markup Language


Introduction

HTML is an application of SGML, the Standard General Markup Language. It has the shape of text which is enriched with extra markup. Two kinds of markup are possible :

tags
A tag is a name enclosed in angled brackets (< and >).Tags are normally encountered in pairs, a start tag (just the name in backets), and an end tag (the name is preceded by a slash (/)).
For example <I>italics</I>.
Tags can also have extra attributes, which can be given after the name, but before the closing bracket.
Tags which are not recognised are skipped !
entities
To allow access to characters which are not always available in the standard character set on a computer, and to allow access to reserved characters (like < and >, see above), entities are also allowed. An entity denotes character, and has to be given by name. Entities are preceded by an ampersand, and ended by a semicolon (e.g. &copy; for ©).
For a list of the possible entities, click here.

In HMTL documents, whitespace is mostly skipped. Line breaks which exist in the source document are translated to whitespace, and all whitespace is just rendered as one word spacing. This allows you to make the source document look good, without affecting the final rendering. However, this changes in the PRE element, which will display preformatted text, and maintains the organisation the source (see section Preformatted Text: PRE).

Document Structure

An HTML document is a tree of elements, including a head and body, headings, paragraphs, lists, etc.

Document Element: HTML

The HTML document element consists of a head and a body, much like a memo or a mail message. The head contains the title and optional elements. The body is a text flow consisting of paragraphs, lists, and other elements.

Head: HEAD

The head of an HTML document is an unordered collection of information about the document. For example:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HEAD>
<TITLE>Introduction to HTML</TITLE>
</HEAD>
...

Title: TITLE

Every HTML document must contain a TITLE element.

The title should identify the contents of the document in a global context. A short title, such as "Introduction" may be meaningless out of context. A title such as "Introduction to HTML Elements" is more appropriate. (Although the length of titles is not limited, long titles may truncated in some applications. To minimize this possibility, titles should be limited to less than 64 characters. The ProWesS reader has a maximum title length of 80 characters).

The ProWesS reader uses the title of a document both in the history list and as a label for the window displaying the document. This differs from headings (section Headings: H1 ... H6), which are typically displayed within the body text flow.

Body: BODY

The BODY element contains the text flow of the document, including headings, paragraphs, lists, etc.

For example:

<BODY>
<h1>Important Stuff</h1>
<p>Explanation about important stuff...
</BODY>

Headings: H1 ... H6

The six heading elements, H1 through H6, denote section headings. Although the order and occurrence of headings is not constrained by HTML, it is advised not to skip levels (for example, from H1 to H3), as converting such documents to other representations is often problematic.

Example of use:

<H1>This is a heading</H1>
Here is some text
<H2>Second level heading</H2>
Here is some more text.

Typical renderings are:

H1
Bold, very-large font, centered. One or two blank lines above and below.
H2
Bold, large font, flush-left. One or two blank lines above and below.
H3
Italic, large font, slightly indented from the left margin. One or two blank lines above and below.
H4
Bold, normal font, indented more than H3. One blank line above and below.
H5
Italic, normal font, indented as H4. One blank line above.
H6
Bold, indented same as normal text, more than H5. One blank line above.

Block Structuring Elements

Block structuring elements include paragraphs, lists, and block quotes. They must not contain heading elements, but they may contain phrase markup, and in some cases, they may be nested.

Paragraph: P

The P element indicates a paragraph. The exact indentation, leading space, etc. of a paragraph is not specified and may be a function of other tags, style sheets, etc.

Typically, paragraphs are surrounded by a vertical space of one line or half a line. The first line in a paragraph is indented in some cases.

Example of use:

<H1>This Heading Precedes the Paragraph</H1>
<P>This is the text of the first paragraph.
<P>This is the text of the second paragraph. Although you do not 
need to start paragraphs on new lines, maintaining this 
convention facilitates document maintenance.</P>
<P>This is the text of a third paragraph.</P>

Preformatted Text: PRE

The PRE element represents a character cell block of text and is suitable for text that has been formatted for a monospaced font.

Within preformatted text:

Example of use:

<PRE>
Line 1.
       Line 2 is to the right of line 1.     <a href="abc">abc</a>
       Line 3 aligns with line 2.            <a href="def">def</a>
</PRE>

Address: ADDRESS

The ADDRESS element contains such information as address, signature and authorship, often at the beginning or end of the body of a document.

Typically, the ADDRESS element is rendered in an italic typeface and may be indented.

Example of use:

<ADDRESS>
Newsletter editor<BR>
J.R. Brown<BR>
JimquickPost News, Jimquick, CT 01234<BR>
Tel (123) 456 7890
</ADDRESS>

Block Quote: BLOCKQUOTE or BQ

The BLOCKQUOTE element contains text quoted from another source.

A typical rendering might be a slight extra left and right indent, and/or italic font. The BLOCKQUOTE typically provides space above and below the quote.

Single-font rendition may reflect the quotation style of Internet mail by putting a vertical line of graphic characters, such as the greater than symbol (>), in the left margin.

The ProWesS reader allows you (conforming with HTML3) to shorten the BLOCKQUOTE tag to BG. Also, the rendition is exactly the same as the ADDRESS element.

Example of use:

I think the play ends
<BLOCKQUOTE>
<P>Soft you now, the fair Ophelia. Nymph, in thy orisons, be all 
my sins remembered.
</BLOCKQUOTE>
but I am not sure.

List Elements

HTML includes a number of list elements. They may be used in combination; for example, a OL may be nested in an LI element of a UL.

In compliance with HTML3, lists can be provided with a title which is rendered just before the actual list, typically in a somewhat larger font. A list should have at most one title, which should should be given before the list items.

The list header uses the LH element. For example :

<UL>
<LH>List header</LH>
<LI>List item
<LI>Another list item
</UL>

Unordered List: UL, LI

The UL represents a list of items -- typically rendered as a bulleted list.

The content of a UL element is a sequence of LI elements. For example:

<UL>
<LI>First list item
<LI>Second list item
 <p>second paragraph of second item
<LI>Third list item
</UL>

Ordered List: OL

The OL element represents an ordered list of items, sorted by sequence or order of importance. It is typically rendered as a numbered list.

The content of a OL element is a sequence of LI elements. For example:

<OL>
<LI>Click the Web button to open URI window.
<LI>Enter the URI number in the text field of the Open URI
window. The Web document you specified is displayed.
  <ol>
   <li>substep 1
   <li>substep 2
  </ol>
<LI>Click highlighted text to move from one link to another.
</OL>

Directory List: DIR

The DIR element is similar to the UL element. It represents a list of short items, typically up to 20 characters each. Items in a directory list may be arranged in columns, typically 24 characters wide.

The content of a DIR element is a sequence of LI elements. Nested block elements are not allowed in the content of DIR elements. For example:

<DIR>
<LI>A-H<LI>I-M
<LI>M-R<LI>S-Z
</DIR>

Menu List: MENU

The MENU element is a list of items with typically one line per item. The menu list style is typically more compact than the style of an unordered list.

The content of a MENU element is a sequence of LI elements. Nested block elements are not allowed in the content of MENU elements. For example:

<MENU>
<LI>First item in the list.
<LI>Second item in the list.
<LI>Third item in the list.
</MENU>

Definition List: DL, DT, DD

A definition list is a list of terms and corresponding definitions. Definition lists are typically formatted with the term flush-left and the definition, formatted paragraph style, indented after the term.

The content of a DL element is a sequence of DT elements and/or DD elements, usually in pairs. Multiple DT may be paired with a single DD element. Documents should not contain multiple consecutive DD elements.

Example of use:

<DL>
<DT>Term<DD>This is the definition of the first term.
<DT>Term<DD>This is the definition of the second term.
</DL>

If the DT term does not fit in the DT column (typically one third of the display area), it may be extended across the page with the DD section moved to the next line, or it may be wrapped onto successive lines of the left hand column.

Phrase Markup

Phrases may be marked up according to idiomatic usage, typographic appearance, or for use as hyperlink anchors.

User agents must render highlighted phrases distinctly from plain text. Additionally, EM content must be rendered as distinct from STRONG content, and B content must rendered as distinct from I content.

Phrase elements may be nested within the content of other phrase elements; however, HTML user agents may render nested phrase elements indistinctly from non-nested elements:

plain <B>bold <I>italic</I></B> may be rendered
the same as plain <B>bold </B><I>italic</I>

Idiomatic Elements

Phrases may be marked up to indicate certain idioms.

Citation: CITE

The CITE element is used to indicate the title of a book or other citation. It is typically rendered as italics. For example:

He just couldn't get enough of <cite>The Grapes of Wrath</cite>.

Code: CODE

The CODE element indicates an example of code, typically rendered in a mono-spaced font. The CODE element is intended for short words or phrases of code; the PRE block structuring element (section Preformatted Text: PRE) is more appropriate for multiple-line listings. For example:

The expression <code>x += 1</code>
is short for <code>x = x + 1</code>.

Emphasis: EM

The EM element indicates an emphasized phrase, typically rendered as italics. For example:

A singular subject <em>always</em> takes a singular verb.

Keyboard: KBD

The KBD element indicates text typed by a user, typically rendered in a mono-spaced font. This is commonly used in instruction manuals. For example:

Enter <kbd>FIND IT</kbd> to search the database.

Sample: SAMP

The SAMP element indicates a sequence of literal characters, typically rendered in a mono-spaced font. For example:

The only word containing the letters <samp>mt</samp> is dreamt.

Strong Emphasis: STRONG

The STRONG element indicates strong emphasis, typically rendered in bold. For example:

<strong>STOP</strong>, or I'll say "<strong>STOP</strong>" again!

Variable: VAR

The VAR element indicates a placeholder variable, typically rendered as italic. For example:

Type <SAMP>html-check <VAR>file</VAR> | more</SAMP>
to check <VAR>file</VAR> for markup errors.

Typographic Elements

Typographic elements are used to specify the format of marked text.

Typical renderings for idiomatic elements may vary between user agents. If a specific rendering is necessary -- for example, when referring to a specific text attribute as in "The italic parts are mandatory" -- a typographic element can be used to ensure that the intended typography is used where possible.

Bold: B

The B element indicates bold text. Where bold typography is unavailable, an alternative representation may be used.

Italic: I

The I element indicates italic text. Where italic typography is unavailable, an alternative representation may be used.

Teletype: TT

The TT element indicates teletype (monospaced )text. Where a teletype font is unavailable, an alternative representation may be used.

Anchor: A

The A element indicates a hyperlink anchor. At least one of the NAME and HREF attributes should be present. Attributes of the A element :

HREF
gives the URI of the head anchor of a hyperlink. The ProWesS reader can only access local documents, so this makes the URI quite limited. You can reference files by giving the filename as value. The file will be searched in the same directory as the current file. If you want, a position in the file can also be given. This position is given by name, just after the filename, separated by a hash (#). For local links, the filename should be omitted. For example :
<A HREF="myfile.html">external link</A>
<A HREF="#somewhere">local link</A>
<A HREF="myfile.html#somewhere">position in file</A>
All dots and (back)slashes in the filenames are translated to underscores by the ProWesS reader. This allows access of external HTML documents.
NAME
gives the name of the anchor, and makes it available as a head of a hyperlink.
REL
The REL attribute gives the relationship(s) described by the hyperlink. The value is a whitespace separated list of relationship names. The semantics of link relationships are not specified in this document.
The ProWesS reader will include referenced objects in printout if the REL=SUBDOCUMENT attribute/value pair if found (value is compared case independant).
REV
same as the REL attribute, but the semantics of the relationship are in the reverse direction. A link from A to B with REL="X" expresses the same relationship as a link from B to A with REV="X". An anchor may have both REL and REV attributes.
The ProWesS reader will include referenced objects in printout if the REV="toC" attribute/value pair if found (value is compared case independant).
PRINT
the ProWesS reader also supports an extra PRINT attribute value, which indicates that the referenced document should also be printed when the user requests hardcopy of the document.

Line Break: BR

The BR element specifies a line break between words. For example:

<P> Pease porridge hot<BR>
Pease porridge cold<BR>
Pease porridge in the pot<BR>
Nine days old.

Horizontal Rule: HR

The HR element is a divider between sections of text; typically a full width horizontal rule or equivalent graphic. For example:

<HR>

<ADDRESS>February 8, 1995, CERN</ADDRESS>
</BODY>

Image: IMG

The IMG element refers to an image or icon via a hyperlink.

HTML user agents may process the value of the ALT attribute as an alternative to processing the image resource indicated by the SRC attribute.

Attributes of the IMG element :

ALT
text to use in place of the referenced image resource, for example due to processing constraints or user preference.
SRC
specifies the URI of the image resource.
UNITS
Give the unit which is used in the value of the WIDTH and HEIGHT attributes. The possible values are PIXELS or EN. The default is pixels. An en is half the point size which is in use. The ProWesS reader also accepts CW as unit. One cw equals the current width of the column.
WIDTH
Specify the width for the image.
HEIGHT
Specify the height for the image.

In the ProWesS reader, all images are always displayed flush left on a separate line. To be able to display a picture, there has to be a PROforma picture type which can recognize and display that picture. When the size to display the picture is not given, then the picture will get a width of half the width of the area in which the document is displayed (as if you included WIDTH=.5 UNITS=CW. If you only specify either the width or the height of the picture, then the aspect ratio of the picture will be retained. When the UNITS=EN attribute is given, then one en is half the current fontsize. Because the ProWesS reader also has to be able to print HTML files properly, the picture size when given in pixels is approximated by using points (as if you are running at a resolution of 720 by 540).

Examples of use:

car1_com
<img src="win2_tmp_car1_com">
car2_com
<img src="win2_tmp_car2_com" height=10 units=en>

This document is mostly based on a part of the HTML 2.0 specification
PROGS, Professional & Graphical Software
last edited September 27, 1996