HTML is an application of SGML, the Standard General Markup Language. It has the shape of text which is enriched with extra markup. Two kinds of markup are possible :
In HMTL documents, whitespace is mostly skipped. Line breaks which exist in the source document are translated to whitespace, and all whitespace is just rendered as one word spacing. This allows you to make the source document look good, without affecting the final rendering. However, this changes in the PRE element, which will display preformatted text, and maintains the organisation the source (see section Preformatted Text: PRE).
An HTML document is a tree of elements, including a head and body, headings, paragraphs, lists, etc.
The HTML document element consists of a head and a body, much like a memo or a mail message. The head contains the title and optional elements. The body is a text flow consisting of paragraphs, lists, and other elements.
The head of an HTML document is an unordered collection of information about the document. For example:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HEAD> <TITLE>Introduction to HTML</TITLE> </HEAD> ...
Every HTML document must contain a TITLE element.
The title should identify the contents of the document in a global context. A short title, such as "Introduction" may be meaningless out of context. A title such as "Introduction to HTML Elements" is more appropriate. (Although the length of titles is not limited, long titles may truncated in some applications. To minimize this possibility, titles should be limited to less than 64 characters. The ProWesS reader has a maximum title length of 80 characters).
The ProWesS reader uses the title of a document both in the history list and as a label for the window displaying the document. This differs from headings (section Headings: H1 ... H6), which are typically displayed within the body text flow.
The BODY element contains the text flow of the document, including headings, paragraphs, lists, etc.
For example:
<BODY> <h1>Important Stuff</h1> <p>Explanation about important stuff... </BODY>
The six heading elements, H1 through H6, denote section headings. Although the order and occurrence of headings is not constrained by HTML, it is advised not to skip levels (for example, from H1 to H3), as converting such documents to other representations is often problematic.
Example of use:
<H1>This is a heading</H1> Here is some text <H2>Second level heading</H2> Here is some more text.
Typical renderings are:
Block structuring elements include paragraphs, lists, and block quotes. They must not contain heading elements, but they may contain phrase markup, and in some cases, they may be nested.
The P element indicates a paragraph. The exact indentation, leading space, etc. of a paragraph is not specified and may be a function of other tags, style sheets, etc.
Typically, paragraphs are surrounded by a vertical space of one line or half a line. The first line in a paragraph is indented in some cases.
Example of use:
<H1>This Heading Precedes the Paragraph</H1> <P>This is the text of the first paragraph. <P>This is the text of the second paragraph. Although you do not need to start paragraphs on new lines, maintaining this convention facilitates document maintenance.</P> <P>This is the text of a third paragraph.</P>
The PRE element represents a character cell block of text and is suitable for text that has been formatted for a monospaced font.
Within preformatted text:
Example of use:
<PRE> Line 1. Line 2 is to the right of line 1. <a href="abc">abc</a> Line 3 aligns with line 2. <a href="def">def</a> </PRE>
The ADDRESS element contains such information as address, signature and authorship, often at the beginning or end of the body of a document.
Typically, the ADDRESS element is rendered in an italic typeface and may be indented.
Example of use:
<ADDRESS> Newsletter editor<BR> J.R. Brown<BR> JimquickPost News, Jimquick, CT 01234<BR> Tel (123) 456 7890 </ADDRESS>
The BLOCKQUOTE element contains text quoted from another source.
A typical rendering might be a slight extra left and right indent, and/or italic font. The BLOCKQUOTE typically provides space above and below the quote.
Single-font rendition may reflect the quotation style of Internet mail by putting a vertical line of graphic characters, such as the greater than symbol (>), in the left margin.
The ProWesS reader allows you (conforming with HTML3) to shorten the BLOCKQUOTE tag to BG. Also, the rendition is exactly the same as the ADDRESS element.
Example of use:
I think the play ends <BLOCKQUOTE> <P>Soft you now, the fair Ophelia. Nymph, in thy orisons, be all my sins remembered. </BLOCKQUOTE> but I am not sure.
HTML includes a number of list elements. They may be used in combination; for example, a OL may be nested in an LI element of a UL.
In compliance with HTML3, lists can be provided with a title which is rendered just before the actual list, typically in a somewhat larger font. A list should have at most one title, which should should be given before the list items.
The list header uses the LH element. For example :
<UL> <LH>List header</LH> <LI>List item <LI>Another list item </UL>
The UL represents a list of items -- typically rendered as a bulleted list.
The content of a UL element is a sequence of LI elements. For example:
<UL> <LI>First list item <LI>Second list item <p>second paragraph of second item <LI>Third list item </UL>
The OL element represents an ordered list of items, sorted by sequence or order of importance. It is typically rendered as a numbered list.
The content of a OL element is a sequence of LI elements. For example:
<OL> <LI>Click the Web button to open URI window. <LI>Enter the URI number in the text field of the Open URI window. The Web document you specified is displayed. <ol> <li>substep 1 <li>substep 2 </ol> <LI>Click highlighted text to move from one link to another. </OL>
The DIR element is similar to the UL element. It represents a list of short items, typically up to 20 characters each. Items in a directory list may be arranged in columns, typically 24 characters wide.
The content of a DIR element is a sequence of LI elements. Nested block elements are not allowed in the content of DIR elements. For example:
<DIR> <LI>A-H<LI>I-M <LI>M-R<LI>S-Z </DIR>
The MENU element is a list of items with typically one line per item. The menu list style is typically more compact than the style of an unordered list.
The content of a MENU element is a sequence of LI elements. Nested block elements are not allowed in the content of MENU elements. For example:
<MENU> <LI>First item in the list. <LI>Second item in the list. <LI>Third item in the list. </MENU>
A definition list is a list of terms and corresponding definitions. Definition lists are typically formatted with the term flush-left and the definition, formatted paragraph style, indented after the term.
The content of a DL element is a sequence of DT elements and/or DD elements, usually in pairs. Multiple DT may be paired with a single DD element. Documents should not contain multiple consecutive DD elements.
Example of use:
<DL> <DT>Term<DD>This is the definition of the first term. <DT>Term<DD>This is the definition of the second term. </DL>
If the DT term does not fit in the DT column (typically one third of the display area), it may be extended across the page with the DD section moved to the next line, or it may be wrapped onto successive lines of the left hand column.
Phrases may be marked up according to idiomatic usage, typographic appearance, or for use as hyperlink anchors.
User agents must render highlighted phrases distinctly from plain text. Additionally, EM content must be rendered as distinct from STRONG content, and B content must rendered as distinct from I content.
Phrase elements may be nested within the content of other phrase elements; however, HTML user agents may render nested phrase elements indistinctly from non-nested elements:
plain <B>bold <I>italic</I></B> may be rendered the same as plain <B>bold </B><I>italic</I>
Phrases may be marked up to indicate certain idioms.
The CITE element is used to indicate the title of a book or other citation. It is typically rendered as italics. For example:
He just couldn't get enough of <cite>The Grapes of Wrath</cite>.
The CODE element indicates an example of code, typically rendered in a mono-spaced font. The CODE element is intended for short words or phrases of code; the PRE block structuring element (section Preformatted Text: PRE) is more appropriate for multiple-line listings. For example:
The expression <code>x += 1</code> is short for <code>x = x + 1</code>.
The EM element indicates an emphasized phrase, typically rendered as italics. For example:
A singular subject <em>always</em> takes a singular verb.
The KBD element indicates text typed by a user, typically rendered in a mono-spaced font. This is commonly used in instruction manuals. For example:
Enter <kbd>FIND IT</kbd> to search the database.
The SAMP element indicates a sequence of literal characters, typically rendered in a mono-spaced font. For example:
The only word containing the letters <samp>mt</samp> is dreamt.
The STRONG element indicates strong emphasis, typically rendered in bold. For example:
<strong>STOP</strong>, or I'll say "<strong>STOP</strong>" again!
The VAR element indicates a placeholder variable, typically rendered as italic. For example:
Type <SAMP>html-check <VAR>file</VAR> | more</SAMP> to check <VAR>file</VAR> for markup errors.
Typographic elements are used to specify the format of marked text.
Typical renderings for idiomatic elements may vary between user agents. If a specific rendering is necessary -- for example, when referring to a specific text attribute as in "The italic parts are mandatory" -- a typographic element can be used to ensure that the intended typography is used where possible.
The B element indicates bold text. Where bold typography is unavailable, an alternative representation may be used.
The I element indicates italic text. Where italic typography is unavailable, an alternative representation may be used.
The TT element indicates teletype (monospaced )text. Where a teletype font is unavailable, an alternative representation may be used.
The A element indicates a hyperlink anchor. At least one of the NAME and HREF attributes should be present. Attributes of the A element :
<A HREF="myfile.html">external link</A> <A HREF="#somewhere">local link</A> <A HREF="myfile.html#somewhere">position in file</A>All dots and (back)slashes in the filenames are translated to underscores by the ProWesS reader. This allows access of external HTML documents.
The BR element specifies a line break between words. For example:
<P> Pease porridge hot<BR> Pease porridge cold<BR> Pease porridge in the pot<BR> Nine days old.
The HR element is a divider between sections of text; typically a full width horizontal rule or equivalent graphic. For example:
<HR> <ADDRESS>February 8, 1995, CERN</ADDRESS> </BODY>
The IMG element refers to an image or icon via a hyperlink.
HTML user agents may process the value of the ALT attribute as an alternative to processing the image resource indicated by the SRC attribute.
Attributes of the IMG element :
In the ProWesS reader, all images are always displayed flush left on a separate line. To be able to display a picture, there has to be a PROforma picture type which can recognize and display that picture. When the size to display the picture is not given, then the picture will get a width of half the width of the area in which the document is displayed (as if you included WIDTH=.5 UNITS=CW. If you only specify either the width or the height of the picture, then the aspect ratio of the picture will be retained. When the UNITS=EN attribute is given, then one en is half the current fontsize. Because the ProWesS reader also has to be able to print HTML files properly, the picture size when given in pixels is approximated by using points (as if you are running at a resolution of 720 by 540).
Examples of use:
car1_com <img src="win2_tmp_car1_com"> car2_com <img src="win2_tmp_car2_com" height=10 units=en>