_____________________________
WRITING HTML
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
A "quick and dirty" guide
to creating a HTML document
by: M†rten Lindstr”m
HTML really IS very simple, which may not be immediately evident when
looking at the HTML specification documents. In this presentation I will
skip over most of the theory and furthermore limit it to (most of the)
HTML 2.0 features.
If there is an interest for it I could perhaps write a further article,
more in-depth with full coverage of both HTML 2.0 and 3.2.
HOW TO TURN PLAIN TEXT INTO HTML
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
Just take an existing plain ("ascii") text and do the following:
1) Replace all occurrences of & with &
including the semicolon.
2) Now, in the same way, replace every < with <
and perhaps, to be sure, also > with >
the second line is unnecessary, so it can be removed.
3) Insert a "TITLE" (to be used for window caption by browser) at the
start of the text:
Some text for caption of document window
This also is what a browser will use to determine that the text at all
IS a HTML document.
4) Insert before each and every PARAGRAPH of your text.
Remember that the browser will IGNORE ALL NEWLINES in your source
(instead formatting the text according to the current window width) and
will split your text into paragraphs based solely on these
tags.
5) If you have used any Atari-specific characters (the ones in the second
half of the character set - including British pound sign and "non-
English" letters) then you must also convert these into the "ANSI" (aka
ISO 8859-1 aka Latin-1) character set. For instance using my ANSIFIER
program in Ictari 39.
Done!
(Now inspect your text with CAB or HTML-Browser!)
Any newlines and extra spaces (above one between each word) will be ignored
by the browser, so you are free to insert as many as you like, to improve
the readability of the plain source text.
Note on start tags and end tags: Most types of elements, like the title,
need BOTH start and end tag (
and ) while a few, like
paragraphs, don't. (It is enough to start each paragraph with though
you optionally also COULD end it with
.)
There are even some elements that NEVER have an end tag, simply because
they don't contain any document text - see
and
below.
Furthermore, in clean HTML, elements can be contained within each other but
should never overlap. For instance, in order to use both bold AND italics
style on some text you could write:
some text An i element cleanly within b
or some text A b element cleanly within i
but the following versions, on the other hand
some text
and some text
are not clean HTML, although most browsers might understand them anyway.
REFINEMENTS
ÿÿÿÿÿÿÿÿÿÿÿ
Pre-formatted text
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
Instead of the tags, preceding every paragraph, you could have merely
preceded the whole text - after the title element - with a
tag and
succeeded it with
. This would suppress automatic word-wrapping,
causing the browser to preserve all spaces and newlines literally and use a
monospaced font for the text. I.e. behaving essentially like the familiar
old ascii text viewer.
More typically, you would use and
tags only around selected
parts of the text, such as program listings.
Headings
ÿÿÿÿÿÿÿÿ
To turn a paragraph into a heading, just remove the before it and
instead enclose it in
and
, thus:
Some heading in your document
With H2 instead of H1 you will get a smaller heading, H3 results in an even
smaller heading, down to H6 for the smallest possible heading.
(A recommendation is to not skip heading levels, i.e. after a H1 don't go
down to H3 before you have used H2.)
Lists
ÿÿÿÿÿ
A list, bulleted or numbered, can be written thus:
In browser this will appear as
- Text for first item ù Text for first item
- second item ù second item
- third ... etc. ù third ... etc.
or
In browser this will appear as
- Text for first item 1. Text for first item
- second item 2. second item
- third ... etc. 3. third ... etc.
UL stands for Unordered (i.e. bulleted) List,
OL stands for Ordered (i.e. numbered) List.
Each LI (List Item) element could also contain multiple paragraphs or even
sub-lists (but not headings).
The indentation I have used on the elements is of course purely for
readability of the source text and won't affect how the browser displays
them (they will typically be displayed indented anyway).
Horizontal Rules
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
Just insert
where you want a horizontal division line in the text. In
monochrome it will simply be a thin black line, while in colour most
browsers make it appear as a three-dimensional groove (achieved by using
two colours for it: top= dark gray or black, bottom = white; the text back-
ground being not white but light gray).
Images
ÿÿÿÿÿÿ
An image can be inserted anywhere in the text flow with:
For really GOOD HTML, you should also add, within the IMG tag, an extra
attribute: ALT="Text that is displayed if image not shown".
Note: ALT="" is entirely appropriate to use with pure adornment images. The
ALT text should _NOT_ be a picture DESCRIPTION but an ALTERNATIVE.
GIF is the most widely recognized picture file format, while JPEG is
understood by most newer browsers (this SHOULD include CAB (?)). Only now,
in October this year (96), was PNG formally adopted by W3C (the World Wide
Web Consortium), but it will probably replace GIF eventually.
Hyperlinks
ÿÿÿÿÿÿÿÿÿÿ
Any image or piece of text could also be made into a hyperlink by enclosing
it in and
For instance:
This is a clickable link
A link doesn't necessarily have to lead to another HTML file. You could
make links to ANY kind of file, though the browser may not be able to
display it, of course. Plain ("ascii") text files as well as (GIF) images
normally ARE displayed directly by the browser, others may be passed by the
browser to some other program (if a protocol for this has been
established).
Note: When CAB displays a plain text file it treats characters 160-255 as
ANSI (like in a HTML file) rather than Atari. Not the ideal behaviour for
an Atari browser I would say.
-----
More generally, an A element ("Anchor") can be jumped both TO and FROM,
making it possible to jump not only between different files but within a
HTML file. For an anchor to serve as a starting point, a HREF attribute
must be present, as above; An anchor serving as a destination must have a
NAME attribute, for instance:
To enable a link to this anchor, some other anchor, in the same document,
could be written as:
See conclusions
Note the '#' character. Links could also be made from other documents by
preceding the '#' with a relative pathname, e.g.:
See conclusions
Text Styles in HTML
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
Enclosing text with and will render it in BOLD type; similarly
and for italics and and for a monospaced (TeleType) font.
However, this TYPOGRAPHIC markup is slightly out of line with the rest of
HTML (at least until anomalies like the FONT element of HTML 3 appeared -
that should become obsolete with the expected addition of STYLE SHEETS).
HTML mainly tries to concentrate on the LOGICAL purpose of the text. And so
there is an alternative logical or "idiomatic" markup system:
... for emphasized italics
... for strongly emphasized bold
... for book titles etc. italics
... for variables (in syntax descriptions) italics
...
for some code element monospaced
... for text typed by user (in eg manuals) monospaced
... for some sample of literal characters monospaced
In the last column I have listed how browsers typically display these
elements, and, of course, the styles overlap both with each other and with
the typographic markup. So what's the point of all this?
Answers:
1) Logical markup allows the browsing software (and human if the software
allows it) to CHOOSE how each element type is to be displayed - for
instance using different colours.
2) It may simplify automatic processing of the text, e.g. by indexers or
text analysers.
Still, it should be said that many or most people will probably never use
anything but the typographic markup, because it reminds them of the secure
and old-fashioned word-processor they are used to (plus that the
typographic tags admittedly are a little shorter than the idiomatic ones).
Typographic markup is of course also what programs automatically converting
from word-processor file formats will always have to use.
Comments
ÿÿÿÿÿÿÿÿ
COMMENTS (ignored by the browser) can be inserted anywhere in a HTML
document enclosed in For example:
PATHS FOR IMAGES AND LINKS
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
Just about any familiar old relative DOS path and filename is acceptable in
the
and tags, EXCEPT that FORWARD slash characters (/) should be
used instead of the DOS backslashes (\) for separator. Browsers on Atari
and PC will probably understand backslashes too, but e.g. a Unix browser
may be more pleased to see forward slashes.
You should probably also try to use UPPERCASE letters only, for your path
and file names, since this is how names on files and folders are stored by
(GEM)DOS. Even though TOS/DOS/Windows are case insensitive, Unix isn't.
Above remarks are for the event that you transfer your files to Unix or
something, plus you might as well learn proper HTML (actually URLs = "Web
paths") from the start.
It is quite OK to use even the familiar old DOS double-dot ".." for moving
up one folder level. For instance ../INDEX.HTML
Paths are counted from where the current HTML document is located.
PROPOSITION FOR ICTARI ARTICLES
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
May I here make the suggestion that pictures and sub-documents of any
Ictari article in HTML format be normally placed in a folder with the name
of the HTML article but with the extension .SUB (or .PIX) instead of .HTM
For instance a document
ARTICLE.HTM
would have its sub-documents (and pictures (?)) in the folder
ARTICLE.SUB
This would tidy the disk directories so that the main .HTM file would
always easily be found. And, regardless of how many pictures and sub-
documents referred to by it, there would in most cases only be two items -
the HTM file and a SUB folder - to deal with during disk operations such as
move or copy.
In order to convert an existing HTML document into this format you will
need to search it for occurrences of the SRC attribute (in IMG tags) and
HREF attribute (in A tags) and change the given paths appropriately.