Firebird Documentation IndexFirebird Docwriting Guide → DocBook XML – an introduction
Firebird Home Firebird Home Prev: Preparing to write: make an outline!Firebird Documentation IndexUp: Firebird Docwriting GuideNext: DocBook XML authoring tools

DocBook XML – an introduction

A very general XML primer
A DocBook XML primer

The chosen format for the documentation in the Firebird manual module is DocBook XML. For those of you who are not familiar with XML and/or DocBook, short introductions to XML in general and DocBook XML in particular follow. Be aware that these introductions give a grossly oversimplified picture. But that's just fine: you don't have to be a DocBook XML expert in order to write Firebird docs. You only need some basic knowledge – which you can pick up in half an hour from the paragraphs below – and a little experience in applying DocBook XML tags to your texts (which you will gain soon enough once you start writing).

Skip the general XML primer if you know all about XML elements, tags, attributes, rendering, and multichannel publishing.

Skip both primers if you're also an experienced DocBook author.

Note

While we strongly ask that you at least try to deliver your work in DocBook format, we also realise that some people just won't have the time to master it (or to convert their existing docs to DocBook). If this applies to you, please talk about it on the firebird-docs list. We surely don't want to refuse useful documentation just because it's not in the right format.

A very general XML primer

XML stands for Extensible Markup Language, which is, simply put, plain text with markup tags. A typical XML text fragment may look like this:

<paragraph>
<loud>'No!'</loud> she screamed. <scary>But the bloody hand
<italics>kept on creeping</italics> towards her.</scary>
<picture file="bloody_hand.png"/>
</paragraph>

Tags and attributes

In the example given above, the words and phrases enclosed in angle brackets are the markup tags. <italics> is a start tag, </italics> is an end tag, and <picture file="bloody_hand.png"/> is a standalone tag, officially termed empty-element tag. XML tags are always formatted like this:

Table 1. Format of XML tags

Tag type Starts with Ends with
Start tag < >
End tag </ >
Empty-element tag < />


Still referring to our example, the words paragraph, loud, scary, italics and picture are tag names. In the <picture.../> tag, file="bloody_hand.png" is called an attribute, with file the attribute name and bloody_hand.png the attribute value. Attribute values must always be quoted; both single and double quotes are allowed.

XML allows you to define any tags you like, as long as you build them correctly. So <thistag>, <thattag>, and <this_is_not_a_tag/> are all well-formed XML tags. (XML that follows the standard is called well-formed; the term valid is only used for specifically defined implementations – DocBook XML, for instance.)

Clearly the tags themselves are not meant to appear in the final document (that is, the document as it is presented to the readers). Rather, they contain instructions that affect its appearance. XML, when used for writing documentation, is a typical source format, intended to be processed by software to produce nicely formatted output documents. This processing is usually called rendering.

Some tags are unmistakably makeup instructions:

<italics>kept on creeping</italics>

means of course that the words kept on creeping must be displayed or printed in italics. However,

<loud>'No!'</loud>

is a little less obvious. Should the word No! appear in boldface? Or underlined? Or again in italics? Or maybe this text is going to be read out aloud by a speech synthesizer, and the <loud> tag instructs it to raise its voice? All these things are possible, and what's more: often a single XML source document is converted into several different output formats – say, a PDF document, an HTML web page, and a sound file. This is called multichannel publishing. With multichannel publishing, <loud> may be translated to boldface for the PDF document; to a bold, red-colored font for the web page; and to a 50% volume increase for the synthesizer.

Looking at the other tags, <picture.../> is obviously an instruction to insert the image bloody_hand.png into the document, and <scary>, well... this is even less clear then <loud>. Maybe the phrase between the <scary> tags has to drip with blood. Maybe frightening music must be played here. It all depends on the people who defined the tags, and the software they use to do the rendering.

The <paragraph> tag, finally, is a structural tag. It tells us something about the place that the lines have within the document's internal hierarchy. In the final document, paragraphs may or may not be separated by empty lines. Again, that depends on the rendering software and possibly also on user-configurable options. Other structural tags one might think of are e.g. <chapter>, <section>, and <subdocument>.

Special characters and Entities

Because the character “<” has a special meaning as the start of a tag, you can't include it directly as a literal value. Instead, if you want your readers to see an opening angle bracket, you type this:

&lt;

That's an ampersand, followed by the letters l and t (for less than), followed by a semicolon. You can also use &gt; (greater than) for the closing angle bracket “>”, but you don't have to.

XML has lots of codes like this; they are called entities. Some represent characters, like &lt; and &auml; (lower a with umlaut) and some serve totally different purposes. But they all start with an ampersand and end with a semicolon.

But wait a minute... if an ampersand marks the start of an entity, how do you include a literal ampersand in your text? Well, there's an entity for that too:

&amp;

So this line of XML:

Kernigan &amp; Ritchie chose '&lt;' as the less-than operator for C.

will wind up in the final documents as:

Kernigan & Ritchie chose '<' as the less-than operator for C.

And here's some good news: if you use a dedicated XML editor to author your document, you can probably just type “<” and “&” anywhere you want to use them as literals. The editor will make sure that they end up as &lt; and &amp; in the XML as it is saved to disk. You'll find pointers to some XML/DocBook editors later in this guide.

Elements

There's one more important XML concept you need to know about: the element. An element is the combination of a start tag, a matching end tag, and everything in between. This “everything in between” is called the element's content, and it may include other elements. Elements are named after their tags, so we can talk about paragraph elements, italics elements etc.

Note

Actually, elements are a more basic concept than tags: tags just happen to be the things that identify the elements. So it would be better to say that tags are named after their elements. But because tags are easier to recognize than entire elements, I thought I'd introduce you to them first.

This is an element:

<loud>'No!'</loud>

This is also an element:

<paragraph>This is an element containing <bold>another</bold> 
  element!</paragraph>

Empty-element tags constitute an element all by themselves. These elements can have no content of course, because they don't have a pair of tags:

<picture file="bloody_hand.png"/>

Important

Don't confuse content with attributes. Content lives between tags, attributes within tags. The empty element in the last example has an attribute, but no content.

I'm stressing the element concept here because most documentation tends to speak of “chapter elements”, “title elements” etc. rather than “chapter tags” and “title tags”. The terms are often used interchangeably, but there are cases where it's important to know the difference.

XML Conclusion

Good – that's about all you need to know about XML. By now you should have a general idea of what an XML text looks like, what tags and elements are, and what they are for. As said earlier, the picture is oversimplified but it's good enough for our purposes.

It should also be understood that just writing away in plain, self-invented XML is pretty pointless unless you have processing software that understands your tags. How else are you going to turn your XML source into a nicely formatted, presentable document?

Fortunately, we don't have to worry about developing our own element definitions and conversion software. There are a number of formalized XML types available, each defining a set of tags and, equally important, a set of rules on how to use them. DocBook XML is one of those types.

A DocBook XML primer

DocBook was designed to facilitate the writing of structured documents using SGML or XML (but don't worry about SGML – we use the XML strain). It is particularly fit for writing technical books and articles, especially on computer-related subjects. DocBook XML is defined in its Document Type Definition or DTD: a set of definitions and rules describing exactly how a valid DocBook document is structured. DocBook is rapidly becoming a de facto standard for computer-technical documents, and it is supported by a growing number of tools and applications.

DocBook XML Characteristics

Important characteristics of DocBook – as opposed to “general” XML – are:

  • The DocBook DTD defines a limited number of tags, and it gives exact rules on how to use them: what attributes are possible for a tag A, whether element B can be nested within element C, and so on. If you use undefined tags, or if you don't follow the rules, your document isn't DocBook anymore (and DocBook-supporting processing tools may break on it).

  • DocBook tags always convey structure and semantics (meaning), never makeup. In DocBook, you'll find structural tags like <book>, <part>, <chapter>, <section>, <para>, <table>; and semantic tags like <filename>, <warning,> <emphasis>, <postcode>; but nothing like <font>, <bold>, <center>, <indent>, <backgroundcolor> – nothing that has to do with layout or makeup.

  • Because of this, a decision has to be taken somewhere on how the DocBook tags are translated into presentational makeup. This decision (or rather: the rendering rules) can be hardcoded in the tools but that would make things very inflexible. That's why the rules are mostly defined in stylesheets. A stylesheet is a document that tells the tool stuff like:

    Print chapter titles in a 24-point black font; start each chapter on a new page; use italics for emphasis; render warnings in a bold, 12-point red font; use smallcaps for acronyms; etc. etc.

    This approach enables the user to alter the stylesheets if he or she doesn't like the appearance of the final document. It would be a lot harder – if not impossible – to alter the tools themselves.

    Note

    Stylesheets that are used to convert DocBook XML to other formats are called transformation stylesheets. They are written in yet another type of XML, called XSLT (eXtensible Stylesheet Language for Transformations).

Benefits of DocBook XML

DocBook has a lot of advantages for anybody writing technical documentation. These are the most important ones for us:

  • A DocBook XML document consists of pure, unpolluted, content. You never have to worry about the presentational side of things while writing your doc; you can concentrate on structure and informational content. This practice may at first feel a little odd if you're used to writing text in e.g. Word, but I promise you: you'll soon get to love it.

  • Because DocBook is all about structure and meaning, it will be surprisingly easy to transform your outline into a DocBook skeleton.

  • Many people produce docs for the manual module. If they all used different formats, or even one single format like Word or HTML, their works would look very different because every contributor would make his or her own makeup decisions. Of course we could develop a set of makeup rules, but then every docwriter would have to be aware of those rules, and take care to apply them all the time. Nah... better put the rules in one central place: the stylesheets, and let the docmakers worry about documentation, not presentation. The stylesheets will ensure that all our documentation has the same look-and-feel.

  • If we don't like the makeup of our documents, we can easily change it if the makeup rules are in a stylesheet. Nothing needs to be altered in the DocBook source documents; all we have to do, after changing the stylesheets, is re-render the docs. Newly developed docs will automatically get the new look. Try to achieve that if the makeup instructions are scattered all over the documents themselves!

  • Another advantage is that DocBook is an open standard, not tied to any commercial application or even a particular OS. If you download the Firebird manual module, you can build the HTML and PDF docs from the DocBook source both under Linux and under Windows – and we can add support for more OS's if need be.

  • A DocBook document is pure text, which is ideal for use in git. Yes, a git repository can also contain binary files, but many useful features that git offers (showing the difference between two versions of a file, for instance) only work with text files.

Admittedly, none of these benefits is unique to DocBook. But DocBook has them all, and it's widely supported. That makes it the perfect choice for our Firebird documentation.

DocBook documentation on the Internet

Here are some links in case you want to find out more about DocBook:

  • http://opensource.bureau-cornavin.com/crash-course/

    Writing Documentation Using DocBook – A Crash Course by David Rugge, Mark Galassi and Eric Bischoff. A very nice tutorial, even though most of the tools discussed are not the ones we use.

  • http://docbook.org/tdg/en/

    DocBook – The Definitive Guide, by Norman Walsh and Leonard Muellner. Don't expect it to be a beginner-friendly tutorial – in fact, the first part is quite intimidating if you're a DocBook newbie. The reason I mention it here is its great online element reference, which I often consult while I'm writing.

  • http://www.tldp.org/HOWTO/DocBook-Demystification-HOWTO/

    The DocBook Demystification Howto is interesting if you want to know a little more about XML and DocBook than what we've told you here. It also contains quite a lot of material on SGML, and – again – on tools we don't use for the Firebird documentation subproject.

  • http://sourceforge.net/projects/docbook

    The DocBook open source project at SourceForge.

If you know of some other great online resource, please let use know by posting a message to the firebird-docs list.

Prev: Preparing to write: make an outline!Firebird Documentation IndexUp: Firebird Docwriting GuideNext: DocBook XML authoring tools
Firebird Documentation IndexFirebird Docwriting Guide → DocBook XML – an introduction