Firebird Documentation Index → Firebird Docbuilding Howto → Advanced: Improving the PDF |
Some of the topics covered in this section apply to the ant build and old versions of Apache FOP, they may no longer apply when using the Gradle-build, or the files may have a different path in the Gradle-build. When in doubt, ask for help on the firebird-docs mailing list.
Due to limitations in our build tools, the PDF output may suffer from some irritating defects, such as:
Widowed headers and titles (appearing at the bottom of the page, with the corresponding text block starting on the next page).
Page breaks at awkward positions in tables or lists.
Overly wide horizontal justification spaces.
Squeezed, truncated or otherwise messed-up page-sized content. This is a new feature, introduced with FOP 0.93.
This part shows you how to deal with these problems, should the need arise.
First you have to understand how the PDF is built. Contrary to the HTML generation, this is a two-step process:
The DocBook XML source is converted to a Formatting Objects (FO) file. FO – formally
called XSL-FO – is also an XML format, but unlike DocBook it's
presentation-oriented. This step is performed by a so-called XSL
transformer called Saxon. The output goes into
firebird-documentation/inter/
.
filename
.fo
Another tool, Apache FOP (Formatting Objects
Processor), then picks up
and converts it to
filename
.fo
, which is stored in filename
.pdffirebird-documentation/dist/pdf
.
If you give a build pdf command, two consecutive build targets are called internally: fo and fo2pdf, corresponding to the two steps described above. But you can also call them from the command line. For instance,
build fo -Did=qsg15
...transforms the 1.5 Quick Start Guide source to firebird-documentation/inter/qsq15.fo
. And
build fo2pdf -Did=qsg15
...produces the PDF from the FO file (which must of course be present for this step to succeed).
In fact, build pdf is just a shortcut for build fo followed by build fo2pdf.
This setup allows us to edit the FO file manually before generating the final PDF. And that's exactly what we're going to do to fix some of those nasty problems that can spoil our PDFs.
The general procedure for improving the PDF output by editing the FO file is:
Build the PDF once as usual with build pdf
[arguments]
.
Start reading the PDF and find the first trouble spot.
Open the FO file in an XML or text editor.
Find the location in the FO file that corresponds to the trouble spot in the PDF (we'll show you how later).
Edit the FO file to fix the problem (we'll show you how later), and save it.
Rebuild the PDF, but this time use build fo2pdf
[arguments]
. If you don't, you'll overwrite the
changes you've just made to the FO file, get the same PDF as first, and have to start all
over again.
Check if the problem is really solved and if so, find the next trouble spot in the PDF.
Repeat steps 4–7 until you've worked your way through the entire PDF.
Although this FO-editing approach suggests that the problem lies in the FO file, this is not the case. The FO file is all right, but Apache FOP doesn't support all the nice features in the XSL-FO specification (yet). With our manual editing, we force the PDF in a certain direction.
It is important to fix the problems in document order. Editing the FO in one spot may lead to vertical adjustments at the corresponding spot in the PDF: more lines, less lines, lines moving to the following page, etc... These adjustments may affect everything that comes after it.
For the same reason, you should always look for the next problem after you have fixed the previous one. For instance, don't make a list of all widowed headers in the PDF and then start fixing them all in the FO file. Fixing a widowed header moves all the text below it downward, possibly creating new widowed headers and un-widowing others.
In general, you can keep the FO file open throughout the process. Just don't forget to save your changes before you rebuild the PDF. You must close the PDF before every rebuild though: once it's opened in Adobe (even in Adobe Reader), other processes can't write to it.
The entire process can be pretty time-consuming, so don't try to fix every tiny little imperfection, especially if you're a beginning FO hacker. In general, only the widowed headers are really ugly and make the document look very unprofessional. Fortunately, they have become very rare since we've moved to FOP 0.93.
The next section deals with the various problems and how to solve them.
Problem: Headers or titles at the bottom of the page.
Cause: Apache FOP doesn't support the keep-with-next
attribute everywhere.
Since we've upgraded to Apache FOP 0.93, this problem – which used to be our biggest annoyance – has become extremely rare. Yet it may still occur under some circumstances. Or, more in general, there may be a page break you find awkward, e.g. after a line that announces what's to come and ends with a colon. This section helps you solve such cases.
Note that the example used here – a widowed section header – shouldn't occur
anymore, but it's still usable to demonstrate the steps you have to take, especially for
elements with an id
attribute.
Solution: Force a page break at the start of the element (often a list, list item or table) that the title or header belongs to.
How: If the element has an id
attribute (you can see this in the DocBook source), do a
search on the id in the FO file. For example, suppose that you've just built the
Firebird 2 Quick Start Guide and you find that the title
Creating a database using isql is positioned at the bottom of a page.
In the DocBook XML source you can see that this is the title of a section whose id is
qsg2-databases-creating
. If you search on
qsg2-databases-creating
from the top of the file, your first hit will
probably look like this:
<fo:bookmark starting-state="hide"
internal-destination="qsg2-databases-creating">
The fo:bookmark
elements correspond to the links in
the navigation frame on the left side of the PDF. So this is not yet the section itself;
you'll have to look further. Next find:
<fo:block text-align-last="justify" end-indent="24pt"
last-line-end-indent="-24pt"><fo:inline
keep-with-next.within-line="always"><fo:basic-link
internal-destination="qsg2-databases-creating">Creating a database...
Here, the id is an attribute value in a fo:basic-link
. We're in the Table of Contents now. Still not
there.
The third and fourth finds are often a couple of lines below the second; they serve to create a link from the page number citation in the ToC. But the fifth is usually the one we're looking for (unless there are any more forward links to the section in question):
<fo:block id="qsg2-databases-creating">
That's it! Most mid- and low-level hierarchical elements in DocBook (preface
, section
, appendix
, para
etc.) wind up as
a fo:block
in the FO file. Now we have to tell Apache FOP
that it must start this section on a new page. Edit the line like this:
<fo:block id="qsg2-databases-creating" break-before="page">
Save the change and rebuild the PDF (remember: use build fo2pdf, not build pdf). The section title will now appear at the top of the following page, and you can move on to the next problem.
What if the element has no DocBook id? You'll have to search on (part of) the
title/header then. This is a bit trickier, because the title may contain a line break in
the FO file, in which case it won't be found. Or the title element has one or more
children of its own (e.g. quote
or emphasis
). This too will keep you from finding it if you search
on the full title. On the other hand: the more you shrink the search term, the higher the
probability that you will get a number of unrelated hits. You'll have to use your own
judgement here; if there is some characteristic text shortly before or after the title you
can also search on that, and try to locate the title in the lines above and below
it.
No matter how, once you've found the title, go upward in the FO file until you find the beginning of the section – often identifiable by the auto-generated FO id:
</fo:block> <fo:block id="d0e2340"> <fo:block> <fo:block> <fo:block keep-together="always" margin-left="0pc" font-family="sans-serif,Symbol,ZapfDingbats"> <fo:block keep-with-next.within-column="always"> <fo:block font-family="sans-serif" font-weight="bold" keep-with-next.within-column="always" space-before.minimum="0.8em" space-before.optimum... space-before.maximum="1.2em" color="#404090" hyph... text-align="start"> <fo:block font-size="11pt" font-style="italic" space-before.minimum="0.88em" space-before.opti... space-before.maximum="1.32em">The DISTINCT keyword comes to the rescue!</fo:block>
As you see, there may be quite a number of lines between the section start and the title text. Notice, by the way, how the title is split over two lines here.
Once you've found the fo:block
that corresponds
to the section start, give it a break-before="page"
attribute just like we did before.
Why look for the section start and not apply the break-before
attribute to the fo:block
immediately enclosing the title? Well, doing the latter
will print the title on the next page all right, but links from the Outline and the ToC
will point to the previous page, because the “invisible” section start – the
block tag bearing the ID – lies before the page break.
As said, the widowed header problem shouldn't occur anymore with sections, but it might still happen to some other objects like tables, figures etc. for which the stylesheets generate ids if you haven't assigned them yourself. In all those cases you can use the approach described above.
There are also numerous DocBook elements – in fact, the majority – for which the
stylesheets don't generate ids. Examples are para
,
informaltable
, the various list types, etc. In those
cases, once you have located the text fragment in the FO file, simply apply the break-before
attribute to the nearest enclosing fo:block
.
Problem: Table rows or list items split across page boundaries. (DocBook lists are implemented as fo:tables.)
Cause: Nothing in particular – there's no rule that forbids page breaks to occur within table rows.
Solution: If you want to keep the row together, insert a hard page break at the start of the row.
How: Find the row by searching on text at the beginning of the
row or at the end of the previous row. The element you're looking for is a fo:table-row
, but don't use that for a search term, because many
DocBook elements (not only <table>
s) are implemented using
fo:table
s and thus contain fo:table-row
s.
Once the start of the split row is found, add a break-before
attribute like you did with widowed headers:
<fo:table-row break-before="page">
Alternatively, you can give the previous row a break-after
attribute.
Problem: Very large horizontal justification spaces on lines above a long spaceless string. These large strings are often printed in monospaced (fixed-width) font:
Cause: Apache FOP often doesn't hyphenate these strings. Therefore, if the string doesn't fit on the line it must be moved to the next line as a whole. This leaves the previous line with “too little” text, making large justification spaces necessary. Note that in the example above, the large spaces on the top line are caused by the string on the line below, not by the one on the line itself.
Solution: You may have good reasons to leave the string unbroken. In that case, accept the wide spaces as a consequence. Otherwise, insert a space (or hyphen-plus-space) at the point where the string should be broken.
How: First find the string in the FO file by searching on (part
of) its contents. If it's monospaced in the PDF, you'll almost always find it within a
fo:inline
element. Then look at the PDF and estimate how
much of the as yet unbroken string would fit in the large whitespace on the line above. Back
in the FO file, insert a space – possibly preceded by a hyphen – in the string at a location
where it's acceptable to break it. Rebuild the PDF (build fo2pdf !) and
check the result. If you've broken the string too far to the right, it will still be
entirely on the next line. Too far to the left and the whitespace may still be too wide to
your liking. Adjust and rebuild until you're satisfied.
One surprise you may get during this job is that, once you've broken the string in one place, Apache FOP suddenly decides that it's OK to hyphenate the rest of the string. This will leave you with a part of the string on the first line that contains your own (now erroneous) space but also extends beyond it. You'll now have to delete your space and break the string again at the spot chosen by Apache.
An alternative approach to the wide-spaces problem is to insert zero-width space characters at each and every point where the culprit string may be broken, leaving it to Apache FOP to work out which one is best suited. This is guaranteed to work at the first try, but:
it's only feasible if you have an editor that lets you insert ZWSPs easily;
you can only do this in places where it's OK to break the string without a hyphen.
Problem: Tables, figures or other formal objects are truncated or some parts are printed on top of others.
Cause: Formal objects are given a keep-together.within-page="always"
attribute by the stylesheets.
As of FOP 0.93, this attribute is always enforced, even if the object
is too large to fit on a page. The result: wrecked content that is crammed together on one
page.
Solution: There are three alternatives. 1: Use the corresponding informal DocBook element instead. 2: Insert a processing instruction in the DocBook source. 3: Remove the attribute from the FO.
How: Two solutions are applied to the DocBook source, the third involves editing the FO file:
If you don't mind leaving the element titleless, use informalequation
/ informalexample
/ informalfigure
/ informaltable
instead of their formal counterparts equation
, example
, figure
and table
. These
elements don't get the keep-together
attribute
during transformation, so they will be page-broken as necessary.
If it concerns a table and you want to keep the title, insert a processing instruction like this:
<table frame="all" id="ufb-about-tbl-features">
<?dbfo keep-together='auto'?>
<title>Summary of features</title>
...
(table content...)
...
</table>
Adding the instruction if you're working in the source text is easy enough. With XMLMind, it's a bit laborious:
Place the cursor somewhere in the title or select the entire title element.
Choose Edit -> Processing Instruction -> Insert Processing Instruction Before from the menu. A green line will appear above the title.
Type keep-together='auto'
on that line.
With the cursor still on the green line, choose Edit -> Processing Instruction -> Change Processing Instruction Target from the menu. A dialogue box pops up.
In the dialogue box, change target
to
dbfo
and click OK.
By the way: you can do the opposite with an informaltable
if you absolutely don't want it broken at page
borders. The procedure is the same, except that you must specify
always
instead of auto
. Be sure that the
informaltable does fit on one page, though!
We don't have a similar provision for the other formal objects because we probably don't need it. (Things like this require work on our custom stylesheets, so we only implement them if we really feel the need.)
Ye olde fo-hacking way... open the FO file, locate the element (tip: give it an
id
in the DocBook source so it's easy to find) and
remove the keep-together.within-page="always"
attribute. A disadvantage is that this procedure has to be repeated every time the
source changes and a new PDF is built. The other two solutions are persistent.
The official XSL-FO (Formatting Objects) page is here: http://www.w3.org/TR/xsl/
The Apache FOP homepage is here: http://xmlgraphics.apache.org/fop/
The Apache FOP compliance page is here: http://xmlgraphics.apache.org/fop/compliance.html. It contains a large object support table where you can look up which XSL-FO objects and attributes (properties) are supported. When consulting the table, please bear in mind that we currently use Apache FOP 0.93 (but with some home-made patches).
Firebird Documentation Index → Firebird Docbuilding Howto → Advanced: Improving the PDF |