PDF::Builder::Content::Column_docs(3) User Contributed Perl Documentation NAME PDF::Builder::Content::Column_docs -- column text formatting system PDF::Builder::Content::Text/column and related routines These routines form a sub-library for support of complex columnar output with high level markup languages. Currently, a single rectangular layout may be defined on a page, to be filled by user-defined content. Any content which could not be fit within the column confines is returned in an internal array format, and may be passed to the next column() call to finish the formatting. Future plans call for non-rectangular columns to be definable, as well as flow from one column to another on a page, and column balancing. Other possible enhancements call for support of non-Western writing systems (e.g., bidirectional text, using the HarfBuzz library), proper word-splitting and paragraph shaping (possibly using the Knuth-Plass algorithm), and additional markup languages. column ($rc, $next_y, $unused) = $text->column($page, $text, $grfx, $markup, $txt, %opts) This method fills out a column of text on a page, returning any unused portion that could not be fit, and where it left off on the page. Tag names, CSS entries, markup type, etc. are case-sensitive (usually lower-case letters only). For example, you cannot give a
paragraph in HTML or a P selector in CSS styling.
$page is the page context. Currently, its only use is for page
annotations for links ('md1' []() and 'html' ), so if you're not
using those, you may pass anything such as "undef" for $page if you
wish.
$text is the text context, so that various font and text-output
operations may be performed. It is often, but not necessarily
always, the same as the object containing the "column" method.
$grfx is the graphics (gfx) context. It may be a dummy (e.g.,
undef) if no graphics are to be drawn, but graphical items such as
the column outline ('outline' option) and horizontal rule (
in
HTML markup) use it. Currently, text-decoration underline (default
for links, 'md1' "[]()" and 'html' "") or line-through or
overline use the text context, but may in the future require a
valid graphics context. Images (when implemented) will require a
graphics context.
$markup is information on what sort of markup is being used to
format and lay out the column's text:
'pre'
The input material has already been processed and is already in
the desired form. $txt is an array reference to the list of
hashes. This must be used when you are calling column() a
second (or later) time to output material left over from the
first call. It may also be used when the caller application has
already processed the text into the appropriate format, and
other markup isn't being used.
'none'
If none is specified, there is no markup in use. At most, a
blank line or a new text array element specifies a new
paragraph, and that's it. $txt may be a single string, or an
array (list) of strings.
The input txt is a list (anonymous array reference) of strings,
each containing one or more paragraphs. A single string may
also be given. An empty line between paragraphs may be used to
separate the paragraphs. Paragraphs may not span array
elements.
'md1'
This specifies a certain flavor of Markdown compatible with
Text::Markdown. See the full description below.
There are other flavors of Markdown, so other mdn flavors may
be defined in the future, such as POD from Perl code.
'html'
This specifies that a large subset of HTML markup is used,
along with some attributes and CSS.
Numeric entities (decimal nnn; and hexadecimal nnn;) are
supported, as well as named entities (— for example).
The input txt is a list (anonymous array reference) of strings,
each containing one or more paragraphs and other markup. A
single string may also be given. Per normal HTML practice,
paragraph tags should be used to mark paragraphs. Note that
HTML::TreeBuilder is configured to automatically mark top
body-level text with paragraph tags, in case you forget to do
so, although it is probably better to do it yourself, to
maintain more control over the processing. Separate array
elements will first be glued together into a single string
before processing, permitting paragraphs to span array elements
if desired.
Other input formats
There are other markup languages out there, such as HTML-like
Pango, nroff-like man page, Markdown-like wikimedia, and Perl's
POD, that might be supported in the future (provided there are
supported Perl libraries for them). It is very unlikely that
TeX or LaTeX will ever be supported, as they both already have
excellent PDF output.
PDF::Builder currently only supports the markup languages
described above. If you want to use something else (e.g.,
Perl's POD, or man format, or even MS Word or some other
WYSIWYG format), you will need to find a converter utility to
convert it to a supported flavor of Markdown or HTML. Many such
converters already exist, so take a look (although you may well
have to do some cleanup before column() accepts the resulting
HTML as input).
Perhaps in the future, PDF::Builder will directly support
additional formats, but no promises.
$txt is the input text: a string, an array reference to multiple
strings, or an array reference to hashes. See $markup for details.
%opts Options -- a number of these are, despite the name,
mandatory.
'rect' => [x, y, width, height]
This defines a column as a rectangular area of a given width
and height (both in points) on the current page. In the future,
it is expected that more elaborate non-rectangular areas will
be definable, but for now, a simple rectangle is all that is
permitted. The column's upper left coordinate is "x, y".
The top text baseline is assumed to be relative to the UL
corner (based on the determined line height), and the column
outline clips that baseline, as it does additional baselines
down the page (interline spacing is "leading" multiplied by the
largest "font_size" or image height needed on that line).
Currently, 'rect' is required, as it is the only column shape
supported.
'relative' => [ x, y, scale(s) ]
'relative' defaults to "[ 0, 0, 1, 1 ]", and allows a column
outline (currently only 'rect') to be either absolute or
relative. "x" and "y" are added to each "x,y" coordinate pair,
after scaling. Scaling values:
(none) The scaling defaults to 1 in both x and y dimensions
(no change).
scale (one value) The scaling in both the x (width) and y
(height) dimensions uses this value.
scale_x, scale_y (two values) There are two separate scaling
factors for the x dimension (width) and y dimension (height).
This permits a generically-shaped outline to be defined, scaled
(perhaps not preserving the aspect ratio) and placed anywhere
on the page. This could save you from having to define
similarly-shaped columns from scratch multiple times. If you
want to define a relative outline, the lower left corner
(whether or not it contains a point, and whether or not it's
the first one listed) would usually be "0, 0", to have scaling
work as expected. In other works, your outline template should
be in the lower left corner of the page.
'start_y' => $start_y
If omitted, it is assumed that you want to start at the top of
the defined column (the maximum "y" value minus the maximum
vertical extent of this line). If used, the normal value is
the "next_y" returned from the previous column() call. It is
the deepest extent reached by the previous line (plus leading),
and is the top-most point of the new first line of this
column() call.
Note that the "x" position will be determined by the column
shape and size (the left-most point of the baseline), so there
is no place to explicitly set an "x" position to start at.
'font_size' => $font_size
This is the starting font size (in points) to be used. Over the
course of the text, it may be modified by markup. The default
is 12pt. It is in turn overridden by any CSS or HTML font
size-settings.
The starting font size may be set in a number of ways. It may
be inherited from a previous "$text->font(..., font-size)"
statement; it may be set via the "font_size" option (overriding
any font method inheritance); it may default to 12pt (if
neither explicit way is given). For HTML markup, it may of
course be modified by the "font" tag or by CSS styling
"font-size". For Markdown, it may be modified by CSS styling.
'font_info' => $string
This permits the user to specify the starting font used in
column() (body font-family, font-style, font-weight, color).
column() will pick up any font already loaded
("$text->font($font, $size);", or using FontManager), and use
that as the "current" font. If no font has been loaded, and no
other instructions are given, the FontManager default (core
Times-Roman) will be used.
The "font_info" option for column() may be given to override
either of the two above methods. You may specify a $string of
'-fm-' to instruct column() to use the FontManager "default"
font (Times face core font). Or, you may pick a font face
known to FontManager (added by user code if not one of the 28
core fonts), and optionally give it style and weight: $string
of 'face:style:weight:color'. The style defaults to 'normal'
(non-italic), or 'normal' or '0' may be given. For italics, use
'italic' or '1'. The weight defaults to 'normal' (unbolded
weight), or 'normal' or '0' may be given. For bold (heavy)
text, use 'bold' or '1'. Finally, a color may be given.
Finally, the "style" option for column() may be given to
override any of the above settings, e.g., 'style'=>{ body {
font-family:... } and set the initial current font. Remember
that, as with anything font-related that column() does, the
'face' (family) used must already be known to FontManager
(explicitly loaded with add_font() if not one of the 28 core
fonts). Remember that the first 14 fonts are standard PDF, and
the second 14 are normally supplied with Windows (but not
always with other operating systems).
'marker_width' => $marker_width
'marker_gap' => $marker_gap
This is the width of the gutter to the left of a list item,
where (for the first line of the item) the marker lives. The
marker contains the symbol (for bulleted/unordered lists) or
formatted number and "before" and "after" text (for
numbered/ordered lists). Both have a single space (marker_gap =
1em) before the item text starts. The number is a length, in
points.
The default is 1 em (1 times the font_size passed to column()),
and is not adjusted for any changes of font_size in the markup,
so that lists are indented consistently. This is usually fine
for unordered (bulleted) lists and single digit ordered
(numbered) lists, although you may need to make it wider for
two or three digit numbered lists. An explicit value passed in
is also not changed -- the gutter width for the marker will be
the same in all lists (keeping them aligned). If you plan to
have exceptionally long markers, such as an ordered list of
years in Roman numerals, e.g., (MCMXCIX), you may want to make
this gutter a bit wider.
A value may be given for the marker_gap, which is the gap
between the ($marker_width wide) marker and the start of the
list item's text. The default is $fs points (1 em), set by the
font_size in the markup.
The "list-style-position" CSS property may be given as the
standard 'outside' (the default) or 'inside', or (extension to
CSS) to indent the left side of second, third, etc.