groff_diff(7) Miscellaneous Information Manual groff_diff(7) Name groff_diff - differences between GNU roff and AT&T troff Description The GNU roff text processing system, groff, is a reimplementation and extension of AT&T troff, the typesetting system originating in Unix systems of the 1970s. groff removes many arbitrary limitations and adds features, both to the input language and to the page description language output by the troff formatter. We also note here differences arising from groff's implementation of AT&T troff features. See roff(7) for background. GNU troff can operate in a manner that increases support for documents written for AT&T troff; see section "Compatibility mode" below. Language GNU troff features identifiers of arbitrary length; supports color output, non-integral type sizes, user-defined characters, and automatic hyphenation of languages other than English; adds more conditional expression operators; recognizes additional scaling units and arithmetic operators; enables general file I/O (in "unsafe mode" only); and exposes more formatter state. Long names GNU troff introduces many new requests; with three exceptions (cp, do, rj), they have names longer than two characters. The names of registers, fonts, strings/macros/diversions, environments, special characters, character classes, streams, hyphenation language codes, and colors can be of any length. Anywhere AT&T troff supports a parameterized escape sequence that uses an opening parenthesis "(" to introduce a two-character argument, groff supports a square-bracketed form "[]" where the argument within can be of arbitrary length. Font families, abstract styles, and translation GNU troff can group text typefaces into families. For example, groff ships with support for families containing each of the styles "R", "I", "B", and "BI" (roman [upright], bold, italic [slanted], and bold- italic). So that a document need not be coupled to a specific font family, an output device can associate a style in the abstract sense with a mounting position. Thus the default family can combine with a style dynamically, producing a resolved font name. A document can translate, or remap, font names with the ftr request. Applying the requests cs, bd, tkf, uf, or fspecial to an abstract style affects the member of the default family corresponding to that style. The default family can be set with the fam request or -f command-line option. The styles directive in the output device's DESC file controls which mounting positions (if any) are initially associated with abstract styles rather than fonts, and the sty request can update this association. Colors groff supports color output with a variety of color spaces and up to 16 bits per channel. Some devices, particularly terminals, may be more limited. When color support is enabled, two colors are current at any given time: the stroke color, with which glyphs, rules (lines), and geometric figures are drawn, and the fill color, which paints the interior of filled geometric figures. The color, defcolor, gcolor, and fcolor requests; \m and \M escape sequences; and .color, .m, and .M registers exercise color support. Hyphenation GNU troff uses a hyphenation algorithm and language-specific pattern files (based on TeX's) to decide which words can be hyphenated and where. AT&T troff's hyphenation system ("suftab") was specific to English. New requests permit finer control over hyphenation breaking; hyphenation of a word might be suppressed due to a limit on consecutive hyphenated lines (hlm), a minimum line length threshold (hym), or because the line can instead be adjusted with additional inter-word space (hys). The hla request selects a hyphenation language, whereas hpf and hpfa respectively load and append to the language's hyphenation patterns. If no hyphenation language is set or no patterns are loaded, GNU troff does not perform automatic hyphenation. For automatic hyphenation to work, the formatter must know which letters are equivalent. For example, the letter "E" behaves like "e"; only the latter typically appears in hyphenation pattern files. GNU troff expects characters that participate in automatic hyphenation to be assigned hyphenation codes that define these equivalence classes. At startup, GNU troff assigns hyphenation codes to the letters "a"-"z", applies the same codes to "A"-"Z" in one-to-one correspondence, and assigns a code of zero to all other characters. The hcode request enables application of hyphenation codes to characters outside the Unicode basic Latin set; without doing so, words containing such letters won't hyphenate properly even if the corresponding hyphenation patterns contain them. Localization files for the input character set and language configure hyphenation codes; see groff_tmac(5). GNU troff's \: escape sequence works like \% but produces no hyphen if the word breaks at that location. Fractional type sizes and new scaling units When configuring the type size, AT&T troff ignored scaling units and intepreted all measurements in points. Combined with integer arithmetic, this design choice made it impossible to support, for instance, ten-and-a-half-point type. In GNU troff an output device can select a scaling factor that subdivides a point into "scaled points". A type size expressed in scaled points can thus represent a non- integral size in points. A scaled point, scaling unit s, is equal to 1/sizescale points, where the device description file, DESC, specifies sizescale and otherwise defaults to 1; see groff_font(5). GNU troff also defines the typographical point, scaling unit z, which explicitly specifies a type size of potentially non-integral measure. The program multiplies typographical points by sizescale and converts the value to an integer. Arguments GNU troff interprets in z units by default comprise those to the escape sequences \H and \s, to the request ps, the third argument to the cs request, and the second and fourth arguments to the tkf request. In GNU troff, the register \n[.s] interpolates the type size in typographical points (z), whereas the register \n[.ps] interpolates it in scaled points (s). "\n[.ps]s", "\n[.s]z", and "1m" are co-equal by definition. For example, if sizescale is 1000, then a scaled point is one thousandth of a point. Consequently, ".ps 10.5" is synonymous with ".ps 10.5z"; both set the type size to 10,500 scaled points or 10.5 typographical points. It makes no sense to use the "z" scaling unit in a numeric expression whose default scaling unit is neither "u" nor "z", so GNU troff disallows this. Similarly, it is nonsensical to use scaling units other than "p", "s", "z", or "u", in a numeric expression whose default scaling unit is "z", and so GNU troff disallows those as well. Output devices may be limited in the type sizes they can employ. The .s and .ps registers represent the type size selected by the formatter as it understands a device's capability. the read-only registers .psr and (string-valued) .sr interpolate the last requested in scaled points and in points as a decimal fraction, respectively. Like the actual current and previous type size, the requested ones are properties of an environment. For example, if a document requests a type size of 10.95 points, and the nearest size permitted by a sizes request (or by the sizes or sizescale directives in the device's DESC file) is 11 points, groff uses the latter value. A further two new measurement units available in groff are "M", which indicates hundredths of an em, and "f", which multiplies by 65,536. The latter provides convenient fractions for color definitions with the defcolor request. For example, 0.5f equals 32768u. Special fonts GNU troff's "special" and fspecial requests permit a document to supplement the set of fonts the device configures for glyph search without having to use the fp request to manipulate the list of mounting positions, which can be tedious--by default, GNU troff mounts 40 fonts at startup when using the ps device. Numeric expressions GNU troff permits spaces in a numeric expression within parentheses, and offers three new operators. e1>?e2 Interpolate the greater of expressions e1 and e2. e1_(j = 1)^(i - 1) (dx_j, dy_j). groff output drivers automatically close polygons, drawing a line from (dx_n, dy_n) back to (dx_1, dy_1). The drawing position is left at the last specified vertex, but this may change in a future version of GNU troff. Heirloom Doctools troff, like DWB troff, by default does not close the polygon. In its groff compatibility mode, Heirloom closes the polygon but leaves the drawing position unchanged--that is, at the polygon's initial drawing position. DP dx_1 dy_1 ldots dx_n dy_n" As Dp, but draw a filled rather than a stroked polygon. Dt n Set the line thickness to n basic units. AT&T troff output drivers use a thickness proportional to the type size; this is the GNU troff default. A negative n requests this explicitly. An n of zero selects the smallest available line thickness. A difficulty arises in how the drawing position should be changed after the execution of these commands. This has little importance to most users, since the output of GNU grn and pic does not depend on it. Given a drawing command of the form Dz x_1 y_1 ldots x_n y_n, where z is not c or e, AT&T troff treats each x_i as a horizontal motion, each y_i as a vertical one, and therefore assumes that the width of the drawn object is _(i = 1)^n x_i, and its height is _(i = 1)^n y_i. (Verify its assumption about height by examining the st and sb registers after using such a drawing command in a \w escape sequence). Thus after executing a D command of the form Dz x_1 y_1 ldots x_n y_n, the drawing position increases by (_(i = 1)^n x_i, _(i = 1)^n y_i). For the sake of compatibility, GNU troff follows this rule, even though it frustrates extensions to the D command that set drawing parameters rather than rendering objects, producing ugly results in the case of Dt and Df, or otherwise don't parameterize objects as a series of vertices, as with GNU troff's filled ellipse, DE. In a future release, GNU troff and its output drivers may abandon the application of this assumption to drawing commands not explicitly specified in the AT&T "Troff User's Manual". You can ensure predictable output by enclosing drawing commands in the zero-motion escape sequence \Z. GNU troff implements fill color selection with another set of extensions. DFc cyan magenta yellow DFd DFg gray DFk cyan magenta yellow black DFr red green blue Set the components of the fill color as described under the \M escape sequence above. DFd restores the device's default fill color. The drawing position is not updated, in contrast to Df. Device control syntax extension GNU troff introduces a line continuation convention, permitting the argument to the x X command to contain newlines. A newline in the input is transformed to the sequence "newline+". When interpreting an x X command, a postprocessor should therefore be prepared for a plus sign after a newline; if it occurs, preserve the newline, discard the plus sign, and continue to collect the input into the argument of the x X command. A newline not followed by a plus sign terminates the x X command. An application of this feature is the embedding of PostScript or PDF language command streams into troff output. GNU troff guarantees that the first three output commands it emits are as follows. x T device x res n h v x init Debugging In addition to AT&T troff's debugging features, GNU troff emits more error diagnostics when syntactical or semantic nonsense is encountered and supports several warning categories; the output of these can be selected with "warn". Also see the -E, -w, and -W options of troff(1). A trace of the formatter's input processing stack can be emitted when errors or warnings occur by means of GNU troff -b option, or produced on demand with the backtrace request. groff also adds more flexible diagnostic output requests (tmc and tm1). Examine the state of the formatter with requests that write lists of defined colors (pcolor), composite character mappings (pcomposite), environments (pev), font translations (pftr), automatic hyphenation codes (pchar) and exceptions (phw), registers (pnr), open streams (pstream), and page location traps (pwh). Requests can also disclose to the standard error stream the internal properties and representations of characters and classes (pchar), macros (and strings and diversions) (pm), and the list of output nodes corresponding to the pending input line (pline). Compatibility mode Some syntactical and behavioral differences between AT&T and GNU troffs are thought too important to neglect; GNU troff therefore makes available a compatibility mode in an effort to keep documents prepared for AT&T troff rendering well. Identifiers of arbitrary length may be GNU troff's most obvious innovation. AT&T troff interprets ".dsabcd" as defining a string "ab" with contents "cd". Normally, GNU troff interprets this input as calling a macro named "dsabcd". AT&T troff also interprets \*[ and \n[ as interpolating a string or register, respectively, named "[". GNU troff, however, normally interprets "[" as bracketing a long name (with "]" at the distal end). In compatibility mode, GNU troff interprets names in the traditional way, they thus can be two characters long at most. See the -C option in troff(1) and, above, the .C and .cp registers, and cp and "do" requests, for more on compatibility mode. The register \n[.cp] is specialized and may require a statement of rationale. When writing macro packages or documents that use GNU troff features and which may be mixed with other packages or documents that do not--common scenarios include serial processing of man pages or use of the "so" or mso requests--you may desire correct operation regardless of compatibility mode enablement in the surrounding context. It may occur to you to save the existing value of \n(.C into a register, say, _C, at the beginning of your file, turn compatibility mode off with ".cp 0", then restore it from that register at the end with ".cp \n(_C". At the same time, a modular design of a document or macro package may lead you to multiple layers of inclusion. You cannot use the same register name everywhere lest you "clobber" the value from a preceding or enclosing context. The two-character register name space of AT&T troff is confining, but employing GNU troff's more capacious one, as with ".nr _my_saved_C \n(.C" does not work in compatibility mode; the register name is too long. Employing the "do" request is no help: ".do nr _my_saved_C \n(.C" always saves zero to the register, because "do" turns compatibility mode off while it interprets its argument list. GNU troff normally tracks the interpolation depth of escape sequence parameters and other delimited structures, but not in compatibility mode. See section "Miscellaneous" above. The escape sequences \f, \H, \m, \M, \R, \s, and \S are transparent to control character recognition at the beginning of an input line, or after the conditional expression of an "if" or ie request, only in compatibility mode. That is, upon interpreting them, GNU troff normally no longer recognizes a control character on the input line; but in compatibility mode, it does, just like AT&T troff. Normally, the syntax form \sn accepts only a single character (a digit) for n, consistently with other forms that originated in AT&T troff, like \*, \$, \f, \g, \k, \n, and \z. In compatibility mode only, a non-zero n must be in the range 4-39. Legacy documents relying upon this quirk of parsing should migrate to another \s form. [Background: The Graphic Systems C/A/T phototypesetter (the original device target for AT&T troff) supported only a few discrete type sizes in the range 6-36 points, so Ossanna contrived a special case in the parser to do what the user must have meant. Kernighan warned of this in the 1992 revision of CSTR #54 (
2.3), and more recently, McIlroy referred to it as a "living fossil".] Other differences GNU troff does not emit output if it has nothing to format. For example, it treats an input document consisting solely of nr and tm requests as empty, and produces nothing on its standard output stream. AT&T troff does, creating a blank page. Use of C0 control characters in identifiers is not portable; Solaris, Plan 9, and Heirloom Doctools troffs accept Control+B, Control+C, Control+E, Control+F, and Control+G (only); DWB 3.3 troff does not. GNU troff rejects C0 controls in identifiers with an error diagnostic. Formatters that don't implement GNU troff extension request names tend to ignore them, and if they don't support a GNU troff extension escape sequence, they are liable to format its function selector character as text. For example, the adjustable, non-breaking space escape sequence \~ is also supported by Heirloom Doctools troff 050915 (September 2005), mandoc 1.9.5 (2009-09-21), neatroff (commit 1c6ab0f6e, 2016-09-13), and Plan 9 from User Space troff (commit 93f8143600, 2022-08-12), but not by Solaris or Documenter's Workbench troffs, which both render it as "~". The \A escape sequence (see subsection "Escape sequences" above) may be helpful in avoiding their use. AT&T troff discards trailing spaces from input lines, like GNU troff, but when it does so, AT&T troff also cancels end-of-sentence detection. Use of the dummy character escape sequence \& is more portable. When adjusting output lines to both margins, AT&T troff at first adjusts spaces starting from the right; GNU troff begins from the left. Both implementations adjust spaces from opposite ends on alternating output lines in this adjustment mode to prevent "rivers" in the text. GNU troff does not always hyphenate words as AT&T troff does. The AT&T implementation uses a set of hard-coded rules specific to U.S. English, while GNU troff uses language-specific hyphenation pattern files derived from TeX. Some versions of troff reserved meager storage for hyphenation exception words (arguments to the hw request); GNU troff has no such restriction. When the hy request is invoked without an argument, GNU troff sets the automatic hyphenation mode to the value of the .hydefault register; the AT&T implementation sets it to "1", which is not suitable in GNU troff for some languages, including English. Unlike GNU troff, AT&T troff does not recognize an occurrence of \% at the beginning of a word as suppressing its hyphenation; instead, it (uselessly) marks the start of the word as a potential hyphenation point, permitting output lines to end with hyphens that are not interior to a word. GNU troff handles the dummy character \& differently from AT&T troff when it is followed by the hyphenation control escape sequence \% at the beginning of a word. GNU troff does not regard the dummy character as "starting" the word; AT&T troff does. Further, Heirloom Doctools troff does not honor an explicit hyphenation point marked with \% after a word-initial one. GNU troff interprets request arguments representing file names and system commands in the same way it does the contents argument to the ds and "as" requests: it removes a leading neutral double quote `"' from the argument to the cf, nx, pi, "so", and sy requests, and the second argument (if present) to the lf request, permitting initial embedded spaces in it, and reads it to the end of the input line in copy mode. This difference permits the formatter to handle files with spaces in their names, but requires more care with trailing comments, and doubling of an initial neutral double quote """ if the file name has one. The existence of the .T string is a common feature of device- independent troffs--DWB 3.3, Solaris, Heirloom Doctools, and Plan 9 troff all support it--but valid values are specific to each implementation. The (read-only) register .T interpolates 1 if GNU troff is run with the -T option, and 0 otherwise. In contrast, AT&T troff interpolated 1 only if nroff was the formatter and was run with -T. AT&T troff ignored attempts to remove read-only registers; GNU troff honors such requests. The lf request sets the number of the current input line in AT&T troff, and the next in GNU troff. AT&T troff had only environments named "0", "1", and "2". In GNU troff, any number of environments may exist, using any valid identifiers for their names. As noted above in "Fractional type sizes and new scaling units", AT&T troff's ps request ignores scaling units and thus ".ps 10u" sets the type size to 10 points, whereas in GNU troff it sets the type size to 10 scaled points, possibly a much smaller measurement. AT&T's behavior also means that ".ps 10p" and ".ps 10z" are portable. The ab request differs from AT&T troff: GNU troff writes no message to the standard error stream if no arguments are given, and it exits with a failure status instead of a successful one. The bp request differs from AT&T troff: GNU troff does not accept a scaling unit on the argument, a page number; the former does (uselessly). In AT&T troff the pm request reports macro, string, and diversion sizes in units of 128-byte blocks, and an argument reduces the report to a sum of the above in the same units. GNU troff reports their lengths in characters or nodes if given no arguments, and otherwise dumps the JSON-encoded name, contents, and other properties of each named argument. AT&T troff ignores the ss request if the output is a terminal device; GNU troff rounds down the values of minimum inter-word and additional inter-sentence space each to the nearest multiple of 12. GNU troff distinguishes characters from glyphs. Characters can be ordinary, special, or indexed, and populate strings and macros. Characters per se have not (yet) been formatted. Glyphs represent graphemes (supplied by the output device) and populate diversions. Formatting converts characters into (sequences of) glyphs. GNU troff stores properties of the environment that affect how a glyph is rendered with the glyph node's data. Thus, subsequent formatting operations do not affect it, including bd, cs, tkf, tr, and fp requests. Normally, a macro or string contains only a list of characters and a diversion contains only a list of nodes. However, applying the asciify or unformat requests to a diversion converts some of its nodes back into characters. Where the formatter cannot recover the character representation of a node, it stores a null character in the character list corresponding to a single node in the node list. Consequently, a glyph node does not behave as a character does in macro interpolation: it does not inherit special properties that the character from which it was constructed might have had. One way to format a backslash in most documents is with the \e escape sequence; this formats the glyph of the current escape character, regardless of whether it is used in a diversion; it also works in both GNU troff and AT&T troff. (Naturally, if you've changed the escape character, you need to prefix the "e" with whatever it is--and you'll likely get something other than a backslash in the output.) The other correct way, appropriate in contexts independent of the backslash's common use as a roff escape character--perhaps in discussion of character sets or other programming languages--is the special character escape sequence \(rs or \[rs], for "reverse solidus", from its name in the ECMA-6 and ISO 10646 standards. [AT&T troff 's font description files did not define the rs special character, but those of its descendant Heirloom Doctools troff do, as of its 060716 release (July 2006).] To store an escape sequence in a diversion that is interpreted when the diversion is interpolated, either use the traditional \! transparent output facility, or, if this is unsuitable, the new \? escape sequence. See subsection "Escape sequences" above and sections "Diversions" and "Gtroff Internals" in Groff: The GNU Implementation of troff, the groff Texinfo manual. Like AT&T troff, GNU troff maintains a buffer of device-independent output commands, populating the buffer as formatted output accumulates. GNU troff always flushes this buffer when processing a break; AT&T troff does so according to no obvious schedule (perhaps, if the buffer is of fixed size, the formatter performs the flush when the buffer runs out of room). In the somewhat pathological case where a diversion exists containing a partially collected line and a partially collected line at the top- level diversion has never existed, AT&T troff outputs a partially collected but otherwise empty line (as if "\c" were in the top-level diversion) at the end of input; GNU troff does not. Formatter output incompatibilities Its extensions notwithstanding, GNU troff's page description language has some incompatibilities with that of AT&T troff, but better compatibility is sought; problem reports and patches are welcome. The following incompatibilities are known. o The drawing position after rendering polygons is inconsistent with AT&T troff practice. Other implementations have diverged on this point as well. o The output cannot be easily rescaled to other devices as AT&T troff's could. Authors This document was written by James Clark , Werner Lemberg , Bernd Warken , and G. Branden Robinson . See also Groff: The GNU Implementation of troff, by Trent A. Fisher and Werner Lemberg, is the primary groff manual. You can browse it interactively with "info groff". "Troff User's Manual" by Joseph F. Ossanna, 1976 (revised by Brian W. Kernighan, 1992), AT&T Bell Laboratories Computing Science Technical Report No. 54, widely called simply "CSTR #54", documents the language, device and font description file formats, and page description language referred to collectively in groff documentation as AT&T troff. "A Typesetter-independent TROFF" by Brian W. Kernighan, 1982, AT&T Bell Laboratories Computing Science Technical Report No. 97, provides additional insights into the device and font description file formats and page description language. groff(1), groff(7), roff(7) groff 1.24.0 2026-03-01 groff_diff(7)