LOCALE::PO4A::TRANSTRACTOR.3PM(1)

User Contributed Perl Documentation

LOCALE::PO4A::TRANSTRACTOR.3PM(1)

名称

Locale::Po4a::TransTractor - 通用翻译提取器。

描述

Po4a (PO For Anything) 项目的目标是在文档等不需要翻译的领域使用 gettext 工具简化翻译(更有趣的是，简化翻译的维护)。

这个类是每个 po4a 解析器的祖先，用于解析文档、搜索可翻译字符串、将它们提取到 PO 文件并在输出文档中用它们的翻译替换它们。

更正式地说，它接受以下参数作为输入：

要翻译的文档;
包含要使用的翻译的 PO 文件。

作为输出，它产生：

另一个 PO 文件，其结果是从输入文档中提取可翻译字符串;
翻译的文档，其结构与输入中的文档相同，但所有可翻译字符串都替换为输入中提供的 PO 文件中的翻译。

以下是这一点的图形表示：

Input document --\                             /---> Output document
                  \                           /       (translated)
                   +-> parse() function -----+
                  /                           \
Input PO --------/                             \---> Output PO
                                                      (extracted)

解析器应该重写的函数

parse(): 所有工作都在这里进行：解析输入文档、生成输出和提取可翻译字符串。使用下面 INTERNAL FUNCTIONS 章节中提供的函数可以非常简单地实现这一点。另请参阅 SYNOPSIS，它提供了一个示例。
此函数由下面的 process() 函数调用，但如果您选择使用 new() 函数，并手动将内容添加到文档中，则必须自己调用此函数。
docheader(): 此函数返回我们应该添加到生成的文档中的标题，并将其正确引用为目标语言中的注释。有关它的好处，请参阅 po4a(7) 中的 Educating developers about translations 章节。

简介

下面的示例解析以 "<p>" 开头的段落列表。为简单起见，我们假定文档格式良好，即 "<p>" 标记是唯一存在的标记，并且该标记位于每个段落的最开始。

sub parse {
  my $self = shift;
  PARAGRAPH: while (1) {
      my ($paragraph,$pararef)=("","");
      my $first=1;
      my ($line,$lref)=$self->shiftline();
      while (defined($line)) {
          if ($line =~ m/<p>/ && !$first--; ) {
              # Not the first time we see <p>.
              # Reput the current line in input,
              #  and put the built paragraph to output
              $self->unshiftline($line,$lref);
              # Now that the document is formed, translate it:
              #   - Remove the leading tag
              $paragraph =~ s/^<p>//s;
              #   - push to output the leading tag (untranslated) and the
              #     rest of the paragraph (translated)
              $self->pushline(  "<p>"
                              . $self->translate($paragraph,$pararef)
                              );
              next PARAGRAPH;
          } else {
              # Append to the paragraph
              $paragraph .= $line;
              $pararef = $lref unless(length($pararef));
          }
          # Reinit the loop
          ($line,$lref)=$self->shiftline();
      }
      # Did not get a defined line? End of input file.
      return;
  }
}

一旦实现了解析函数，就可以使用下一节中介绍的公共接口来使用 document 类。

使用解析器的脚本的公共接口

构造函数

process(%): 此函数可以在一次调用中完成处理 po4a 文档所需的所有操作。它的参数必须打包为散列。操作：

a.: 读取 po_in_name 中指定的所有 PO 文件
b.: 读取在 file_in_name 中指定的所有原始文档
c.: 解析文档
d.: 读取并应用所有指定的附录
e.: 将翻译的文档写入 file_out_name (如果给定)
f.: 将提取的 PO 文件写入 po_out_name (如果给定)

参数，除了 new() 接受的参数(具有预期类型)：

file_in_name (@): 我们应该在其中读取输入文档的文件名列表。
file_in_charset ($): Charset used in the input document (if it isn't specified, use UTF-8).
file_out_name ($): 我们应该在其中写入输出文档的文件名。
file_out_charset ($): Charset used in the output document (if it isn't specified, use UTF-8).
po_in_name (@): 我们应该从中读取输入 PO 文件的文件名列表，其中包含将用于翻译文档的翻译。
po_out_name ($): 我们应该在其中写入输出 PO 文件的文件名，其中包含从输入文档提取的字符串。
addendum (@): 我们应该从中读取附录的文件名列表。
addendum_charset ($): 附录的字符集。

new(%): 创建新的 po4a 文档。接受的选项 (在作为参数传递的哈希中)：

verbose ($): 设置详细程度。
debug ($): 设置调试。
wrapcol ($): The column at which we should wrap text in output document (default: 76).
The negative value means not to wrap lines at all.

Also it accepts next options for underlying Po-files: porefs, copyright-holder, msgid-bugs-address, package-name, package-version, wrap-po.

操作文档文件

read($$$)

Add another input document data at the end of the existing array "@{$self->{TT}{doc_in}}".

This function takes two mandatory arguments and an optional one.
* The filename to read on disk;
* The name to use as filename when building the reference in the PO file;
* The charset to use to read that file (UTF-8 by default)

This array "@{$self->{TT}{doc_in}}" holds this input document data as an array of strings with alternating meanings.
* The string $textline holding each line of the input text data.
* The string "$filename:$linenum" holding its location and called as
"reference" ("linenum" starts with 1).

请注意，它不解析任何内容。当您完成将输入文件打包到文档中时，应该使用 parse() 函数。

write($)

将翻译后的文档写入给定的文件名。

This translated document data are provided by:
* "$self->docheader()" holding the header text for the plugin, and
* "@{$self->{TT}{doc_out}}" holding each line of the main translated text in the array.

操作 PO 文件

readpo($)

将文件的内容（该名称作为参数传递）添加到现有输入 PO。旧内容不会丢弃。

writepo($)

将提取的 PO 文件写入给定的文件名。

stats()

返回到目前为止完成的转换的一些统计信息。请注意，它与 msgfmt --statistic 打印的统计数据不同。在这里，它是关于 PO 文件最近使用情况的统计信息，而 msgfmt 则报告该文件的状态。它是应用于输入 PO 文件的 Locale::Po4a::Po::stats_get 函数的封装。使用示例：

[normal use of the po4a document...]
($percent,$hit,$queries) = $document->stats();
print "We found translations for $percent\%  ($hit from $queries) of strings.\n";

操作附录

addendum($): 请参阅 po4a(7)，了解有关附录的详细信息，以及翻译人员应如何编写附录。要对翻译后的文档应用附录，只需将其文件名传递给此函数，即可完成 ;)
此函数在出错时返回非空整数。

用于编写派生解析器的内部函数

获取输入，提供输出

提供了四个函数来获取输入和返回输出。它们与 Perl 的 shift/unshift 和 push/pop 非常相似。

* Perl shift returns the first array item and drop it from the array.
* Perl unshift prepends an item to the array as the first array item.
* Perl pop returns the last array item and drop it from the array.
* Perl push appends an item to the array as the last array item.

第一对是关于输入的，第二对是关于输出的。助记符：在 input 中，您感兴趣的是第一行，shift 提供什么，而在 output 中，您希望将结果添加到末尾，就像 push 一样。

shiftline(): 此函数从数组 "@{$self->{TT}{doc_in}}" 返回要解析的第一行及其对应的引用 (打包为数组)，并删除前两个数组项。这里，引用由字符串 "$filename:$linenum" 提供。
unshiftline($$): 取消将输入文档的最后移位行及其对应的引用移回 "{$self->{TT}{doc_in}}" 的头部。
pushline($): 将新行推到 "{$self->{TT}{doc_out}}" 的末尾。
popline(): 从 "{$self->{TT}{doc_out}}" 的末尾弹出最后推送的行。

将字符串标记为可翻译

提供一个函数来处理应该翻译的文本。

translate($$$): 必选参数：

要翻译的字符串
此字符串的引用 (即输入文件中的位置)
此字符串的类型(即对其结构角色的文本描述；在 Locale::Po4a::Po::gettextization() 中使用；另请参阅 po4a(7)，Gettextization: how does it work? 部分)

此函数还可以接受一些额外的参数。它们必须组织为散列。例如：

$self->translate("string","ref","type",
                 'wrap' => 1);

wrap: 指示我们是否可以认为字符串中的空格不重要的布尔值。如果是，则该函数在查找或提取翻译之前对字符串进行规范化，并对翻译进行封装。
wrapcol: the column at which we should wrap (default: the value of wrapcol specified during creation of the TransTractor or 76).
The negative value will be substracted from the default.
comment: 要添加到条目的额外注释。

操作：

将字符串、引用和类型推送到 po_out。
返回字符串的翻译(如 po_in 中所示)，以便解析器可以构建 doc_out。
在将字符串发送到 po_out 和返回翻译之前，处理字符集以重新编码字符串。

其他函数

verbose(): 返回是否在创建翻译提取器期间传递了 verbose 选项。
debug(): 返回是否在创建翻译提取器期间传递了调试选项。
get_in_charset(): This function return the charset that was provided as master charset
get_out_charset(): 此函数将返回应该在输出文档中使用的字符集(通常用于替换已找到的输入文档的检测到的字符集)。
它将使用命令行中指定的输出字符集。如果未指定该命令，则将使用输入 PO 的字符集，如果输入 PO 具有默认的 "CHARSET"，则返回输入文档的字符集，以便不会执行编码。

未来方向

当前翻译提取器的一个缺点是它不能处理包含所有语言的翻译文档，如 debconf 模板或 .desktop 文件。

要解决此问题，只需更改接口：

将散列作为 po_in_name (每种语言的列表)
添加要翻译的参数以指示目标语言

使用类似映射的语法创建一个 pushline_all 函数，该函数将为所有语言创建其内容的 pushline：

$self->pushline_all({ "Description[".$langcode."]=".
                      $self->translate($line,$ref,$langcode)
                    });

看看是否足够 ;)

作者

Denis Barbier <barbier@linuxfr.org>
Martin Quinson (mquinson#debian.org)
Jordi Vilalta <jvprat@gmail.com>

翻译

taotieren <admin@taotieren.com>

2025-11-22

perl v5.42.0

名称

描述