.\" -*- mode: troff; coding: utf-8 -*- .\" Automatically generated by Pod::Man v6.0.2 (Pod::Simple 3.45) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. .ie n \{\ . ds C` "" . ds C' "" 'br\} .el\{\ . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Required to disable full justification in groff 1.23.0. .if n .ds AD l .\" ======================================================================== .\" .IX Title "MIME::WordDecoder 3" .TH MIME::WordDecoder 3 2025-07-13 "perl v5.42.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH NAME MIME::WordDecoder \- decode RFC 2047 encoded words to a local representation .PP WARNING: Most of this module is deprecated and may disappear. The only function you should use for MIME decoding is "mime_to_perl_string". .SH SYNOPSIS .IX Header "SYNOPSIS" See MIME::Words for the basics of encoded words. See "DESCRIPTION" for how this class works. .PP .Vb 1 \& use MIME::WordDecoder; \& \& \& ### Get the default word\-decoder (used by unmime()): \& $wd = default MIME::WordDecoder; \& \& ### Get a word\-decoder which maps to ISO\-8859\-1 (Latin1): \& $wd = supported MIME::WordDecoder "ISO\-8859\-1"; \& \& \& ### Decode a MIME string (e.g., into Latin1) via the default decoder: \& $str = $wd\->decode(\*(AqTo: =?ISO\-8859\-1?Q?Keld_J=F8rn_Simonsen?= \*(Aq); \& \& ### Decode a string using the default decoder, non\-OO style: \& $str = unmime(\*(AqTo: =?ISO\-8859\-1?Q?Keld_J=F8rn_Simonsen?= \*(Aq); \& \& ### Decode a string to an internal Perl string, non\-OO style \& ### The result is likely to have the UTF8 flag ON. \& $str = mime_to_perl_string(\*(AqTo: =?ISO\-8859\-1?Q?Keld_J=F8rn_Simonsen?= \*(Aq); .Ve .SH DESCRIPTION .IX Header "DESCRIPTION" WARNING: Most of this module is deprecated and may disappear. It duplicates (badly) the function of the standard \*(AqEncode\*(Aq module. The only function you should rely on is mime_to_perl_string. .PP A MIME::WordDecoder consists, fundamentally, of a hash which maps a character set name (US\-ASCII, ISO\-8859\-1, etc.) to a subroutine which knows how to take bytes in that character set and turn them into the target string representation. Ideally, this target representation would be Unicode, but we don\*(Aqt want to overspecify the translation that takes place: if you want to convert MIME strings directly to Big5, that\*(Aqs your own decision. .PP The subroutine will be invoked with two arguments: DATA (the data in the given character set), and CHARSET (the upcased character set name). .PP For example: .PP .Vb 6 \& ### Keep 7\-bit characters as\-is, convert 8\-bit characters to \*(Aq#\*(Aq: \& sub keep7bit { \& local $_ = shift; \& tr/\ex00\-\ex7F/#/c; \& $_; \& } .Ve .PP Here\*(Aqs a decoder which uses that: .PP .Vb 6 \& ### Construct a decoder: \& $wd = MIME::WordDecoder\->new({\*(AqUS\-ASCII\*(Aq => "KEEP", ### sub { $_[0] } \& \*(AqISO\-8859\-1\*(Aq => \e&keep7bit, \& \*(AqISO\-8859\-2\*(Aq => \e&keep7bit, \& \*(AqBig5\*(Aq => "WARN", \& \*(Aq*\*(Aq => "DIE"}); \& \& ### Convert some MIME text to a pure ASCII string... \& $ascii = $wd\->decode(\*(AqTo: =?ISO\-8859\-1?Q?Keld_J=F8rn_Simonsen?= \*(Aq); \& \& ### ...which will now hold: "To: Keld J#rn Simonsen " .Ve .PP The UTF\-8 built\-in decoder decodes everything into Perl\*(Aqs internal string format, possibly turning on the internal UTF8 flag. Use it like this: .PP .Vb 3 \& $wd = supported MIME::WordDecoder \*(AqUTF\-8\*(Aq; \& $perl_string = $wd\->decode(\*(AqTo: =?ISO\-8859\-1?Q?Keld_J=F8rn_Simonsen?= \*(Aq); \& # perl_string will be a valid UTF\-8 string with the "UTF8" flag set. .Ve .PP Generally, you should use the UTF\-8 decoder in preference to "unmime". .SH "PUBLIC INTERFACE" .IX Header "PUBLIC INTERFACE" .IP "default [DECODER]" 4 .IX Item "default [DECODER]" \&\fIClass method.\fR Get/set the default DECODER object. .IP "supported CHARSET, [DECODER]" 4 .IX Item "supported CHARSET, [DECODER]" \&\fIClass method.\fR If just CHARSET is given, returns a decoder object which maps data into that character set (the character set is forced to all\-uppercase). .Sp .Vb 1 \& $wd = supported MIME::WordDecoder "ISO\-8859\-1"; .Ve .Sp If DECODER is given, installs such an object: .Sp .Vb 2 \& MIME::WordDecoder\->supported("ISO\-8859\-1" => \& (new MIME::WordDecoder::ISO_8859 "1")); .Ve .Sp You should not override this method. .IP "new [\e@HANDLERS]" 4 .IX Item "new [@HANDLERS]" \&\fIClass method, constructor.\fR If \e@HANDLERS is given, then \f(CW@HANDLERS\fR is passed to \fBhandler()\fR to initialize the internal map. .IP "handler CHARSET=>\e&SUBREF, ..." 4 .IX Item "handler CHARSET=>&SUBREF, ..." \&\fIInstance method.\fR Set the handler SUBREF for a given CHARSET, for as many pairs as you care to supply. .Sp When performing the translation of a MIME\-encoded string, a given SUBREF will be invoked when translating a block of text in character set CHARSET. The subroutine will be invoked with the following arguments: .Sp .Vb 5 \& DATA \- the data in the given character set. \& CHARSET \- the upcased character set name, which may prove useful \& if you are using the same SUBREF for multiple CHARSETs. \& DECODER \- the decoder itself, if it contains configuration information \& that your handler function needs. .Ve .Sp For example: .Sp .Vb 5 \& $wd = new MIME::WordDecoder; \& $wd\->handler(\*(AqUS\-ASCII\*(Aq => "KEEP"); \& $wd\->handler(\*(AqISO\-8859\-1\*(Aq => \e&handle_latin1, \& \*(AqISO\-8859\-2\*(Aq => \e&handle_latin1, \& \*(Aq*\*(Aq => "DIE"); .Ve .Sp Notice that, much as with \f(CW%SIG\fR, the SUBREF can also be taken from a set of special keywords: .Sp .Vb 4 \& KEEP Pass data through unchanged. \& IGNORE Ignore data in this character set, without warning. \& WARN Ignore data in this character set, with warning. \& DIE Fatal exception with "can\*(Aqt handle character set" message. .Ve .Sp The subroutine for the special CHARSET of \*(Aqraw\*(Aq is used for raw (non\-MIME\-encoded) text, which is supposed to be US\-ASCII. The handler for \*(Aqraw\*(Aq defaults to whatever was specified for \*(AqUS\-ASCII\*(Aq at the time of construction. .Sp The subroutine for the special CHARSET of \*(Aq*\*(Aq is used for any unrecognized character set. The default action for \*(Aq*\*(Aq is WARN. .IP "decode STRING" 4 .IX Item "decode STRING" \&\fIInstance method.\fR Decode a STRING which might contain MIME\-encoded components into a local representation (e.g., UTF\-8, etc.). .IP "unmime STRING" 4 .IX Item "unmime STRING" \&\fIFunction, exported.\fR Decode the given STRING using the \fBdefault()\fR decoder. See \fBdefault()\fR. .Sp You should consider using the UTF\-8 decoder instead. It decodes MIME strings into Perl\*(Aqs internal string format. .IP mime_to_perl_string 4 .IX Item "mime_to_perl_string" \&\fIFunction, exported.\fR Decode the given STRING into an internal Perl Unicode string. You should use this function in preference to all others. .Sp The result of mime_to_perl_string is likely to have Perl\*(Aqs UTF8 flag set. .SH SUBCLASSES .IX Header "SUBCLASSES" .IP MIME::WordDecoder::ISO_8859 4 .IX Item "MIME::WordDecoder::ISO_8859" A simple decoder which keeps US\-ASCII and the 7\-bit characters of ISO\-8859 character sets and UTF8, and also keeps 8\-bit characters from the indicated character set. .Sp .Vb 2 \& ### Construct: \& $wd = new MIME::WordDecoder::ISO_8859 2; ### ISO\-8859\-2 \& \& ### What to translate unknown characters to (can also use empty): \& ### Default is "?". \& $wd\->unknown("?"); \& \& ### Collapse runs of unknown characters to a single unknown()? \& ### Default is false. \& $wd\->collapse(1); .Ve .Sp According to \fBhttp://czyborra.com/charsets/iso8859.html\fR (ca. November 2000): .Sp ISO 8859 is a full series of 10 (and soon even more) standardized multilingual single\-byte coded (8bit) graphic character sets for writing in alphabetic languages: .Sp .Vb 10 \& 1. Latin1 (West European) \& 2. Latin2 (East European) \& 3. Latin3 (South European) \& 4. Latin4 (North European) \& 5. Cyrillic \& 6. Arabic \& 7. Greek \& 8. Hebrew \& 9. Latin5 (Turkish) \& 10. Latin6 (Nordic) .Ve .Sp The ISO 8859 charsets are not even remotely as complete as the truly great Unicode but they have been around and usable for quite a while (first registered Internet charsets for use with MIME) and have already offered a major improvement over the plain 7bit US\-ASCII. .Sp Characters 0 to 127 are always identical with US\-ASCII and the positions 128 to 159 hold some less used control characters: the so\-called C1 set from ISO 6429. .IP MIME::WordDecoder::US_ASCII 4 .IX Item "MIME::WordDecoder::US_ASCII" A subclass of the ISO\-8859\-1 decoder which discards 8\-bit characters. You\*(Aqre probably better off using ISO\-8859\-1. .SH "SEE ALSO" .IX Header "SEE ALSO" MIME::Tools .SH AUTHOR .IX Header "AUTHOR" Eryq (\fIeryq@zeegee.com\fR), ZeeGee Software Inc (\fIhttp://www.zeegee.com\fR). Dianne Skoll (dianne@skoll.ca)