.\" -*- mode: troff; coding: utf-8 -*-
.\" Automatically generated by Pod::Man 5.0102 (Pod::Simple 3.45)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
.ie n \{\
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "Mail::SpamAssassin::Pyzor::Digest::Pieces 3"
.TH Mail::SpamAssassin::Pyzor::Digest::Pieces 3 2024-09-01 "perl v5.40.0" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH NAME
Mail::SpamAssassin::Pyzor::Digest::Pieces \- Pyzor backend logic module
.SH DESCRIPTION
.IX Header "DESCRIPTION"
This module houses backend logic for Mail::SpamAssassin::Pyzor::Digest.
.PP
It reimplements logic found in pyzor's \fIdigest.py\fR module
(<https://github.com/SpamExperts/pyzor/blob/master/pyzor/digest.py>).
.SH FUNCTIONS
.IX Header "FUNCTIONS"
.ie n .SS "$strings_ar = digest_payloads( $EMAIL_MIME )"
.el .SS "\f(CW$strings_ar\fP = digest_payloads( \f(CW$EMAIL_MIME\fP )"
.IX Subsection "$strings_ar = digest_payloads( $EMAIL_MIME )"
This imitates the corresponding object method in \fIdigest.py\fR.
It returns a reference to an array of strings. Each string can be either
a byte string or a character string (e.g., UTF\-8 decoded).
.PP
NB: RFC 2822 stipulates that message bodies should use CRLF
line breaks, not plain LF (nor plain CR). 
We will thus convert any plain CRs in a quoted-printable message
body into CRLF. Python, though, doesn't do this, so the output of
our implementation of \f(CWdigest_payloads()\fR diverges from that of the Python
original. It doesn't ultimately make a difference since the line-ending
whitespace gets trimmed regardless, but it's necessary to factor in when
comparing the output of our implementation with the Python output.
.ie n .SS "normalize( $STRING )"
.el .SS "normalize( \f(CW$STRING\fP )"
.IX Subsection "normalize( $STRING )"
This imitates the corresponding object method in \fIdigest.py\fR.
It modifies \f(CW$STRING\fR in-place.
.PP
As with the original implementation, if \f(CW$STRING\fR contains (decoded)
Unicode characters, those characters will be parsed accordingly. So:
.PP
.Vb 1
\&    $str = "123\exc2\exa0";   # [ c2 a0 ] == \eu00a0, non\-breaking space
\&
\&    normalize($str);
.Ve
.PP
The above will leave \f(CW$str\fR alone, but this:
.PP
.Vb 1
\&    utf8::decode($str);
\&
\&    normalize($str);
.Ve
.PP
\&... will trim off the last two bytes from \f(CW$str\fR.
.ie n .SS "$yn = should_handle_line( $STRING )"
.el .SS "\f(CW$yn\fP = should_handle_line( \f(CW$STRING\fP )"
.IX Subsection "$yn = should_handle_line( $STRING )"
This imitates the corresponding object method in \fIdigest.py\fR.
It returns a boolean.
.ie n .SS "$sr = assemble_lines( \e@LINES )"
.el .SS "\f(CW$sr\fP = assemble_lines( \e@LINES )"
.IX Subsection "$sr = assemble_lines( @LINES )"
This assembles a string buffer out of \f(CW@LINES\fR. The string is the buffer
of octets that will be hashed to produce the message digest.
.PP
Each member of \f(CW@LINES\fR is expected to be an \fBoctet string\fR, not a
character string.
.ie n .SS "($main, $sub, $encoding, $checkval) = parse_content_type( $CONTENT_TYPE )"
.el .SS "($main, \f(CW$sub\fP, \f(CW$encoding\fP, \f(CW$checkval\fP) = parse_content_type( \f(CW$CONTENT_TYPE\fP )"
.IX Subsection "($main, $sub, $encoding, $checkval) = parse_content_type( $CONTENT_TYPE )"
.ie n .SS "@lines = splitlines( $TEXT )"
.el .SS "\f(CW@lines\fP = splitlines( \f(CW$TEXT\fP )"
.IX Subsection "@lines = splitlines( $TEXT )"
Imitates \f(CW\*(C`str.splitlines()\*(C'\fR. (cf. \f(CW\*(C`pydoc str\*(C'\fR)
.PP
Returns a plain list in list context. Returns the number of
items to be returned in scalar context.