Mail::SpamAssassin::Pyzor::Digest::Pieces(3) User Contributed Perl Documentation Mail::SpamAssassin::Pyzor::Digest::Pieces(3)

Mail::SpamAssassin::Pyzor::Digest::Pieces - Pyzor backend logic module

This module houses backend logic for Mail::SpamAssassin::Pyzor::Digest.

It reimplements logic found in pyzor's digest.py module (https://github.com/SpamExperts/pyzor/blob/master/pyzor/digest.py).

This imitates the corresponding object method in digest.py. It returns a reference to an array of strings. Each string can be either a byte string or a character string (e.g., UTF-8 decoded).

NB: RFC 2822 stipulates that message bodies should use CRLF line breaks, not plain LF (nor plain CR). We will thus convert any plain CRs in a quoted-printable message body into CRLF. Python, though, doesn't do this, so the output of our implementation of digest_payloads() diverges from that of the Python original. It doesn't ultimately make a difference since the line-ending whitespace gets trimmed regardless, but it's necessary to factor in when comparing the output of our implementation with the Python output.

This imitates the corresponding object method in digest.py. It modifies $STRING in-place.

As with the original implementation, if $STRING contains (decoded) Unicode characters, those characters will be parsed accordingly. So:

$str = "123\xc2\xa0";   # [ c2 a0 ] == \u00a0, non-breaking space
normalize($str);

The above will leave $str alone, but this:

utf8::decode($str);
normalize($str);

... will trim off the last two bytes from $str.

This imitates the corresponding object method in digest.py. It returns a boolean.

This assembles a string buffer out of @LINES. The string is the buffer of octets that will be hashed to produce the message digest.

Each member of @LINES is expected to be an octet string, not a character string.

Imitates "str.splitlines()". (cf. "pydoc str")

Returns a plain list in list context. Returns the number of items to be returned in scalar context.

2024-09-01 perl v5.40.0