.\" -*- mode: troff; coding: utf-8 -*- .\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. .ie n \{\ . ds C` "" . ds C' "" 'br\} .el\{\ . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "RSSLite 3" .TH RSSLite 3 2024-07-13 "perl v5.38.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH NAME XML::RSSLite \- lightweight, "relaxed" RSS (and XML\-ish) parser .SH SYNOPSIS .IX Header "SYNOPSIS" .Vb 1 \& use XML::RSSLite; \& \& parseRSS(\e%result, \e$content); \& \& print "=== Channel ===\en", \& "Title: $result{\*(Aqtitle\*(Aq}\en", \& "Desc: $result{\*(Aqdescription\*(Aq}\en", \& "Link: $result{\*(Aqlink\*(Aq}\en\en"; \& \& foreach $item (@{$result{\*(Aqitems\*(Aq}}) { \& print " \-\-\- Item \-\-\-\en", \& " Title: $item\->{\*(Aqtitle\*(Aq}\en", \& " Desc: $item\->{\*(Aqdescription\*(Aq}\en", \& " Link: $item\->{\*(Aqlink\*(Aq}\en\en"; \& } .Ve .SH DESCRIPTION .IX Header "DESCRIPTION" This module attempts to extract the maximum amount of content from available documents, and is less concerned with XML compliance than alternatives. Rather than rely on XML::Parser, it uses heuristics and good old-fashioned Perl regular expressions. It stores the data in a simple hash structure, and "aliases" certain tags so that when done, you can count on having the minimal data necessary for re-constructing a valid RSS file. This means you get the basic title, description, and link for a channel and its items. .PP This module extracts more usable links by parsing "scriptingNews" and "weblog" formats in addition to RDF & RSS. It also "sanitizes" the output for best results. The munging includes: .IP "Remove html tags to leave plain text" 4 .IX Item "Remove html tags to leave plain text" .PD 0 .IP "Remove leading whitespace from URIs" 4 .IX Item "Remove leading whitespace from URIs" .IP "By defaul strips characters except 0\-9~!@#$%^&*()\-+=a\-zA\-Z[];',.:""<>?\es" 4 .IX Item "By defaul strips characters except 0-9~!@#$%^&*()-+=a-zA-Z[];',.:""<>?s" .IP "Use tags when is empty" 4 .IX Item "Use tags when is empty" .IP "Use misplaced urls in when <link> is empty" 4 .IX Item "Use misplaced urls in <title> when <link> is empty" .IP "Exract links from <a href=...> if required" 4 .IX Item "Exract links from <a href=...> if required" .IP "Limit links to ftp and http(s)" 4 .IX Item "Limit links to ftp and http(s)" .IP "Join relative item urls (beginning with / or #) to the site base" 4 .IX Item "Join relative item urls (beginning with / or #) to the site base" .PD .SS EXPORT .IX Subsection "EXPORT" .ie n .IP "parseRSS($outHashRef, $inScalarRef, [$strip])" 4 .el .IP "parseRSS($outHashRef, \f(CW$inScalarRef\fR, [$strip])" 4 .IX Item "parseRSS($outHashRef, $inScalarRef, [$strip])" .RS 4 .PD 0 .IP "inScalarRef \- required" 4 .IX Item "inScalarRef - required" .PD Reference to a scalar containing the document to be parsed. NOTE: The contents will effectively be destroyed. Make a deep copy first if you care. .IP "outHashRef \- required" 4 .IX Item "outHashRef - required" Reference to the hash within which to store the parsed content. .IP "strip \- optional" 4 .IX Item "strip - optional" An expression indicating the level of winnowing to be performed on the characters permitted in the results. .RS 4 .IP "1 strip non-printable characters" 4 .IX Item "1 strip non-printable characters" .PD 0 .IP "0 no characters are removed" 4 .IX Item "0 no characters are removed" .IP "undefined (Default) strip everything but:" 4 .IX Item "undefined (Default) strip everything but:" .PD 0\-9~!@#$%^&*()\-+= a\-zA\-Z[];',.:"<>?\et\en .RE .RS 4 .RE .RE .RS 4 .RE .SS EXPORTABLE .IX Subsection "EXPORTABLE" .ie n .IP "parseXML(\e%parsedTree, \e$parseThis, 'topTag', $comments);" 4 .el .IP "parseXML(\e%parsedTree, \e$parseThis, 'topTag', \f(CW$comments\fR);" 4 .IX Item "parseXML(%parsedTree, $parseThis, 'topTag', $comments);" .RS 4 .PD 0 .IP "parsedTree \- required" 4 .IX Item "parsedTree - required" .PD Reference to hash to store the parsed document within. .IP "parseThis \- required" 4 .IX Item "parseThis - required" Reference to scalar containing the document to parse. .IP "topTag \- optional" 4 .IX Item "topTag - optional" Tag to consider the root node, leaving this undefined is not recommended. .IP "comments \- optional" 4 .IX Item "comments - optional" .RS 4 .PD 0 .IP "false will remove contents from parseThis" 4 .IX Item "false will remove contents from parseThis" .IP "true will not remove comments from parseThis" 4 .IX Item "true will not remove comments from parseThis" .IP "array reference is true, comments are stored here" 4 .IX Item "array reference is true, comments are stored here" .RE .RS 4 .RE .RE .RS 4 .RE .PD .SS CAVEATS .IX Subsection "CAVEATS" This is not a conforming parser. It does not handle the following .IP \(bu 4 .Sp .Vb 1 \& <foo bar=">"> .Ve .IP \(bu 4 .Sp .Vb 1 \& <foo><bar> <bar></bar> <bar></bar> </bar></foo> .Ve .IP \(bu 4 .Sp .Vb 1 \& <![CDATA[ ]]> .Ve .IP \(bu 4 .Sp .Vb 1 \& PI .Ve .PP It's non-validating, without a DTD the following cannot be properly addressed .IP entities 4 .IX Item "entities" .PD 0 .IP namespaces 4 .IX Item "namespaces" .PD This may or may not be arriving in some future release. .SH "SEE ALSO" .IX Header "SEE ALSO" \&\fBperl\fR\|(1), \f(CW\*(C`XML::RSS\*(C'\fR, \f(CW\*(C`XML::SAX::PurePerl\*(C'\fR, \&\f(CW\*(C`XML::Parser::Lite\*(C'\fR, <XML::Parser> .SH AUTHOR .IX Header "AUTHOR" Jerrad Pierce <jpierce@cpan.org>. .PP Scott Thomason <scott@thomasons.org> .SH LICENSE .IX Header "LICENSE" Portions Copyright (c) 2002,2003,2009 Jerrad Pierce, (c) 2000 Scott Thomason. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. .SH "POD ERRORS" .IX Header "POD ERRORS" Hey! \fBThe above document had some coding errors, which are explained below:\fR .IP "Around line 480:" 4 .IX Item "Around line 480:" =back without =over