.\" -*- mode: troff; coding: utf-8 -*- .\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. .ie n \{\ . ds C` "" . ds C' "" 'br\} .el\{\ . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "HTML::Parse 3pm" .TH HTML::Parse 3pm 2024-07-13 "perl v5.38.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH NAME HTML::Parse \- Deprecated, a wrapper around HTML::TreeBuilder .SH VERSION .IX Header "VERSION" This document describes version 5.07 of HTML::Parse, released August 31, 2017 as part of HTML-Tree. .SH SYNOPSIS .IX Header "SYNOPSIS" .Vb 1 \& See the documentation for HTML::TreeBuilder .Ve .SH DESCRIPTION .IX Header "DESCRIPTION" Disclaimer: This module is provided only for backwards compatibility with earlier versions of this library. New code should \fInot\fR use this module, and should really use the HTML::Parser and HTML::TreeBuilder modules directly, instead. .PP The \f(CW\*(C`HTML::Parse\*(C'\fR module provides functions to parse HTML documents. There are two functions exported by this module: .ie n .IP "parse_html($html) or parse_html($html, $obj)" 4 .el .IP "parse_html($html) or parse_html($html, \f(CW$obj\fR)" 4 .IX Item "parse_html($html) or parse_html($html, $obj)" This function is really just a synonym for \f(CW$obj\fR\->parse($html) and \f(CW$obj\fR is assumed to be a subclass of \f(CW\*(C`HTML::Parser\*(C'\fR. Refer to HTML::Parser for more documentation. .Sp If \f(CW$obj\fR is not specified, the \f(CW$obj\fR will default to an internally created new \f(CW\*(C`HTML::TreeBuilder\*(C'\fR object configured with \fBstrict_comment()\fR turned on. That class implements a parser that builds (and is) a HTML syntax tree with HTML::Element objects as nodes. .Sp The return value from \fBparse_html()\fR is \f(CW$obj\fR. .IP "parse_htmlfile($file, [$obj])" 4 .IX Item "parse_htmlfile($file, [$obj])" Same as \fBparse_html()\fR, but pulls the HTML to parse, from the named file. .Sp Returns \f(CW\*(C`undef\*(C'\fR if the file could not be opened, or \f(CW$obj\fR otherwise. .PP When a \f(CW\*(C`HTML::TreeBuilder\*(C'\fR object is created, the following variables control how parsing takes place: .ie n .IP $HTML::Parse::IMPLICIT_TAGS 4 .el .IP \f(CW$HTML::Parse::IMPLICIT_TAGS\fR 4 .IX Item "$HTML::Parse::IMPLICIT_TAGS" Setting this variable to true will instruct the parser to try to deduce implicit elements and implicit end tags. If this variable is false you get a parse tree that just reflects the text as it stands. Might be useful for quick & dirty parsing. Default is true. .Sp Implicit elements have the \fBimplicit()\fR attribute set. .ie n .IP $HTML::Parse::IGNORE_UNKNOWN 4 .el .IP \f(CW$HTML::Parse::IGNORE_UNKNOWN\fR 4 .IX Item "$HTML::Parse::IGNORE_UNKNOWN" This variable contols whether unknow tags should be represented as elements in the parse tree. Default is true. .ie n .IP $HTML::Parse::IGNORE_TEXT 4 .el .IP \f(CW$HTML::Parse::IGNORE_TEXT\fR 4 .IX Item "$HTML::Parse::IGNORE_TEXT" Do not represent the text content of elements. This saves space if all you want is to examine the structure of the document. Default is false. .ie n .IP $HTML::Parse::WARN 4 .el .IP \f(CW$HTML::Parse::WARN\fR 4 .IX Item "$HTML::Parse::WARN" Call \fBwarn()\fR with an appropriate message for syntax errors. Default is false. .SH REMEMBER! .IX Header "REMEMBER!" HTML::TreeBuilder objects should be explicitly destroyed when you're finished with them. See HTML::TreeBuilder. .SH "SEE ALSO" .IX Header "SEE ALSO" HTML::Parser, HTML::TreeBuilder, HTML::Element .SH AUTHOR .IX Header "AUTHOR" Current maintainers: .IP \(bu 4 Christopher J. Madsen \f(CW\*(C`\*(C'\fR .IP \(bu 4 Jeff Fearn \f(CW\*(C`\*(C'\fR .PP Original HTML-Tree author: .IP \(bu 4 Gisle Aas .PP Former maintainers: .IP \(bu 4 Sean M. Burke .IP \(bu 4 Andy Lester .IP \(bu 4 Pete Krawczyk \f(CW\*(C`\*(C'\fR .PP You can follow or contribute to HTML-Tree's development at . .SH "COPYRIGHT AND LICENSE" .IX Header "COPYRIGHT AND LICENSE" Copyright 1995\-1998 Gisle Aas, 1999\-2004 Sean M. Burke, 2005 Andy Lester, 2006 Pete Krawczyk, 2010 Jeff Fearn, 2012 Christopher J. Madsen. .PP This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. .PP The programs in this library are distributed in the hope that they will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.