PERLFAQ9(7) Perl Programmers Reference Guide PERLFAQ9(7) NAME perlfaq9 - (2003/01/31 17:36:57 ) DESCRIPTION web What is the correct form of response from a CGI script? (Alan Flavell answers...) The Common Gateway Interface (CGI) specifies a software interface between a program ("CGI script") and a web server (HTTPD). It is not specific to Perl, and has its own FAQs and tutorials, and usenet group, comp.infosystems.www.authoring.cgi The original CGI specification is at: http://hoohoo.ncsa.uiuc.edu/cgi/ Current best-practice RFC draft at: http://CGI-Spec.Golux.Com/ Other relevant documentation listed in: http://www.perl.org/CGI_MetaFAQ.html These Perl FAQs very selectively cover some CGI issues. However, Perl programmers are strongly advised to use the CGI.pm module, to take care of the details for them. The similarity between CGI response headers (defined in the CGI specification) and HTTP response headers (defined in the HTTP specification, RFC2616) is intentional, but can sometimes be confusing. The CGI specification defines two kinds of script: the "Parsed Header" script, and the "Non Parsed Header" (NPH) script. Check your server documentation to see what it supports. "Parsed Header" scripts are simpler in various respects. The CGI specification allows any of the usual newline representations in the CGI response (it's the server's job to create an accurate HTTP response based on it). So "\n" written in text mode is technically correct, and recommended. NPH scripts are more tricky: they must put out a complete and accurate set of HTTP transaction response headers; the HTTP specification calls for records to be terminated with carriage-return and line-feed, i.e ASCII \015\012 written in binary mode. Using CGI.pm gives excellent platform independence, including EBCDIC systems. CGI.pm selects an appropriate newline representation ($CGI::CRLF) and sets binmode as appropriate. CGI (500 Server Error) "Troubleshooting Perl CGI scripts" guide, http://www.perl.org/troubleshooting_CGI.html FAQ post comp.infosystems.www.authoring.cgi HTTP HTML CGI Perl CGI post comp.lang.perl.misc FAQ CGI Meta FAQ http://www.perl.org/CGI_MetaFAQ.html CGI Use the CGI::Carp module. It replaces "warn" and "die", plus the normal Carp modules "carp", "croak", and "confess" functions with more verbose and safer versions. It still sends them to the normal server error log. use CGI::Carp; warn "This is a complaint"; die "But this one is serious"; The following use of CGI::Carp also redirects errors to a file of your choice, placed in a BEGIN block to catch compile-time warnings as well: BEGIN { use CGI::Carp qw(carpout); open(LOG, ">>/var/local/cgi-logs/mycgi-log") or die "Unable to append to mycgi-log: $!\n"; carpout(*LOG); } You can even arrange for fatal errors to go back to the client browser, which is nice for your own debugging, but might confuse the end user. use CGI::Carp qw(fatalsToBrowser); die "Bad error here"; Even if the error happens before you get the HTTP header out, the module will try to take care of this to avoid the dreaded server 500 errors. Normal warnings still go out to the server error log (or wherever you've sent them with "carpout") with the application name and date stamp prepended. HTML HTML::Parse CPAN Web libwww-perl HTML::FormatText HTML "s/<.*?>//g" quote HTML comment < entities "<" #!/usr/bin/perl -p0777 s/<(?:[^>'"]*|(['"]).*?\1)*>//gs striphtml http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz . Here are some tricky cases that you should think about when picking a solution: A > B A > B <# Just data #> >>>>>>>>>>> ]]> If HTML comments include other tags, those solutions would also break on text like this: URL? HTML URL "HTML::SimpleLinkExtor" URL "HTML::LinkExtor" "HTML::Parser". "HTML::SimpleLinkExtor" You can use URI::Find to extract URLs from an arbitrary text document. Less complete solutions involving regular expressions can save you a lot of processing time if you know that the input is simple. One solution from Tom Christiansen runs 100 times faster than most module based approaches but only extracts URLs from anchors where the first attribute is HREF and there are no other attributes. #!/usr/bin/perl -n00 # qxurl - tchrist@perl.com print "$2\n" while m{ < \s* A \s+ HREF \s* = \s* (["']) (.*?) \1 \s* > }gsix; In this case, download means to use the file upload feature of HTML forms. You allow the web surfer to specify a file to send to your web server. To you it looks like a download, and to the user it looks like an upload. No matter what you call it, you do it with what's known as multipart/form-data encoding. The CGI.pm module (which comes with Perl as part of the Standard Library) supports this in the start_multipart_form() method, which isn't the same as the startform() method. See the section in the CGI.pm documentation on file uploads for code examples and details. HTML ?