.\" -*- mode: troff; coding: utf-8 -*- .\" Automatically generated by Pod::Man 5.0102 (Pod::Simple 3.45) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. .ie n \{\ . ds C` "" . ds C' "" 'br\} .el\{\ . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "Pegex::API 3" .TH Pegex::API 3 2024-09-01 "perl v5.40.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "The Pegex API" .IX Header "The Pegex API" Pegex can be used in many ways: inside scripts, from the command line or as the foundation of a modular parsing framework. This document details the various ways to use Pegex. .PP At the most abstract level, Pegex works like this: .PP .Vb 1 \& $result = $parser\->new($grammar, $receiver)\->parse($input); .Ve .PP Which is to say, abstractly: a Pegex parser, under the direction of a Pegex grammar, parses an input stream, and reports matches to a Pegex receiver, which produces a result. .PP The parser, grammar, receiver and even the input, are Pegex objects. These 4 objects are involved in every Pegex parse operation, so let's review them briefly: .IP Pegex::Parser 4 .IX Item "Pegex::Parser" The Pegex parsing engine. This engine applies the logic of the grammar to an input text. A \fBparser\fR object contains a \fBgrammar\fR object and a \fBreceiver\fR object. Its primary method is called \f(CW\*(C`parse\*(C'\fR. The default parser engine is non-backtracking, recursive descent. However there are parser subclasses for various alternative types of parsing. .IP Pegex::Grammar 4 .IX Item "Pegex::Grammar" A Pegex grammar starts as a text file/string composed in the \fBPegex\fR syntax. Before it can be used in by a Parser it must be compiled. After compilation, it is turned into a data tree consisting of rules and regexes. In modules that are based on a Pegex grammar, the grammar will be compiled into a class file. Pegex itself, uses a Pegex grammar class called Pegex::Pegex::Grammar to parse various Pegex grammars. .IP Pegex::Receiver 4 .IX Item "Pegex::Receiver" A parser on it's own has no idea what to do with the text it matches. A Pegex \&\fBreceiver\fR is a class that contains methods corresponding to the rules in a grammar. As a rule in the grammar matches, its corresponding receiver method (if one exists) is called with the data that has been matched. It is the receiver's job to take action on the data, often building it into some new structure. Pegex will use Pegex::Tree::Wrap as the default receiver; it produces a reasonably readable tree of the matched/captured data. .IP Pegex::Input 4 .IX Item "Pegex::Input" Pegex abstracts its input streams into an object interface as well. Any operation that can take an input string, can also take an input object. Pegex will turn regular strings into these objects. This is probably the API concept you will encounter the least, but it is covered here for completeness. .PP All of these object classes can be subclassed to achieve various results. Normally, you will write your own Pegex grammar and a Pegex receiver to achieve a task. .ie n .SS "Starting Simple \- The ""pegex"" Function" .el .SS "Starting Simple \- The \f(CWpegex\fP Function" .IX Subsection "Starting Simple - The pegex Function" The Pegex module exports a function called \f(CW\*(C`pegex\*(C'\fR that you can use for smaller tasks. Here is an example: .PP .Vb 2 \& use Pegex; \& use YAML; \& \& $grammar = " \& expr: num PLUS num \& num: /( DIGIT+ )/ \& "; \& \& print Dump pegex($grammar)\->parse(\*(Aq2+2\*(Aq); .Ve .PP This program would produce: .PP .Vb 3 \& expr: \& \- num: 2 \& \- num: 2 .Ve .PP Let's review what's happening here. The Pegex module is exporting a \&\f(CW\*(C`pegex\*(C'\fR function. This function takes a Pegex grammar string as input. Internally this function compiles the grammar string into a grammar object. Then it creates a parser object containing the grammar object and returns it. .PP The parse method is called on the input string: \f(CW\*(Aq2+2\*(Aq\fR. The string matches, and a nice data structure is returned. .PP So how was the data structure created? By the receiver object, of course! But we didn't specify one, did we? Nope. It used the default receiver, Pegex::Tree::Wrap. We could have said: .PP .Vb 1 \& print Dump pegex($grammar, \*(AqPegex::Tree::Wrap\*(Aq)\->parse(\*(Aq2+2\*(Aq); .Ve .PP This receiver basically generates a mapping, where rule names of matches are the keys, and the leaf values are the regex captures. .PP The more basic receiver called Pegex::Tree generates a tree of sequences that contain just the data (without the rule names). This code: .PP .Vb 1 \& print Dump pegex($grammar, \*(AqPegex::Tree\*(Aq)\->parse(\*(Aq2+2\*(Aq); .Ve .PP would produce: .PP .Vb 2 \& \- 2 \& \- 2 .Ve .PP If we wrote our own receiver class called \f(CW\*(C`Calculator\*(C'\fR like this: .PP .Vb 2 \& package Calculator; \& use base \*(AqPegex::Tree\*(Aq; \& \& sub got_expr { \& my ($receiver, $data) = @_; \& my ($a, $b) = @$data; \& return $a + $b; \& } .Ve .PP Then, this: .PP .Vb 1 \& print pegex(grammar, \*(AqCalculator\*(Aq)\->parse(\*(Aq2+2\*(Aq); .Ve .PP would print: .PP .Vb 1 \& 4 .Ve .SS "More Explicit Usage" .IX Subsection "More Explicit Usage" Continuing with the example above, let's see how to do it a little more formally. .PP .Vb 5 \& use Pegex::Parser; \& use Pegex::Grammar; \& use Pegex::Tree; \& use Pegex::Input; \& use YAML; \& \& $grammar_text = " \& expr: num PLUS num \& num: /( DIGIT+ )/ \& "; \& \& $grammar = Pegex::Grammar\->new(text => $grammar_text); \& $receiver = Pegex::Tree\->new(); \& $parser = Pegex::Parser\->new( \& grammar => $grammar, \& receiver => $receiver, \& ); \& $input = Pegex::Input\->new(string => \*(Aq2+2\*(Aq); \& \& print Dump $parser\->parse($input); .Ve .PP This code does the same thing as the first example, but this time we've made all the objects ourselves. .SS "Precompiled Grammars" .IX Subsection "Precompiled Grammars" If you ship a Pegex grammar as part of a CPAN distribution, you'll want it to be precompiled into a module. Pegex makes that easy. .PP Say the grammar_text about is stored in a file called \f(CW\*(C`share/expr.pgx\*(C'\fR. If you create a module called \f(CW\*(C`lib/MyThing/Grammar.pm\*(C'\fR with content like this: .PP .Vb 6 \& package MyThing::Grammar; \& use base \*(AqPegex::Grammar\*(Aq; \& use constant file => \*(Aq./share/expr.pgx\*(Aq; \& sub make_tree { \& } \& 1; .Ve .PP Then run this command line: .PP .Vb 1 \& perl \-Ilib \-MMyThing::Grammar=compile .Ve .PP It will rewrite your module to look something like this: .PP .Vb 10 \& package MyThing::Grammar; \& use base \*(AqPegex::Grammar\*(Aq; \& use constant file => \*(Aq./share/expr.pgx\*(Aq; \& sub make_tree { \& { \*(Aq+toprule\*(Aq => \*(Aqexpr\*(Aq, \& \*(AqPLUS\*(Aq => { \*(Aq.rgx\*(Aq => qr/\eG\e+/ }, \& \*(Aqexpr\*(Aq => { \& \*(Aq.all\*(Aq => [ \& { \*(Aq.ref\*(Aq => \*(Aqnum\*(Aq }, \& { \*(Aq.ref\*(Aq => \*(AqPLUS\*(Aq }, \& { \*(Aq.ref\*(Aq => \*(Aqnum\*(Aq } \& ] \& }, \& \*(Aqnum\*(Aq => { \*(Aq.rgx\*(Aq => qr/\eG([0\-9]+)/ } \& } \& } \& 1; .Ve .PP This command found the file where your grammar is, compiled it, and used Data::Dumper to output it back into your module's \f(CW\*(C`make_tree\*(C'\fR method. .PP This is what a compiled Pegex grammar looks like. As soon as this module is loaded, the grammar is ready to be used by Pegex. .PP \fIAutomatically rebuilding during development with environment variable\fR .IX Subsection "Automatically rebuilding during development with environment variable" .PP If you find yourself needing to compile your grammar module a lot during development, just set this environment variable like so: .PP .Vb 1 \& export PERL_PEGEX_AUTO_COMPILE=MyThing::Grammar .Ve .PP Now, every time the grammar module is loaded it will check to see if it needs to be recompiled, and do it on the fly. .PP If you have more than one grammar to recompile, just list all the names separated by commas. .PP \fIAutomatically rebuilding during development using \fR\f(CI\*(C`make\*(C'\fR .IX Subsection "Automatically rebuilding during development using make" .PP Alternatively, if your module uses \f(CW\*(C`ExtUtils::MakeMaker\*(C'\fR, you can have \&\f(CW\*(C`make\*(C'\fR automatically rebuild your \f(CW\*(C`Grammar\*(C'\fR class if your \f(CW\*(C`.pgx\*(C'\fR file is updated. .PP Simply add this at the bottom of your \f(CW\*(C`Makefile.PL\*(C'\fR: .PP .Vb 6 \& sub MY::postamble { \& <<EOF; \& lib/MyThing/Grammar.pm : share/expr.pgx \& \et\e$(PERL) \-Ilib \-MMyThing::Grammar=compile \& EOF \& } .Ve .SH "See Also" .IX Header "See Also" .IP \(bu 4 Pegex::Parser .IP \(bu 4 Pegex::Grammar .IP \(bu 4 Pegex::Receiver .IP \(bu 4 Pegex::Tree .IP \(bu 4 Pegex::Tree::Wrap .IP \(bu 4 Pegex::Input