.\" -*- mode: troff; coding: utf-8 -*- .\" Automatically generated by Pod::Man 5.0102 (Pod::Simple 3.45) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. .ie n \{\ . ds C` "" . ds C' "" 'br\} .el\{\ . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "Pegex::Overview 3" .TH Pegex::Overview 3 2024-09-01 "perl v5.40.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "What is Pegex?" .IX Header "What is Pegex?" Pegex is a Friendly, Acmeist, PEG Parser framework. Friendly means that it is simple to create, understand, modify and maintain Pegex parsers. Acmeist means that the parsers will work automatically in many programming languages (as long as they have some kind of traditional "regex" support). PEG (Parser Expression Grammars) is the new style of Recursive\-Descent/BNF style grammar definition syntax. .PP The name "Pegex" comes from PEG + Regex. With Pegex you define top down grammars that eventually break down to regex fragments. ie The low level parsing matches are always done with regexes against the current position in the input stream. .SH "What is Parsing?" .IX Header "What is Parsing?" It may seem like a silly question, but it's important to have an understanding of what parsing is and what a parser can do for you. At the the most basic level "parsing" is the act of reading through an input, making sense of it, and possibly doing something with what is found. .PP Usually a parser gets its instructions of what means what from something called a grammar. A grammar is a set of rules that defines how the input must be structured. In many parsing methodologies, input is preprocessed (possibly into tokens) before the parser/grammar get to look at it. Although this is a common method, it is not the only approach. .SH "How Pegex Works" .IX Header "How Pegex Works" Pegex parsing consists of 4 distinct parts or objects: .IP Parser 4 .IX Item "Parser" The Pegex parsing engine .IP Grammar 4 .IX Item "Grammar" The rules of a particular syntax .IP Receiver 4 .IX Item "Receiver" The logic for processing matches .IP Input 4 .IX Item "Input" Text conforming to the grammar rules .PP Quite simply, a parser object is created with a grammar object and a receiver object. Then the parser object's \f(CWparse()\fR method is called on an input object. The parser applies the rules of the grammar to the input and invokes methods of the receiver as the rules match. The parse is either successful or results in an error. The result is whatever the receiver object decides it should be. .PP For example consider a parser that turns the Markdown text language into HTML. The Pegex code to use this might look like this: .PP In the simplest terms, Pegex works like this (pseudocode): .PP .Vb 5 \& parser = new Pegex.Parser( \& grammar: new Markdown.Grammar \& receiver: new Markdown.Receiver.HTML \& ) \& html = parser.parse(markdown) .Ve .SH "See Also" .IX Header "See Also" .IP \(bu 4 Pegex::API .IP \(bu 4 Pegex::Syntax .IP \(bu 4 Pegex::Tutorial