| <html> | 
 | <head> | 
 | <title>PLY Internals</title> | 
 | </head> | 
 | <body bgcolor="#ffffff"> | 
 |  | 
 | <h1>PLY Internals</h1> | 
 |   | 
 | <b> | 
 | David M. Beazley <br> | 
 | dave@dabeaz.com<br> | 
 | </b> | 
 |  | 
 | <p> | 
 | <b>PLY Version: 3.0</b> | 
 | <p> | 
 |  | 
 | <!-- INDEX --> | 
 | <div class="sectiontoc"> | 
 | <ul> | 
 | <li><a href="#internal_nn1">Introduction</a> | 
 | <li><a href="#internal_nn2">Grammar Class</a> | 
 | <li><a href="#internal_nn3">Productions</a> | 
 | <li><a href="#internal_nn4">LRItems</a> | 
 | <li><a href="#internal_nn5">LRTable</a> | 
 | <li><a href="#internal_nn6">LRGeneratedTable</a> | 
 | <li><a href="#internal_nn7">LRParser</a> | 
 | <li><a href="#internal_nn8">ParserReflect</a> | 
 | <li><a href="#internal_nn9">High-level operation</a> | 
 | </ul> | 
 | </div> | 
 | <!-- INDEX --> | 
 |  | 
 |  | 
 | <H2><a name="internal_nn1"></a>1. Introduction</H2> | 
 |  | 
 |  | 
 | This document describes classes and functions that make up the internal | 
 | operation of PLY.  Using this programming interface, it is possible to | 
 | manually build an parser using a different interface specification | 
 | than what PLY normally uses.  For example, you could build a gramar | 
 | from information parsed in a completely different input format.  Some of | 
 | these objects may be useful for building more advanced parsing engines | 
 | such as GLR. | 
 |  | 
 | <p> | 
 | It should be stressed that using PLY at this level is not for the | 
 | faint of heart.  Generally, it's assumed that you know a bit of | 
 | the underlying compiler theory and how an LR parser is put together. | 
 |  | 
 | <H2><a name="internal_nn2"></a>2. Grammar Class</H2> | 
 |  | 
 |  | 
 | The file <tt>ply.yacc</tt> defines a class <tt>Grammar</tt> that  | 
 | is used to hold and manipulate information about a grammar | 
 | specification.   It encapsulates the same basic information | 
 | about a grammar that is put into a YACC file including  | 
 | the list of tokens, precedence rules, and grammar rules.  | 
 | Various operations are provided to perform different validations | 
 | on the grammar.  In addition, there are operations to compute | 
 | the first and follow sets that are needed by the various table | 
 | generation algorithms. | 
 |  | 
 | <p> | 
 | <tt><b>Grammar(terminals)</b></tt> | 
 |  | 
 | <blockquote> | 
 | Creates a new grammar object.  <tt>terminals</tt> is a list of strings | 
 | specifying the terminals for the grammar.  An instance <tt>g</tt> of | 
 | <tt>Grammar</tt> has the following methods: | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.set_precedence(term,assoc,level)</tt></b> | 
 | <blockquote> | 
 | Sets the precedence level and associativity for a given terminal <tt>term</tt>.   | 
 | <tt>assoc</tt> is one of <tt>'right'</tt>, | 
 | <tt>'left'</tt>, or <tt>'nonassoc'</tt> and <tt>level</tt> is a positive integer.  The higher | 
 | the value of <tt>level</tt>, the higher the precedence.  Here is an example of typical | 
 | precedence settings: | 
 |  | 
 | <pre> | 
 | g.set_precedence('PLUS',  'left',1) | 
 | g.set_precedence('MINUS', 'left',1) | 
 | g.set_precedence('TIMES', 'left',2) | 
 | g.set_precedence('DIVIDE','left',2) | 
 | g.set_precedence('UMINUS','left',3) | 
 | </pre> | 
 |  | 
 | This method must be called prior to adding any productions to the | 
 | grammar with <tt>g.add_production()</tt>.  The precedence of individual grammar | 
 | rules is determined by the precedence of the right-most terminal. | 
 |  | 
 | </blockquote> | 
 | <p> | 
 | <b><tt>g.add_production(name,syms,func=None,file='',line=0)</tt></b> | 
 | <blockquote> | 
 | Adds a new grammar rule.  <tt>name</tt> is the name of the rule, | 
 | <tt>syms</tt> is a list of symbols making up the right hand | 
 | side of the rule, <tt>func</tt> is the function to call when | 
 | reducing the rule.   <tt>file</tt> and <tt>line</tt> specify | 
 | the filename and line number of the rule and are used for | 
 | generating error messages.     | 
 |  | 
 | <p> | 
 | The list of symbols in <tt>syms</tt> may include character | 
 | literals and <tt>%prec</tt> specifiers.  Here are some | 
 | examples: | 
 |  | 
 | <pre> | 
 | g.add_production('expr',['expr','PLUS','term'],func,file,line) | 
 | g.add_production('expr',['expr','"+"','term'],func,file,line) | 
 | g.add_production('expr',['MINUS','expr','%prec','UMINUS'],func,file,line) | 
 | </pre> | 
 |  | 
 | <p> | 
 | If any kind of error is detected, a <tt>GrammarError</tt> exception | 
 | is raised with a message indicating the reason for the failure. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.set_start(start=None)</tt></b> | 
 | <blockquote> | 
 | Sets the starting rule for the grammar.  <tt>start</tt> is a string | 
 | specifying the name of the start rule.   If <tt>start</tt> is omitted, | 
 | the first grammar rule added with <tt>add_production()</tt> is taken to be | 
 | the starting rule.  This method must always be called after all | 
 | productions have been added. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.find_unreachable()</tt></b> | 
 | <blockquote> | 
 | Diagnostic function.  Returns a list of all unreachable non-terminals | 
 | defined in the grammar.  This is used to identify inactive parts of | 
 | the grammar specification. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.infinite_cycle()</tt></b> | 
 | <blockquote> | 
 | Diagnostic function.  Returns a list of all non-terminals in the | 
 | grammar that result in an infinite cycle.  This condition occurs if | 
 | there is no way for a grammar rule to expand to a string containing | 
 | only terminal symbols. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.undefined_symbols()</tt></b> | 
 | <blockquote> | 
 | Diagnostic function.  Returns a list of tuples <tt>(name, prod)</tt> | 
 | corresponding to undefined symbols in the grammar. <tt>name</tt> is the | 
 | name of the undefined symbol and <tt>prod</tt> is an instance of  | 
 | <tt>Production</tt> which has information about the production rule | 
 | where the undefined symbol was used. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.unused_terminals()</tt></b> | 
 | <blockquote> | 
 | Diagnostic function.  Returns a list of terminals that were defined, | 
 | but never used in the grammar. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.unused_rules()</tt></b> | 
 | <blockquote> | 
 | Diagnostic function.  Returns a list of <tt>Production</tt> instances | 
 | corresponding to production rules that were defined in the grammar, | 
 | but never used anywhere.  This is slightly different | 
 | than <tt>find_unreachable()</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.unused_precedence()</tt></b> | 
 | <blockquote> | 
 | Diagnostic function.  Returns a list of tuples <tt>(term, assoc)</tt>  | 
 | corresponding to precedence rules that were set, but never used the | 
 | grammar.  <tt>term</tt> is the terminal name and <tt>assoc</tt> is the | 
 | precedence associativity (e.g., <tt>'left'</tt>, <tt>'right'</tt>,  | 
 | or <tt>'nonassoc'</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.compute_first()</tt></b> | 
 | <blockquote> | 
 | Compute all of the first sets for all symbols in the grammar.  Returns a dictionary | 
 | mapping symbol names to a list of all first symbols. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.compute_follow()</tt></b> | 
 | <blockquote> | 
 | Compute all of the follow sets for all non-terminals in the grammar. | 
 | The follow set is the set of all possible symbols that might follow a | 
 | given non-terminal.  Returns a dictionary mapping non-terminal names | 
 | to a list of symbols. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.build_lritems()</tt></b> | 
 | <blockquote> | 
 | Calculates all of the LR items for all productions in the grammar.  This | 
 | step is required before using the grammar for any kind of table generation. | 
 | See the section on LR items below. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | The following attributes are set by the above methods and may be useful | 
 | in code that works with the grammar.  All of these attributes should be | 
 | assumed to be read-only.  Changing their values directly will likely  | 
 | break the grammar. | 
 |  | 
 | <p> | 
 | <b><tt>g.Productions</tt></b> | 
 | <blockquote> | 
 | A list of all productions added.  The first entry is reserved for | 
 | a production representing the starting rule.  The objects in this list | 
 | are instances of the <tt>Production</tt> class, described shortly. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.Prodnames</tt></b> | 
 | <blockquote> | 
 | A dictionary mapping the names of nonterminals to a list of all | 
 | productions of that nonterminal. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.Terminals</tt></b> | 
 | <blockquote> | 
 | A dictionary mapping the names of terminals to a list of the | 
 | production numbers where they are used. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.Nonterminals</tt></b> | 
 | <blockquote> | 
 | A dictionary mapping the names of nonterminals to a list of the | 
 | production numbers where they are used. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.First</tt></b> | 
 | <blockquote> | 
 | A dictionary representing the first sets for all grammar symbols.  This is | 
 | computed and returned by the <tt>compute_first()</tt> method. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.Follow</tt></b> | 
 | <blockquote> | 
 | A dictionary representing the follow sets for all grammar rules.  This is | 
 | computed and returned by the <tt>compute_follow()</tt> method. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>g.Start</tt></b> | 
 | <blockquote> | 
 | Starting symbol for the grammar.  Set by the <tt>set_start()</tt> method. | 
 | </blockquote> | 
 |  | 
 | For the purposes of debugging, a <tt>Grammar</tt> object supports the <tt>__len__()</tt> and | 
 | <tt>__getitem__()</tt> special methods.  Accessing <tt>g[n]</tt> returns the nth production | 
 | from the grammar. | 
 |  | 
 |  | 
 | <H2><a name="internal_nn3"></a>3. Productions</H2> | 
 |  | 
 |  | 
 | <tt>Grammar</tt> objects store grammar rules as instances of a <tt>Production</tt> class.  This | 
 | class has no public constructor--you should only create productions by calling <tt>Grammar.add_production()</tt>. | 
 | The following attributes are available on a <tt>Production</tt> instance <tt>p</tt>. | 
 |  | 
 | <p> | 
 | <b><tt>p.name</tt></b> | 
 | <blockquote> | 
 | The name of the production. For a grammar rule such as <tt>A : B C D</tt>, this is <tt>'A'</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.prod</tt></b> | 
 | <blockquote> | 
 | A tuple of symbols making up the right-hand side of the production.  For a grammar rule such as <tt>A : B C D</tt>, this is <tt>('B','C','D')</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.number</tt></b> | 
 | <blockquote> | 
 | Production number.  An integer containing the index of the production in the grammar's <tt>Productions</tt> list. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.func</tt></b> | 
 | <blockquote> | 
 | The name of the reduction function associated with the production. | 
 | This is the function that will execute when reducing the entire | 
 | grammar rule during parsing. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.callable</tt></b> | 
 | <blockquote> | 
 | The callable object associated with the name in <tt>p.func</tt>.  This is <tt>None</tt> | 
 | unless the production has been bound using <tt>bind()</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.file</tt></b> | 
 | <blockquote> | 
 | Filename associated with the production.  Typically this is the file where the production was defined.  Used for error messages. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.lineno</tt></b> | 
 | <blockquote> | 
 | Line number associated with the production.  Typically this is the line number in <tt>p.file</tt> where the production was defined.  Used for error messages. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.prec</tt></b> | 
 | <blockquote> | 
 | Precedence and associativity associated with the production.  This is a tuple <tt>(assoc,level)</tt> where | 
 | <tt>assoc</tt> is one of <tt>'left'</tt>,<tt>'right'</tt>, or <tt>'nonassoc'</tt> and <tt>level</tt> is | 
 | an integer.   This value is determined by the precedence of the right-most terminal symbol in the production | 
 | or by use of the <tt>%prec</tt> specifier when adding the production. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.usyms</tt></b> | 
 | <blockquote> | 
 | A list of all unique symbols found in the production. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.lr_items</tt></b> | 
 | <blockquote> | 
 | A list of all LR items for this production.  This attribute only has a meaningful value if the | 
 | <tt>Grammar.build_lritems()</tt> method has been called.  The items in this list are  | 
 | instances of <tt>LRItem</tt> described below. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.lr_next</tt></b> | 
 | <blockquote> | 
 | The head of a linked-list representation of the LR items in <tt>p.lr_items</tt>.   | 
 | This attribute only has a meaningful value if the <tt>Grammar.build_lritems()</tt>  | 
 | method has been called.  Each <tt>LRItem</tt> instance has a <tt>lr_next</tt> attribute | 
 | to move to the next item.  The list is terminated by <tt>None</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.bind(dict)</tt></b> | 
 | <blockquote> | 
 | Binds the production function name in <tt>p.func</tt> to a callable object in  | 
 | <tt>dict</tt>.   This operation is typically carried out in the last step | 
 | prior to running the parsing engine and is needed since parsing tables are typically | 
 | read from files which only include the function names, not the functions themselves. | 
 | </blockquote> | 
 |  | 
 | <P> | 
 | <tt>Production</tt> objects support | 
 | the <tt>__len__()</tt>, <tt>__getitem__()</tt>, and <tt>__str__()</tt> | 
 | special methods. | 
 | <tt>len(p)</tt> returns the number of symbols in <tt>p.prod</tt> | 
 | and <tt>p[n]</tt> is the same as <tt>p.prod[n]</tt>.  | 
 |  | 
 | <H2><a name="internal_nn4"></a>4. LRItems</H2> | 
 |  | 
 |  | 
 | The construction of parsing tables in an LR-based parser generator is primarily | 
 | done over a set of "LR Items".   An LR item represents a stage of parsing one | 
 | of the grammar rules.   To compute the LR items, it is first necessary to | 
 | call <tt>Grammar.build_lritems()</tt>.  Once this step, all of the productions | 
 | in the grammar will have their LR items attached to them. | 
 |  | 
 | <p> | 
 | Here is an interactive example that shows what LR items look like if you | 
 | interactively experiment.  In this example, <tt>g</tt> is a <tt>Grammar</tt>  | 
 | object. | 
 |  | 
 | <blockquote> | 
 | <pre> | 
 | >>> <b>g.build_lritems()</b> | 
 | >>> <b>p = g[1]</b> | 
 | >>> <b>p</b> | 
 | Production(statement -> ID = expr) | 
 | >>> | 
 | </pre> | 
 | </blockquote> | 
 |  | 
 | In the above code, <tt>p</tt> represents the first grammar rule. In | 
 | this case, a rule <tt>'statement -> ID = expr'</tt>. | 
 |  | 
 | <p> | 
 | Now, let's look at the LR items for <tt>p</tt>. | 
 |  | 
 | <blockquote> | 
 | <pre> | 
 | >>> <b>p.lr_items</b> | 
 | [LRItem(statement -> . ID = expr),  | 
 |  LRItem(statement -> ID . = expr),  | 
 |  LRItem(statement -> ID = . expr),  | 
 |  LRItem(statement -> ID = expr .)] | 
 | >>> | 
 | </pre> | 
 | </blockquote> | 
 |  | 
 | In each LR item, the dot (.) represents a specific stage of parsing.  In each LR item, the dot | 
 | is advanced by one symbol.  It is only when the dot reaches the very end that a production | 
 | is successfully parsed. | 
 |  | 
 | <p> | 
 | An instance <tt>lr</tt> of <tt>LRItem</tt> has the following | 
 | attributes that hold information related to that specific stage of | 
 | parsing. | 
 |  | 
 | <p> | 
 | <b><tt>lr.name</tt></b> | 
 | <blockquote> | 
 | The name of the grammar rule. For example, <tt>'statement'</tt> in the above example. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.prod</tt></b> | 
 | <blockquote> | 
 | A tuple of symbols representing the right-hand side of the production, including the | 
 | special <tt>'.'</tt> character.  For example, <tt>('ID','.','=','expr')</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.number</tt></b> | 
 | <blockquote> | 
 | An integer representing the production number in the grammar. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.usyms</tt></b> | 
 | <blockquote> | 
 | A set of unique symbols in the production.  Inherited from the original <tt>Production</tt> instance. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.lr_index</tt></b> | 
 | <blockquote> | 
 | An integer representing the position of the dot (.).  You should never use <tt>lr.prod.index()</tt> | 
 | to search for it--the result will be wrong if the grammar happens to also use (.) as a character | 
 | literal. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.lr_after</tt></b> | 
 | <blockquote> | 
 | A list of all productions that can legally appear immediately to the right of the | 
 | dot (.).  This list contains <tt>Production</tt> instances.   This attribute | 
 | represents all of the possible branches a parse can take from the current position. | 
 | For example, suppose that <tt>lr</tt> represents a stage immediately before | 
 | an expression like this: | 
 |  | 
 | <pre> | 
 | >>> <b>lr</b> | 
 | LRItem(statement -> ID = . expr) | 
 | >>> | 
 | </pre> | 
 |  | 
 | Then, the value of <tt>lr.lr_after</tt> might look like this, showing all productions that | 
 | can legally appear next: | 
 |  | 
 | <pre> | 
 | >>> <b>lr.lr_after</b> | 
 | [Production(expr -> expr PLUS expr),  | 
 |  Production(expr -> expr MINUS expr),  | 
 |  Production(expr -> expr TIMES expr),  | 
 |  Production(expr -> expr DIVIDE expr),  | 
 |  Production(expr -> MINUS expr),  | 
 |  Production(expr -> LPAREN expr RPAREN),  | 
 |  Production(expr -> NUMBER),  | 
 |  Production(expr -> ID)] | 
 | >>> | 
 | </pre> | 
 |  | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.lr_before</tt></b> | 
 | <blockquote> | 
 | The grammar symbol that appears immediately before the dot (.) or <tt>None</tt> if | 
 | at the beginning of the parse.   | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.lr_next</tt></b> | 
 | <blockquote> | 
 | A link to the next LR item, representing the next stage of the parse.  <tt>None</tt> if <tt>lr</tt> | 
 | is the last LR item. | 
 | </blockquote> | 
 |  | 
 | <tt>LRItem</tt> instances also support the <tt>__len__()</tt> and <tt>__getitem__()</tt> special methods. | 
 | <tt>len(lr)</tt> returns the number of items in <tt>lr.prod</tt> including the dot (.). <tt>lr[n]</tt> | 
 | returns <tt>lr.prod[n]</tt>. | 
 |  | 
 | <p> | 
 | It goes without saying that all of the attributes associated with LR | 
 | items should be assumed to be read-only.  Modifications will very | 
 | likely create a small black-hole that will consume you and your code. | 
 |  | 
 | <H2><a name="internal_nn5"></a>5. LRTable</H2> | 
 |  | 
 |  | 
 | The <tt>LRTable</tt> class is used to represent LR parsing table data. This | 
 | minimally includes the production list, action table, and goto table.  | 
 |  | 
 | <p> | 
 | <b><tt>LRTable()</tt></b> | 
 | <blockquote> | 
 | Create an empty LRTable object.  This object contains only the information needed to | 
 | run an LR parser.   | 
 | </blockquote> | 
 |  | 
 | An instance <tt>lrtab</tt> of <tt>LRTable</tt> has the following methods: | 
 |  | 
 | <p> | 
 | <b><tt>lrtab.read_table(module)</tt></b> | 
 | <blockquote> | 
 | Populates the LR table with information from the module specified in <tt>module</tt>. | 
 | <tt>module</tt> is either a module object already loaded with <tt>import</tt> or | 
 | the name of a Python module.   If it's a string containing a module name, it is | 
 | loaded and parsing data is extracted.   Returns the signature  value that was used | 
 | when initially writing the tables.  Raises a <tt>VersionError</tt> exception if | 
 | the module was created using an incompatible version of PLY. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lrtab.bind_callables(dict)</tt></b> | 
 | <blockquote> | 
 | This binds all of the function names used in productions to callable objects | 
 | found in the dictionary <tt>dict</tt>.  During table generation and when reading | 
 | LR tables from files, PLY only uses the names of action functions such as <tt>'p_expr'</tt>, | 
 | <tt>'p_statement'</tt>, etc.  In order to actually run the parser, these names | 
 | have to be bound to callable objects.   This method is always called prior to | 
 | running a parser. | 
 | </blockquote> | 
 |  | 
 | After <tt>lrtab</tt> has been populated, the following attributes are defined. | 
 |  | 
 | <p> | 
 | <b><tt>lrtab.lr_method</tt></b> | 
 | <blockquote> | 
 | The LR parsing method used (e.g., <tt>'LALR'</tt>) | 
 | </blockquote> | 
 |  | 
 |  | 
 | <p> | 
 | <b><tt>lrtab.lr_productions</tt></b> | 
 | <blockquote> | 
 | The production list.  If the parsing tables have been newly | 
 | constructed, this will be a list of <tt>Production</tt> instances.  If | 
 | the parsing tables have been read from a file, it's a list | 
 | of <tt>MiniProduction</tt> instances.  This, together | 
 | with <tt>lr_action</tt> and <tt>lr_goto</tt> contain all of the | 
 | information needed by the LR parsing engine. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lrtab.lr_action</tt></b> | 
 | <blockquote> | 
 | The LR action dictionary that implements the underlying state machine. | 
 | The keys of this dictionary are the LR states. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lrtab.lr_goto</tt></b> | 
 | <blockquote> | 
 | The LR goto table that contains information about grammar rule reductions. | 
 | </blockquote> | 
 |  | 
 |  | 
 | <H2><a name="internal_nn6"></a>6. LRGeneratedTable</H2> | 
 |  | 
 |  | 
 | The <tt>LRGeneratedTable</tt> class represents constructed LR parsing tables on a | 
 | grammar.  It is a subclass of <tt>LRTable</tt>. | 
 |  | 
 | <p> | 
 | <b><tt>LRGeneratedTable(grammar, method='LALR',log=None)</tt></b> | 
 | <blockquote> | 
 | Create the LR parsing tables on a grammar.  <tt>grammar</tt> is an instance of <tt>Grammar</tt>, | 
 | <tt>method</tt> is a string with the parsing method (<tt>'SLR'</tt> or <tt>'LALR'</tt>), and | 
 | <tt>log</tt> is a logger object used to write debugging information.  The debugging information | 
 | written to <tt>log</tt> is the same as what appears in the <tt>parser.out</tt> file created | 
 | by yacc.  By supplying a custom logger with a different message format, it is possible to get | 
 | more information (e.g., the line number in <tt>yacc.py</tt> used for issuing each line of | 
 | output in the log).   The result is an instance of <tt>LRGeneratedTable</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | An instance <tt>lr</tt> of <tt>LRGeneratedTable</tt> has the following attributes. | 
 |  | 
 | <p> | 
 | <b><tt>lr.grammar</tt></b> | 
 | <blockquote> | 
 | A link to the Grammar object used to construct the parsing tables. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.lr_method</tt></b> | 
 | <blockquote> | 
 | The LR parsing method used (e.g., <tt>'LALR'</tt>) | 
 | </blockquote> | 
 |  | 
 |  | 
 | <p> | 
 | <b><tt>lr.lr_productions</tt></b> | 
 | <blockquote> | 
 | A reference to <tt>grammar.Productions</tt>.  This, together with <tt>lr_action</tt> and <tt>lr_goto</tt> | 
 | contain all of the information needed by the LR parsing engine. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.lr_action</tt></b> | 
 | <blockquote> | 
 | The LR action dictionary that implements the underlying state machine.  The keys of this dictionary are | 
 | the LR states. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.lr_goto</tt></b> | 
 | <blockquote> | 
 | The LR goto table that contains information about grammar rule reductions. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.sr_conflicts</tt></b> | 
 | <blockquote> | 
 | A list of tuples <tt>(state,token,resolution)</tt> identifying all shift/reduce conflicts. <tt>state</tt> is the LR state | 
 | number where the conflict occurred, <tt>token</tt> is the token causing the conflict, and <tt>resolution</tt> is | 
 | a string describing the resolution taken.  <tt>resolution</tt> is either <tt>'shift'</tt> or <tt>'reduce'</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>lr.rr_conflicts</tt></b> | 
 | <blockquote> | 
 | A list of tuples <tt>(state,rule,rejected)</tt> identifying all reduce/reduce conflicts.  <tt>state</tt> is the | 
 | LR state number where the conflict occurred, <tt>rule</tt> is the production rule that was selected | 
 | and <tt>rejected</tt> is the production rule that was rejected.   Both <tt>rule</tt> and </tt>rejected</tt> are | 
 | instances of <tt>Production</tt>.  They can be inspected to provide the user with more information. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | There are two public methods of <tt>LRGeneratedTable</tt>. | 
 |  | 
 | <p> | 
 | <b><tt>lr.write_table(modulename,outputdir="",signature="")</tt></b> | 
 | <blockquote> | 
 | Writes the LR parsing table information to a Python module.  <tt>modulename</tt> is a string  | 
 | specifying the name of a module such as <tt>"parsetab"</tt>.  <tt>outputdir</tt> is the name of a  | 
 | directory where the module should be created.  <tt>signature</tt> is a string representing a | 
 | grammar signature that's written into the output file. This can be used to detect when | 
 | the data stored in a module file is out-of-sync with the the grammar specification (and that | 
 | the tables need to be regenerated).  If <tt>modulename</tt> is a string <tt>"parsetab"</tt>, | 
 | this function creates a file called <tt>parsetab.py</tt>.  If the module name represents a | 
 | package such as <tt>"foo.bar.parsetab"</tt>, then only the last component, <tt>"parsetab"</tt> is | 
 | used. | 
 | </blockquote> | 
 |  | 
 |  | 
 | <H2><a name="internal_nn7"></a>7. LRParser</H2> | 
 |  | 
 |  | 
 | The <tt>LRParser</tt> class implements the low-level LR parsing engine. | 
 |  | 
 |  | 
 | <p> | 
 | <b><tt>LRParser(lrtab, error_func)</tt></b> | 
 | <blockquote> | 
 | Create an LRParser.  <tt>lrtab</tt> is an instance of <tt>LRTable</tt> | 
 | containing the LR production and state tables.  <tt>error_func</tt> is the | 
 | error function to invoke in the event of a parsing error. | 
 | </blockquote> | 
 |  | 
 | An instance <tt>p</tt> of <tt>LRParser</tt> has the following methods: | 
 |  | 
 | <p> | 
 | <b><tt>p.parse(input=None,lexer=None,debug=0,tracking=0,tokenfunc=None)</tt></b> | 
 | <blockquote> | 
 | Run the parser.  <tt>input</tt> is a string, which if supplied is fed into the | 
 | lexer using its <tt>input()</tt> method.  <tt>lexer</tt> is an instance of the | 
 | <tt>Lexer</tt> class to use for tokenizing.  If not supplied, the last lexer | 
 | created with the <tt>lex</tt> module is used.   <tt>debug</tt> is a boolean flag | 
 | that enables debugging.   <tt>tracking</tt> is a boolean flag that tells the | 
 | parser to perform additional line number tracking.  <tt>tokenfunc</tt> is a callable | 
 | function that returns the next token.  If supplied, the parser will use it to get | 
 | all tokens. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.restart()</tt></b> | 
 | <blockquote> | 
 | Resets the parser state for a parse already in progress. | 
 | </blockquote> | 
 |  | 
 | <H2><a name="internal_nn8"></a>8. ParserReflect</H2> | 
 |  | 
 |  | 
 | <p> | 
 | The <tt>ParserReflect</tt> class is used to collect parser specification data | 
 | from a Python module or object.   This class is what collects all of the | 
 | <tt>p_rule()</tt> functions in a PLY file, performs basic error checking, | 
 | and collects all of the needed information to build a grammar.    Most of the | 
 | high-level PLY interface as used by the <tt>yacc()</tt> function is actually | 
 | implemented by this class. | 
 |  | 
 | <p> | 
 | <b><tt>ParserReflect(pdict, log=None)</tt></b> | 
 | <blockquote> | 
 | Creates a <tt>ParserReflect</tt> instance. <tt>pdict</tt> is a dictionary | 
 | containing parser specification data.  This dictionary typically corresponds | 
 | to the module or class dictionary of code that implements a PLY parser. | 
 | <tt>log</tt> is a logger instance that will be used to report error | 
 | messages. | 
 | </blockquote> | 
 |  | 
 | An instance <tt>p</tt> of <tt>ParserReflect</tt> has the following methods: | 
 |  | 
 | <p> | 
 | <b><tt>p.get_all()</tt></b> | 
 | <blockquote> | 
 | Collect and store all required parsing information. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.validate_all()</tt></b> | 
 | <blockquote> | 
 | Validate all of the collected parsing information.  This is a seprate step | 
 | from <tt>p.get_all()</tt> as a performance optimization.  In order to | 
 | increase parser start-up time, a parser can elect to only validate the | 
 | parsing data when regenerating the parsing tables.   The validation | 
 | step tries to collect as much information as possible rather than | 
 | raising an exception at the first sign of trouble.  The attribute | 
 | <tt>p.error</tt> is set if there are any validation errors.  The | 
 | value of this attribute is also returned. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.signature()</tt></b> | 
 | <blockquote> | 
 | Compute a signature representing the contents of the collected parsing | 
 | data.  The signature value should change if anything in the parser | 
 | specification has changed in a way that would justify parser table | 
 | regeneration.   This method can be called after <tt>p.get_all()</tt>, | 
 | but before <tt>p.validate_all()</tt>. | 
 | </blockquote> | 
 |  | 
 | The following attributes are set in the process of collecting data: | 
 |  | 
 | <p> | 
 | <b><tt>p.start</tt></b> | 
 | <blockquote> | 
 | The grammar start symbol, if any. Taken from <tt>pdict['start']</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.error_func</tt></b> | 
 | <blockquote> | 
 | The error handling function or <tt>None</tt>. Taken from <tt>pdict['p_error']</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.tokens</tt></b> | 
 | <blockquote> | 
 | The token list. Taken from <tt>pdict['tokens']</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.prec</tt></b> | 
 | <blockquote> | 
 | The precedence specifier.  Taken from <tt>pdict['precedence']</tt>. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.preclist</tt></b> | 
 | <blockquote> | 
 | A parsed version of the precedence specified.  A list of tuples of the form | 
 | <tt>(token,assoc,level)</tt> where <tt>token</tt> is the terminal symbol, | 
 | <tt>assoc</tt> is the associativity (e.g., <tt>'left'</tt>) and <tt>level</tt> | 
 | is a numeric precedence level. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.grammar</tt></b> | 
 | <blockquote> | 
 | A list of tuples <tt>(name, rules)</tt> representing the grammar rules. <tt>name</tt> is the | 
 | name of a Python function or method in <tt>pdict</tt> that starts with <tt>"p_"</tt>. | 
 | <tt>rules</tt> is a list of tuples <tt>(filename,line,prodname,syms)</tt> representing | 
 | the grammar rules found in the documentation string of that function. <tt>filename</tt> and <tt>line</tt> contain location | 
 | information that can be used for debugging. <tt>prodname</tt> is the name of the  | 
 | production. <tt>syms</tt> is the right-hand side of the production.  If you have a | 
 | function like this | 
 |  | 
 | <pre> | 
 | def p_expr(p): | 
 |     '''expr : expr PLUS expr | 
 |             | expr MINUS expr | 
 |             | expr TIMES expr | 
 |             | expr DIVIDE expr''' | 
 | </pre> | 
 |  | 
 | then the corresponding entry in <tt>p.grammar</tt> might look like this: | 
 |  | 
 | <pre> | 
 | ('p_expr', [ ('calc.py',10,'expr', ['expr','PLUS','expr']), | 
 |              ('calc.py',11,'expr', ['expr','MINUS','expr']), | 
 |              ('calc.py',12,'expr', ['expr','TIMES','expr']), | 
 |              ('calc.py',13,'expr', ['expr','DIVIDE','expr']) | 
 |            ]) | 
 | </pre> | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.pfuncs</tt></b> | 
 | <blockquote> | 
 | A sorted list of tuples <tt>(line, file, name, doc)</tt> representing all of | 
 | the <tt>p_</tt> functions found. <tt>line</tt> and <tt>file</tt> give location | 
 | information.  <tt>name</tt> is the name of the function. <tt>doc</tt> is the | 
 | documentation string.   This list is sorted in ascending order by line number. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.files</tt></b> | 
 | <blockquote> | 
 | A dictionary holding all of the source filenames that were encountered | 
 | while collecting parser information.  Only the keys of this dictionary have | 
 | any meaning. | 
 | </blockquote> | 
 |  | 
 | <p> | 
 | <b><tt>p.error</tt></b> | 
 | <blockquote> | 
 | An attribute that indicates whether or not any critical errors  | 
 | occurred in validation.  If this is set, it means that that some kind | 
 | of problem was detected and that no further processing should be | 
 | performed. | 
 | </blockquote> | 
 |  | 
 |  | 
 | <H2><a name="internal_nn9"></a>9. High-level operation</H2> | 
 |  | 
 |  | 
 | Using all of the above classes requires some attention to detail.  The <tt>yacc()</tt> | 
 | function carries out a very specific sequence of operations to create a grammar. | 
 | This same sequence should be emulated if you build an alternative PLY interface. | 
 |  | 
 | <ol> | 
 | <li>A <tt>ParserReflect</tt> object is created and raw grammar specification data is | 
 | collected. | 
 | <li>A <tt>Grammar</tt> object is created and populated with information | 
 | from the specification data. | 
 | <li>A <tt>LRGenerator</tt> object is created to run the LALR algorithm over | 
 | the <tt>Grammar</tt> object. | 
 | <li>Productions in the LRGenerator and bound to callables using the <tt>bind_callables()</tt> | 
 | method. | 
 | <li>A <tt>LRParser</tt> object is created from from the information in the | 
 | <tt>LRGenerator</tt> object. | 
 | </ol> | 
 |  | 
 | </body> | 
 | </html> | 
 |  | 
 |  | 
 |  | 
 |  | 
 |  | 
 |  | 
 |  |