| |
| Version 3.2 |
| ----------------------------- |
| 03/24/09: beazley |
| Added an extra check to not print duplicated warning messages |
| about reduce/reduce conflicts. |
| |
| 03/24/09: beazley |
| Switched PLY over to a BSD-license. |
| |
| 03/23/09: beazley |
| Performance optimization. Discovered a few places to make |
| speedups in LR table generation. |
| |
| 03/23/09: beazley |
| New warning message. PLY now warns about rules never |
| reduced due to reduce/reduce conflicts. Suggested by |
| Bruce Frederiksen. |
| |
| 03/23/09: beazley |
| Some clean-up of warning messages related to reduce/reduce errors. |
| |
| 03/23/09: beazley |
| Added a new picklefile option to yacc() to write the parsing |
| tables to a filename using the pickle module. Here is how |
| it works: |
| |
| yacc(picklefile="parsetab.p") |
| |
| This option can be used if the normal parsetab.py file is |
| extremely large. For example, on jython, it is impossible |
| to read parsing tables if the parsetab.py exceeds a certain |
| threshold. |
| |
| The filename supplied to the picklefile option is opened |
| relative to the current working directory of the Python |
| interpreter. If you need to refer to the file elsewhere, |
| you will need to supply an absolute or relative path. |
| |
| For maximum portability, the pickle file is written |
| using protocol 0. |
| |
| 03/13/09: beazley |
| Fixed a bug in parser.out generation where the rule numbers |
| where off by one. |
| |
| 03/13/09: beazley |
| Fixed a string formatting bug with one of the error messages. |
| Reported by Richard Reitmeyer |
| |
| Version 3.1 |
| ----------------------------- |
| 02/28/09: beazley |
| Fixed broken start argument to yacc(). PLY-3.0 broke this |
| feature by accident. |
| |
| 02/28/09: beazley |
| Fixed debugging output. yacc() no longer reports shift/reduce |
| or reduce/reduce conflicts if debugging is turned off. This |
| restores similar behavior in PLY-2.5. Reported by Andrew Waters. |
| |
| Version 3.0 |
| ----------------------------- |
| 02/03/09: beazley |
| Fixed missing lexer attribute on certain tokens when |
| invoking the parser p_error() function. Reported by |
| Bart Whiteley. |
| |
| 02/02/09: beazley |
| The lex() command now does all error-reporting and diagonistics |
| using the logging module interface. Pass in a Logger object |
| using the errorlog parameter to specify a different logger. |
| |
| 02/02/09: beazley |
| Refactored ply.lex to use a more object-oriented and organized |
| approach to collecting lexer information. |
| |
| 02/01/09: beazley |
| Removed the nowarn option from lex(). All output is controlled |
| by passing in a logger object. Just pass in a logger with a high |
| level setting to suppress output. This argument was never |
| documented to begin with so hopefully no one was relying upon it. |
| |
| 02/01/09: beazley |
| Discovered and removed a dead if-statement in the lexer. This |
| resulted in a 6-7% speedup in lexing when I tested it. |
| |
| 01/13/09: beazley |
| Minor change to the procedure for signalling a syntax error in a |
| production rule. A normal SyntaxError exception should be raised |
| instead of yacc.SyntaxError. |
| |
| 01/13/09: beazley |
| Added a new method p.set_lineno(n,lineno) that can be used to set the |
| line number of symbol n in grammar rules. This simplifies manual |
| tracking of line numbers. |
| |
| 01/11/09: beazley |
| Vastly improved debugging support for yacc.parse(). Instead of passing |
| debug as an integer, you can supply a Logging object (see the logging |
| module). Messages will be generated at the ERROR, INFO, and DEBUG |
| logging levels, each level providing progressively more information. |
| The debugging trace also shows states, grammar rule, values passed |
| into grammar rules, and the result of each reduction. |
| |
| 01/09/09: beazley |
| The yacc() command now does all error-reporting and diagnostics using |
| the interface of the logging module. Use the errorlog parameter to |
| specify a logging object for error messages. Use the debuglog parameter |
| to specify a logging object for the 'parser.out' output. |
| |
| 01/09/09: beazley |
| *HUGE* refactoring of the the ply.yacc() implementation. The high-level |
| user interface is backwards compatible, but the internals are completely |
| reorganized into classes. No more global variables. The internals |
| are also more extensible. For example, you can use the classes to |
| construct a LALR(1) parser in an entirely different manner than |
| what is currently the case. Documentation is forthcoming. |
| |
| 01/07/09: beazley |
| Various cleanup and refactoring of yacc internals. |
| |
| 01/06/09: beazley |
| Fixed a bug with precedence assignment. yacc was assigning the precedence |
| each rule based on the left-most token, when in fact, it should have been |
| using the right-most token. Reported by Bruce Frederiksen. |
| |
| 11/27/08: beazley |
| Numerous changes to support Python 3.0 including removal of deprecated |
| statements (e.g., has_key) and the additional of compatibility code |
| to emulate features from Python 2 that have been removed, but which |
| are needed. Fixed the unit testing suite to work with Python 3.0. |
| The code should be backwards compatible with Python 2. |
| |
| 11/26/08: beazley |
| Loosened the rules on what kind of objects can be passed in as the |
| "module" parameter to lex() and yacc(). Previously, you could only use |
| a module or an instance. Now, PLY just uses dir() to get a list of |
| symbols on whatever the object is without regard for its type. |
| |
| 11/26/08: beazley |
| Changed all except: statements to be compatible with Python2.x/3.x syntax. |
| |
| 11/26/08: beazley |
| Changed all raise Exception, value statements to raise Exception(value) for |
| forward compatibility. |
| |
| 11/26/08: beazley |
| Removed all print statements from lex and yacc, using sys.stdout and sys.stderr |
| directly. Preparation for Python 3.0 support. |
| |
| 11/04/08: beazley |
| Fixed a bug with referring to symbols on the the parsing stack using negative |
| indices. |
| |
| 05/29/08: beazley |
| Completely revamped the testing system to use the unittest module for everything. |
| Added additional tests to cover new errors/warnings. |
| |
| Version 2.5 |
| ----------------------------- |
| 05/28/08: beazley |
| Fixed a bug with writing lex-tables in optimized mode and start states. |
| Reported by Kevin Henry. |
| |
| Version 2.4 |
| ----------------------------- |
| 05/04/08: beazley |
| A version number is now embedded in the table file signature so that |
| yacc can more gracefully accomodate changes to the output format |
| in the future. |
| |
| 05/04/08: beazley |
| Removed undocumented .pushback() method on grammar productions. I'm |
| not sure this ever worked and can't recall ever using it. Might have |
| been an abandoned idea that never really got fleshed out. This |
| feature was never described or tested so removing it is hopefully |
| harmless. |
| |
| 05/04/08: beazley |
| Added extra error checking to yacc() to detect precedence rules defined |
| for undefined terminal symbols. This allows yacc() to detect a potential |
| problem that can be really tricky to debug if no warning message or error |
| message is generated about it. |
| |
| 05/04/08: beazley |
| lex() now has an outputdir that can specify the output directory for |
| tables when running in optimize mode. For example: |
| |
| lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar") |
| |
| The behavior of specifying a table module and output directory are |
| more aligned with the behavior of yacc(). |
| |
| 05/04/08: beazley |
| [Issue 9] |
| Fixed filename bug in when specifying the modulename in lex() and yacc(). |
| If you specified options such as the following: |
| |
| parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar") |
| |
| yacc would create a file "foo.bar.parsetab.py" in the given directory. |
| Now, it simply generates a file "parsetab.py" in that directory. |
| Bug reported by cptbinho. |
| |
| 05/04/08: beazley |
| Slight modification to lex() and yacc() to allow their table files |
| to be loaded from a previously loaded module. This might make |
| it easier to load the parsing tables from a complicated package |
| structure. For example: |
| |
| import foo.bar.spam.parsetab as parsetab |
| parser = yacc.yacc(tabmodule=parsetab) |
| |
| Note: lex and yacc will never regenerate the table file if used |
| in the form---you will get a warning message instead. |
| This idea suggested by Brian Clapper. |
| |
| |
| 04/28/08: beazley |
| Fixed a big with p_error() functions being picked up correctly |
| when running in yacc(optimize=1) mode. Patch contributed by |
| Bart Whiteley. |
| |
| 02/28/08: beazley |
| Fixed a bug with 'nonassoc' precedence rules. Basically the |
| non-precedence was being ignored and not producing the correct |
| run-time behavior in the parser. |
| |
| 02/16/08: beazley |
| Slight relaxation of what the input() method to a lexer will |
| accept as a string. Instead of testing the input to see |
| if the input is a string or unicode string, it checks to see |
| if the input object looks like it contains string data. |
| This change makes it possible to pass string-like objects |
| in as input. For example, the object returned by mmap. |
| |
| import mmap, os |
| data = mmap.mmap(os.open(filename,os.O_RDONLY), |
| os.path.getsize(filename), |
| access=mmap.ACCESS_READ) |
| lexer.input(data) |
| |
| |
| 11/29/07: beazley |
| Modification of ply.lex to allow token functions to aliased. |
| This is subtle, but it makes it easier to create libraries and |
| to reuse token specifications. For example, suppose you defined |
| a function like this: |
| |
| def number(t): |
| r'\d+' |
| t.value = int(t.value) |
| return t |
| |
| This change would allow you to define a token rule as follows: |
| |
| t_NUMBER = number |
| |
| In this case, the token type will be set to 'NUMBER' and use |
| the associated number() function to process tokens. |
| |
| 11/28/07: beazley |
| Slight modification to lex and yacc to grab symbols from both |
| the local and global dictionaries of the caller. This |
| modification allows lexers and parsers to be defined using |
| inner functions and closures. |
| |
| 11/28/07: beazley |
| Performance optimization: The lexer.lexmatch and t.lexer |
| attributes are no longer set for lexer tokens that are not |
| defined by functions. The only normal use of these attributes |
| would be in lexer rules that need to perform some kind of |
| special processing. Thus, it doesn't make any sense to set |
| them on every token. |
| |
| *** POTENTIAL INCOMPATIBILITY *** This might break code |
| that is mucking around with internal lexer state in some |
| sort of magical way. |
| |
| 11/27/07: beazley |
| Added the ability to put the parser into error-handling mode |
| from within a normal production. To do this, simply raise |
| a yacc.SyntaxError exception like this: |
| |
| def p_some_production(p): |
| 'some_production : prod1 prod2' |
| ... |
| raise yacc.SyntaxError # Signal an error |
| |
| A number of things happen after this occurs: |
| |
| - The last symbol shifted onto the symbol stack is discarded |
| and parser state backed up to what it was before the |
| the rule reduction. |
| |
| - The current lookahead symbol is saved and replaced by |
| the 'error' symbol. |
| |
| - The parser enters error recovery mode where it tries |
| to either reduce the 'error' rule or it starts |
| discarding items off of the stack until the parser |
| resets. |
| |
| When an error is manually set, the parser does *not* call |
| the p_error() function (if any is defined). |
| *** NEW FEATURE *** Suggested on the mailing list |
| |
| 11/27/07: beazley |
| Fixed structure bug in examples/ansic. Reported by Dion Blazakis. |
| |
| 11/27/07: beazley |
| Fixed a bug in the lexer related to start conditions and ignored |
| token rules. If a rule was defined that changed state, but |
| returned no token, the lexer could be left in an inconsistent |
| state. Reported by |
| |
| 11/27/07: beazley |
| Modified setup.py to support Python Eggs. Patch contributed by |
| Simon Cross. |
| |
| 11/09/07: beazely |
| Fixed a bug in error handling in yacc. If a syntax error occurred and the |
| parser rolled the entire parse stack back, the parser would be left in in |
| inconsistent state that would cause it to trigger incorrect actions on |
| subsequent input. Reported by Ton Biegstraaten, Justin King, and others. |
| |
| 11/09/07: beazley |
| Fixed a bug when passing empty input strings to yacc.parse(). This |
| would result in an error message about "No input given". Reported |
| by Andrew Dalke. |
| |
| Version 2.3 |
| ----------------------------- |
| 02/20/07: beazley |
| Fixed a bug with character literals if the literal '.' appeared as the |
| last symbol of a grammar rule. Reported by Ales Smrcka. |
| |
| 02/19/07: beazley |
| Warning messages are now redirected to stderr instead of being printed |
| to standard output. |
| |
| 02/19/07: beazley |
| Added a warning message to lex.py if it detects a literal backslash |
| character inside the t_ignore declaration. This is to help |
| problems that might occur if someone accidentally defines t_ignore |
| as a Python raw string. For example: |
| |
| t_ignore = r' \t' |
| |
| The idea for this is from an email I received from David Cimimi who |
| reported bizarre behavior in lexing as a result of defining t_ignore |
| as a raw string by accident. |
| |
| 02/18/07: beazley |
| Performance improvements. Made some changes to the internal |
| table organization and LR parser to improve parsing performance. |
| |
| 02/18/07: beazley |
| Automatic tracking of line number and position information must now be |
| enabled by a special flag to parse(). For example: |
| |
| yacc.parse(data,tracking=True) |
| |
| In many applications, it's just not that important to have the |
| parser automatically track all line numbers. By making this an |
| optional feature, it allows the parser to run significantly faster |
| (more than a 20% speed increase in many cases). Note: positional |
| information is always available for raw tokens---this change only |
| applies to positional information associated with nonterminal |
| grammar symbols. |
| *** POTENTIAL INCOMPATIBILITY *** |
| |
| 02/18/07: beazley |
| Yacc no longer supports extended slices of grammar productions. |
| However, it does support regular slices. For example: |
| |
| def p_foo(p): |
| '''foo: a b c d e''' |
| p[0] = p[1:3] |
| |
| This change is a performance improvement to the parser--it streamlines |
| normal access to the grammar values since slices are now handled in |
| a __getslice__() method as opposed to __getitem__(). |
| |
| 02/12/07: beazley |
| Fixed a bug in the handling of token names when combined with |
| start conditions. Bug reported by Todd O'Bryan. |
| |
| Version 2.2 |
| ------------------------------ |
| 11/01/06: beazley |
| Added lexpos() and lexspan() methods to grammar symbols. These |
| mirror the same functionality of lineno() and linespan(). For |
| example: |
| |
| def p_expr(p): |
| 'expr : expr PLUS expr' |
| p.lexpos(1) # Lexing position of left-hand-expression |
| p.lexpos(1) # Lexing position of PLUS |
| start,end = p.lexspan(3) # Lexing range of right hand expression |
| |
| 11/01/06: beazley |
| Minor change to error handling. The recommended way to skip characters |
| in the input is to use t.lexer.skip() as shown here: |
| |
| def t_error(t): |
| print "Illegal character '%s'" % t.value[0] |
| t.lexer.skip(1) |
| |
| The old approach of just using t.skip(1) will still work, but won't |
| be documented. |
| |
| 10/31/06: beazley |
| Discarded tokens can now be specified as simple strings instead of |
| functions. To do this, simply include the text "ignore_" in the |
| token declaration. For example: |
| |
| t_ignore_cppcomment = r'//.*' |
| |
| Previously, this had to be done with a function. For example: |
| |
| def t_ignore_cppcomment(t): |
| r'//.*' |
| pass |
| |
| If start conditions/states are being used, state names should appear |
| before the "ignore_" text. |
| |
| 10/19/06: beazley |
| The Lex module now provides support for flex-style start conditions |
| as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html. |
| Please refer to this document to understand this change note. Refer to |
| the PLY documentation for PLY-specific explanation of how this works. |
| |
| To use start conditions, you first need to declare a set of states in |
| your lexer file: |
| |
| states = ( |
| ('foo','exclusive'), |
| ('bar','inclusive') |
| ) |
| |
| This serves the same role as the %s and %x specifiers in flex. |
| |
| One a state has been declared, tokens for that state can be |
| declared by defining rules of the form t_state_TOK. For example: |
| |
| t_PLUS = '\+' # Rule defined in INITIAL state |
| t_foo_NUM = '\d+' # Rule defined in foo state |
| t_bar_NUM = '\d+' # Rule defined in bar state |
| |
| t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar |
| t_ANY_NUM = '\d+' # Rule defined in all states |
| |
| In addition to defining tokens for each state, the t_ignore and t_error |
| specifications can be customized for specific states. For example: |
| |
| t_foo_ignore = " " # Ignored characters for foo state |
| def t_bar_error(t): |
| # Handle errors in bar state |
| |
| With token rules, the following methods can be used to change states |
| |
| def t_TOKNAME(t): |
| t.lexer.begin('foo') # Begin state 'foo' |
| t.lexer.push_state('foo') # Begin state 'foo', push old state |
| # onto a stack |
| t.lexer.pop_state() # Restore previous state |
| t.lexer.current_state() # Returns name of current state |
| |
| These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and |
| yy_top_state() functions in flex. |
| |
| The use of start states can be used as one way to write sub-lexers. |
| For example, the lexer or parser might instruct the lexer to start |
| generating a different set of tokens depending on the context. |
| |
| example/yply/ylex.py shows the use of start states to grab C/C++ |
| code fragments out of traditional yacc specification files. |
| |
| *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also |
| discussed various aspects of the design. |
| |
| 10/19/06: beazley |
| Minor change to the way in which yacc.py was reporting shift/reduce |
| conflicts. Although the underlying LALR(1) algorithm was correct, |
| PLY was under-reporting the number of conflicts compared to yacc/bison |
| when precedence rules were in effect. This change should make PLY |
| report the same number of conflicts as yacc. |
| |
| 10/19/06: beazley |
| Modified yacc so that grammar rules could also include the '-' |
| character. For example: |
| |
| def p_expr_list(p): |
| 'expression-list : expression-list expression' |
| |
| Suggested by Oldrich Jedlicka. |
| |
| 10/18/06: beazley |
| Attribute lexer.lexmatch added so that token rules can access the re |
| match object that was generated. For example: |
| |
| def t_FOO(t): |
| r'some regex' |
| m = t.lexer.lexmatch |
| # Do something with m |
| |
| |
| This may be useful if you want to access named groups specified within |
| the regex for a specific token. Suggested by Oldrich Jedlicka. |
| |
| 10/16/06: beazley |
| Changed the error message that results if an illegal character |
| is encountered and no default error function is defined in lex. |
| The exception is now more informative about the actual cause of |
| the error. |
| |
| Version 2.1 |
| ------------------------------ |
| 10/02/06: beazley |
| The last Lexer object built by lex() can be found in lex.lexer. |
| The last Parser object built by yacc() can be found in yacc.parser. |
| |
| 10/02/06: beazley |
| New example added: examples/yply |
| |
| This example uses PLY to convert Unix-yacc specification files to |
| PLY programs with the same grammar. This may be useful if you |
| want to convert a grammar from bison/yacc to use with PLY. |
| |
| 10/02/06: beazley |
| Added support for a start symbol to be specified in the yacc |
| input file itself. Just do this: |
| |
| start = 'name' |
| |
| where 'name' matches some grammar rule. For example: |
| |
| def p_name(p): |
| 'name : A B C' |
| ... |
| |
| This mirrors the functionality of the yacc %start specifier. |
| |
| 09/30/06: beazley |
| Some new examples added.: |
| |
| examples/GardenSnake : A simple indentation based language similar |
| to Python. Shows how you might handle |
| whitespace. Contributed by Andrew Dalke. |
| |
| examples/BASIC : An implementation of 1964 Dartmouth BASIC. |
| Contributed by Dave against his better |
| judgement. |
| |
| 09/28/06: beazley |
| Minor patch to allow named groups to be used in lex regular |
| expression rules. For example: |
| |
| t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)''' |
| |
| Patch submitted by Adam Ring. |
| |
| 09/28/06: beazley |
| LALR(1) is now the default parsing method. To use SLR, use |
| yacc.yacc(method="SLR"). Note: there is no performance impact |
| on parsing when using LALR(1) instead of SLR. However, constructing |
| the parsing tables will take a little longer. |
| |
| 09/26/06: beazley |
| Change to line number tracking. To modify line numbers, modify |
| the line number of the lexer itself. For example: |
| |
| def t_NEWLINE(t): |
| r'\n' |
| t.lexer.lineno += 1 |
| |
| This modification is both cleanup and a performance optimization. |
| In past versions, lex was monitoring every token for changes in |
| the line number. This extra processing is unnecessary for a vast |
| majority of tokens. Thus, this new approach cleans it up a bit. |
| |
| *** POTENTIAL INCOMPATIBILITY *** |
| You will need to change code in your lexer that updates the line |
| number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1" |
| |
| 09/26/06: beazley |
| Added the lexing position to tokens as an attribute lexpos. This |
| is the raw index into the input text at which a token appears. |
| This information can be used to compute column numbers and other |
| details (e.g., scan backwards from lexpos to the first newline |
| to get a column position). |
| |
| 09/25/06: beazley |
| Changed the name of the __copy__() method on the Lexer class |
| to clone(). This is used to clone a Lexer object (e.g., if |
| you're running different lexers at the same time). |
| |
| 09/21/06: beazley |
| Limitations related to the use of the re module have been eliminated. |
| Several users reported problems with regular expressions exceeding |
| more than 100 named groups. To solve this, lex.py is now capable |
| of automatically splitting its master regular regular expression into |
| smaller expressions as needed. This should, in theory, make it |
| possible to specify an arbitrarily large number of tokens. |
| |
| 09/21/06: beazley |
| Improved error checking in lex.py. Rules that match the empty string |
| are now rejected (otherwise they cause the lexer to enter an infinite |
| loop). An extra check for rules containing '#' has also been added. |
| Since lex compiles regular expressions in verbose mode, '#' is interpreted |
| as a regex comment, it is critical to use '\#' instead. |
| |
| 09/18/06: beazley |
| Added a @TOKEN decorator function to lex.py that can be used to |
| define token rules where the documentation string might be computed |
| in some way. |
| |
| digit = r'([0-9])' |
| nondigit = r'([_A-Za-z])' |
| identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)' |
| |
| from ply.lex import TOKEN |
| |
| @TOKEN(identifier) |
| def t_ID(t): |
| # Do whatever |
| |
| The @TOKEN decorator merely sets the documentation string of the |
| associated token function as needed for lex to work. |
| |
| Note: An alternative solution is the following: |
| |
| def t_ID(t): |
| # Do whatever |
| |
| t_ID.__doc__ = identifier |
| |
| Note: Decorators require the use of Python 2.4 or later. If compatibility |
| with old versions is needed, use the latter solution. |
| |
| The need for this feature was suggested by Cem Karan. |
| |
| 09/14/06: beazley |
| Support for single-character literal tokens has been added to yacc. |
| These literals must be enclosed in quotes. For example: |
| |
| def p_expr(p): |
| "expr : expr '+' expr" |
| ... |
| |
| def p_expr(p): |
| 'expr : expr "-" expr' |
| ... |
| |
| In addition to this, it is necessary to tell the lexer module about |
| literal characters. This is done by defining the variable 'literals' |
| as a list of characters. This should be defined in the module that |
| invokes the lex.lex() function. For example: |
| |
| literals = ['+','-','*','/','(',')','='] |
| |
| or simply |
| |
| literals = '+=*/()=' |
| |
| It is important to note that literals can only be a single character. |
| When the lexer fails to match a token using its normal regular expression |
| rules, it will check the current character against the literal list. |
| If found, it will be returned with a token type set to match the literal |
| character. Otherwise, an illegal character will be signalled. |
| |
| |
| 09/14/06: beazley |
| Modified PLY to install itself as a proper Python package called 'ply'. |
| This will make it a little more friendly to other modules. This |
| changes the usage of PLY only slightly. Just do this to import the |
| modules |
| |
| import ply.lex as lex |
| import ply.yacc as yacc |
| |
| Alternatively, you can do this: |
| |
| from ply import * |
| |
| Which imports both the lex and yacc modules. |
| Change suggested by Lee June. |
| |
| 09/13/06: beazley |
| Changed the handling of negative indices when used in production rules. |
| A negative production index now accesses already parsed symbols on the |
| parsing stack. For example, |
| |
| def p_foo(p): |
| "foo: A B C D" |
| print p[1] # Value of 'A' symbol |
| print p[2] # Value of 'B' symbol |
| print p[-1] # Value of whatever symbol appears before A |
| # on the parsing stack. |
| |
| p[0] = some_val # Sets the value of the 'foo' grammer symbol |
| |
| This behavior makes it easier to work with embedded actions within the |
| parsing rules. For example, in C-yacc, it is possible to write code like |
| this: |
| |
| bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; } |
| |
| In this example, the printf() code executes immediately after A has been |
| parsed. Within the embedded action code, $1 refers to the A symbol on |
| the stack. |
| |
| To perform this equivalent action in PLY, you need to write a pair |
| of rules like this: |
| |
| def p_bar(p): |
| "bar : A seen_A B" |
| do_stuff |
| |
| def p_seen_A(p): |
| "seen_A :" |
| print "seen an A =", p[-1] |
| |
| The second rule "seen_A" is merely a empty production which should be |
| reduced as soon as A is parsed in the "bar" rule above. The use |
| of the negative index p[-1] is used to access whatever symbol appeared |
| before the seen_A symbol. |
| |
| This feature also makes it possible to support inherited attributes. |
| For example: |
| |
| def p_decl(p): |
| "decl : scope name" |
| |
| def p_scope(p): |
| """scope : GLOBAL |
| | LOCAL""" |
| p[0] = p[1] |
| |
| def p_name(p): |
| "name : ID" |
| if p[-1] == "GLOBAL": |
| # ... |
| else if p[-1] == "LOCAL": |
| #... |
| |
| In this case, the name rule is inheriting an attribute from the |
| scope declaration that precedes it. |
| |
| *** POTENTIAL INCOMPATIBILITY *** |
| If you are currently using negative indices within existing grammar rules, |
| your code will break. This should be extremely rare if non-existent in |
| most cases. The argument to various grammar rules is not usually not |
| processed in the same way as a list of items. |
| |
| Version 2.0 |
| ------------------------------ |
| 09/07/06: beazley |
| Major cleanup and refactoring of the LR table generation code. Both SLR |
| and LALR(1) table generation is now performed by the same code base with |
| only minor extensions for extra LALR(1) processing. |
| |
| 09/07/06: beazley |
| Completely reimplemented the entire LALR(1) parsing engine to use the |
| DeRemer and Pennello algorithm for calculating lookahead sets. This |
| significantly improves the performance of generating LALR(1) tables |
| and has the added feature of actually working correctly! If you |
| experienced weird behavior with LALR(1) in prior releases, this should |
| hopefully resolve all of those problems. Many thanks to |
| Andrew Waters and Markus Schoepflin for submitting bug reports |
| and helping me test out the revised LALR(1) support. |
| |
| Version 1.8 |
| ------------------------------ |
| 08/02/06: beazley |
| Fixed a problem related to the handling of default actions in LALR(1) |
| parsing. If you experienced subtle and/or bizarre behavior when trying |
| to use the LALR(1) engine, this may correct those problems. Patch |
| contributed by Russ Cox. Note: This patch has been superceded by |
| revisions for LALR(1) parsing in Ply-2.0. |
| |
| 08/02/06: beazley |
| Added support for slicing of productions in yacc. |
| Patch contributed by Patrick Mezard. |
| |
| Version 1.7 |
| ------------------------------ |
| 03/02/06: beazley |
| Fixed infinite recursion problem ReduceToTerminals() function that |
| would sometimes come up in LALR(1) table generation. Reported by |
| Markus Schoepflin. |
| |
| 03/01/06: beazley |
| Added "reflags" argument to lex(). For example: |
| |
| lex.lex(reflags=re.UNICODE) |
| |
| This can be used to specify optional flags to the re.compile() function |
| used inside the lexer. This may be necessary for special situations such |
| as processing Unicode (e.g., if you want escapes like \w and \b to consult |
| the Unicode character property database). The need for this suggested by |
| Andreas Jung. |
| |
| 03/01/06: beazley |
| Fixed a bug with an uninitialized variable on repeated instantiations of parser |
| objects when the write_tables=0 argument was used. Reported by Michael Brown. |
| |
| 03/01/06: beazley |
| Modified lex.py to accept Unicode strings both as the regular expressions for |
| tokens and as input. Hopefully this is the only change needed for Unicode support. |
| Patch contributed by Johan Dahl. |
| |
| 03/01/06: beazley |
| Modified the class-based interface to work with new-style or old-style classes. |
| Patch contributed by Michael Brown (although I tweaked it slightly so it would work |
| with older versions of Python). |
| |
| Version 1.6 |
| ------------------------------ |
| 05/27/05: beazley |
| Incorporated patch contributed by Christopher Stawarz to fix an extremely |
| devious bug in LALR(1) parser generation. This patch should fix problems |
| numerous people reported with LALR parsing. |
| |
| 05/27/05: beazley |
| Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav, |
| and Thad Austin. |
| |
| 05/27/05: beazley |
| Added outputdir option to yacc() to control output directory. Contributed |
| by Christopher Stawarz. |
| |
| 05/27/05: beazley |
| Added rununit.py test script to run tests using the Python unittest module. |
| Contributed by Miki Tebeka. |
| |
| Version 1.5 |
| ------------------------------ |
| 05/26/04: beazley |
| Major enhancement. LALR(1) parsing support is now working. |
| This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu) |
| and optimized by David Beazley. To use LALR(1) parsing do |
| the following: |
| |
| yacc.yacc(method="LALR") |
| |
| Computing LALR(1) parsing tables takes about twice as long as |
| the default SLR method. However, LALR(1) allows you to handle |
| more complex grammars. For example, the ANSI C grammar |
| (in example/ansic) has 13 shift-reduce conflicts with SLR, but |
| only has 1 shift-reduce conflict with LALR(1). |
| |
| 05/20/04: beazley |
| Added a __len__ method to parser production lists. Can |
| be used in parser rules like this: |
| |
| def p_somerule(p): |
| """a : B C D |
| | E F" |
| if (len(p) == 3): |
| # Must have been first rule |
| elif (len(p) == 2): |
| # Must be second rule |
| |
| Suggested by Joshua Gerth and others. |
| |
| Version 1.4 |
| ------------------------------ |
| 04/23/04: beazley |
| Incorporated a variety of patches contributed by Eric Raymond. |
| These include: |
| |
| 0. Cleans up some comments so they don't wrap on an 80-column display. |
| 1. Directs compiler errors to stderr where they belong. |
| 2. Implements and documents automatic line counting when \n is ignored. |
| 3. Changes the way progress messages are dumped when debugging is on. |
| The new format is both less verbose and conveys more information than |
| the old, including shift and reduce actions. |
| |
| 04/23/04: beazley |
| Added a Python setup.py file to simply installation. Contributed |
| by Adam Kerrison. |
| |
| 04/23/04: beazley |
| Added patches contributed by Adam Kerrison. |
| |
| - Some output is now only shown when debugging is enabled. This |
| means that PLY will be completely silent when not in debugging mode. |
| |
| - An optional parameter "write_tables" can be passed to yacc() to |
| control whether or not parsing tables are written. By default, |
| it is true, but it can be turned off if you don't want the yacc |
| table file. Note: disabling this will cause yacc() to regenerate |
| the parsing table each time. |
| |
| 04/23/04: beazley |
| Added patches contributed by David McNab. This patch addes two |
| features: |
| |
| - The parser can be supplied as a class instead of a module. |
| For an example of this, see the example/classcalc directory. |
| |
| - Debugging output can be directed to a filename of the user's |
| choice. Use |
| |
| yacc(debugfile="somefile.out") |
| |
| |
| Version 1.3 |
| ------------------------------ |
| 12/10/02: jmdyck |
| Various minor adjustments to the code that Dave checked in today. |
| Updated test/yacc_{inf,unused}.exp to reflect today's changes. |
| |
| 12/10/02: beazley |
| Incorporated a variety of minor bug fixes to empty production |
| handling and infinite recursion checking. Contributed by |
| Michael Dyck. |
| |
| 12/10/02: beazley |
| Removed bogus recover() method call in yacc.restart() |
| |
| Version 1.2 |
| ------------------------------ |
| 11/27/02: beazley |
| Lexer and parser objects are now available as an attribute |
| of tokens and slices respectively. For example: |
| |
| def t_NUMBER(t): |
| r'\d+' |
| print t.lexer |
| |
| def p_expr_plus(t): |
| 'expr: expr PLUS expr' |
| print t.lexer |
| print t.parser |
| |
| This can be used for state management (if needed). |
| |
| 10/31/02: beazley |
| Modified yacc.py to work with Python optimize mode. To make |
| this work, you need to use |
| |
| yacc.yacc(optimize=1) |
| |
| Furthermore, you need to first run Python in normal mode |
| to generate the necessary parsetab.py files. After that, |
| you can use python -O or python -OO. |
| |
| Note: optimized mode turns off a lot of error checking. |
| Only use when you are sure that your grammar is working. |
| Make sure parsetab.py is up to date! |
| |
| 10/30/02: beazley |
| Added cloning of Lexer objects. For example: |
| |
| import copy |
| l = lex.lex() |
| lc = copy.copy(l) |
| |
| l.input("Some text") |
| lc.input("Some other text") |
| ... |
| |
| This might be useful if the same "lexer" is meant to |
| be used in different contexts---or if multiple lexers |
| are running concurrently. |
| |
| 10/30/02: beazley |
| Fixed subtle bug with first set computation and empty productions. |
| Patch submitted by Michael Dyck. |
| |
| 10/30/02: beazley |
| Fixed error messages to use "filename:line: message" instead |
| of "filename:line. message". This makes error reporting more |
| friendly to emacs. Patch submitted by François Pinard. |
| |
| 10/30/02: beazley |
| Improvements to parser.out file. Terminals and nonterminals |
| are sorted instead of being printed in random order. |
| Patch submitted by François Pinard. |
| |
| 10/30/02: beazley |
| Improvements to parser.out file output. Rules are now printed |
| in a way that's easier to understand. Contributed by Russ Cox. |
| |
| 10/30/02: beazley |
| Added 'nonassoc' associativity support. This can be used |
| to disable the chaining of operators like a < b < c. |
| To use, simply specify 'nonassoc' in the precedence table |
| |
| precedence = ( |
| ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators |
| ('left', 'PLUS', 'MINUS'), |
| ('left', 'TIMES', 'DIVIDE'), |
| ('right', 'UMINUS'), # Unary minus operator |
| ) |
| |
| Patch contributed by Russ Cox. |
| |
| 10/30/02: beazley |
| Modified the lexer to provide optional support for Python -O and -OO |
| modes. To make this work, Python *first* needs to be run in |
| unoptimized mode. This reads the lexing information and creates a |
| file "lextab.py". Then, run lex like this: |
| |
| # module foo.py |
| ... |
| ... |
| lex.lex(optimize=1) |
| |
| Once the lextab file has been created, subsequent calls to |
| lex.lex() will read data from the lextab file instead of using |
| introspection. In optimized mode (-O, -OO) everything should |
| work normally despite the loss of doc strings. |
| |
| To change the name of the file 'lextab.py' use the following: |
| |
| lex.lex(lextab="footab") |
| |
| (this creates a file footab.py) |
| |
| |
| Version 1.1 October 25, 2001 |
| ------------------------------ |
| |
| 10/25/01: beazley |
| Modified the table generator to produce much more compact data. |
| This should greatly reduce the size of the parsetab.py[c] file. |
| Caveat: the tables still need to be constructed so a little more |
| work is done in parsetab on import. |
| |
| 10/25/01: beazley |
| There may be a possible bug in the cycle detector that reports errors |
| about infinite recursion. I'm having a little trouble tracking it |
| down, but if you get this problem, you can disable the cycle |
| detector as follows: |
| |
| yacc.yacc(check_recursion = 0) |
| |
| 10/25/01: beazley |
| Fixed a bug in lex.py that sometimes caused illegal characters to be |
| reported incorrectly. Reported by Sverre Jørgensen. |
| |
| 7/8/01 : beazley |
| Added a reference to the underlying lexer object when tokens are handled by |
| functions. The lexer is available as the 'lexer' attribute. This |
| was added to provide better lexing support for languages such as Fortran |
| where certain types of tokens can't be conveniently expressed as regular |
| expressions (and where the tokenizing function may want to perform a |
| little backtracking). Suggested by Pearu Peterson. |
| |
| 6/20/01 : beazley |
| Modified yacc() function so that an optional starting symbol can be specified. |
| For example: |
| |
| yacc.yacc(start="statement") |
| |
| Normally yacc always treats the first production rule as the starting symbol. |
| However, if you are debugging your grammar it may be useful to specify |
| an alternative starting symbol. Idea suggested by Rich Salz. |
| |
| Version 1.0 June 18, 2001 |
| -------------------------- |
| Initial public offering |
| |