PARSO(1) parso PARSO(1)

parso - parso Documentation

Release v0.8.4. (Installation)

Parso is a Python parser that supports error recovery and round-trip parsing for different Python versions (in multiple Python versions). Parso is also able to list multiple syntax errors in your python file.

Parso has been battle-tested by jedi. It was pulled out of jedi to be useful for other projects as well.

Parso consists of a small API to parse Python and analyse the syntax tree.

A simple example:

>>> import parso
>>> module = parso.parse('hello + 1', version="3.9")
>>> expr = module.children[0]
>>> expr
PythonNode(arith_expr, [<Name: hello@1,0>, <Operator: +>, <Number: 1>])
>>> print(expr.get_code())
hello + 1
>>> name = expr.children[0]
>>> name
<Name: hello@1,0>
>>> name.end_pos
(1, 5)
>>> expr.end_pos
(1, 9)

To list multiple issues:

>>> grammar = parso.load_grammar()
>>> module = grammar.parse('foo +\nbar\ncontinue')
>>> error1, error2 = grammar.iter_errors(module)
>>> error1.message
'SyntaxError: invalid syntax'
>>> error2.message
"SyntaxError: 'continue' not properly in loop"

On any system you can install parso directly from the Python package index using pip:

sudo pip install parso

If you want to install the current development version (master branch):

sudo pip install -e git://github.com/davidhalter/parso.git#egg=parso

If you prefer not to use an automated package installer, you can download a current copy of parso and install it manually.

To install it, navigate to the directory containing setup.py on your console and type:

sudo python setup.py install

parso works around grammars. You can simply create Python grammars by calling parso.load_grammar(). Grammars (with a custom tokenizer and custom parser trees) can also be created by directly instantiating parso.Grammar(). More information about the resulting objects can be found in the parser tree documentation.

The simplest way of using parso is without even loading a grammar (parso.parse()):

>>> import parso
>>> parso.parse('foo + bar')
<Module: @1-1>

Typically if you want to work with one specific Python version, use:

Loads a parso.Grammar. The default version is the current Python version.
  • version (str) -- A python version string, e.g. version='3.8'.
  • path (str) -- A path to a grammar file

You will get back a grammar object that you can use to parse code and find issues in it:

parso.load_grammar() returns instances of this class.

Creating custom none-python grammars by calling this is not supported, yet.

text -- A BNF representation of your grammar.
If you want to parse a Python file you want to start here, most likely.

If you need finer grained control over the parsed instance, there will be other ways to access it.

  • code (str) -- A unicode or bytes string. When it's not possible to decode bytes to a string, returns a UnicodeDecodeError.
  • error_recovery (bool) -- If enabled, any code will be returned. If it is invalid, it will be returned as an error node. If disabled, you will get a ParseError when encountering syntax errors in your code.
  • start_symbol (str) -- The grammar rule (nonterminal) that you want to parse. Only allowed to be used when error_recovery is False.
  • path (str) -- The path to the file you want to open. Only needed for caching.
  • cache (bool) -- Keeps a copy of the parser tree in RAM and on disk if a path is given. Returns the cached trees if the corresponding files on disk have not changed. Note that this stores pickle files on your file system (e.g. for Linux in ~/.cache/parso/).
  • diff_cache (bool) -- Diffs the cached python module against the new code and tries to parse only the parts that have changed. Returns the same (changed) module that is found in cache. Using this option requires you to not do anything anymore with the cached modules under that path, because the contents of it might change. This option is still somewhat experimental. If you want stability, please don't use it.
  • cache_path (bool) -- If given saves the parso cache in this directory. If not given, defaults to the default cache places on each platform.
A subclass of parso.tree.NodeOrLeaf. Typically a parso.python.tree.Module.
Given a parso.tree.NodeOrLeaf returns a generator of parso.normalizer.Issue objects. For Python this is a list of syntax/indentation errors.

parso is able to find multiple errors in your source code. Iterating through those errors yields the following instances:

An integer code that stands for the type of error.
A message (string) for the issue.
The start position position of the error as a tuple (line, column). As always in parso the first line is 1 and the first column 0.

parso also offers some utility functions that can be really useful:

A utility function to avoid loading grammars. Params are documented in parso.Grammar.parse().
version (str) -- The version used by parso.load_grammar().
Intended for Python code. In contrast to Python's str.splitlines(), looks at form feeds and other special characters as normal text. Just splits \n and \r\n. Also different: Returns [""] for an empty string input.

In Python 2.7 form feeds are used as normal characters when using str.splitlines. However in Python 3 somewhere there was a decision to split also on form feeds.

Checks for unicode BOMs and PEP 263 encoding declarations. Then returns a unicode object like in bytes.decode().
  • encoding -- See bytes.decode() documentation.
  • errors -- See bytes.decode() documentation. errors can be 'strict', 'replace' or 'ignore'.

  • jedi (which is used by IPython and a lot of editor plugins).
  • mutmut (mutation tester)

The parser tree is returned by calling parso.Grammar.parse().

NOTE:

Note that parso positions are always 1 based for lines and zero based for columns. This means the first position in a file is (1, 0).

Generally there are two types of classes you will deal with: parso.tree.Leaf and parso.tree.BaseNode.

Bases: NodeOrLeaf

The super class for all nodes. A node has children, a type and possibly a parent node.

A list of NodeOrLeaf child nodes.
Returns the starting position of the prefix as a tuple, e.g. (3, 4).
(line, column)
Returns the start_pos of the prefix. This means basically it returns the end_pos of the last prefix. The get_start_pos_of_prefix() of the prefix + in 2 + 1 would be (1, 1), while the start_pos is (1, 2).
(line, column)
Returns the end position of the prefix as a tuple, e.g. (3, 4).
(line, column)
Returns the code that was the input for the parser for this node.
include_prefix -- Removes the prefix (whitespace and comments) of e.g. a statement.
Get the parso.tree.Leaf at position
  • position (tuple) -- A position tuple, row, column. Rows start from 1
  • include_prefixes (bool) -- If False, None will be returned if position falls on whitespace or comments before a leaf
parso.tree.Leaf at position, or None
Returns the first leaf of a node or itself if this is a leaf.
Returns the last leaf of a node or itself if this is a leaf.
Bases: NodeOrLeaf

Leafs are basically tokens with a better API. Leafs exactly know where they were defined and what text preceeds them.

str() The value of the current token.
str() Typically a mixture of whitespace and comments. Stuff that is syntactically irrelevant for the syntax tree.
Returns the starting position of the prefix as a tuple, e.g. (3, 4).
(line, column)
Returns the start_pos of the prefix. This means basically it returns the end_pos of the last prefix. The get_start_pos_of_prefix() of the prefix + in 2 + 1 would be (1, 1), while the start_pos is (1, 2).
(line, column)
Returns the first leaf of a node or itself if this is a leaf.
Returns the last leaf of a node or itself if this is a leaf.
Returns the code that was the input for the parser for this node.
include_prefix -- Removes the prefix (whitespace and comments) of e.g. a statement.
Returns the end position of the prefix as a tuple, e.g. (3, 4).
(line, column)

All nodes and leaves have these methods/properties:

Bases: object

The base class for nodes and leaves.

The type is a string that typically matches the types of the grammar file.
The parent BaseNode of this node or leaf. None if this is the root node.
Returns the root node of a parser tree. The returned node doesn't have a parent node like all the other nodes/leaves.
Returns the node immediately following this node in this parent's children list. If this node does not have a next sibling, it is None
Returns the node immediately preceding this node in this parent's children list. If this node does not have a previous sibling, it is None.
Returns the previous leaf in the parser tree. Returns None if this is the first element in the parser tree.
Returns the next leaf in the parser tree. Returns None if this is the last element in the parser tree.
Returns the starting position of the prefix as a tuple, e.g. (3, 4).
(line, column)
Returns the end position of the prefix as a tuple, e.g. (3, 4).
(line, column)
Returns the start_pos of the prefix. This means basically it returns the end_pos of the last prefix. The get_start_pos_of_prefix() of the prefix + in 2 + 1 would be (1, 1), while the start_pos is (1, 2).
(line, column)
Returns the first leaf of a node or itself if this is a leaf.
Returns the last leaf of a node or itself if this is a leaf.
Returns the code that was the input for the parser for this node.
include_prefix -- Removes the prefix (whitespace and comments) of e.g. a statement.
Recursively looks at the parents of this node or leaf and returns the first found node that matches node_types. Returns None if no matching node is found.
node_types -- type names that are searched for.
Returns a formatted dump of the parser tree rooted at this node or leaf. This is mainly useful for debugging purposes.

The indent parameter is interpreted in a similar way as ast.dump(). If indent is a non-negative integer or string, then the tree will be pretty-printed with that indent level. An indent level of 0, negative, or "" will only insert newlines. None selects the single line representation. Using a positive integer indent indents that many spaces per level. If indent is a string (such as "\t"), that string is used to indent each level.

indent -- Indentation style as described above. The default indentation is 4 spaces, which yields a pretty-printed dump.
>>> import parso
>>> print(parso.parse("lambda x, y: x + y").dump())
Module([
    Lambda([
        Keyword('lambda', (1, 0)),
        Param([
            Name('x', (1, 7), prefix=' '),
            Operator(',', (1, 8)),
        ]),
        Param([
            Name('y', (1, 10), prefix=' '),
        ]),
        Operator(':', (1, 11)),
        PythonNode('arith_expr', [
            Name('x', (1, 13), prefix=' '),
            Operator('+', (1, 15), prefix=' '),
            Name('y', (1, 17), prefix=' '),
        ]),
    ]),
    EndMarker('', (1, 18)),
])

This is the syntax tree for Python 3 syntaxes. The classes represent syntax elements like functions and imports.

All of the nodes can be traced back to the Python grammar file. If you want to know how a tree is structured, just analyse that file (for each Python version it's a bit different).

There's a lot of logic here that makes it easier for Jedi (and other libraries) to deal with a Python syntax tree.

By using parso.tree.NodeOrLeaf.get_code() on a module, you can get back the 1-to-1 representation of the input given to the parser. This is important if you want to refactor a parser tree.

>>> from parso import parse
>>> parser = parse('import os')
>>> module = parser.get_root_node()
>>> module
<Module: @1-1>

Any subclasses of Scope, including Module has an attribute iter_imports:

>>> list(module.iter_imports())
[<ImportName: import os@1,0>]

A few things have changed when looking at Python grammar files:

  • Param does not exist in Python grammar files. It is essentially a part of a parameters node. parso splits it up to make it easier to analyse parameters. However this just makes it easier to deal with the syntax tree, it doesn't actually change the valid syntax.
  • A few nodes like lambdef and lambdef_nocond have been merged in the syntax tree to make it easier to do deal with them.

Bases: object
Returns the string leaf of a docstring. e.g. r'''foo'''.
Bases: object

Some Python specific utilities.

Given a (line, column) tuple, returns a Name or None if there is no name at that position.
Bases: PythonMixin, Leaf
Basically calls parso.tree.NodeOrLeaf.get_start_pos_of_prefix().
Bases: _LeafWithoutNewlines
The type is a string that typically matches the types of the grammar file.
Bases: PythonLeaf

Contains NEWLINE and ENDMARKER tokens.

The type is a string that typically matches the types of the grammar file.
Bases: _LeafWithoutNewlines

A string. Sometimes it is important to know if the string belongs to a name or not.

The type is a string that typically matches the types of the grammar file.
Returns True if the name is being defined.
Returns None if there's no definition for a name.
import_name_always -- Specifies if an import name is always a definition. Normally foo in from foo import bar is not a definition.
Bases: Literal
The type is a string that typically matches the types of the grammar file.
Bases: Literal
The type is a string that typically matches the types of the grammar file.
Bases: PythonLeaf

f-strings contain f-string expressions and normal python strings. These are the string parts of f-strings.

The type is a string that typically matches the types of the grammar file.
Bases: PythonLeaf

f-strings contain f-string expressions and normal python strings. These are the string parts of f-strings.

The type is a string that typically matches the types of the grammar file.
Bases: PythonLeaf

f-strings contain f-string expressions and normal python strings. These are the string parts of f-strings.

The type is a string that typically matches the types of the grammar file.
Bases: _LeafWithoutNewlines, _StringComparisonMixin
The type is a string that typically matches the types of the grammar file.
Bases: _LeafWithoutNewlines, _StringComparisonMixin
The type is a string that typically matches the types of the grammar file.
Bases: PythonBaseNode, DocstringMixin

Super class for the parser tree, which represents the state of a python text file. A Scope is either a function, class or lambda.

Returns a generator of funcdef nodes.
Returns a generator of classdef nodes.
Returns a generator of import_name and import_from nodes.
Returns the part that is executed by the function.
Bases: Scope

The top scope, which is always a module. Depending on the underlying parser this may be a full module or just a part of a module.

The type is a string that typically matches the types of the grammar file.
Returns all the Name leafs that exist in this module. This includes both definitions and references of names.
Bases: PythonBaseNode
The type is a string that typically matches the types of the grammar file.
Bases: Scope
Returns the Name leaf that defines the function or class name.
list of Decorator
Bases: ClassOrFunc

Used to store the parsed contents of a python class.

The type is a string that typically matches the types of the grammar file.
Returns the arglist node that defines the super classes. It returns None if there are no arguments.
Bases: ClassOrFunc

Used to store the parsed contents of a python function.

Children:

0. <Keyword: def>
1. <Name>
2. parameter list (including open-paren and close-paren <Operator>s)
3. or 5. <Operator: :>
4. or 6. Node() representing function body
3. -> (if annotation is also present)
4. annotation (if present)
The type is a string that typically matches the types of the grammar file.
Returns a list of Param().
Returns the Name leaf that defines the function or class name.
Returns a generator of yield_expr.
Returns a generator of return_stmt.
Returns a generator of raise_stmt. Includes raise statements inside try-except blocks
Checks if a function is a generator or not.
Returns the test node after -> or None if there is no annotation.
Bases: Function

Lambdas are basically trimmed functions, so give it the same interface.

Children:

 0. <Keyword: lambda>
 *. <Param x> for each argument x
-2. <Operator: :>
-1. Node() representing body
The type is a string that typically matches the types of the grammar file.
Raises an AttributeError. Lambdas don't have a defined name.
Returns None, lambdas don't have annotations.
Bases: Flow
The type is a string that typically matches the types of the grammar file.
E.g. returns all the test nodes that are named as x, below:
pass
pass
Searches for the branch in which the node is and returns the corresponding test node (see function above). However if the node is in the test node itself and not in the suite return None.
Checks if a node is defined after else.
Bases: Flow
The type is a string that typically matches the types of the grammar file.
Bases: Flow
The type is a string that typically matches the types of the grammar file.
Returns the input node y from: for x in y:.
Bases: Flow
The type is a string that typically matches the types of the grammar file.
Returns the test nodes found in except_clause nodes. Returns [None] for except clauses without an exception given.
Bases: Flow
The type is a string that typically matches the types of the grammar file.
Returns the a list of Name that the with statement defines. The defined names are set after as.
Bases: PythonBaseNode
The path is the list of names that leads to the searched name.
Bases: Import
The type is a string that typically matches the types of the grammar file.
Returns the a list of Name that the import defines. The defined names are set after import or in case an alias - as - is present that name is returned.
The level parameter of __import__.
The import paths defined in an import statement. Typically an array like this: [<Name: datetime>, <Name: date>].
Bases: Import

For import_name nodes. Covers normal imports without from.

The type is a string that typically matches the types of the grammar file.
Returns the a list of Name that the import defines. The defined names is always the first name after import or in case an alias - as - is present that name is returned.
The level parameter of __import__.
This checks for the special case of nested imports, without aliases and from statement:
import foo.bar
Bases: PythonBaseNode

For the following statements: assert, del, global, nonlocal, raise, return, yield.

pass, continue and break are not in there, because they are just simple keywords and the parser reduces it to a keyword.

Keyword statements start with the keyword and end with _stmt. You can crosscheck this with the Python grammar.
Bases: PythonBaseNode
The type is a string that typically matches the types of the grammar file.
Bases: PythonBaseNode, DocstringMixin
The type is a string that typically matches the types of the grammar file.
Returns a list of Name defined before the = sign.
Returns the right-hand-side of the equals.
Returns a generator of +=, =, etc. or None if there is no operation.
Bases: PythonBaseNode
The type is a string that typically matches the types of the grammar file.
Bases: PythonBaseNode

It's a helper class that makes business logic with params much easier. The Python grammar defines no param node. It defines it in a different way that is not really suited to working with parameters.

The type is a string that typically matches the types of the grammar file.
Is 0 in case of foo, 1 in case of *foo or 2 in case of **foo.
The default is the test node that appears after the =. Is None in case no default is present.
The default is the test node that appears after :. Is None in case no annotation is present.
The Name leaf of the param.
Property for the positional index of a paramter.
Returns the function/lambda of a parameter.
Like all the other get_code functions, but includes the param include_comma.
bool (include_comma) -- If enabled includes the comma in the string output.
Bases: PythonBaseNode
The type is a string that typically matches the types of the grammar file.
Returns the a list of Name that the comprehension defines.
alias of SyncCompFor
Bases: Mapping

This class exists for the sole purpose of creating an immutable dict.

Recursively looks at the parents of a node and returns the first found node that matches node_types. Returns None if no matching node is found.

This function is deprecated, use NodeOrLeaf.search_ancestor() instead.

  • node -- The ancestors of this node will be checked.
  • node_types -- type names that are searched for.

If you want to contribute anything to parso, just open an issue or pull request to discuss it. We welcome changes! Please check the CONTRIBUTING.md file in the repository, first.

The deprecation process is as follows:

1.
A deprecation is announced in the next major/minor release.
2.
We wait either at least a year & at least two minor releases until we remove the deprecated functionality.

The test suite depends on pytest:

pip install pytest

To run the tests use the following:

pytest

If you want to test only a specific Python version (e.g. Python 3.9), it's as easy as:

python3.9 -m pytest

Tests are also run automatically on GitHub Actions.

  • Source Code on Github
  • GitHub Actions Testing
  • Python Package Index

parso contributors

parso contributors

April 5, 2024 0.8