wraptypes is a general utility for creating ctypes wrappers from C header
files. The front-end is
tools/wraptypes/wrap.py, for usage:
python tools/wraptypes/wrap.py -h
There are three components to wraptypes:
- Interprets preprocessor declarations and converts the source header files into a list of tokens.
- Parses the preprocessed tokens for type and function declarations and
handle_methods on the class CParser in a similar manner to a SAX parser.
- Interprets C declarations and types from CParser and creates corresponding
ctypes declarations, calling
handle_methods on the class CtypesParser.
wrap.py provides a simple subclass of
CtypesWrapper, which writes the ctypes declarations found to a file in a
format that can be imported as a module.
The parsers are built upon a modified version of PLY, a Python
implementation of lex and yacc. The modified source is included in
wraptypes directory. The modifications are:
- Grammar is abstracted out of Parser, so multiple grammars can easily be defined in the same module.
- Tokens and symbols keep track of their filename as well as line number.
- Lexer state can be pushed onto a stack.
The first time the parsers are run (or after they are modified), PLY creates
parsetab.py in the current directory. These are
the generated state machines, which can take a few seconds to generate.
parser.out is created if debugging is enabled, and contains the
parser description (of the last parser that was generated), which is essential
The grammar and parser are defined in
There is only one lexer state. Each token has a type which is a string (e.g.
'CHARACTER_CONSTANT') and a value. Token values, when read directly from
the source file are only ever strings. When tokens are written to the output
list they sometimes have tuple values (for example, a
PP_DEFINE token on
Two lexer classes are defined:
PreprocessorLexer, which reads a stack of
files (actually strings) as input, and
TokenListLexer, which reads from a
list of already-parsed tokens (used for parsing expressions).
The preprocessing entry-point is the
PreprocessorParser class. This
PreprocessorLexer and its grammar during construction. The
system include path includes the GCC search path by default but can be
modified by altering the
framework_path lists. The
system_headers dict allows header files to be implied on the search path
that don’t exist. For example, by setting:
system_headers['stdlib.h'] = '''#ifndef STDLIB_H #define STDLIB_H /* ... */ #endif '''
you can insert your own custom header in place of the one on the filesystem. This is useful when parsing headers from network locations.
Parsing begins when
parse is called. Specify one or both of a filename
and a string of data. If
debug kwarg is True, syntax errors dump the
parser state instead of just the line number where they occurred.
The production rules specify the actions; these are implemented in
PreprocessorGrammar. The actions call methods on
include(self, header), to push another file onto the lexer.
include_system(self, header), to search the system path for a file to push onto the lexer
error(self, message, filename, line), to signal a parse error. Not all syntax errors get this far, due to limitations in the parser. A parse error at EOF will just print to stderr.
write(self, tokens), to write tokens to the output list. This is the default action when no preprocessing declaratives are being parsed.
The parser has a stack of
ExecutionState, which specifies whether the
current tokens being parsed are ignored or not (tokens are ignored in an
#if that evaluates to 0). This is a little more complicated than just a
boolean flag: the parser must also ignore #elif conditions that can have no
True if the top-most
ExecutionState allows declaratives and
conditionals to be parsed, respecitively. The execution state stack is
modified with the
PreprocessorParser has a
PreprocessorNamespace which keeps track of
the currently defined macros. You can create and specify your own namespace,
or use one that is created by default. The default namespace includes GCC
platform macros needed for parsing system headers, and some of the STDC
Macros are expanded when tokens are written to the output list, and when
conditional expressions are parsed.
PreprocessorNamespace.apply_macros(tokens) takes care of this, replacing
function parameters, variable arguments, macro objects and (mostly) avoiding
infinite recursion. It does not yet handle the
which are needed to parse the Windows system headers.
The process for evaluating a conditional (
- Tokens between
NEWLINEare expanded by
- The resulting list of tokens is used to construct a
- This lexer is used as input to a
ConstantExpressionParser. This parser uses the
ConstantExpressionGrammar, which builds up an AST of
parseis called on the
ConstantExpressionParser, which returns the resulting top-level
Noneif there was a syntax error.
evaluatemethod of the
ExpressionNodeis called with the preprocessor’s namespace as the evaluation context. This allows the expression nodes to resolve
- The result of
evaluateis always an int; non-zero values are treated as True.
Because pyglet requires special knowledge of the preprocessor declaratives
that were encountered in the source, these are encoded as pseudo-tokens within
the output token list. For example, after a
#ifndef is evaluated, it
is written to the token list as a
#define is handled specially. After applying it to the namespace, it is
parsed as an expression immediately. This is allowed (and often expected) to
fail. If it does not fail, a
PP_DEFINE_CONSTANT token is created, and the
value is the result of evaluatin the expression. Otherwise, a
token is created, and the value is the string concatenation of the tokens
defined. Special handling of parseable expressions makes it simple to later
parse constants defined as, for example:
#define RED_SHIFT 8 #define RED_MASK (0x0f << RED_SHIFT)
The preprocessor can be tested/debugged by running
stand-alone with a header file as the sole argument. The resulting token list
will be written to stdout.
The lexer for
CLexer, takes as input a list of tokens output
from the preprocessor. The special preprocessor tokens such as
are intercepted here and handled immediately; hence they can appear anywhere
in the source header file without causing problems with the parser. At this
IDENTIFIER tokens which are found to be the name of a defined type
(the set of defined types is updated continuously during parsing) are
The entry-point to parsing C source is the
CParser class. This creates a
preprocessor in its constructor, and defines some default types such as
__int64_t. These can be disabled with kwargs.
Preprocessing can be quite time-consuming, especially on OS X where thousands
#include declaratives are processed when Carbon is parsed. To minimise
the time required to parse similar (or the same, while debugging) header
files, the token list from preprocessing is cached and reused where possible.
This is handled by
CPreprocessorParser, which overrides
CParser if the desired file is cached. The cache is checked
against the file’s modification timestamp as well as a “memento” that
describes the currently defined tokens. This is intended to avoid using a
cached file that would otherwise be parsed differently due to the defined
macros. It is by no means perfect; for example, it won’t pick up on a macro
that has been defined differently. It seems to work well enough for the
header files pyglet requires.
The header cache is saved and loaded automatically in the working directory
.header.cache. The cache should be deleted if you make changes to the
preprocessor, or are experiencing cache errors (these are usually accompanied
by a “what-the?” exclamation from the user).
The actions in the grammar construct parts of a “C object model” and call
CParser. The C object model is not at all complete, containing
only what pyglet (and any other ctypes-wrapping application) requires. The
classes in the object model are:
- A single declaration occuring outside of a function body. This includes
type declarations, function declarations and variable declarations. The
type(a Type object) and
storage(for example, ‘typedef’, ‘const’, ‘static’, ‘extern’, etc).
- A declarator is a thing being declared. Declarators have an
identifier(the name of it, None if the declarator is abstract, as in some function parameter declarations), an optional
initializer(currently ignored), an optional linked-list of
array(giving the dimensions of the array) and an optional list of
parameters(if the declarator is a function).
- This is a type of declarator that is dereferenced via
pointerto another declarator.
- Array has size (an int, its dimension, or None if unsized) and a pointer
arrayto the next array dimension, if any.
- A function parameter consisting of a
- Type has a list of
qualifiers(e.g. ‘const’, ‘volatile’, etc) and
specifiers(the meaty bit).
- A base TypeSpecifier is just a string, such as
'unsigned'. Note that types can have multiple TypeSpecifiers; not all combinations are valid.
- This is the specifier for a struct or union (if
is_unionis True) type.
taggives the optional
declarationsis the meat (an empty list for an opaque or unspecified struct).
- This is the specifier for an enum type.
taggives the optional
enumeratorsis the list of Enumerator objects (an empty list for an unspecified enum).
- Enumerators exist only within EnumSpecifier. Contains
expression, an ExpressionNode object.
ExpressionNode object hierarchy is similar to that used in the
preprocessor, but more fully-featured, and using a different
EvaluationContext which can evaluate identifiers and the
operator (currently it actually just returns 0 for both).
Methods are called on CParser as declarations and preprocessor declaratives are parsed. The are mostly self explanatory. For example:
- handle_ifndef(self, name, filename, lineno)
#ifndefwas encountered testing the macro
- handle_declaration(self, declaration, filename, lineno)
declarationis an instance of Declaration.
These methods should be overridden by a subclass to provide functionality.
DebugCParser does this and prints out the arguments to each
CParser can be tested in isolation by running it stand-alone with the
filename of a header as the sole argument. A
DebugCParser will be
constructed and used to parse the header.
CtypesParser is implemented in
ctypesparser.py. It is a subclass of
CParser and implements the
handle_ methods to provide a more
ctypes-friendly interpretation of the declarations.
To use, subclass and override the methods:
- handle_ctypes_constant(self, name, value, filename, lineno)
- An integer or float constant (in a
- handle_ctypes_type_definition(self, name, ctype, filename, lineno)
typedefdeclaration. See below for type of
- handle_ctypes_function(self, name, restype, argtypes, filename, lineno)
- A function declaration with the given return type and argument list.
- handle_ctypes_variable(self, name, ctype, filename, lineno)
- Any other non-
Types are represented by instances of
CtypesType. This is more easily
manipulated than a “real” ctypes type. There are subclasses for
CtypesFunction, and so on; see the
module for details.
CtypesType class implements the
visit method, which can be used,
Visitor pattern style, to traverse the type hierarchy. Call the
method of any type with an implementation of
pointers, array bases, function parameters and return types are traversed
automatically (struct members are not, however).
This is useful when writing the contents of a struct or enum. Before writing
a type declaration for a struct type (which would consist only of the struct’s
visit the type and handle the
visit_struct method on the visitor
to print out the struct’s members first. Similarly for enums.
ctypesparser.py can not be run stand-alone.
wrap.py provides a
straight-forward implementation that writes a module of ctypes wrappers. It
can filter the output based on the originating filename. See the module
docstring for usage and extension details.