*** Shed Skin Python-to-C++ Compiler *** Copyright 2005, 2006 Mark Dufour; License GNU GPL version 2 or later (See LICENSE)
ABOUT
Shed Skin is an experimental Python-to-C++ compiler. It accepts pure Python programs, and generates optimized C++ code. This means that, in combination with a C++ compiler, it allows for translation of Python programs into highly efficient machine language. For a set of 27 non-trivial test programs, measurements show a typical speedup of 2-40 over Psyco, about 11 on average, and 2-220 over CPython, about 39 on average.
The high performance and elegant approach of Shed Skin (it is only 6500 lines!) come at a cost. First, it currently only accepts programs that are statically typed. This simply means that variables can only ever have a single type. So e.g. a = 1; a = '1' is not allowed. Of course, a single type can be 'abstract' or 'generic' (as in C++), so that e.g. a = A(); a = B(), where A and B have a common base class, is allowed.
Second, Python programs cannot currently freely use the Python standard library. However, some common imports are supported (see lib/*.py), and many other can be easily added. The problem is a practical one, since in theory it is possible to create bindings for most library modules. A simple work-around is to only compile critical parts of a Python program, and communicate with it through e.g. files and standard in- and output. This way, the 'main' program can use the full Python dynamics and standard library and the whole program is written in pure Python.
Shed Skin is still alpha software, and there are some other minor, often temporary, limitations. Please read the LIMITATIONS section carefully, before trying to compile a program. The only thing I ask in return for making the software freely available under the GPL, is that you send me an email when you encounter a problem, that is not listed among these limitations. This is the fastest way to getting your program supported, since I usually do not fix problems I do not know about :-) Please also let me know if you would like me to implement certain library functions.
INSTALLATION
On Windows XP:
- Install Python 2.3 or newer; make sure 'python' can be called from the command-line
- Unzip the installation package into some directory
- Enter this directory from the command-line and run 'init.bat'
- Try the 'hello world' example by running 'ss test; make run' (If this fails, it may be because you have an old version of MingW installed. Uninstall or upgrade it and try again.)
- To run a full test suite of 175 examples, run 'unit.py'. This may take a while.
On UNIX:
- Install Python 2.3 or newer; make sure 'python' can be called from the command-line
- Install the Boehm-Demers-Weiser GC version 6.x: http://www.hpl.hp.com/personal/Hans_Boehm/gc/ Make sure to use: ./configure --enable-cplusplus Make sure that you have the development version (i.e. including include files and libgc.so) (Under Gentoo GNU/Linux this whole step is simply: 'emerge boehm-gc') (Under Debian GNU/Linux this whole step is simply: 'apt-get install libgc-dev')
- Make sure you have C++ development tools installed (the compiler has been tested with g++ 3.4/4.0)
- Unzip the installation package into some directory
- Enter this directory from the command-line and run 'python setup.py'
- Copy 'ss' to somewhere in your path. (Under some distros, such as Ubuntu, there may already be a utility named 'ss' - make sure you use the right one)
- Try the 'hello world' example by running 'ss test; make run' (If you get an error during 'make run', you may have to modify the FLAGS file. For example, in case of something like "undefined reference to `dlopen'", add ' -ldl' to the LFLAGS line and rerun 'ss test; make run')
- To run a full test suite of 175 programs, run 'python unit.py'. This may take a while!
USAGE
- Compilation
-
- On Windows XP, make sure you are on a command-line where you have run 'init.bat' (see above)
- Go to the directory with the Python project that has to be compiled
- Compile it using 'ss name', where the project's main module is called 'name.py' (For each program file, a version annotated with type information will be generated, e.g. 'name.ss.py')
- Run 'make' or 'make run' to build (and run) the resulting C++ code.
Unit tests:
- Run 'python unit.py' to perform all unit tests ('-h' gives a list of options)
LIMITATIONS
You can only import functions/variables that exist in lib/*.py. See IMPLEMENTING LIBRARIES.
The type analysis currently does not scale very well beyond a few hundred lines of code.
Do not mix different types in a single variable or container (instance variable):
a = 1
a = "hoi" # wrong
[1, '1'] # wrong
Except in binary tuples (in the future: tuples up to a certain length):
t = (1, [1]) # okay
t = (1, 1.0, "1") # wrong
None may only be mixed with non-scalar types:
l = [1]
l = None # okay
m = 1
m = None # wrong
def fun(x=None): # wrong - use a special value for x here, for example x=-1
return x
fun(1)
Anonymous functions can only be mixed if they have the same (non-generic) signature:
x = lambda x,y: x+y # okay
x = lambda a,b: a-b
somefunc(x)
Try not to mix floats and ints together (mostly temporary):
a = [1.0]
a = [1] # wrong - use a float here, too
Do not pass classes, modules around (temporary):
class bla: pass
x = bla # wrong
Class attributes should always be accessed using a class identifier:
self.class_attr # wrong
bla.class_attr
Generators cannot be methods (yet).
Anonymous function passing works reasonably, but methods cannot be passed around (yet), and anonymous function references cannot be generic (yet)!
Do not use:
-variable numbers of arguments
-more than 3 arguments to 'zip', 'min' and 'max' (temporary)
-arbitrary-size arithmetic
-reflection (getattr, hasattr), eval, or other really dynamic stuff
-multiple/dynamic inheritance, generator expressions, nested functions
-operator overloading may not always work
Finally, the compiler has not been optimized for string usage.
TIPS
- I sometimes see dictionaries used like this:
statistics = {'nodes': 28, 'solutions', set()}
Shed Skin does not support this 'non-uniform' use of container types ('nodes' and 'solutions' have different types.) You can easily code around this problem by using a 'statistics' class:
class statistics: pass s = statistics(); s.nodes = 28; s.solutions = set()
- The type analysis may currently end up in an infinite loop; if this happens, it sometimes helps to run Shedskin with the --infinite command-line option.
- Having abstract types means you can not only use custom classes, but also builtin Python container classes in an abstract way, e.g., as follows:
a = [1,2,3]
a = (1,2,3)
for e in a: print e # 'a' is an abstract sequence of type 'int'!
b = {1: '1', 2: '2', 3: '3'}
b = xrange(10) # 'b' is an abstract iterable of type 'int'!
for e in b: print e
4) Not only builtin Python container classes can be used in a generic way (a list with integers becomes a generic list<int>, for example), but also custom classes. For the following code, 'm1' and 'm2' become a matrix<int> and a matrix<double>, respectively:
class matrix:
def __init__(self, hop):
self.unit = hop
m1 = matrix([1,2])
m2 = matrix([1.0, 2.0])
5) The evaluation order of arguments to 'print' changes with translation to C++. While this may be fixed later, for now it's better not to depend on this order (e.g., by using separate print statements):
print f(), g() # in generated code, g is called before f! print 'hoei', raw_input() # raw_input is called before 'hoei' is printed!
6) While tuples with different types of elements with length > 2 are not yet supported, it can be useful to simulate them:
a = (1, '1', 1.0) # not supported
a = (1, ('1', 1.0)) # supported
IMPLEMENTING LIBRARIES
A library module in Shedskin consists of two parts: a 'type-model', that models the type behaviour of the module, to be used during type analysis; and an implementation in C++, that implements the module or forms a bridge to something that does. In the 'lib' dir you can find many examples. See, for example, the files 'lib/os/path.py' and 'lib/os/path.?pp'.
We show how to implement a module for when there is a pure Python implementation available (e.g., extracted from PyPy). Of course, Shedskin can often be used to generate the C++ implementation part. While the type-model can be the original Python implementation, it is better to create a minimized version of it, that shows how types 'flow' through each function or method. For example, the function random.random can be modeled simply as:
def random(): return 1
The following steps show how to add support for the 'stat' module.
- Save a pure Python implementation somewhere as 'stat.py' (e.g. take a copy from PyPy).
- Write a test program, that uses every part of the module, in such a way that Shedskin can determine every type that 'flows' through the module:
import stat stat.S_ISDIR(1) ..
- Compile the test program and the pure Python implementation of the 'stat' module to C++ (assuming the test program is called 'test.py'):
ss test
- Make sure the test program works:
make run
- Add the 'stat' module to the Shedskin library, by moving 'stat.py' and the newly generated 'stat.?pp' files into the 'lib' dir.
- Optionally, change 'stat.py' into a minimal type-model, by removing anything that is not needed in a type analysis. For example, the definition of 'S_ISDIR' can simply become:
def S_ISDIR(x): return 1
If you cannot start from a pure Python implementation, start by writing a type model, and use Shedskin to create a C++ framework for the module. Then fill in the details yourself and move the result to the 'lib' dir.
If you'd like to implement a nested library module, e.g. os.path, start by creating an empty 'os' dir and place the 'path.py' file in here. Write a test program that simply imports 'os.path', compile it with Shedskin, test the result, and move 'os/path.py' and 'os/path.?pp' to 'lib/os'.
THANKS
Thanks to Google, for sponsoring this project via their SoC program; to Bearophile, for finding bugs/missing features and for keeping me motivated; to Jeff Miller, for implementing the random module; and to Denis de Leeuw Duarte for helping me run Shed Skin under OSX.
CONTACT
mark.dufour@gmail.com
http://mark.dufour.googlepages.com
