zzzeek's Guide to Python 3 Porting

January 24, 2011 at 05:36 PM | Code

update 2012-11-18:

This blog post discusses a Python 3 approach that's heavily centered on the 2to3 tool. These days, I'm much more in favor of the "in place" approach, even if still supporting as far back as Python 2.4. Mako 0.7.4 is now an "in place" library, supporting Python2.4-3.x with no changes. For a good introduction to the "in place" approach, see Supporting Python 2 and 3 without 2to3 conversion.

Just the other day, Ben asked me, "OK, where is there an online HOWTO of how to port to Python 3?". I hit the Google expecting to see at least three or four blog posts with the basic steps, an overview, the things Guido laid out for us at Pycon '10 (and maybe even '09 ? don't remember). Surprisingly, other than the link to the 2to3 tool and Guido's original guide, there aren't a whole lot.

So here are my steps which I've used to produce released versions of SQLAlchemy and Mako on Pypi which are cross-compatible for Py2k and Py3k.

1. Make Sure You're Good for 2.6 at Least

Run your test suite with Python 2.6 or 2.7, using the -3 flag. Make sure there's no warnings. Such as, using the following ridiculous program:

def foo(somedict):
    if somedict.has_key("hi"):
        print somedict["hi"]

assert callable(foo)
foo({"hi":"there"})

Running with -3 has some things to say:

classics-MacBook-Pro:~ classic$ python -3 test.py
test.py:5: DeprecationWarning: callable() not supported in 3.x; use isinstance(x, collections.Callable)
  assert callable(foo)
test.py:2: DeprecationWarning: dict.has_key() not supported in 3.x; use the in operator
  if somedict.has_key("hi"):
there

So we fix all those things. If our code needs to support old versions of Python as well, like 2.3 or 2.4, we may have to use runtime version and/or library detection for some things - as an example, Python 2.4 doesn't have collections.Callable. More on that later. For now let's assume we can get our whole test suite to pass without any warnings with the -3 flag.

2. Run the whole library through 2to3 and see how we do

This is the step we're all familiar with. Run the 2to3 tool to get a first pass. Such as, when I run the 2to3 tool on Mako:

classics-MacBook-Pro:mako classic$ 2to3 mako/ test/ -w

2to3 dumps out to stdout everything it's doing, and with the -w flag it also rewrites the files in place. I usually do a clone of my source tree to a second, scratch tree so that I can make alterations to the original, Py2K tree as I go along, which remains the Python 2.x source that gets committed.

It's typical with a larger application or library that some things, or even many things, didn't survive the 2to3 process intact.

In the case of SQLAlchemy, along with the usual string/unicode/bytes types of issues, we had problems regarding the name changes of iteritems() to items() and itervalues() to values() on dictionary types - some of our custom dictionary types would be broken. When your code produces no warnings with -3 and the 2to3 tool is still producing non-working code, there are three general approaches towards achieving cross-compatibility, listed here from lowest to highest severity.

2a. Try to replace idioms that break in Py3K with cross-version ones

Easiest is if the code in question can be modified so that it works on both platforms, as run through the 2to3 tool for the Py3k version. This is generally where a lot of the bytes/unicode issues wind up. Such as, code like this:

hexlify(somestring)

...doesn't work in Py3k, hexlify() needs bytes. So a change like this might be appropriate:

hexlify(somestring.encode('utf-8'))

or in Mako, the render() method returns an encoded string, which on Py3k is bytes. A unit test was doing this:

html_error = template.render()
assert "RuntimeError: test" in html_error

We fixed it to instead say this:

html_error = template.render()
assert "RuntimeError: test" in str(html_error)

2b. Use Runtime Version Flags to Handle Usage / Library Incompatibilities

SQLAlchemy has a util package which includes code similar to this:

import sys
py3k = sys.version_info >= (3, 0)
py3k_flag = getattr(sys, 'py3kwarning', False)
py26 = sys.version_info >= (2, 6)
jython = sys.platform.startswith('java')
win32 = sys.platform.startswith('win')

This is basically getting some flags upfront that we can use to select behaviors specific to different platforms. Other parts of the library can say from sqlalchemy.util import py3k if we need to switch off some runtime behavior for Py3k (or Jython, or an older Python version).

In Mako we use this flag to do things like switching among 'unicode' and 'str' template filters:

if util.py3k:
    self.default_filters = ['str']
else:
    self.default_filters = ['unicode']

We use it to mark certain unit tests as unsupported (skip_if() is a decorator we use in our Nose tests which raises SkipTest if the given expression is True):

@skip_if(lambda: util.py3k)
def test_quoting_non_unicode(self):
    # ...

For our previously mentioned issue with callable() (which apparently is coming back in Python 3.2), we have a block in SQLAlchemy's compat.py module like this, which returns to us callable(), cmp(), and reduce():

if py3k:
    def callable(fn):
        return hasattr(fn, '__call__')
    def cmp(a, b):
        return (a > b) - (a < b)

    from functools import reduce
else:
    callable = __builtin__.callable
    cmp = __builtin__.cmp
    reduce = __builtin__.reduce

2c. Use a Preprocessor

The "runtime flags" approach is probably as far as 90% of Python libraries need to go. In SQLAlchemy, we took a more heavy handed approach, which is to bolt a preprocessor onto the 2to3 tool. The advantage here is that you can handle incompatible syntaxes, you don't need to be concerned about whatever latency a runtime boolean flag might introduce into some critical section, and in my opinion its a little easier to read, particularly in class declarations where you can maintain the same level of indentation.

The preprocessor is part of the SQLAlchemy distribution and you can also download it here. It currently uses a monkeypatch approach to work.

I've mentioned the usage of a preprocessor in some other forums and mentioned it in talks, but as yet I don't know of anyone else using this approach. I would welcome suggestions how we could do this better, such as if there's a way to get a regular 2to3 "fixer" to do it without the need for monkeypatching (I couldn't get that to work - the system doesn't read comment lines for one thing), or otherwise some approach that has similar advantages to the preprocessor.

An example is our IdentityMap dict subclass, paraphrased here, where we had to define iteritems() on the Python 2 platform as returning an iterator, but on Python 3 that needs to be the items() method:

class IdentityMap(dict):
    # ...

    def items(self):
    # Py2K
        return list(self.iteritems())

    def iteritems(self):
    # end Py2K
        return iter(self._get_items())

Above, the "# Py2K / # end Py2K" comments are picked up, and when passed to the 2to3 tool, the code looks like this:

class IdentityMap(dict):
    # ...

    def items(self):
    # start Py2K
    #    return list(self.iteritems())
    #
    #def iteritems(self):
    # end Py2K
        return iter(self._get_items())

We also use it in cases where new syntactical features are useful. When we re-throw DBAPI exceptions, its nice for us to use Python3's from keyword to do it so that we can chain the exceptions together, something we can't do in Python 2:

# Py3K
#raise MyException(e) from e
# Py2K
raise MyException(e), None, sys.exc_info()[2]
# end Py2K

The 2to3 tool turns the above into a with_traceback() call, also it does it incorrectly on Python 2.6 (was fixed in 2.7). The from keyword has a slightly different meaning than with_traceback() in that both exceptions are preserved in a "chain". Run through the preprocessor we get:

# start Py3K
raise MyException(e) from e
# end Py3K
# start Py2K
#raise MyException(e), None, sys.exc_info()[2]
# end Py2K

After the preprocessor modifies the incoming text stream, it passes it off to the 2to3 tool where the remaining Python 2 idioms are converted to Python 3. The tool ignores code that's already Python 3 compatible (luckily).

3. Create a dual-platform distribution with Distutils/Distribute

Now that we have a source tree that becomes a fully working Python 3 application via script, we can integrate this script with our setup.py script using the use_2to3 directive. Clarification is appreciated here, I think the case is that distutils itself allows the flag, but only if you have Distribute installed does it actually work. The guidelines in Porting to Python 3 — A Guide are helpful here, where we reproduce Armin's code example entirely:

import sys

from setuptools import setup

# if we are running on python 3, enable 2to3 and
# let it use the custom fixers from the custom_fixers
# package.
extra = {}
if sys.version_info >= (3, 0):
    extra.update(
        use_2to3=True,
        use_2to3_fixers=['custom_fixers']
    )


setup(
    name='Your Library',
    version='1.0',
    classifiers=[
        # make sure to use :: Python *and* :: Python :: 3 so
        # that pypi can list the package on the python 3 page
        'Programming Language :: Python',
        'Programming Language :: Python :: 3'
    ],
    packages=['yourlibrary'],
    # make sure to add custom_fixers to the MANIFEST.in
    include_package_data=True,
    **extra
)

For SQLAlchemy, we modify this approach slightly to ensure our preprocessor is patched in:

extra = {}
if sys.version_info >= (3, 0):
    # monkeypatch our preprocessor
    # onto the 2to3 tool.
    from sa2to3 import refactor_string
    from lib2to3.refactor import RefactoringTool
    RefactoringTool.refactor_string = refactor_string

    extra.update(
        use_2to3=True,
    )

With the use_2to3 flag, our source distribution can now be built and installed with either a Python 2 or Python 3 interpreter, and if Python 3, 2to3 is run on the source files before installing.

I've seen several packages which maintain two entirely separate source trees, one being the Python 3 version. I sincerely hope less packages choose to do it that way, since it means more work for the maintainers (or alternatively, slower releases for Python 3), more bugs (since unit tests aren't run against the same source tree), and it just doesn't seem like the best way to do things. Eventually, when Python 3 is our default development platform, we'll use 3to2 to maintain the Python 2 version in the other direction.

4. Add the Python :: 3 Classifier!

I forget to do this sometimes, like the example above, remember to add 'Programming Language :: Python :: 3' to your classifiers ! This is the primary method of announcing that your package works with Python 3:

setup(
    name='Your Library',
    version='1.0',
    classifiers=[
        # make sure to use :: Python *and* :: Python :: 3 so
        # that pypi can list the package on the python 3 page
        'Programming Language :: Python',
        'Programming Language :: Python :: 3'
    ],
    packages=['yourlibrary'],
    # make sure to add custom_fixers to the MANIFEST.in
    include_package_data=True,
    **extra
)

Further Reading

Guido's own porting guide:

http://docs.python.org/release/3.0.1/whatsnew/3.0.html

Armin Ronacher's porting guide:

http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide/

Armin again, writing forwards-compatible Python code:

http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/

Dave Beazley, Porting Py65 (and my Superboard) to Python 3:

http://dabeaz.blogspot.com/2011/01/porting-py65-and-my-superboard-to.html