zzzeek

Gerrit is Awesome

Thu, 21 Apr 2016 12:10:00 EDT

In the past week I've started using Gerrit for all development on SQLAlchemy, including for pull requests, feature branch development, and bug fixes across maintenance branches. I was first introduced to Gerrit through my involvement in the Openstack project, where initially I was completely bewildered by it. Eventually, I figured out roughly enough what was going on to be productive with it and to ultimately prefer it for adding code changes to a project. But making the big leap of actually moving SQLAlchemy and my other key projects over to it required the extra effort of learning Gerrit a little more fundamentally as well as installing and integrating it with SQLAlchemy's existing workflows.

What is Gerrit?

I don't consider myself to be any authority on Gerrit, so my description here is based on my working impression of what it does; the details may not be entirely accurate. Gerrit is a "code review" tool, intended to allow collaboration around code changes targeted at a project. At that level, what Gerrit is doing can be compared to a pull request - allows proposal of a change, shows you how it's different from what's there already, allows comments on the change including against specific code, and then provides workflow to allow the change to be merged into the code repository. For those familiar with pull requests, this seems exactly the same, yet while I've always been very unsatisifed with pull requests for a long time, I am completely satisifed with Gerrit's model. So to understand that requires more understanding of Gerrit.

At its core, Gerrit is maintaining a set of git repositories that are basically mirrors of your projects' actual git repositories. Then, it adds some additional, non-mirrored (e.g. only to its local copy) reference paths to these git repositories such that you can push commits to them which represent Code Reviews. These commits resemble feature branches in that they are branched off of "master" or another maintenance branch, but the way we work with them is different. The most immediate difference is that unlike a traditional feature branch, the individual "code review" is always a single commit containing the entire new feature all at once, whereas a feature branch may consist of any number of commits. In order to allow additional changes and refinements to the "Code Review" as it proceeds, you create a brand new changeset that encompasses the previous changeset completely, plus your new changes. Since there is no longer a linear history in Git between these refinements, Gerrit adds an additional "Change-Id" identifier to the commit message which is how Gerrit keeps track of each change in a code review. So instead of a long-lived feature branch that changes by adding new revisions to the end of it, you have a series of discrete all-at-once commits, each one containing the whole feature at once.

In ASCII art parlance, the traditional Git branching model, also used by pull requests, can be seen like this:

rev 1 -> rev 2 -> rev 3 -> rev 4 -> rev 5          -> master
                    |
                    +-> rev 3a -> rev 3b -> rev 3c -> feature branch

Above, the feature branch is forked off from the code at some point, and as development continues, a new revision is added to the feature branch. The multiple commits of the feature branch will eventually be merged into master, either using a traditional merge, or by "rebasing" the feature branch onto the top of master.

In Gerrit, the open-ended nature of Git is taken advantage of in an entirely different way, and the mapping of Gerrit to Git might look more like this:

rev 1 -> rev 2 -> rev 3 -> rev 4 -> rev 5 -> master
                    |
                    +-> rev 3a
                    |     |
                    +-> rev 3a/3b
                    |     |
                    +-> rev 3a/3b/3c
                          |
                          v
                       Change Id: Iabcdef

The above model would be difficult to work with using Git alone; Gerrit maintains the list of changesets conforming to a Change Id in its own database, and provides a full blown GUI and command line API on top so that the history of development of the "Feature" is very clear and easy to work with.

Getting Used to It

The major caveat with this whole approach is that there's a non-trivial conceptual hill to climb in order to use it, particularly if you're not accustomed to "git rebase" (as I wasn't). When we are in our code change that we've pushed as a code review, and we want to revise it and push it up as a new version of that code review, we need to replace the old commit with the new one, not add it on as additional history. The most fundamental thing this means is that instead of saying git commit, we have to say, git commit --amend, meaning, squash our current changes into the most recent commit. The hash of the recent commit is replaced with a brand new one, and the old commit is essentially floating, except for the fact that Gerrit will track it.

When we amend a single commit, we get the same commit message and we basically leave it alone; the commit message represents what the change will be a whole, not the little adjustment we're making here. The commentary about adjustments to our feature occurs in the comment stream on the review itself. With Gerrit's approach, you no longer have commits like, "fix whitespace", "add test", "add documentation" - at the end of the day there will be just one commit message with, "Add Feature XYZ; XYZ does QPR ...". I consider that an advantage, because intermediary commits within a feature branch like "fix whitespace" are really just noise; I don't miss them.

The other thing that has to be dealt with is that Gerrit won't allow you to push up a code review that can't be cleanly merged into the working branch it will be a part of. Again, this is a conceptually odd thing to adjust to but when you start doing it you realize what a great idea it is; when we work with tradtional feature branches and pull requests, there's always that time when we realize we've fallen so far behind master, and oh now we have to merge master into our branch, and fix all the conflicts, so that we can merge back up later without making a huge mess.

With Gerrit's approach, nothing ever gets pushed up that can't be immediately merged, which means if you're targeting a project that has a lot of activity, you'll find yourself having to rebase your code reviews every time - but the key word here is "rebase". Instead of merging master back into our feature branch, we are doing a straight up rebase of our single commit against the state of master; and because there's only one commit to deal with, this is usually a very simple process - we aren't burdened with trying to knit the changesets together in just the right way, or worrying about awkward merge artifacts cluttering up our feature branch forever; we flatten everything into our single changeset, and all the clutter of how we merged things together is discarded. As long as you're comfortable with "git rebase", it's a predictable and consistent process. If you are using Gerrit for any amount of time, you will be very comfortable with rebasing :).

Advantages

The advantages to the above Gerrit model are in my opinion huge wins, including:

You always have a feature packaged up as a single, clean commit, so that even with "git show", you can see the change represented by the feature all at once in its entirety, with no stream of trivial "work in progress" commits cluttering it up. Because Gerrit maintains its own database of past versions and commentary, the history of how this change was developed is persisted permanently, but outside of your Git repository.
You don't clutter up your Git repository with tons of old feature branches. Git of course allows you to delete feature branches, but then you lose all the change history. With Gerrit you get to hold on to a permanent record of how a new feature or change was produced and none of it clutters up your main repository.
Any number of developers can collaborate on a single change with no bumps in the process. This is something that is just not practical at all with pull requests; if a user submits a pull request to me, and I'd like to add some changes to it myself, the usual answer is that I need to pull that pull request into my own feature branch somewhere, and change it there - totally separated from where the original pull request is. Or, the developer of that pull request can opt to give my account write access to their repository; however in practice, I've never seen anyone do this, and it also means I'm awkwardly working on my own project via someone else's account. This is the most continuously frustrating thing about pull requests and is a key reason I'm so much happier with Gerrit. With pull requests I often found myself having to painstakingly describe exactly how some part of code should look in order to get the submitter to do it in their branch, so that I wouldn't have to "take it over" and chase them away forever. With a Gerrit code review I can just push up a modification to their change, and the contributor can jump right back on with no bump in continuity. When I've made a lot of changes to someone's change I add "Co-Authored By: " with my name to the commit message so that a viewer can see it was a two-person job.
The "change as a single commit" model means we no longer have to deal with pull requests that consist of two dozen changesets, re-merges of master, weird merge artifacts like where you see zillions of merged commits interspersed with the ones you care about, and so on. When I go to "git merge" someone's pull request, I usually need to go through the effort to manually squash it and add my own changelog notes, which means that the Github or Bitbucket UX has no record at all of me "merging" their pull request, and most horribly on Bitbucket I have to explicitly mark the PR as "DECLINED", when meanwhile I just wanted to squash it. Gerrit on the other hand is built for rebasing and squashing from the ground up, and represents an accurate record of exactly what was merged, with no chance that I went off and modified someone's pull request locally after the fact in a short-lived feature branch. Everything on a code contribution from start to final merge to master is visible exactly within the Gerrit UX, with complete accuracy and permanence.
The workflow model of Gerrit is far richer than that of a pull request and is also fully customizable. Different roles, such as that of developers, contributors, and automated CI systems, can each get their own workflow flags each with different meanings. Typically, a core developer of the project needs to assign a "+2 Approved" flag, the CI systems in play need to assign "+1 Verified" and in Openstack and on my own system there's additionally a "+1 Ready to Merge" flag; when all those are up, you press "submit" and the code review is pushed to the target branch, and its done. With Bitbucket and Gerrit pull requests, I never pushed the button, as I was always having to add changelog notes, make fixes, squash it, fixing merge conflicts, all of which I can now push straight into that same code review under one consistent interface.
Integration with Jenkins is way better for Gerrit than it is with Github, and for Bitbucket I've not even been able to find any Jenkins integration. Once I pull a code change into Gerrit I can easily grant the contributor access to continue working on it, or disallow changes, so that I can on a per-user basis decide who gets to push new changes that automatically kick off builds and not have to worry that someone will submit a malicious pull request when I'm not looking. With the Jenkins Github pull request plugin, it was simply not an option to enable automatic CI unless I were using something like Travis, which currently I'm not.

Overall, with Gerrit I now have a single, consistent way to do all new code; all of my feature branches, all of my random .patch files sitting in my home directory, pull requests from github or bitbucket; all of it goes straight into Gerrit where I have them all under one interface and where changes, history, code review, workflow, and comments on them are permanent, without creating any branch junk in my main git repo!

Gerrit has a ton more features as well, including an extremely detailed permissioning model which you can map to specific paths in each git repo; a Lucene-powered search feature, a replication engine so that changes to the local git repo can be pushed anywhere else; it is itself a full blown http and ssh server for the git repositories as well as for its command-line and GUI interfaces, and generally has all-around industrial style features.

Where Gerrit is falling short is that on the "self-install" side, it is definitely a little obtuse and unforgiving about things. It was not that easy to set up, and you'll likely have to spend a lot of time reading stack traces in the logs to figure out various issues - documentation, links to packages, and other online help is a little scattered and sometimes incomplete, especially for the plugins. It's not very user friendly on the adminstration side yet. But hosting it yourself is still the way to go, so that when you hit a view like "all", you see just your projects and not a huge list of other people's projects.

Implementation

The move to self-hosting services again is kind of a pendulum swing for me; a couple of years ago I was entirely gleeful to move my issue tracking off of Trac and onto Bitbucket, but i think the reason for that was mostly due to Trac's inability to limit spam accounts as well as a burdensome performance model and very little upstream development. With Gerrit, I'm self-hosting again, however all authorization is pushed up to Github with OAuth, so that I'm not dealing with fake user accounts and so far not any spammers either.

I'm using the Gerrit Oauth plugin to provide Github logins, and this plugin also includes support for Google Oauth which I haven't turned on yet, and Bitbucket OAuth which I couldn't get working.

For integration with community-supplied code submissions, I am still using pull requests as the primary entrypoint into the system. Github and Bitbucket require that you leave the "pull request" button on in any case, so people are naturally going to send you code on both of them no matter what you say, so we might as well use them. The pull request allows me to have a quick preview of what we're dealing with, and I then import it into Gerrit using a semi-manual process. To achieve this consistently, I've created the prtogerrit script which communicates with both the Github and Bitbucket APIs to pull in any target pull request, squash it, push it into Gerrit as a new review, and add comments to the pull request referring the contributor to our Gerrit registration page. Pull requests submitted through Bitbucket still necessarily get that big ugly "DECLINED" status, however it is now consistent for all BB pull requests and the messaging is clear that the work is continued on Gerrit.

Sum Up

Since I've been using Gerrit, I've been surprised at how much more productive I became; being able to see every code change being worked on by anyone all under one interface is very freeing, and the workflow that lets me have the entire change from implementation to tests to changelog and migration notes in one clean, mergable commit, fully run through multiple Jenkins CI jobs before it gets merged has made me totally comfortable pushing a big red "merge" button at the end, which was never the case with pull requests and other ad-hoc patchfiles and feature branches.

Asynchronous Python and Databases

Sun, 15 Feb 2015 12:10:00 EST

Note on SQLAlchemy, November, 2020

To quote the last part of this blog post, "I think it's [asyncio] really well done, it's fun to use, and I still am interested in having more of a SQLAlchemy story for it, because I'm sure folks will still want this no matter what anyone says.". To that end, SQLAlchemy 1.4 now includes complete support for asyncio, thanks to an approach that internally uses the "greenlet" library to bridge the gap between asyncio's syntactical requirements and SQLAlchemy's internals, so that we didn't have to rewrite everything. The greenlet approach is also made available as an optional public-facing API feature so that an asyncio-centric application may opt in to making use of traditional ORM patterns such as lazy loading within specific segments of an application.

Read the migration notes for the new asyncio feature at Asynchronous Support for Core and ORM.

The asynchronous programming topic is difficult to cover. These days, it's not just about one thing, and I'm mostly an outsider to it. However, because I deal a lot with relational databases and the Python stack's interaction with them, I have to field a lot of questions and issues regarding asynchronous IO and database programming, both specific to SQLAlchemy as well as towards Openstack.

As I don't have a simple opinion on the matter, I'll try to give a spoiler for the rest of the blog post here. I think that the Python asyncio library is very neat, promising, and fun to use, and organized well enough that it's clear that some level of SQLAlchemy compatibility is feasible, most likely including most parts of the ORM. As asyncio is now a standard part of Python, this compatiblity layer is something I am interested in producing at some point.

All of that said, I still think that asynchronous programming is just one potential approach to have on the shelf, and is by no means the one we should be using all the time or even most of the time, unless we are writing HTTP or chat servers or other applications that specifically need to concurrently maintain large numbers of arbitrarily slow or idle TCP connections (where by "arbitrarily" we mean, we don't care if individual connections are slow, fast, or idle, throughput can be maintained regardless). For standard business-style, CRUD-oriented database code, the approach given by asyncio is never necessary, will almost certainly hinder performance, and arguments as to its promotion of "correctness" are very questionable in terms of relational database programming. Applications that need to do non-blocking IO on the front end should leave the business-level CRUD code behind the thread pool.

With my assumedly entirely unsurprising viewpoint revealed, let's get underway!

What is Asynchronous IO?

Asynchronous IO is an approach used to achieve concurrency by allowing processing to continue while responses from IO operations are still being waited upon. To achieve this, IO function calls are made to be non blocking, so that they return immediately, before the actual IO operation is complete or has even begun. A typically OS-dependent polling system (such as epoll) is used within a loop in order to query a set of file descriptors in search of the next one which has data available; when located, it is acted upon, and when the operation is complete, control goes back to the polling loop in order to act upon the next descriptor with data available.

Non-blocking IO in its classical use case is for those cases where it's not efficient to dedicate a thread of execution towards waiting for a socket to have results. It's an essential technique for when you need to listen to lots of TCP sockets that are arbitrarily "sleepy" or slow - the best example is a chat server, or some similar kind of messaging system, where you have lots of connections connected persistently, only sending data very occasionally; e.g. when a connection actually sends data, we consider it to be an "event" to be responded to.

In recent years, the asynchronous IO approach has also been successfully applied to HTTP related servers and applications. The theory of operation is that a very large number of HTTP connections can be efficiently serviced without the need for the server to dedicate threads to wait on each connection individually; in particular, slow HTTP clients need not get in the way of the server being able to serve lots of other clients at the same time. Combine this with the renewed popularity of so-called long polling approaches, and non-blocking web servers like nginx have proven to work very well.

Asynchronous IO and Scripting

Asynchronous IO programming in scripting languages is heavily centered on the notion of an event loop, which in its most classic form uses callback functions that receive a call once their corresponding IO request has data available. A critical aspect of this type of programming is that, since the event loop has the effect of providing scheduling for a series of functions waiting for IO, a scripting language in particular can replace the need for threads and OS-level scheduling entirely, at least within a single CPU. It can in fact be a little bit awkward to integrate multithreaded, blocking IO code with code that uses non-blocking IO, as they necessarily use different programming approaches when IO-oriented methods are invoked.

The relationship of asynchronous IO to event loops, combined with its growing popularity for use in web-server oriented applications as well as its ability to provide concurrency in an intuitive and obvious way, found itself hitting a perfect storm of factors for it to become popular on one platform in particular, Javascript. Javascript was designed to be a client side scripting language for browsers. Browsers, like any other GUI app, are essentially event machines; all they do is respond to user-initiated events of button pushes, key presses, and mouse moves. As a result, Javascript has a very strong concept of an event loop with callbacks and, until recently, no concept at all of multithreaded programming.

As an army of front-end developers from the 90's through the 2000's mastered the use of these client-side callbacks, and began to use them not just for user-initiated events but for network-initiated events via AJAX connections, the stage was set for a new player to come along, which would transport the ever growing community of Javascript programmers to a new place...

The Server

Node.js is not the first attempt to make Javascript a server side language. However, a key reason for its success was that there were plenty of sophisticated and experienced Javascript programmers around by the time it was released, and that it also fully embraces the event-driven programming paradigm that client-side Javascript programmers are already well-versed in and comfortable with.

In order to sell this, it followed that the "non-blocking IO" approach needed to be established as appropriate not just for the classic case of "tending to lots of usually asleep or arbitrarily slow connections", but as the de facto style in which all web-oriented software should be written. This meant that any network IO of any kind now had to be interacted with in a non-blocking fashion, and this of course includes database connections - connections which are normally relatively few per process, with numbers of 10-50 being common, are usually pooled so that the latency associated with TCP startup is not much of an issue, and for which the response times for a well-architected database, naturally served over the local network behind the firewall and often clustered, are extremely fast and predictable - in every way, the exact opposite of the use case for which non-blocking IO was first intended. The Postgresql database supports an asynchronous command API in libpq, stating a primary rationale for it as - surprise! using it in GUI applications.

node.js already benefits from an extremely performant JIT-enabled engine, so it's likely that despite this repurposing of non-blocking IO for a case in which it was not intended, scheduling among database connections using non-blocking IO works acceptably well. (authors note: the comment here regarding libuv's thread pool is removed, as this only regards file IO.)

The Spectre of Threads

Well before node.js was turning masses of client-side Javascript developers into async-only server side programmers, the multithreaded programming model had begun to make academic theorists complain that they produce non-deterministic programs, and asynchronous programming, having the side effect that the event-driven paradigm effectively provides an alternative model of programming concurrency (at least for any program with a sufficient proportion of IO to keep context switches high enough), quickly became one of several hammers used to beat multithreaded programming over the head, centered on the two critiques that threads are expensive to create and maintain in an application, being inappropriate for applications that wish to tend to hundreds or thousands of connections simultaneously, and secondly that multithreaded programming is difficult and non-deterministic. In the Python world, continued confusion over what the GIL does and does not do provided for a natural tilling of land fertile for the async model to take root more strongly than might have occurred in other scenarios.

How do you like your Spaghetti?

The callback style of node.js and other asynchronous paradigms was considered to be problematic; callbacks organized for larger scale logic and operations made for verbose and hard-to-follow code, commonly referred to as callback spaghetti. Whether callbacks were in fact spaghetti or a thing of beauty was one of the great arguments of the 2000's, however I fortunately don't have to get into it because the async community has clearly acknowledged the former and taken many great steps to improve upon the situation.

In the Python world, one approach offered in order to allow for asyncrhonous IO while removing the need for callbacks is the "implicit async IO" approach offered by eventlet and gevent. These take the approach of instrumenting IO functions to be implicitly non-blocking, organized such that a system of green threads may each run concurrently, using a native event library such as libev to schedule work between green threads based on the points at which non-blocking IO is invoked. The effect of implicit async IO systems is that the vast majority of code which performs IO operations need not be changed at all; in most cases, the same code can literally be used in both blocking and non-blocking IO contexts without any changes (though in typical real-world use cases, certainly not without occasional quirks).

In constrast to implicit async IO is the very promising approach offered by Python itself in the form of the previously mentioned asyncio library, now available in Python 3. Asyncio brings to Python fully standardized concepts of "futures" and coroutines, where we can produce non-blocking IO code that flows in a very similar way to traditional blocking code, while still maintaining the explicit nature of when non blocking operations occur.

SQLAlchemy? Asyncio? Yes?

Now that asyncio is part of Python, it's a common integration point for all things async. Because it maintains the concepts of meaningful return values and exception catching semantics, getting an asyncio version of SQLAlchemy to work for real is probably feasible; it will still require at least several external modules that re-implement key methods of SQLAlchemy Core and ORM in terms of async results, but it seems that the majority of code, even within execution-centric parts, can stay much the same. It no longer means a rewrite of all of SQLAlchemy, and the async aspects should be able to remain entirely outside of the central library itself. I've started playing with this. It will be a lot of effort but should be doable, even for the ORM where some of the patterns like "lazy loading" will just have to work in some more verbose way.

However. I don't know that you really would generally want to use an async-enabled form of SQLAlchemy.

Taking Async Web Scale

As anticipated, let's get into where it's all going wrong, especially for database-related code.

Issue One - Async as Magic Performance Fairy Dust

Many (but certainly not all) within both the node.js community as well as the Python community continue to claim that asynchronous programming styles are innately superior for concurrent performance in nearly all cases. In particular, there's the notion that the context switching approaches of explicit async systems such as that of asyncio can be had virtually for free, and as the Python has a GIL, that all adds up in some unspecified/non-illustrated/apples-to-oranges way to establish that asyncio will totally, definitely be faster than using any kind of threaded approach, or at the very least, not any slower. Therefore any web application should as quickly as possible be converted to use a front-to-back async approach for everything, from HTTP request to database calls, and performance enhancements will come for free.

I will address this only in terms of database access. For HTTP / "chat" server styles of communication, either listening as a server or making client calls, asyncio may very well be superior as it can allow lots more sleepy/arbitrarily slow connections to be tended towards in a simple way. But for local database access, this is just not the case.

1. Python is Very , Very Slow compared to your database

Update - redditor Riddlerforce found valid issues with this section, in that I was not testing over a network connection. Results here are updated. The conclusion is the same, but not as hyperbolically amusing as it was before.

Let's first review asynchronous programming's sweet spot, the I/O Bound application:

I/O Bound refers to a condition in which the time it takes to complete a computation is determined principally by the period spent waiting for input/output operations to be completed. This circumstance arises when the rate at which data is requested is slower than the rate it is consumed or, in other words, more time is spent requesting data than processing it.

A great misconception I seem to encounter often is the notion that communication with the database takes up a majority of the time spent in a database-centric Python application. This perhaps is a common wisdom in compiled languages such as C or maybe even Java, but generally not in Python. Python is very slow, compared to such systems; and while Pypy is certainly a big help, the speed of Python is not nearly as fast as your database, when dealing in terms of standard CRUD-style applications (meaning: not running large OLAP-style queries, and of course assuming relatively low network latencies). As I worked up in my PyMySQL Evaluation for Openstack, whether a database driver (DBAPI) is written in pure Python or in C will incur significant additional Python-level overhead. For just the DBAPI alone, this can be as much as an order of magnitude slower. While network overhead will cause more balanced proportions between CPU and IO, just the CPU time spent by Python driver itself still takes up twice the time as the network IO, and that is without any additional database abstraction libraries, business logic, or presentation logic in place.

This script, adapted from the Openstack entry, illustrates a pretty straightforward set of INSERT and SELECT statements, and virtually no Python code other than the barebones explicit calls into the DBAPI.

MySQL-Python, a pure C DBAPI, runs it like the following over a network:

DBAPI (cProfile):  
     47503 function calls in 14.863 seconds
DBAPI (straight time):  , total seconds 12.962214

With PyMySQL, a pure-Python DBAPI,and a network connection we're about 30% slower:

DBAPI (cProfile):  
     23807673 function calls in 21.269 seconds
DBAPI (straight time):  , total seconds 17.699732

Running against a local database, PyMySQL is an order of magnitude slower than MySQLdb:

DBAPI:  , total seconds 9.121727

DBAPI:  , total seconds 1.025674

To highlight the actual proportion of these runs that's spent in IO, the following two RunSnakeRun displays illustrate how much time is actually for IO within the PyMySQL run, both for local database as well as over a network connection. The proportion is not as dramatic over a network connection, but in that case network calls still only take 1/3rd of the total time; the other 2/3rds is spent in Python crunching the results. Keep in mind this is just the DBAPI alone; a real world application would have database abstraction layers, business and presentation logic surrounding these calls as well:

Local connection - clearly not IO bound.

Network connection - not as dramatic, but still not IO bound (8.7 sec of socket time vs. 24 sec for the overall execute)

Let's be clear here, that when using Python, calls to your database, unless you're trying to make lots of complex analytical calls with enormous result sets that you would normally not be doing in a high performing application, or unless you have a very slow network, do not typically produce an IO bound effect. When we talk to databases, we are almost always using some form of connection pooling, so the overhead of connecting is already mitigated to a large extent; the database itself can select and insert small numbers of rows very fast on a reasonable network. The overhead of Python itself, just to marshal messages over the wire and produce result sets, gives the CPU plenty of work to do which removes any unique throughput advantages to be had with non-blocking IO. With real-world activities based around database operations, the proportion spent in CPU only increases.

2. AsyncIO uses appealing, but relatively inefficient Python paradigms

At the core of asyncio is that we are using the @asyncio.coroutine decorator, which does some generator tricks in order to have your otherwise synchronous looking function defer to other coroutines. Central to this is the yield from technique, which causes the function to stop its execution at that point, while other things go on until the event loop comes back to that point. This is a great idea, and it can also be done using the more common yield statement as well. However, using yield from, we are able to maintain at least the appearance of the presence of return values:

@asyncio.coroutine
def some_coroutine():
    conn = yield from db.connect()
    return conn

That syntax is fantastic, I like it a lot, but unfortunately, the mechanism of that return conn statement is necessarily that it raises a StopIteration exception. This, combined with the fact that each yield from call more or less adds up to the overhead of an individual function call separately. I tweeted a simple demonstration of this, which I include here in abbreviated form:

def return_with_normal():
    """One function calls another normal function, which returns a value."""

    def foo():
        return 5

    def bar():
        f1 = foo()
        return f1

    return bar

def return_with_generator():
    """One function calls another coroutine-like function,
    which returns a value."""

    def decorate_to_return(fn):
        def decorate():
            it = fn()
            try:
                x = next(it)
            except StopIteration as y:
                return y.args[0]
        return decorate

    @decorate_to_return
    def foo():
        yield from range(0)
        return 5

    def bar():
        f1 = foo()
        return f1

    return bar

return_with_normal = return_with_normal()
return_with_generator = return_with_generator()

import timeit

print(timeit.timeit("return_with_generator()",
    "from __main__ import return_with_generator", number=10000000))
print(timeit.timeit("return_with_normal()",
    "from __main__ import return_with_normal", number=10000000))

The results we get are that the do-nothing yield from + StopIteration take about six times longer:

yield from: 12.52761328802444
normal: 2.110536064952612

To which many people said to me, "so what? Your database call is much more of the time spent". Never minding that we're not talking here about an approach to optimize existing code, but to prevent making perfectly fine code more slow than it already is. The PyMySQL example should illustrate that Python overhead adds up very fast, even just within a pure Python driver, and in the overall profile dwarfs the time spent within the database itself. However, this argument still may not be convincing enough.

So, I will here present a comprehensive test suite which illustrates traditional threads in Python against asyncio, as well as gevent style nonblocking IO. We will use psycopg2 which is currently the only production DBAPI that even supports async, in conjunction with aiopg which adapts psycopg2's async support to asyncio and psycogreen which adapts it to gevent.

The purpose of the test suite is to load a few million rows into a Postgresql database as fast as possible, while using the same general set of SQL instructions, such that we can see if in fact the GIL slows us down so much that asyncio blows right past us with ease. The suite can use any number of connections simultaneously; at the highest I boosted it up to using 350 concurrent connections, which trust me, will not make your DBA happy at all.

The results of several runs on different machines under different conditions are summarized at the bottom of the README. The best performance I could get was running the Python code on one laptop interfacing to the Postgresql database on another, but in virtually every test I ran, whether I ran just 15 threads/coroutines on my Mac, or 350 (!) threads/coroutines on my Linux laptop, threaded code got the job done much faster than asyncio in every case (including the 350 threads case, to my surprise), and usually faster than gevent as well. Below are the results from running 120 threads/processes/connections on the Linux laptop networked to the Postgresql database on a Mac laptop:

Python2.7.8 threads (22k r/sec, 22k r/sec)
Python3.4.1 threads (10k r/sec, 21k r/sec)
Python2.7.8 gevent (18k r/sec, 19k r/sec)
Python3.4.1 asyncio (8k r/sec, 10k r/sec)

Above, we see asyncio significantly slower for the first part of the run (Python 3.4 seemed to have some issue here in both threaded and asyncio), and for the second part, fully twice as slow compared to both Python2.7 and Python3.4 interpreters using threads. Even running 350 concurrent connections, which is way more than you'd usually ever want a single process to run, asyncio could hardly approach the efficiency of threads. Even with the very fast and pure C-code psycopg2 driver, just the overhead of the aiopg library on top combined with the need for in-Python receipt of polling results with psycopg2's asynchronous library added more than enough Python overhead to slow the script right down.

Remember, I wasn't even trying to prove that asyncio is significantly slower than threads; only that it wasn't any faster. The results I got were more dramatic than I expected. We see also that an extremely low-latency async approach, e.g. that of gevent, is also slower than threads, but not by much, which confirms first that async IO is definitely not faster in this scenario, but also because asyncio is so much slower than gevent, that it is in fact the in-Python overhead of asyncio's coroutines and other Python constructs that are likely adding up to very significant additional latency on top of the latency of less efficient IO-based context switching.

Issue Two - Async as Making Coding Easier

This is the flip side to the "magic fairy dust" coin. This argument expands upon the "threads are bad" rhetoric, and in its most extreme form goes that if a program at some level happens to spawn a thread, such as if you wrote a WSGI application and happen to run it under mod_wsgi using a threadpool, you are now doing "threaded programming", of the caliber that is just as difficult as if you were doing POSIX threading exercises throughout your code. Despite the fact that a WSGI application should not have the slightest mention of anything to do with in-process shared and mutable state within in it, nope, you're doing threaded programming, threads are hard, and you should stop.

The "threads are bad" argument has an interesting twist (ha!), which is that it is being used by explicit async advocates to argue against implicit async techniques. Glyph's Unyielding post makes exactly this point very well. The premise goes that if you've accepted that threaded concurrency is a bad thing, then using the implicit style of async IO is just as bad, because at the end of the day, the code looks the same as threaded code, and because IO can happen anywhere, it's just as non-deterministic as using traditional threads. I would happen to agree with this, that yes, the problems of concurrency in a gevent-like system are just as bad, if not worse, than a threaded system. One reason is that concurrency problems in threaded Python are fairly "soft" because already the GIL, as much as we hate it, makes all kinds of normally disastrous operations, like appending to a list, safe. But with green threads, you can easily have hundreds of them without breaking a sweat and you can sometimes stumble across pretty weird issues that are normally not possible to encounter with traditional, GIL-protected threads.

As an aside, it should be noted that Glyph takes a direct swipe at the "magic fairy dust" crowd:

Unfortunately, “asynchronous” systems have often been evangelized by emphasizing a somewhat dubious optimization which allows for a higher level of I/O-bound concurrency than with preemptive threads, rather than the problems with threading as a programming model that I’ve explained above. By characterizing “asynchronousness” in this way, it makes sense to lump all 4 choices together.

I’ve been guilty of this myself, especially in years past: saying that a system using Twisted is more efficient than one using an alternative approach using threads. In many cases that’s been true, but:

the situation is almost always more complicated than that, when it comes to performance,

“context switching” is rarely a bottleneck in real-world programs, and

it’s a bit of a distraction from the much bigger advantage of event-driven programming, which is simply that it’s easier to write programs at scale, in both senses (that is, programs containing lots of code as well as programs which have many concurrent users).

People will quote Glyph's post when they want to talk about how you'll have fewer bugs in your program when you switch to asyncio, but continue to promise greater performance as well, for some reason choosing to ignore this part of this very well written post.

Glyph makes a great, and very clear, argument for the twin points that both non-blocking IO should be used, and that it should be explicit. But the reasoning has nothing to do with non-blocking IO's original beginnings as a reasonable way to process data from a large number of sleepy and slow connections. It instead has to do with the nature of the event loop and how an entirely new concurrency model, removing the need to expose OS-level context switching, is emergent.

While we've come a long way from writing callbacks and can now again write code that looks very linear with approaches like asyncio, the approach should still require that the programmer explicitly specify all those function calls where IO is known to occur. It begins with the following example:

def transfer(amount, payer, payee, server):
    if not payer.sufficient_funds_for_withdrawal(amount):
        raise InsufficientFunds()
    log("{payer} has sufficient funds.", payer=payer)
    payee.deposit(amount)
    log("{payee} received payment", payee=payee)
    payer.withdraw(amount)
    log("{payer} made payment", payer=payer)
    server.update_balances([payer, payee])

The concurrency mistake here in a threaded perspective is that if two threads both run transfer() they both may withdraw from payer such that payer goes below InsufficientFunds, without this condition being raised.

The explcit async version is then:

@coroutine
def transfer(amount, payer, payee, server):
    if not payer.sufficient_funds_for_withdrawal(amount):
        raise InsufficientFunds()
    log("{payer} has sufficient funds.", payer=payer)
    payee.deposit(amount)
    log("{payee} received payment", payee=payee)
    payer.withdraw(amount)
    log("{payer} made payment", payer=payer)
    yield from server.update_balances([payer, payee])

Where now, within the scope of the process we're in, we know that we are only allowing anything else to happen at the bottom, when we call yield from server.update_balances(). There is no chance that any other concurrent calls to payer.withdraw() can occur while we're in the function's body and have not yet reached the server.update_balances() call.

He then makes a clear point as to why even the implicit gevent-style async isn't sufficient. Because with the above program, the fact that payee.deposit() and payer.withdraw() do not do a yield from, we are assured that no IO might occur in future versions of these calls which would break into our scheduling and potentially run another transfer() before ours is complete.

(As an aside, I'm not actually sure, in the realm of "we had to type yield from and that's how we stay aware of what's going on", why the yield from needs to be a real, structural part of the program and not just, for example, a magic comment consumed by a gevent/eventlet-integrated linter that tests callstacks for IO and verifies that the corresponding source code has been annotated with special comments, as that would have the identical effect without impacting any libraries outside of that system and without incurring all the Python performance overhead of explicit async. But that's a different topic.)

Regardless of style of explicit coroutine, there's two flaws with this approach.

One is that asyncio makes it so easy to type out yield from that the idea that it prevents us from making mistakes loses a lot of its plausibility. A commenter on Hacker News made this great point about the notion of asynchronous code being easier to debug:

It's basically, "I want context switches syntactically explicit in my code. If they aren't, reasoning about it is exponentially harder."

And I think that's pretty clearly a strawman. Everything the author claims about threaded code is true of any re-entrant code, multi-threaded or not. If your function inadvertently calls a function which calls the original function recursively, you have the exact same problem.

But, guess what, that just doesn't happen that often. Most code isn't re-entrant. Most state isn't shared.

For code that is concurrent and does interact in interesting ways, you are going to have to reason about it carefully. Smearing "yield from" all over your code doesn't solve.

In practice, you'll end up with so many "yield from" lines in your code that you're right back to "well, I guess I could context switch just about anywhere", which is the problem you were trying to avoid in the first place.

In my benchmark code, one can see this last point is exactly true. Here's a bit of the threaded version:

cursor.execute(
    "select id from geo_record where fileid=%s and logrecno=%s",
    (item['fileid'], item['logrecno'])
)
row = cursor.fetchone()
geo_record_id = row[0]

cursor.execute(
    "select d.id, d.index from dictionary_item as d "
    "join matrix as m on d.matrix_id=m.id where m.segment_id=%s "
    "order by m.sortkey, d.index",
    (item['cifsn'],)
)
dictionary_ids = [
    row[0] for row in cursor
]
assert len(dictionary_ids) == len(item['items'])

for dictionary_id, element in zip(dictionary_ids, item['items']):
    cursor.execute(
        "insert into data_element "
        "(geo_record_id, dictionary_item_id, value) "
        "values (%s, %s, %s)",
        (geo_record_id, dictionary_id, element)
    )

Here's a bit of the asyncio version:

yield from cursor.execute(
    "select id from geo_record where fileid=%s and logrecno=%s",
    (item['fileid'], item['logrecno'])
)
row = yield from cursor.fetchone()
geo_record_id = row[0]

yield from cursor.execute(
    "select d.id, d.index from dictionary_item as d "
    "join matrix as m on d.matrix_id=m.id where m.segment_id=%s "
    "order by m.sortkey, d.index",
    (item['cifsn'],)
)
rows = yield from cursor.fetchall()
dictionary_ids = [row[0] for row in rows]

assert len(dictionary_ids) == len(item['items'])

for dictionary_id, element in zip(dictionary_ids, item['items']):
    yield from cursor.execute(
        "insert into data_element "
        "(geo_record_id, dictionary_item_id, value) "
        "values (%s, %s, %s)",
        (geo_record_id, dictionary_id, element)
    )

Notice how they look exactly the same? The fact that yield from is present is not in any way changing the code that I write, or the decisions that I make - this is because in boring database code, we basically need to do the queries that we need to do, in order. I'm not going to try to weave an intelligent, thoughtful system of in-process concurrency into how I call into the database or not, or try to repurpose when I happen to need database data as a means of also locking out other parts of my program; if I need data I'm going to call for it.

Whether or not that's compelling, it doesn't actually matter - using async or mutexes or whatever inside our program to control concurrency is in fact completely insufficient in any case. Instead, there is of course something we absolutely must always do in real world boring database code in the name of concurrency, and that is:

Database Code Handles Concurrency through ACID, Not In-Process Synchronization

Whether or not we've managed to use threaded code or coroutines with implicit or explicit IO and find all the race conditions that would occur in our process, that matters not at all if the thing we're talking to is a relational database, especially in today's world where everything runs in clustered / horizontal / distributed ways - the handwringing of academic theorists regarding the non-deterministic nature of threads is just the tip of the iceberg; we need to deal with entirely distinct processes, and regardless of what's said, non-determinism is here to stay.

For database code, you have exactly one technique to use in order to assure correct concurrency, and that is by using ACID-oriented constructs and techniques. These unfortunately don't come magically or via any known silver bullet, though there are great tools that are designed to help steer you in the right direction.

All of the example transfer() functions above are incorrect from a database perspective. Here is the correct one:

def transfer(amount, payer, payee, server):
    with transaction.begin():
        if not payer.sufficient_funds_for_withdrawal(amount, lock=True):
            raise InsufficientFunds()
        log("{payer} has sufficient funds.", payer=payer)
        payee.deposit(amount)
        log("{payee} received payment", payee=payee)
        payer.withdraw(amount)
        log("{payer} made payment", payer=payer)
        server.update_balances([payer, payee])

See the difference? Above, we use a transaction. To call upon the SELECT of the payer funds and then modify them using autocommit would be totally wrong. We then must ensure that we retrieve this value using some appropriate system of locking, so that from the time that we read it, to the time that we write it, it is not possible to change the value based on a stale assumption. We'd probably use a SELECT .. FOR UPDATE to lock the row we intend to update. Or, we might use "read committed" isolation in conjunction with a version counter for an optimistic approach, so that our function fails if a race condition occurs. But in no way does the fact that we're using threads, greenlets, or whatever concurrency mechanism in our single process have any impact on what strategy we use here; our concurrency concerns involve the interaction of entirely separate processes.

Sum up!

Please note I am not trying to make the point that you shouldn't use asyncio. I think it's really well done, it's fun to use, and I still am interested in having more of a SQLAlchemy story for it, because I'm sure folks will still want this no matter what anyone says.

My point is that when it comes to stereotypical database logic, there are no advantages to using it versus a traditional threaded approach, and you can likely expect a small to moderate decrease in performance, not an increase. This is well known to many of my colleagues, but recently I've had to argue this point nonetheless.

An ideal integration situation if one wants to have the advantages of non-blocking IO for receiving web requests without needing to turn their business logic into explicit async is a simple combination of nginx with uWsgi, for example.

Introduction to SQLAlchemy - Pycon 2013 - Wrapup

Thu, 04 Apr 2013 12:10:00 EDT

Video is up for my Pycon 2013 tutorial Introduction to SQLAlchemy.

For those who want to follow along at home, the full code and prerequisite material is available here:

Introduction to SQLAlchemy - Pycon 2013

Tue, 05 Mar 2013 12:10:00 EST

Preparations are just about complete for my upcoming tutorial Introduction to SQLAlchemy. There's a good crowd of people already attending, and I think registration is still open in case more people want to sign up.

But in any case, if you are coming, this year there is prerequisite material, including the software installs as well as a "Relational Overview" section that covers the basics of SQL and relational databases. Everyone coming to the tutorial should read through this document, so that we're all on roughly the same page regarding upfront SQL knowledge, and try to get the software installed. If there's any issues with the software, please report bugs to me and we'll try to get them resolved by tutorial time. We will also be available at the Wednesday 6:30pm tutorial setup session to help with installs.

Historically, tutorials pack the whole three hours of material up pretty solidly, and I hope I can balance getting lots of coverage versus not talking too fast. Thanks for signing up !

Pycon Canada - the SQLAlchemy Session In Depth

Wed, 14 Nov 2012 08:31:00 EST

Video is up for my Pycon.ca talk, The SQLAlchemy Session - In Depth. In this talk, I delve into the key philosophies behind the design of SQLAlchemy's Session system. Starting with a brief review of the ACID model, I contrast the approach of the so-called "active record" pattern to that of the more explicit Session pattern, and how the two approaches integrate with the ACID model at work within a relational database. Afterwards, I present an HTML animation of a Session object at work.

Video (setting the quality to 480p is recommended)
Slides (pdf format)
HTML Demo (tar.gz)

The Absolutely Simplest Consistent Hashing Example

Sat, 07 Jul 2012 12:01:00 EDT

Lately I've been studying Redis a lot. When using key/value databases like Redis, as well as caches like Memcached, if you want to scale keys across multiple nodes, you need a consistent hashing algorithm. Consistent hashing is what we use when we want to distribute a set of keys along a span of key/value servers in a...well consistent fashion.

If you Google around to learn what consistent hashing means, the article that most directly tells you "the answer" without a lot of handwringing is Consistent Hashing by Tom White. Not only does it explain the concept very clearly, it even has a plain and simple code example in Java.

The recipe in Tom's post is dependent on the capabilities of Java's TreeMap which we don't have in Python, but after some contemplation it became apparent that the functionality of circle.tailMap(hash) is something we already have using bisect, that is, we have a sorted array of integers, and a new number. Where in the array does the new number go? bisect.bisect() will give you that, with the same efficiency as TreeMap.

As a sanity check, I searched a bit more for Python implementations. I found a recipe by Amir Salihefendic, which seems to be based on the Java recipe and is pretty nice, but in the post he's searching the circle for hash values using a linear search, ouch! Turns out Amir is in fact using bisect in his Python Cheese Shop package hash_ring, but by then it was too late, I had already written my own recipe as well as tests (which hash_ring doesn't appear to have, at least in the downloaded distribution). There's also Continuum, taking a slightly more heavy-handed approach (three separate classes and an expensive IndexError being caught to detect keys beyond the circle). Both systems, Continuum more so, seem to encourage using hostnames directly as keys - as noted by Jeremy Zawodny, with a persistent system like Redis this is a bad idea as it means you can't move a particular key set to a new host.

So spending a bit of NIH capital, here's my recipe, which provides a dictionary interface so that you can store hostnames or even actual client instances, keyed to symbolic names:

import bisect
import md5

class ConsistentHashRing(object):
    """Implement a consistent hashing ring."""

    def __init__(self, replicas=100):
        """Create a new ConsistentHashRing.

        :param replicas: number of replicas.

        """
        self.replicas = replicas
        self._keys = []
        self._nodes = {}

    def _hash(self, key):
        """Given a string key, return a hash value."""

        return long(md5.md5(key).hexdigest(), 16)

    def _repl_iterator(self, nodename):
        """Given a node name, return an iterable of replica hashes."""

        return (self._hash("%s:%s" % (nodename, i))
                for i in xrange(self.replicas))

    def __setitem__(self, nodename, node):
        """Add a node, given its name.

        The given nodename is hashed
        among the number of replicas.

        """
        for hash_ in self._repl_iterator(nodename):
            if hash_ in self._nodes:
                raise ValueError("Node name %r is "
                            "already present" % nodename)
            self._nodes[hash_] = node
            bisect.insort(self._keys, hash_)

    def __delitem__(self, nodename):
        """Remove a node, given its name."""

        for hash_ in self._repl_iterator(nodename):
            # will raise KeyError for nonexistent node name
            del self._nodes[hash_]
            index = bisect.bisect_left(self._keys, hash_)
            del self._keys[index]

    def __getitem__(self, key):
        """Return a node, given a key.

        The node replica with a hash value nearest
        but not less than that of the given
        name is returned.   If the hash of the
        given name is greater than the greatest
        hash, returns the lowest hashed node.

        """
        hash_ = self._hash(key)
        start = bisect.bisect(self._keys, hash_)
        if start == len(self._keys):
            start = 0
        return self._nodes[self._keys[start]]

The map is used as a dictionary of node names to whatever you want, such as here we use Redis clients:

import redis
cr = = ConsistentHashRing(100)

cr["node1"] = redis.StrictRedis(host="host1")
cr["node2"] = redis.StrictRedis(host="host2")

client = cr["some key"]
data = client.get("some key")

I wanted to validate that the ring is in fact producing standard deviations like those mentioned in the Java article, so this is tested like the following:

import unittest
import collections
import random
import math

class ConsistentHashRingTest(unittest.TestCase):
    def test_get_distribution(self):
        ring = ConsistentHashRing(100)

        numnodes = 10
        numhits = 1000
        numvalues = 10000

        for i in range(1, 1 + numnodes):
            ring["node%d" % i] = "node_value%d" % i

        distributions = collections.defaultdict(int)
        for i in xrange(numhits):
            key = str(random.randint(1, numvalues))
            node = ring[key]
            distributions[node] += 1

        # count of hits matches what is observed
        self.assertEquals(sum(distributions.values()), numhits)

        # I've observed standard deviation for 10 nodes + 100
        # replicas to be between 10 and 15.   Play around with
        # the number of nodes / replicas to see how different
        # tunings work out.
        standard_dev = self._pop_std_dev(distributions.values())
        self.assertLessEqual(standard_dev, 20)

        # if the stddev is good, it's safe to assume
        # all nodes were used
        self.assertEquals(len(distributions), numnodes)

        # just to test getting keys, see that we got the values
        # back and not keys or indexes or whatever.
        self.assertEquals(
                set(distributions.keys()),
                set("node_value%d" % i for i in range(1, 1 + numnodes))
            )

    def _pop_std_dev(self, population):
        mean = sum(population) / len(population)
        return math.sqrt(
                sum(pow(n - mean, 2) for n in population)
                / len(population)
            )

Server Side Templates and API Centric Development

Mon, 18 Jun 2012 12:01:00 EDT

We're here to talk about the rise of the API-focused application, and how it interacts with templates for rendering HTML for web browsers. The simple point I hope to make is: you don't necessarily need to use all client side templates in order to write an effective API-centric web application.

I'll do this by illustrating a simple web application, with a single API-oriented method (that is, returns JSON-oriented data), where the way it's rendered and the style of template in use is entirely a matter of declarative configuration. Rendering of full pages as well as Ajax delivered "fragments" are covered using both server- and client-side rendering approaches.

API Centric Development

Most dynamic web applications we write today have the requirement that they provide APIs - that is, methods which return pure data for the consumption by a wide variety of clients. The trend here is towards organizing web applications from the start to act like APIs. In a nutshell, it means we are moving away from mode of data models injected into templates:

def get_balance(request):
    balance = BankBalance(
        amount = Amount(10000, currency="euro"),
        as_of_date = datetime.datetime(2012, 6, 14, 12, 0, 0)
    )
    return render_template("balance.html", balance=balance)

and instead towards returning JSON-compatible data structures, with the "view" being decided somewhere else:

@view_config('balance', renderer='json')
def balance(request):
    return {
        'amount':10000,
        'currency':'euro',
        'as_of_date':'2012-06-14 14:12:00'
    }

This is a fine trend, allowing us to make a clearer line between server side concepts and rendering concepts, and giving us an API-centric view of our data from day one.

Templates

There's a misunderstanding circulating about API-centric development which says that we have to use client side templates:

API-centric development. If you take a client-side rendering approach, odds are the server side of your web application is going to look more like an API than it would if you were doing entirely server-side rendering. And if/when you have plans to release an API, you'd probably already be 90% of the way there. (http://openmymind.net/2012/5/30/Client-Side-vs-Server-Side-Rendering/#comment-544974348)

and:

Lastly, another issue i can see, assuming you develop using MVC. You need to have the view, model and controller very tightly coupled to make his way work. The controller needs to know how the view works to create html that can slot right in. It's easier if the controller only has to pass data, not layout information. (http://www.reddit.com/r/programming/comments/ufyf3/clientside_vs_serverside_rendering/c4v5vjb)

This second quote is specific to the approach of delivering rendered HTML in an ajax response, to be directly rendered into a DOM element, which we will also demonstrate here. How the "model" is more tightly coupled to anything when you have a server side vs client side template, I have absolutely no idea.

Why might we want to stick with server side templates? As a Python developer, in my view the main reason is that they are still a lot easier to develop with, assuming we aren't developing our server in Javascript as well. Consider if our bank account balance needed locale-specific number, currency and date formatting, and also needed to convert the timestamp from UTC into a preferred timezone. A server side approach allows us to easily inject more functionality into our template as part of its standard environment for all render calls, such as something like this:

def render_template(request, template, **template_ns):
    number_formatter = number_formatter(request.user.preferred_locale)
    template_ns['currency_format'] = currency_formatter(number_formatter)
    template_ns['date_format'] = date_formatter(
                                request.user.preferred_timezone,
                                request.user.preferred_date_format)
    return lookup.get_template(template).render(**template_ns)

The template can, if it needs to, call upon these methods like this:


    Balance: ${data['amount'] | currency_format(data['currency'])}
    As of: ${data['as_of_date'] | date_format}

With a client side template, we have to implement all of currency_formatter, number_formatter, date_formatter in Javascript. These methods may need to respond to business-specific inputs, such as specific user preferences or other rules, that also need to be pushed up to the client. All the additional Javascript logic we're pushing out to the client brings forth the need for it to be unit tested, which so far means we need to add significant complexity to the testing environment by running a Javascript engine like node.js or something else in order to test all this functionality. Remember, we're not already using node.js to do the whole thing. If we were, then yes everything is different, but I wasn't planning on abandoning Python for web development just yet.

Some folks might argue that elements like date conversion and number formatting should still remain on the server, but just be applied by the API to the data being returned directly. To me, this is specifically the worst thing you can do - it basically means that the client side approach is forcing you to move presentation-level concepts directly into your API's data format. Almost immediately, you'll find yourself having to inject HTML entities for currency symbols and other browser-specific markup into this data, at the very least complicating your API with presentational concerns and in the worst case pretty much ruining the purity of your API data. While the examples here may be a little contrived, you can be sure that more intricate cases come up in practice that present an ongoing stream of hard decisions between complicating/polluting server-generated API data with presentation concepts versus building a much heavier client than initially seemed necessary.

Also, what about performance? Don't client side/server side templates perform/scale/respond worse/better? My position here is "if we're just starting out, then who knows, who cares". If I've built an application and I've observed that its responsiveness or scalability would benefit from some areas switching to client side rendering, then that's an optimization that can be made later. We've seen big sites like LinkedIn switch from server to client side rendering, and Twitter switch from client to server side rendering, both in the name of "performance"! Who knows! Overall, I don't think the difference between pushing out json strings versus HTML fragments is something that warrants concern up front, until the application is more fully formed and specific issues can addressed as needed.

The Alternative

Implementing a larger client-side application than we might have originally preferred is all doable of course, but given the additional steps of building a bootstrap system for a pure client-side approach, reimplementing lots of Python functionality, in some cases significant tasks such as timezone conversion, into Javascript, and figuring out how to unit test it all, is a lot of trouble for something that is pretty much effortless in a server side system - must we go all client-side in order to be API centric?

Of course not!

The example I've produced illustrates a single, fake API method that produces a grid of random integers:

@view_config(route_name='api_ajax', renderer='json')
def api(request):
    return [
        ["%0.2d" % random.randint(1, 99)  for col in xrange(10)]
        for row in xrange(10)
    ]

and delivers it in four ways - two use server side rendering, two use client side rendering. Only one client template and one server side template is used, and there is only one API call, which internally knows nothing whatsoever about how it is displayed. Basically, the design of our server component is un-impacted by what style of rendering we use, save for different routing declarations which we'll see later.

For client side rendering of this data, we'll use a handlebars.js template:

<p>
    API data -
    {{#if is_page}}
        Displayed via server-initiated, client-rendered page
    {{/if}}
    {{#unless is_page}}
        Displayed via client-initiated, client-rendered page
    {{/unless}}
p>

<table>
    {{#each data}}
        <tr>
            {{#each this}}
                <td>{{this}}td>
            {{/each}}
        tr>
    {{/each}}
table>

and for server side rendering, we'll use a Mako template:

<%inherit file="layout.mako"/>

${display_api(True)}

<%def name="display_api(inline=False)">
    
        API data -
        % if inline:
            Displayed inline within a server-rendered page
        % else:
            Displayed via server-rendered ajax call
        % endif
    
            %forrowindata:                            %forcolinrow:                                    % endfor                    % endfor    
    

    
${col}




def>

The Mako template is using a <%def> to provide indirection between the rendering of the full page, and the rendering of the API data. This is not a requirement, but is here because we'll be illustrating also how to dual purpose a single Mako template such that part of it can be used for a traditional full page render as well as for an ajax-delivered HTML fragment, with no duplication. It's essentially a Pyramid port of the same technique I illustrated with Pylons four years ago in my post Ajax the Mako Way, which appears to be somewhat forgotten. Among other things, Pyramid's Mako renderer does not appear integrate the capability to call upon page defs directly, even though I had successfully lobbied to get the critical render_def() into its predecessor Pylons. Here, I've implemented my own Mako renderer for Pyramid.

Key here is that we are separating the concept of how the API interface is constructed, versus what the server actually produces. Above, note we're using the Pyramid "json" renderer for our API data. Note the term "renderer". Interpreting our API method as a JSON API, as a call to a specific client-side template plus JSON API, or as a server side render or ajax call is just a matter of declaration. The way we organize our application in an API-centric fashion has nothing to do with where the rendering takes place. To illustrate four different ways of interpreting the same API method, we just need to add four different @view_config directives:

@view_config(route_name='server_navigate', renderer='home.mako')
@view_config(route_name='server_ajax', renderer='home|display_api.mako')
@view_config(route_name='client_navigate', renderer='display_api.handlebars')
@view_config(route_name='api_ajax', renderer='json')
def api(request):
    return [
        ["%0.2d" % random.randint(1, 99)  for col in xrange(10)]
        for row in xrange(10)
    ]

The rendering methods here are as follows:

Method One, Server Via Server - The api() view method returns the data, which is received by the home.mako template, which renders the full page, and passes the data to the display_api() def within the same phase for all-at-once server side rendering.
Method Two, Server Via Client - A hyperlink on the page initiates an ajax call to the server's server_ajax route, which invokes the api() view method, and returns the data directly to the display_api def present in the home.mako template. For this case, I had to create my own Pyramid Mako renderer that receives a custom syntax, where the page name and def name are separated by a pipe character.
Method Three, Client Via Server - Intrinsic to any client side rendered approach is that the server needs to first deliver some kind of HTML layout, as nothing more than a launching point for all the requisite javascript needed to start rendering the page for real. This method illustrates that, by delivering the handlebars_base.mako template which serves as the bootstrap point for any server-initiated page call that renders with a client side template. In this mode, it also embeds the data returned by api() within the