Archive for the ‘SQLAlchemy’ Category.

SQLAlchemy - Breaking the 80% Barrier Since Day One

As the 0.5 release of SQLAlchemy has now become available, here's a little retrospective on the "relational" nature of SQLAlchemy. SQLAlchemy's equal treatment of any relation and its deep transformational abilities of such are the core value that makes us highly unique within the database access/ORM field. It's the key to our 80% Busting power which allow an application architected around SQLAlchemy to smoothly co-evolve with an ever-more complex set of queries and schemas.

To illustrate this, we'll walk through an 0.5 feature that draws upon the three years of effort that's gone into this capability.

As is typical, we start in an entirely boring way:

from sqlalchemy import Column, Integer, Unicode, create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
 
Base = declarative_base()
 
class Foo(Base):
    __tablename__ = 'foo'
    id = Column(Integer, primary_key=True)
    data = Column(Unicode)
 
    def __repr__(self):
        return "Foo(%r)" % self.data
 
engine = create_engine('sqlite:///:memory:', echo=True)
Base.metadata.create_all(engine)
 
session = sessionmaker(engine)()

To those unfamiliar with SQLA, the above example is specific to the ORM portion of SQLAlchemy (as opposed to the SQL expression language, an independent library upon which the ORM builds). It consists of the requisite imports, a declarative_base() class which offers us an easy platform with which to construct database-enabled classes, a Foo mapped class with some pretty generic columns, and the specification of a datasource, CREATE TABLE statements as needed, and an ORM session to pull it together. Everything above is detailed in the Object Relational Tutorial (now with Sphinx!).

The data we'll start with is five objects with predictable data values:

session.add_all([
    Foo(data=u'f1'),
    Foo(data=u'f2'),
    Foo(data=u'f3'),
    Foo(data=u'f4'),
    Foo(data=u'f5'),
])
 
session.commit()

In 0.5, we can now query individual columns at the ORM level. So starting with a query like this:

query = session.query(Foo.id, Foo.data)

We can receive the results of this query using all() (we'll move to doctest format where the output can be viewed):

>>> print query.all()
SELECT foo.id AS foo_id, foo.data AS foo_data 
FROM foo
[]
[(1, u'f1'), (2, u'f2'), (3, u'f3'), (4, u'f4'), (5, u'f5')]

The individual column query feature is nice - but still very boring ! It's nothing you can't do with any other tool, and SQLA actually lagged behind a bit in offering this capability in a straightforward way at the ORM level, which is partially because it was always possible with the SQL expression language part of SQLAlchemy, and partially because our Query has a broad usage contract that took a while to adapt to this model. Water under the bridge....

Things remain patently boring as we decide to limit the results to just the first three rows:

>>> print query.limit(3).all()
SELECT foo.id AS foo_id, foo.data AS foo_data 
FROM foo 
LIMIT 3 OFFSET 0
[]
[(1, u'f1'), (2, u'f2'), (3, u'f3')]

Did I see something flicker in the corner ? Not really, we're just adding a descending order by so that we get the last three rows instead. Yaawwwnnnnn:

>>> print query.order_by(Foo.data.desc()).limit(3).all()
SELECT foo.id AS foo_id, foo.data AS foo_data 
FROM foo ORDER BY foo.data DESC 
LIMIT 3 OFFSET 0
[]
[(5, u'f5'), (4, u'f4'), (3, u'f3')]

Barely staying awake, we'd like to sort the above rows in name ascending order so that we get ['f3', 'f4', 'f5'] instead. To do this, it's, errr, umm.......

Eighty Percent Time !!!

What just happened ? Proceeding through an entirely boring series of modifications to our query, we've suddenly hit something that is not entirely obvious. This is where any afternoon-coded SQL tool falls off, because to limit by the last three rows, and then order the limited results in reverse, requires a subquery. Not just a scalar subquery like WHERE x=(SELECT y FROM table) either - a subquery that acts the same way as we've been treating our table - a "selectable" which delivers rows that correspond to our Foo class, to which we can then apply an ascending ORDER BY.

Let's be fair. You could get the results you want using a WHERE subquery, such as in conjunction with IN. "SELECT * FROM foo WHERE id IN (SELECT id FROM foo ORDER BY data DESC LIMIT 3) ORDER BY data" would do it. There's other ways too like EXISTS, or maybe issuing a JOIN to the subquery (another tall order for some tools). However, let me respectfully say that this is lame. You're rearranging your SQL and introducing potentially reduced query optimization because your tool won't let you do the most obvious thing (or more importantly, exactly what you want to do). This is a typical 80% boundary. You pull this one last Jenga stick out and the whole thing collapses, as your tool no longer supports the natural progression of expression construction that direct SQL offers you. You need to drop into raw SQL or you need to restructure your whole query to work around the tool's limitations.

Before we continue, here's a pop quiz. What would be the expected behavior of the following:

query = session.query(Foo.id, Foo.data)
query = query.order_by(Foo.data.desc()).limit(3)
query = query.order_by(Foo.data)
print query.all()

Where above, we order by data descending, then LIMIT 3, then order by data ascending. Does it:

  • a. Issue ORDER BY data DESC, data and then issue the LIMIT ?
  • b. Issue ORDER BY data DESC LIMIT 3, and then ORDER BY on a subquery of the preceding statement ?

As it turns out, the answer to this question is subjective. Depending on the perspective one comes from, we've observed based on talking to our community that some expect "a" and some expect "b". So in refusing to guess, here's what it does:

>>> query = session.query(Foo.id, Foo.data)
>>> query = query.order_by(Foo.data.desc()).limit(3)
>>> query = query.order_by(Foo.data)
Traceback (most recent call last):
    ...
sqlalchemy.exc.InvalidRequestError: Query.order_by() being called on a Query which already has LIMIT or OFFSET applied. To modify the row-limited results of a Query, call from_self() first.  Otherwise, call order_by() before limit() or offset() are applied.

We went with "please tell us which answer you'd like". Specifying from_self() means "yes, we really want to wrap the whole thing in a subquery before continuing":

>>> query = query.from_self().order_by(Foo.data)
>>> print query.all()
SELECT anon_1.foo_id AS anon_1_foo_id, anon_1.foo_data AS anon_1_foo_data 
FROM (SELECT foo.id AS foo_id, foo.data AS foo_data 
FROM foo ORDER BY foo.data DESC 
 LIMIT 3 OFFSET 0) AS anon_1 ORDER BY anon_1.foo_data
[]
[(3, u'f3'), (4, u'f4'), (5, u'f5')]

The mechanism by which SQLAlchemy uses to wrap tables in subqueries but consistently target the columns back to our mapped columns and entities is called Column Correspondence - this is something I described in detail in this blog post. I'm not familiar with any formalized system that describes column correspondence from a relational standpoint, but if someone out there is, I'd appreciate the education. I'm not at all a formalist and I've built this whole thing in my garage.

We still haven't hit the "retrospective" and/or "day one" part of the story yet. from_self() is just one of many ways to get at SQLAlchemy's "relational" guts, the big "under the hood" feature that's taken (and continues to take) a long time to get right. In the old days these guts were exposed in the ORM in extremely limited ways. As we've progressed, we're able to allow more generalized access to it, such as via from_self(). Let's illustrate another from_self() example that can link back to one of SQLA's original "80% busters".

Like all SQLAlchemy examples, we add a second class Bar, relate it to Foo, and add some more data:

from sqlalchemy import ForeignKey
from sqlalchemy.orm import relation, backref
 
class Bar(Base):
    __tablename__ = 'bar'
    id = Column(Integer, primary_key=True)
    data = Column(Unicode)
    foo_id = Column(Integer, ForeignKey('foo.id'))
    foo = relation(Foo, backref=backref('bars', collection_class=set))
 
    def __repr__(self):
        return "Bar(%r)" % self.data
 
Base.metadata.create_all(engine)
 
f3 = session.query(Foo).filter(Foo.data==u'f3').one()
f5 = session.query(Foo).filter(Foo.data==u'f5').one()
 
session.add_all([
    Bar(data=u'b1', foo=f3),
    Bar(data=u'b2', foo=f5),
    Bar(data=u'b3', foo=f5),
    Bar(data=u'b4', foo=f5),
    ])
session.commit()

We've added four Bar objects, each of which references a Foo object via many-to-one. The corresponding Foo object references each Bar via a one-to-many collection. For background, this is also ORM Tutorial stuff.

Let's now do the obvious thing of selecting all the data at once. Just for fun we'll do the query like this (yes, you can mix columns and full entities freely):

>>> query = session.query(Foo.id, Foo.data, Bar).outerjoin(Foo.bars)
>>> print query.all()
SELECT foo.id AS foo_id, foo.data AS foo_data, bar.id AS bar_id, 
  bar.data AS bar_data, bar.foo_id AS bar_foo_id 
FROM foo LEFT OUTER JOIN bar ON foo.id = bar.foo_id
[]
[(1, u'f1', None), (2, u'f2', None), (3, u'f3', Bar(u'b1')), (4, u'f4', None), (5, u'f5', Bar(u'b2')), (5, u'f5', Bar(u'b3')), (5, u'f5', Bar(u'b4'))]

Above we're using outerjoin(Foo.bars) to say "outer join from the foo table to the bar table". Foo.bars was configured via the backref for Bar.foo. We can see we get f5 back three times since three Bar rows match.

So what if we'd like to select all the Foo and Bar rows, but like before we want to get the last three Foos ? We can't do the same thing we did earlier - a straight LIMIT will be limited by the total number of rows, including the multiple f5 rows:

>>> print query.order_by(Foo.data.desc()).limit(3).all()
SELECT foo.id AS foo_id, foo.data AS foo_data, bar.id AS bar_id, 
  bar.data AS bar_data, bar.foo_id AS bar_foo_id 
FROM foo LEFT OUTER JOIN bar ON foo.id = bar.foo_id ORDER BY foo.data DESC 
 LIMIT 3 OFFSET 0
[]
[(5, u'f5', Bar(u'b2')), (5, u'f5', Bar(u'b3')), (5, u'f5', Bar(u'b4'))]

Once again, to get the right data, we can do some less straightforward IN or EXISTS methodology, or we can do the most direct thing and join bar to the subquery of foo which we'd like. Query allows us to build up the statement exactly as we'd do it when thinking in SQL:

>>> query = session.query(Foo).order_by(Foo.data.desc()).limit(3)

from_self() will create the subuquery for us, but we also want to change the columns we're selecting from, since we'll be adding Bar via an outer join. For this purpose from_self() takes the same parameters as session.query():

>>> print query.from_self(Foo.id, Foo.data, Bar).outerjoin(Foo.bars).\
...     order_by(Foo.data).all()
SELECT anon_1.foo_id AS anon_1_foo_id, anon_1.foo_data AS anon_1_foo_data, 
  bar.id AS bar_id, bar.data AS bar_data, bar.foo_id AS bar_foo_id 
FROM (SELECT foo.id AS foo_id, foo.data AS foo_data 
FROM foo ORDER BY foo.data DESC 
LIMIT 3 OFFSET 0) AS anon_1 
LEFT OUTER JOIN bar ON anon_1.foo_id = bar.foo_id 
ORDER BY anon_1.foo_data
[]
[(3, u'f3', Bar(u'b1')), (4, u'f4', None), (5, u'f5', Bar(u'b2')), (5, u'f5', Bar(u'b3')), (5, u'f5', Bar(u'b4'))]

Above, we can see that not only does the subquery created by from_self() target the result columns to our Foo entity, it also adapts the join criterion so that everything just works.

Those who have worked with eager loading might recognize the above query - it is in fact the same kind of query that's been available since 0.1, that of "eager loading" a set of rows with a LEFT OUTER JOIN, but intelligently wrapping the primary query in a subquery so that LIMIT/OFFSET remains effective. The basic idea like this:

>>> from sqlalchemy.orm import eagerload
>>> for f in session.query(Foo).options(eagerload(Foo.bars)).\
...             order_by(Foo.data.desc()).limit(3):
...     print f, [b for b in f.bars]
... 
SELECT anon_1.foo_id AS anon_1_foo_id, anon_1.foo_data AS anon_1_foo_data, 
  bar_1.id AS bar_1_id, bar_1.data AS bar_1_data, 
  bar_1.foo_id AS bar_1_foo_id 
FROM (SELECT foo.id AS foo_id, foo.data AS foo_data 
FROM foo ORDER BY foo.data DESC 
LIMIT 3 OFFSET 0) AS anon_1 
LEFT OUTER JOIN bar AS bar_1 ON anon_1.foo_id = bar_1.foo_id 
 ORDER BY anon_1.foo_data DESC
[]
Foo(u'f5') [Bar(u'b2'), Bar(u'b3'), Bar(u'b4')]
Foo(u'f4') []
Foo(u'f3') [Bar(u'b1')]

Where above, we just select the Foo objects, and the Bar rows are delivered to collections attached to each Foo object. The fact that we've applied limit(3) indicates that the selection of Foo rows should take place within a subquery, to which the Bar rows are left outer joined. We didn't render the query quite as cleanly in 0.1 but by the 0.4 series we had gotten it to this point.

Finally, we can adapt our eager loaded query above to look just like the "last three rows of foo, outer joined to bar, ordered by 'data'" query by combining the eagerload() with from_self():

>>> for f in session.query(Foo).order_by(Foo.data.desc()).\
...             limit(3).from_self().options(eagerload(Foo.bars)).\
...             order_by(Foo.data):
...     print f, [b for b in f.bars]
... 
SELECT anon_1.foo_id AS anon_1_foo_id, anon_1.foo_data AS anon_1_foo_data, 
 bar_1.id AS bar_1_id, bar_1.data AS bar_1_data, 
 bar_1.foo_id AS bar_1_foo_id 
FROM (SELECT foo.id AS foo_id, foo.data AS foo_data 
FROM foo ORDER BY foo.data DESC 
LIMIT 3 OFFSET 0) AS anon_1 
LEFT OUTER JOIN bar AS bar_1 ON anon_1.foo_id = bar_1.foo_id 
ORDER BY anon_1.foo_data
[]
Foo(u'f3') [Bar(u'b1')]
Foo(u'f4') []
Foo(u'f5') [Bar(u'b2'), Bar(u'b3'), Bar(u'b4')]

I'm super excited about the 0.5 release (not to mention 0.6 for which we have a lot planned) since it represents the coming together of the original vision of offering the full relational model, years of feedback from real users with lots of production experience, and an ever more solid maturity to the internals which at this point have probably had about three full turnovers in construction.

Timing All Queries

A few people have been asking about this one as of late, it's quite easy to do with a ConnectionProxy. Here's a quick recipe (yes, this is 0.5):

from sqlalchemy.interfaces import ConnectionProxy
import time
import logging
 
logging.basicConfig()
logger = logging.getLogger("myapp.sqltime")
logger.setLevel(logging.DEBUG)
 
class TimerProxy(ConnectionProxy):
    def cursor_execute(self, execute, cursor, statement, parameters, context, executemany):
        now = time.time()
        try:
            return execute(cursor, statement, parameters, context)
        finally:
            total = time.time() - now
            logger.debug("Query: %s" % statement)
            logger.debug("Total Time: %f" % total)
 
if __name__ == '__main__':
    from sqlalchemy import *
 
    engine= create_engine('sqlite://', proxy=TimerProxy())
 
    m1 = MetaData(engine)
    t1 = Table("sometable", m1, 
            Column("id", Integer, primary_key=True),
            Column("data", String(255), nullable=False),
        )
 
    conn = engine.connect()
    m1.create_all(conn)
 
    conn.execute(
        t1.insert(), 
        [{"data":"entry %d" % x} for x in xrange(100000)]
    )
 
    conn.execute(
        t1.select().where(t1.c.data.between("entry 25", "entry 7800")).order_by(desc(t1.c.data))
    )

Output:

DEBUG:myapp.sqltime:Query: PRAGMA table_info("sometable")
DEBUG:myapp.sqltime:Total Time: 0.000233
DEBUG:myapp.sqltime:Query: 
CREATE TABLE sometable (
    id INTEGER NOT NULL, 
    data VARCHAR(255) NOT NULL, 
    PRIMARY KEY (id)
)
 
 
DEBUG:myapp.sqltime:Total Time: 0.000397
DEBUG:myapp.sqltime:Query: INSERT INTO sometable (data) VALUES (?)
DEBUG:myapp.sqltime:Total Time: 1.147780
DEBUG:myapp.sqltime:Query: SELECT sometable.id, sometable.data 
FROM sometable 
WHERE sometable.data BETWEEN ? AND ? ORDER BY sometable.data DESC
DEBUG:myapp.sqltime:Total Time: 1.064579

ConnectionProxy features two methods that can be overridden, execute() and cursor_execute(). The difference is that the former hooks onto the overall execute() method of Connection and is given the application level arguments, such as the ClauseElement and parameter dictionary, whereas the latter is called at the lower level where we have a string SQL statement and bind parameters compiled and formatted against the specific DBAPI in use.

Selecting Booleans

Ticket 798, allow conjunctions to act as column elements, is complete after a long wait, finally attended to for the great reason that I suddenly needed this feature myself ;). A few tweaks and we can now talk about asking yes/no questions of our database.

As we head into the 0.5 era of SQLAlchemy, one theme is that if we're using the ORM, we're going to need select() constructs a lot less, if at all. Most things can now be done using Query objects. The other day, I needed to ask my database a question regarding if some particular data were available, and that's it. The query needed to be as fast as possible so I wanted to use EXISTS, so that the database needs to only access the first row of the selected rows in order to give an answer. Additionally, I needed to ask this question of a few different sets of criterion, which could be most efficiently achieved by combining them together into a single statement using OR.

Using the current trunk of SQLAlchemy 0.5, we can use Query() against boolean values:

from sqlalchemy import *
from sqlalchemy.orm import *
engine = create_engine('sqlite://', echo=True)
Session = scoped_session(sessionmaker(bind=engine))
 
true, false = literal(True), literal(False)
 
(ret, ), = Session.query(true)
print ret

which returns:

True

The question we just asked our SQLite database is SELECT 1, where "1" is SQLite's way of representing true (in Postgres it's true, in MySQL it can be 1 or true. Using literal(True) means we don't have to worry about this). SQLite tells us, "yes, True is True". Note the way I'm getting the row out of Query. At the moment that seems to be a fun way to go, but you could also say:

ret, = Session.query(true).one()

and of course:

ret = Session.query(true).one()[0]

We can also ask our database about boolean values combined with OR and AND:

>>> (ret, ), = Session.query(or_(true, false))
>>> ret
True
 
>>> (ret, ), = Session.query(and_(true, false))
>>> ret
False

To demonstrate EXISTS, let's build a table. We're using declarative, which is pretty much the only way I roll these days (hey, better late than never....):

from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Keyword(Base):
    __tablename__ = "keyword"
    id = Column(Integer, primary_key=True)
    name = Column(String(255))
 
Base.metadata.create_all(engine)

One way we could find out if our Keyword table has any of a certain set of keywords, is to do something familiar and ask for a count:

keywords = ["beans", "lentils", "legumes"]
(ret, ), = Session.query(func.count(Keyword.id)).filter(Keyword.name.in_(keywords))

But to eliminate the need for the database to actually count the full set of rows, we can ask it just if any rows at all exist, using the exists() construct:

(ret, ), = Session.query(exists().where(Keyword.name.in_(keywords)))

Which issues the SQL:

SELECT EXISTS (SELECT * FROM keyword WHERE keyword.name IN (?, ?, ?)) AS anon_1

The return value is boolean, indicating True for rows were found, False for no rows were found. In my case, I was querying among a bunch of tables. Let's suppose we have a database which represents cookbooks and individual recipes, both of which have lists of keywords associated. Let's declare that up:

recipe_keywords = Table("recipe_keyword", Base.metadata, 
                    Column("recipe_id", Integer, ForeignKey("recipe.id")),
                    Column("keyword_id", Integer, ForeignKey("keyword.id"))
                    )
 
cookbook_keywords = Table("cookbook_keyword", Base.metadata, 
                    Column("cookbook_id", Integer, ForeignKey("cookbook.id")),
                    Column("keyword_id", Integer, ForeignKey("keyword.id"))
                    )
 
class Recipe(Base):
    __tablename__ = "recipe"
    id = Column(Integer, primary_key=True)
    description = Column(Text)
    keywords = relation(Keyword, secondary=recipe_keywords)
 
class Cookbook(Base):
    __tablename__ = "cookbook"
    id = Column(Integer, primary_key=True)
    description = Column(Text)
    keywords = relation(Keyword, secondary=cookbook_keywords)
 
Base.metadata.create_all(engine)

The above schema defines a recipe and a cookbook table, each of which relate to keyword via the recipe_keyword or cookbook_keyword association tables, respectively.

The question of "Do any recipes feature any of our above three keywords?" can be answered by:

(ret, ), = Session.query(
    exists().where(Recipe.id==recipe_keywords.c.recipe_id).
            where(recipe_keywords.c.keyword_id==Keyword.id).
            where(Keyword.name.in_(keywords))
)

Above we ask "do any rows exist, where the ID of the recipe table matches the recipe ID of the recipe_keywords table, and the keyword ID of the recipe_keywords table matches the ID of a keyword table row, and the name of the keyword is in the list ["beans", "lentils", "legumes"]". SQL is:

SELECT EXISTS (SELECT * 
FROM recipe, recipe_keyword, keyword 
WHERE (recipe.id = recipe_keyword.recipe_id AND recipe_keyword.keyword_id = keyword.id) 
AND keyword.name IN (?, ?, ?)) AS anon_1

Spelling out the full join from Recipe to Keyword above is a little bit verbose. We have the option to let SQLAlchemy create this for us using the Recipe.keywords relation already set up, using the ORM level join() function, which provides direct access to the ORM's join algorithm:

from sqlalchemy.orm import join
(ret, ), = Session.query(
        exists().select_from(join(Recipe, Keyword, Recipe.keywords)).
                where(Keyword.name.in_(keywords))
)

This generates:

SELECT EXISTS (SELECT * 
FROM recipe JOIN recipe_keyword AS recipe_keyword_1 ON 
recipe.id = recipe_keyword_1.recipe_id JOIN keyword ON 
keyword.id = recipe_keyword_1.keyword_id 
WHERE keyword.name IN (?, ?, ?)) AS anon_1

We now know how to ask the database if it has any recipe rows which relate to a given set of keyword names. To ask the database if it has any cookbook rows or recipe rows matching our keywords at the same time, we can double up on exists() clauses using or_():

(ret, ), = Session.query(or_(
        exists().select_from(join(Recipe, Keyword, Recipe.keywords)).
                where(Keyword.name.in_(keywords)),
 
        exists().select_from(join(Cookbook, Keyword, Cookbook.keywords)).
            where(Keyword.name.in_(keywords))
))

At this point, our eyes begin to glaze over when viewing the SQL itself. But that's fine; just remember that whenever this happens, somewhere in the world another "ORM's aren't worth it" blog post fades away:

SELECT ((EXISTS (SELECT * 
FROM recipe JOIN recipe_keyword AS recipe_keyword_1 ON recipe.id = recipe_keyword_1.recipe_id  
JOIN keyword ON keyword.id = recipe_keyword_1.keyword_id 
WHERE keyword.name IN (?, ?, ?))) OR (EXISTS (SELECT * 
FROM cookbook JOIN cookbook_keyword AS cookbook_keyword_1 ON 
cookbook.id = cookbook_keyword_1.cookbook_id JOIN keyword ON 
keyword.id = cookbook_keyword_1.keyword_id 
WHERE keyword.name IN (?, ?, ?)))) AS anon_1

Those readers who are familiar with some of SQLA's advanced Query operators might recognize that the above EXISTS queries look a lot like an any() expression. This is because, they are! Though in this case, almost. As a review, an any() expression creates an EXISTS clause which is correlated with the enclosing query, such as:

rows = Session.query(Recipe).filter(Recipe.keywords.any(Keyword.name.in_(keywords))).all()

This query produces a regular SELECT from the recipe table, then embeds a correlated EXISTS inside the WHERE clause:

SELECT recipe.id AS recipe_id, recipe.description AS recipe_description 
FROM recipe 
WHERE EXISTS (SELECT 1 
FROM recipe_keyword, keyword 
WHERE (recipe.id = recipe_keyword.recipe_id AND keyword.id = recipe_keyword.keyword_id) AND
keyword.name IN (?, ?, ?))

When any() is used, it explicitly states that the exists() clause should correlate() to the recipe table, which allows it to work in scenarios where SQLAlchemy's usual "auto" correlation cannot make the right decision (we have tests which illustrate this). So to use any() in our "column-based" approach, we just need to turn off that correlation using correlate(None). Our "recipes/cookbooks which exist" query can be stated as:

(ret, ), = Session.query(or_(
    Recipe.keywords.any(Keyword.name.in_(keywords)).correlate(None),
    Cookbook.keywords.any(Keyword.name.in_(keywords)).correlate(None)
    ))

While the any() approach above is nice, the correlation part of it has me preferring the more explicit exists() version.

SQLAlchemy code swarm

Django's already got one, and Brian Rosner made one for us too: link. SQLA's pre-release development is apparent here which is why it floats around "zzzeek" for so long.

Reddit.com Goes Open Source

Reddit has opened their source up, and we can now see just what they've been up to. It's been known for some time that Reddit was (re)built using Pylons and Mako templates, contrary to their FAQ which still states that they use web.py. As it turns out, they've also built something of their own database layer, which seems to include a homegrown caching layer and ultimately is built on top of SQLAlchemy, using the SQLA expression language to generate queries. Connections are served with the QueuePool, and they use the threadlocal setting, so that they can get implicit access to transactions in progress. They vertically partition their database access among four separate engines across four distinct areas of functionality on the site.

This is currently the highest volume website I'm aware of using SQLAlchemy and Pylons, and is a testament to the stability of our core components (I hope). Python in general is not too prominent in New York City where I work; Java, PHP and .NET are still the default "goto" platforms, and most developers here look at you kind of funny when you mention Python. Look how well-known Java advocate Ted Neward says "even Python!", as though we're the most fringe Java alternative imaginable. I hope examples like Reddit continue to illustrate that Python presents the best mix of performance, stability, and rapid development for web development today, not to mention one of the broadest software ecosystems in the field (which I've always maintained is a good thing).