Django-style Database Routers in SQLAlchemy

January 11, 2012 at 06:50 PM | Code, SQLAlchemy

Updated August, 2014 - the approach here really won't work very well if you are using transactions! See the notes inline.

As luck would have it, I'm currently assigned to work with some existing Django code. It's been quite a long time I've gone without needing to do so, but now that I'm there, I can start fleshing out how some of the use cases for Django can be approximated in SQLAlchemy.

Today it's something that's really quite simple - how to mimic the Multiple Databases example in Django's docs. The Django approach is to build a "router" object into the system which can then decide on a query-by-query basis which of several databases should be used for each query. For this, they provide an interface which includes db_for_read(), db_for_write(), allow_relation() and allow_syncdb(). In particular, db_for_read() and db_for_write() are used when emitting most SQL. These receive an argument called model, which the docs state is a "type", i.e. a class. Because the database determination is based on type, we call this in SQLAlchemy vertical partitioning - databases are partitioned along types.

The specific example here has the following behavior:

  1. Write operations occur on a database referred to as leader.
  2. Read operations are split among two different databases referred to as follower1 and follower2.
  3. Operations for a specific subset of classes denoted by myapp occur on a database referred to as other.

When using SQLAlchemy, we of course don't have the concept of "apps" as a delineating factor between different types. So we'll use the old fashioned approach and just use the type itself, that is, what hierarchy it inherits from, to determine which subsystem of the application it heralds from.

The SQLAlchemy Session makes all decisions about what Engine to use within the get_bind() method. This method is given pretty much the same information that db_for_read() and db_for_write() receive, where a Mapper is passed in, if available, and we can also check if the operation is a "write" just by checking if the Session is flushing.

Imports

We'll build up our example in the usual way, as a full script broken out. First the imports:

from sqlalchemy import Column, Integer, String, MetaData, create_engine
from sqlalchemy.orm import scoped_session, sessionmaker, Session
from sqlalchemy.ext.declarative import declarative_base
import random
import shutil

Engine Registry

Then, some Engine instances. Note that, unlike Django, SQLAlchemy is not a framework, and doesn't have a settings.py file or an existing registry of engines. Assuming we can decide on a decent place to put our own registry, we just stick them in a dict:

engines = {
    'leader': create_engine('sqlite:///leader.db',
                            logging_name='leader'),
    'other': create_engine('sqlite:///other.db',
                            logging_name='other'),
    'follower1': create_engine('sqlite:///follower1.db',
                            logging_name='follower1'),
    'follower2': create_engine('sqlite:///follower2.db',
                            logging_name='follower2'),
    }

The example here uses SQLite databases, so that it's quick and easy to actually run the script. However, as I'm not familiar with replication functionality in SQLite, we'll have to "fake" the part where "leader" replicates to "follower", just for the sake of example.

Routing

Next comes our implementation of get_bind(). We're building into a more open-ended space here, which can good or bad thing, depending on where you're coming from. Instead of building two "router" classes with two db routing methods each, we just need to make one method with some conditionals:

class RoutingSession(Session):

    def get_bind(self, mapper=None, clause=None):
        if mapper and issubclass(mapper.class_, OtherBase):
            return engines['other']
        elif self._flushing:
            return engines['leader']
        else:
            return engines[
                    random.choice(['follower1', 'follower2'])
                ]

So above, we use a three-state conditional to make exactly those same choices the Django example does. When we want to detect the classes destined for the "other" engine, we look for a particular base class called OtherBase. We could just as easily do other kinds of checks here, such as checking for a particular attribute on the class.

We also look at a state variable called _flushing to determine operations destined for the leader. The Session._flushing flag is set as soon as the flush procedure begins, and remains on until the method completes. SQLAlchemy uses this flag mainly to prevent re-entrant calls to autoflush when it's already inside of the flush process.

That's all we need in the way of Session, for the moment. To wire this Session up into the standard scoped_session(sessionmaker(autocommit=True)) process, we can pass it into sessionmaker():

Session = scoped_session(sessionmaker(class_=RoutingSession, autocommit=True))

Note also that, with some dependency on the replication system we have in place, using this approach may very well mean we can't use a transaction for our normal work. If we write some rows into the leader, and don't commit our transaction, we won't see those rows if we try to SELECT from the follower because the transaction is isolated from any other node until committed. Hence the session is set up with autocommit=True, which uses an ad-hoc transaction for each SELECT statement, and one for each flush operation.

Model Setup

Next, let's build up the "model". This is the part where we need to come up with an answer for Django's syncdb feature, which of course in SQLAlchemy is MetaData.create_all(). We now have two different schemas - one schema is shared between leader, follower1, and follower2, the other is destined for other. We'll split out our table metadata among these backends using two different MetaData() objects. To achieve that transparently, I'll use a trick I don't think people are quite aware of yet, which is to use abstract declarative bases.

Starting with a useful declarative base that will give us an id and a data column, as well as a decent __repr__():

class Base(object):
    id = Column(Integer, primary_key=True)
    data = Column(String(50))
    def __repr__(self):
        return "%s(id=%r, data=%r)" % (
            self.__class__.__name__,
            self.id, self.data
        )

Base = declarative_base(cls=Base)

We then split out Base into two subtypes, which won't be mapped. To tell declarative not to map these non-mixin, direct descendants of Base, we use a fairly new flag called __abstract__:

class DefaultBase(Base):
    __abstract__ = True
    metadata = MetaData()

class OtherBase(Base):
    __abstract__ = True
    metadata = MetaData()

and holy crap look at that a MetaData on each one! You can do that? Sure can - the classes we build on top of DefaultBase will put the Table objects into one MetaData, the classes we built on top of OtherBase will put them into another. We've basically just replaced the .metadata attribute declarative sets up with our own - there's no magic. The DefaultBase and OtherBase classes are also not mapped and are transparent to the mapping mechanism.

Let's build out three "models" (I really prefer to say "class", let's see if I can get to the end of this post without complaining more about it...):

class Model1(DefaultBase):
    __tablename__ = 'model1'

class Model2(DefaultBase):
    __tablename__ = 'model2'

class Model3(OtherBase):
    __tablename__ = 'model3'

Bang. For those unfamiliar with declarative, because these objects ultimately descend from Base above, they will have an id and a data column available, where id is a surrogate primary key.

Table Creation

We use MetaData.create_all() here. Anything that's DefaultBase should have tables created in leader, follower1, follower2:

for eng in 'leader', 'follower1', 'follower2':
    DefaultBase.metadata.create_all(engines[eng])

OtherBase then goes to other:

OtherBase.metadata.create_all(engines['other'])

Usage

We now have the tables built, we can show off things getting sent to leader within an explicit transaction block:

s = Session()

with s.begin():
    s.add_all([
        Model1(data='m1_a'),
        Model2(data='m2_a'),
        Model1(data='m1_b'),
        Model2(data='m2_b'),
        Model3(data='m3_a'),
        Model3(data='m3_b'),
    ])

If we have SQL echoing on, we'd see transactions starting on each of leader and other, the requisite INSERT statements, then the two COMMIT calls. Note that the BEGIN for other doesn't occur until the unit of work actually comes across SQL destined for this engine:

INFO sqlalchemy.engine.base.Engine.leader BEGIN (implicit)
INFO sqlalchemy.engine.base.Engine.leader INSERT INTO model1 (data) VALUES (?)
INFO sqlalchemy.engine.base.Engine.leader ('m1_a',)
INFO sqlalchemy.engine.base.Engine.leader INSERT INTO model1 (data) VALUES (?)
INFO sqlalchemy.engine.base.Engine.leader ('m1_b',)
INFO sqlalchemy.engine.base.Engine.other BEGIN (implicit)
INFO sqlalchemy.engine.base.Engine.other INSERT INTO model3 (data) VALUES (?)
INFO sqlalchemy.engine.base.Engine.other ('m3_a',)
INFO sqlalchemy.engine.base.Engine.other INSERT INTO model3 (data) VALUES (?)
INFO sqlalchemy.engine.base.Engine.other ('m3_b',)
INFO sqlalchemy.engine.base.Engine.leader INSERT INTO model2 (data) VALUES (?)
INFO sqlalchemy.engine.base.Engine.leader ('m2_a',)
INFO sqlalchemy.engine.base.Engine.leader INSERT INTO model2 (data) VALUES (?)
INFO sqlalchemy.engine.base.Engine.leader ('m2_b',)
INFO sqlalchemy.engine.base.Engine.other COMMIT
INFO sqlalchemy.engine.base.Engine.leader COMMIT

Now let's query. Normally, if we had a Postgresql or MySQL replication environment set up, the data we write to leader would have been copied out to follower1 and follower2. So cover your eyes for a moment while we pretend:

##### PRETEND PRETEND PRETEND ######
# now let's pretend "leader' is replicating to "follower1", "follower2"
shutil.copy("leader.db", "follower1.db")
shutil.copy("leader.db", "follower2.db")
##### END PRETEND END PRETEND END PRETEND ######

(note that SQLAlchemy as of 0.7 doesn't use a connection pool by default when it talks to SQLite - it just opens from the file each time. As long as nobody's talking to the database, we can swap out the file).

We can do some querying - SQL echoing is displayed inline here:

>>> print s.query(Model1).all()
INFO sqlalchemy.engine.base.Engine.follower2 BEGIN (implicit)
INFO sqlalchemy.engine.base.Engine.follower2 SELECT model1.id AS model1_id, model1.data AS model1_data
FROM model1
[Model1(id=1, data='m1_a'), Model1(id=2, data='m1_b')]

>>> print s.query(Model2).all()
INFO sqlalchemy.engine.base.Engine.follower1 BEGIN (implicit)
INFO sqlalchemy.engine.base.Engine.follower1 SELECT model2.id AS model2_id, model2.data AS model2_data
FROM model2
[Model2(id=1, data='m2_a'), Model2(id=2, data='m2_b')]

>>> print s.query(Model3).all()
INFO sqlalchemy.engine.base.Engine.other BEGIN (implicit)
INFO sqlalchemy.engine.base.Engine.other SELECT model3.id AS model3_id, model3.data AS model3_data
FROM model3
[Model3(id=1, data='m3_a'), Model3(id=2, data='m3_b')]

Manual Access

Django also includes a "manual" selection mode via the using() modifier on the QuerySet object. Let's illustrate a modified version of our RoutingSession with a similar method added. We add one more check in get_bind(), to look for a local attribute called name - then we build a quick "generative" method that copies the state from one RoutingSession into another, so that the second session shares the same state:

class RoutingSession(Session):

    def get_bind(self, mapper=None, clause=None ):
        if self._name:
            return engines[self._name]
        elif mapper and issubclass(mapper.class_, OtherBase):
            return engines['other']
        elif self._flushing:
            return engines['leader']
        else:
            return engines[
                    random.choice(['follower1','follower2'])
                ]

    _name = None
    def using_bind(self, name):
        s = RoutingSession()
        vars(s).update(vars(self))
        s._name = name
        return s

So assuming we plugged in the above Session, we can explicitly query the leader as:

m1 = Session().using_bind("leader").query(Model1).first()

Admittedly, the using_bind() method above might be something I should add some helpers for, like a @generative decorator similar to that available for Query, so that the vars.update() approach is not needed.

I hope this example is useful, and sheds some light on how we do things in SQLAlchemy.

Download the example: sqlalchemy_multiple_dbs.py


Alembic Documentation Ready for Review

November 08, 2011 at 12:01 PM | Code, SQLAlchemy

Many of you are aware that for quite some time I've had available a new migrations tool called Alembic. I wrote the majority of this code last year and basically haven't had much time to work on it, save for a little bit of integration at my day job.

Alembic has several areas that improve upon the current crop of migration tools, including:

  • Full control over how migrations run, including multiple database support, transactional DDL, etc.
  • A super-minimal style of writing migrations, not requiring full table definitions for simple operations.
  • No monkeypatching or repurposing of core SQLAlchemy objects.
  • Ability to generate migrations as SQL scripts, critical for working in large organizations on restricted-access production systems.
  • A non-linear versioning model that allows, somewhat rudimentally, for branching and merging of multiple migration file streams.
  • By popular request, designs for some degree of automation will be added, where "obivous" migrations of tables or columns being added or dropped as well as simple column attribute changes can be detected and rendered into migration scripts automatically.

I get asked about migration tools quite often, and I've always mentioned that I have such a tool available in an early form, it just lacks documentation, hoping that one out of so many eager users would be able to chip in a couple of days to help get it going. Turns out it's simply not possible to get someone to document your project for you; so here at the PloneConf sprints I've taken the time at the sprints to create initial documentation. Originally I was going to do some work on Dogpile but I've literally been asked about schema migrations, either in person or via reddit or twitter, about nine times in the past three days.

The documentation for Alembic will provide an overview of the tool, including theory of operation as well as philosophy, and some indicators of features not yet implemented. I would like feedback on the current approach. Alembic is not yet released though can be checked out from Bitbucket for testing. It's a pretty simple tool, 1163 lines of code at the moment, with plenty of room to support a lot more database features in a straightforward way.

I also have an open question about the notion of "automatic" migrations - how does one discern detecting whether or not a table or column has had a name change, or if a new table/column was added and an old one dropped? Just a curiosity as I attempt to avoid reading South's source code.


Value Agnostic Types, Part II

October 29, 2011 at 12:01 PM | Code, SQLAlchemy

In last week's post I illustrated how to use a "value agnostic type" in conjunction with an ORM-mapped class, using the @hybrid_proprerty decorator. Pressed for details about the hypothetically better TypeDecorator approach, we'll illustrate that here. The TypeDecorator approach as I've mentioned has a few non-intuitive edges, though after working out the example they're not so bad. I do in fact use the TypeDecorator approach in my day job as often as I use the pure hybrid approach.

On SQL Objects and Types

The aspect of the approach I'm a little unsatisfied with is that SQLAlchemy currently doesn't provide a first class pathway to linking operators with the types they operate upon. This is a situation I'd like to improve in an upcoming release, but let's see what the general idea is.

SQLAlchemy separates the concerns of a "SQL-representation" object and a "type-representation" object. That is, when you deal with a Column associated with a table, it is distinct from its type:

someint = Column('int_col', Integer)
somestring = Column('str_col', String)

where above, we have two Column objects. Each has a type attribute, which refers to the SQL type of the column. Above, these are illustrated as Integer and String.

I've seen other systems that do things more directly:

someint = Integer("int_col")
somestring = String("str_col")

That approach is at first simpler, since you can now put special operators upon Integer and String specific to those types, thereby linking Python's typing behavior directly to that of the database. However this fails to model how the relational database actually works. Suppose we derive new SQL constructs from someint and somestring, a "label" and a "scalar subquery", respectively:

label_of_someint = someint.label("some label")
subquery = select([somestring]).as_scalar()

Now what's the type-dependent behavior of those two objects? In this case, the rationale for SQLAlchemy's approach is clear - each has a type attribute copied straight from the origin expression, which applies the same behavior to the derived construct as the original. On the other hand, the "expression-is-the-type" system would in theory require IntegerLabel, StringLabel, IntegerSubquery, StringSubquery classes - an explosion of subclasses and behaviors awkwardly linked together. Welding the datatype to the SQL expression type doesn't really work.

SQLAlchemy's approach is more comprehensive, but at the moment only has limited hooks into the type when an expression is generated. When an expression is constructed, the type is consulted specifically for a few things, including coercion of the operator, such as coercing the add() operator to the concat() operator in the case of strings, as well as hints on what the return type of the expression should be, and what kind of type the "other" side should be treated as, if not already known. It doesn't yet have a way to add new operators to the SQL construct which contains the type, or to replace a value with a SQL expression - something that would in particular help a lot with things like PostGIS. These problems are entirely solvable, however, and may occur in a future release.

As it turns out, these limitations aren't terrible in the case of our Amount type, and we'll use the TypeDecorator to create a new "currency" type that will work just as well without the ORM, that is with plain SQL expressions, as with it. ORM-mapped classes no longer will need to use @hybrid_property to coerce columns into the type. And we'll even get Amount objects back directly in result sets. Overall, after I tried this out, I realized it's definitely the better way to go.

Reintroducing the Amount type

We'll reuse the Amount type from last week. It's mostly the same, except we'll remove __clause_element__(), and additionally provide a classmethod version of the as_currency() method, which we'll use when dealing with SQL expressions. Because a SQLAlchemy custom type can't export new operations to the enclosing SQL construct, special operations on currency types will be done via this function called externally to the target, rather than a method call from the SQL expression construct itself:

from sqlalchemy.ext.hybrid import hybrid_method
from sqlalchemy import type_coerce, Numeric

class Amount(object):
    def __init__(self, amount, currency):
        self.currency = currency
        self.amount = amount

    def __add__(self, other):
        return Amount(
                self.amount +
                other.as_currency(self.currency).amount,
                self.currency
            )

    def __sub__(self, other):
        return Amount(
                self.amount -
                other.as_currency(self.currency).amount,
                self.currency
            )

    def __lt__(self, other):
        return self.amount < other.as_currency(self.currency).amount

    def __gt__(self, other):
        return self.amount > other.as_currency(self.currency).amount

    def __eq__(self, other):
        return self.amount == other.as_currency(self.currency).amount

    @hybrid_method
    def as_currency(self, other_currency):
        return Amount(
                    currency_lookup[(self.currency, other_currency)] * self.amount,
                    other_currency)

    @as_currency.expression
    def as_currency(self, target, other_currency):
        if hasattr(target, '__clause_element__'):
            target = target.__clause_element__()
        currency = target.type.currency
        return Amount(
                    currency_lookup[(currency, other_currency)]
                     * type_coerce(target, Numeric),
                    other_currency)

    def __str__(self):
        return "%s %s" % (self.amount, self.currency)

    def __repr__(self):
        return "Amount(%s)" % self

The basic thing we've added above is a class level version of as_currency(), which accepts a target and a currency. This would be used like:

Amount.as_currency(mytable.c.balance, "usd")

This method needs to take a peek at what it gets to see if it's directly a SQL expression, or an ORM-oriented expression that needs the __clause_element__() method to resolve down to the SQL expression. Consider this to be "rough edge" number one, if you will, though there's not much to it.

Another special step in that method is that we're using type_coerce(value, Numeric) to coerce the target, which will be a SQL expression using our new TypeDecorator, so that it's treated as a Numeric when multiplying by the exchange rate. If we didn't do that, the type coercion rule used by our type would take effect and assume the exchange rate is also an Amount type, which is not the case. While that coercion rule can be customized to handle an incoming Decimal value, it's more appropriately left as is, since expressions against Amount should always be using another Amount object.

Oddly enough we used @hybrid_method to provide the "dual purposed" behavior of as_currency() at the instance level and at the class level. Nothing to do with the ORM here, just a handy place to use it.

Our TypeDecorator is then pretty much from the docs:

from sqlalchemy import TypeDecorator, Numeric

class AmountType(TypeDecorator):
    impl = Numeric

    def __init__(self, currency):
        self.currency = currency
        super(AmountType, self).__init__()

    def process_bind_param(self, value, dialect):
        if value is not None:
            value = value.as_currency(self.currency).amount
        return value

    def process_result_value(self, value, dialect):
        if value is not None:
            value = Amount(value, self.currency)
        return value

Now if we modify our mapping from last week, we can take out the hybrid, and just use AmountType directly:

class BankAccount(Base):
    __tablename__ = 'bank_account'
    id = Column(Integer, primary_key=True)

    balance = Column('balance', AmountType("usd"))

    def __repr__(self):
        return "BankAccount(%s)" % self.balance

And that's it ! What changes with this usage, and what stays the same ? Most simple things stay the same, such as creating new BankAccount objects:

session.add_all([
    account,
    BankAccount(balance=Amount(10000, "cad")),
    BankAccount(balance=Amount(5700, "usd")),
    BankAccount(balance=Amount(89682, "aud")),
])

as well as basic comparison operations:

>>> print session.query(BankAccount).\
...    filter(BankAccount.balance == Amount(10000, "cad")).one()
SELECT bank_account.id AS bank_account_id,
        bank_account.balance AS bank_account_balance
FROM bank_account
WHERE bank_account.balance = ?
(9886.11,)
BankAccount(9886.1100 usd)

A change is when we want to use an Amount specific operator on the SQL side, namely as_currency(). We use the new class-level version, which produces a new Amount object given against the underlying balance column, coerced into a Numeric and multiplied by the exchange rate:

>>> print Amount.as_currency(BankAccount.balance, "cad")
:param_1 * bank_account.balance cad

In other words, "(exchange rate * numeric balance) in canadian dollars". The returned Amount object maintains the "hybrid" behavior we saw last week:

>>> print session.query(BankAccount).\
...    filter(Amount.as_currency(BankAccount.balance, "cad") > Amount(9999, "cad")).\
...    filter(Amount.as_currency(BankAccount.balance, "cad") < Amount(10001, "cad")).\
...    one()
SELECT bank_account.id AS bank_account_id, bank_account.balance AS bank_account_balance
FROM bank_account
WHERE ? * bank_account.balance > ? AND ? * bank_account.balance < ?
(1.01152, 9999.0, 1.01152, 10001.0)
BankAccount(9886.1100 usd)

With the typed approach, we can finally get Amount objects straight back from a result set:

>>> print session.query(BankAccount.balance).all()
SELECT bank_account.balance AS bank_account_balance FROM bank_account
[(Amount(4000.0000 usd),), (Amount(9886.1100 usd),), (Amount(5700.0000 usd),), (Amount(92338.3808 usd),)]

and it works without the ORM too !

>>> from sqlalchemy import select
>>> bank_account = BankAccount.__table__
>>> for amount, in session.execute(
...            select([bank_account.c.balance]).\
...            where(Amount.as_currency(bank_account.c.balance, "cad") > Amount(9999, "cad"))
...            ).fetchall():
...     print "Amount:", amount, \
...                 "As USD:", amount.as_currency("usd"), \
...                 "As CAD:", amount.as_currency("cad")
SELECT bank_account.balance FROM bank_account
WHERE ? * bank_account.balance > ?
(1.01152, 9999.0)
Amount: 9886.1100 usd As USD: 9886.1100 usd As CAD: 9999.9980 cad
Amount: 92338.3808 usd As USD: 92338.3808 usd As CAD: 93402.1190 cad

Downloads

Download the revised version of account_currency.py:


Hybrids and Value Agnostic Types

October 21, 2011 at 12:01 PM | Code, SQLAlchemy

Update: This is now part one of a two part series. Be sure to read Value Agnostic Types, Part II for an updated approach using TypeDecorator.

A "hybrid" in SQLAlchemy is this awesome idea thought up by Ants Aasma. The hybrid takes advantage of the ability of Python classes to have an attribute which has customizable behavior both at the class level as well as at the instance level. What's that mean? Just this:

class my_descriptor(object):
    def __get__(self, instance, owner):
        if instance is None:
            return "I'm at the class level!"
        else:
            return "I'm at the instance level!"

class MyClass(object):
    some_attribute = my_descriptor()

Above, my_descriptor is what's known as a Python Descriptor. It's a way to have an attribute on an object where the value produced comes from a function. Above, we've made one that will tell you one thing when invoked on the class, i.e. class level behavior:

>>> MyClass.some_attribute
I'm at the class level!

And another at the instance level, i.e. instance level behavior:

>>> my_object = MyClass()
>>> my_object.some_attribute
I'm at the instance level!

That's all we're dealing with here. As it turns out, descriptors are very commonly used in object relational mappers and other "declarative" kinds of systems, where attributes are defined at the class level and maintain behavior on the class, as well as on instances. Suppose we have a SQLAlchemy class like this:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, Numeric
Base = declarative_base()

class BankAccount(Base):
    __tablename__ = 'bank_account'
    id = Column(Integer, primary_key=True)

    balance = Column(Numeric)
    pending = Column(Numeric)

We've made a new class BankAccount, which represents something like a bank account. It contains fields representing the current account balance, as well as a "pending" balance - say it's checks that haven't cleared yet.

Let's set up a sqlite database as usual and save some accounts:

from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)

from sqlalchemy.orm import Session
session = Session(engine)
session.add_all([
    BankAccount(balance=4000, pending=500),
    BankAccount(balance=10000, pending=0),
])
session.commit()

Descriptors are used here as follows: first, the BankAccount class, as a result of being mapped by SQLAlchemy, has three descriptors on it, named id, balance, and pending. These descriptors have class- and instance-level behavior. When invoked at the class level, they produce SQL expressions:

>>> print BankAccount.balance + BankAccount.pending
balance + pending
>>> print select([BankAccount.id, BankAccount.balance])
SELECT id, balance FROM bank_account

Invoked at the instance level, they provide access to values that are present in __dict__, the same way any other Python object works:

>>> account = BankAccount()
>>> account.balance = 1000
>>> account.pending = 500
>>> account.__dict__
{'balance': 1000, 'pending': 500, '_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x1014dde50>}

(_sa_instance_state is SQLAlchemy's tracking object for this particular BankAccount). SQLAlchemy's descriptors, used at the instance level, also keep track of changes to data, and can also invoke queries against the database to load more data if needed.

Taking Advantage of the Descriptor with a Hybrid

The main thing to understand about the previous section is that the attributes of BankAccount, that is id, balance, and pending, are ultimately powered by Python functions, which work both on an instance as well as on a class. The first argument passed to these functions is either self, that is, an instance of BankAccount, or cls, that is, the BankAccount class itself. Both self and cls have the same attributes present on them, and in many ways can be treated identically; we often don't have to care that we're dealing with self or cls. Hybrids take advantage of this symmetry and allow us to expand upon it. The core idea of the Hybrid is not as much a SQLAlchemy thing as it is a general Python thing - you can use the same idea with most other ORMs that feature usage of descriptors in this way.

To illustrate, let's give our BankAccount class a new data member called total, representing the balance amount plus the pending amount. When we have a BankAccount loaded in memory, this is simple addition of the existing values, and there's no need to go out to the database for this. We'll use the well known Python descriptor helper called @property so that we can say myaccount.total as though it were a regular attribute:

class BankAccount(Base):
    __tablename__ = 'bank_account'
    id = Column(Integer, primary_key=True)

    balance = Column(Numeric)
    pending = Column(Numeric)

    @property
    def total(self):
        return self.balance + self.pending

With the above, we can get the total amount present in an account:

>>> BankAccount(balance=1000, pending=500).total
1500

Simple enough. But the hybrid can add a new twist to the above, which is that we can transparently calculate the area over on the database side as well. The expression we have:

self.balance + self.pending

can be written against the BankAccount class, using a query, like this:

>>> print session.query(BankAccount.balance + BankAccount.pending)
SELECT bank_account.balance + bank_account.pending AS anon_1
FROM bank_account

The hybrid provides a single descriptor that can emit either an expression against the instance, or against the class, where self might be either an instance of BankAccount, or the BankAccount class:

from sqlalchemy.ext.hybrid import hybrid_property

class BankAccount(Base):
    __tablename__ = 'bank_account'
    id = Column(Integer, primary_key=True)

    balance = Column(Numeric)
    pending = Column(Numeric)

    @hybrid_property
    def total(self):
        return self.balance + self.pending

So we can now do this:

>>> print session.query(BankAccount.total).filter(BankAccount.total.between(3000, 5000)).all()
SELECT bank_account.balance + bank_account.pending AS anon_1
FROM bank_account
WHERE bank_account.balance + bank_account.pending BETWEEN ? AND ?
(3000, 5000)
[(Decimal('4500.0000000000'),)]

We can see that the total attribute works at the SQL level by emitting a SQL expression, at the instance level like a regular @property:

>>> myaccount = session.query(BankAccount).first()
>>> print myaccount.total
4500.0000000000

While this is a simple example, hybrids have more options for real world cases, particularly the very common case that the SQL expression needs to be defined separately. That, and everything else hybrids can do, are described fully in the document Hybrid Attributes on the SQLAlchemy site.

Value Agnostic Types

Now to introduce a variant on this technique which is not in the docs. The behavior of the hybrid will be offloaded into a new type that's generally not SQLAlchemy specific. We'll apply the technique towards the problem of representing a monetary amount in a currency-agnostic fashion, where exchange rates can be applied relatively transparently. We'll introduce first how we'll work with amounts and exchange rates, starting with a simple lookup dictionary of rates:

from decimal import Decimal

# US dollars, British pounds, Canadian dollar, Euro, Australian dollar
symbols = ('usd', 'gbp', 'cad', 'eur', 'aud')
currency_lookup = dict(
            ((currency_from, currency_to), Decimal(str(rate)))
            for currency_to, values in zip(
                symbols,
                [
                    (1, 1.59009, 0.988611, 1.37979, 1.02962),
                    (0.628895, 1, 0.621732, 0.867748, 0.647525),
                    (1.01152, 1.6084, 1, 1.39569, 1.04148),
                    (0.724743, 1.1524, 0.716489, 1, 0.746213),
                    (0.971228, 1.54434, 0.960166, 1.34009, 1),
                ])
            for currency_from, rate in zip(symbols, values)
    )

The above creates a dictionary currency_lookup which contains exchange rates between pairs of symbols. If we had the value of 5000 in US dollars (USD), and wanted to convert to Australian dollars (AUD), we'd look up the key ("usd", "aud") in the dictionary, giving us a rate to multiply by 5000:

>>> print currency_lookup[("usd", "aud")] * 5000
4856.140000

Next we'll build an interface for this dictionary, an Amount object that represents the combination of a monetary amount, and a particular currency symbol. The Amount object acts like a numeric value, but intercepts Python math and comparison operations to first coerce the "right" side of an equation to use the same currency as the "left" side:

class Amount(object):
    def __init__(self, amount, currency):
        self.currency = currency
        self.amount = amount

    def __add__(self, other):
        return Amount(
                self.amount +
                other.as_currency(self.currency).amount,
                self.currency
            )

    def __sub__(self, other):
        return Amount(
                self.amount -
                other.as_currency(self.currency).amount,
                self.currency
            )

    def __lt__(self, other):
        return self.amount < other.as_currency(self.currency).amount

    def __gt__(self, other):
        return self.amount > other.as_currency(self.currency).amount

    def __eq__(self, other):
        return self.amount == other.as_currency(self.currency).amount

    def as_currency(self, other_currency):
        return Amount(
                    currency_lookup[(self.currency, other_currency)] * self.amount,
                    other_currency)

    def __str__(self):
        return "%2.4f %s" % (self.amount, self.currency)

Above, while we have only implemented a small set of Python comparison and math operations, the Amount implementation above can handle addition, subtraction, the equals operator, and the less than/greater than operations. All coercion is via a single method as_currency() which just looks up the exchange rate in the currency_lookup table, and converts the "other" side of any equation into a new Amount that is against the source currency:

>>> # Convert 5000 US dollars into Canadian dollars
>>> print Amount(5000, "usd").as_currency("cad")
5057.6000 cad

>>> # Calculate 5000 USD + 1000 USD:
>>> print Amount(5000, "usd") + Amount(1000, "usd")
6000.000 usd

>>> # Calculate 5000 USD + 500 GBP:
>>> print Amount(5000, "usd") + Amount(500, "gbp")
5795.0450 usd

>>> # Take the above and convert to CAD:
>>> print (Amount(5000, "usd") + Amount(500, "gbp")).as_currency("cad")
5861.8039 cad

The thing to keep in mind going forward is that, while we're about to do something more interesting with our Amount class, it will always be used as an instance of an Amount - we're not going to map this class at all. What we're instead going to do is take advantage of the hybrid approach and apply it to the argument we pass to Amount. Suppose we passed a SQLAlchemy expression construct to Amount, instead of a numeric value. What would happen ?

>>> from sqlalchemy.sql import column
>>> balance_col = column('balance')
>>> print Amount(balance_col, "aud").as_currency("eur").amount
:balance_1 * balance

Passing in a sqlalchemy.sql.column object to Amount instead of a numeric value, we got a SQL expression straight out of the Amount object, which itself had no idea it wasn't handling a number. Note we needed to access .amount in order to display it, as the expression would not have made it through our number formatting Amount.__str__() method. But otherwise, we can pass any type of object into Amount which accommodates basic calculations and it will work fine.

Mapping with Value Agnostic Types

To apply the above Amount object to a mapping, we'll again go back to hybrids. The technique we'll show here uses the object model to coerce between Amount and a raw numeric value. Another option is to use custom types via TypeDecorator - this is detailed in the next post, Value Agnostic Types, Part II. A great advantage of using a hybrid with the value agnostic type is that the whole thing is almost totally orthogonal to SQLAlchemy itself, with very few moving parts or library dependencies.

Let's start up a new BankAccount class, this time the BankAccount will specifically be an account that stores funds in US dollars. We'll store just a balance for now, and the public interface for the balance will be via the Amount object:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, Numeric
from sqlalchemy.ext.hybrid import hybrid_property
Base = declarative_base()

class BankAccount(Base):
    __tablename__ = 'bank_account'
    id = Column(Integer, primary_key=True)

    _balance = Column('balance', Numeric)

    @hybrid_property
    def balance(self):
        return Amount(self._balance, "usd")

    @balance.setter
    def balance(self, value):
        self._balance = value.as_currency("usd").amount

    def __repr__(self):
        return "BankAccount(%s)" % self.balance

The balance column is applied to a private attribute _balance, and the public accessor balance uses a hybrid to coerce to/from an Amount object, hardcoding "usd" as the base exchange currency. Given a new BankAccount, we can operate on "balance" at the Python level, using Amount objects as value objects:

>>> account = BankAccount(balance=Amount(4000, "usd"))
>>> print account.balance
4000.0000 usd

>>> print account.balance.as_currency("gbp")
2515.5800 gbp

>>> print account.balance > Amount(500, "cad")
True

>>> print account.balance + Amount(500, "cad") - Amount(50, "eur")
4425.3160 usd

All we're doing here is accessing the .balance attribute on an instance of BankAccount, where the Amount instance returned acts against the underlying value.

But note we've applied @hybrid_property to .balance. That means we want to use it as a SQL expression too. Used in a SQL context, it returns the same Amount object, passing in the ._balance Column object, working as it did earlier. Starting with some test data:

from sqlalchemy import create_engine
from sqlalchemy.orm import Session
from sqlalchemy import func

engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)

session = Session(engine)

session.add_all([
    account,
    BankAccount(balance=Amount(10000, "cad")),
    BankAccount(balance=Amount(5700, "usd")),
    BankAccount(balance=Amount(89682, "aud")),
])
session.commit()

Above, all the balances are coerced from their source currencies to USD, via the @balance.setter portion of our hybrid, which is using value.as_currency("usd").

We can then query BankAccount objects using the .balance column more or less as is, such as in a simple comparison:

>>> print session.query(BankAccount).\
...    filter(BankAccount.balance == Amount(10000, "cad")).one()
SELECT bank_account.balance AS bank_account_balance, bank_account.id AS bank_account_id
FROM bank_account
WHERE bank_account.balance = ?
(9886.11,)
BankAccount(9886.1100 usd)

Notice above the value was converted on the Python side before being converted to a SQL expression. This is due to the coercion occurring upon the right side, which is an Amount object against a numeric in-Python value.

If we told the left side to convert itself, then we'd be getting the currency conversions on the SQL side as well:

>>> print session.query(BankAccount).\
...    filter(BankAccount.balance.as_currency("cad") > Amount(9999, "cad")).\
...    filter(BankAccount.balance.as_currency("cad") < Amount(10001, "cad")).\
...    one()
SELECT bank_account.balance AS bank_account_balance, bank_account.id AS bank_account_id
FROM bank_account
WHERE ? * bank_account.balance > ? AND ? * bank_account.balance < ?
(1.01152, 9999.0, 1.01152, 10001.0)
BankAccount(9886.1100 usd)

So far the above operations take advantage of the fact that the __eq__(), __lt__() and __gt__() methods of Amount are returning SQL expressions. To interpret the Amount object directly, at the moment we need to use the .amount accessor:

>>> # print all amounts as CAD
>>> print session.query(BankAccount.balance.as_currency("cad").amount).all()
SELECT ? * bank_account.balance AS anon_1
FROM bank_account
(1.01152,)
[(Decimal('4046.0800000000'),), (Decimal('9999.9979872000'),), (Decimal('5765.6640000000'),), (Decimal('93402.1189872768'),)]

>>> # return the average balance in EUR
>>> print session.query(func.avg(BankAccount.balance.as_currency("eur").amount)).scalar()
SELECT avg(? * bank_account.balance) AS avg_1
FROM bank_account
(0.724743,)
20279.1228162

The situation can be improved in most cases if we add a __clause_element__() method to our Amount class - the full example script includes this. There's also a few other rough edges to this approach which I may try to address in a future release. But overall, I use this pattern quite a bit as presented in several ways. It in fact stretches all the way back to similar patterns I used when I was primarily a Java programmer, though the Python version is much more flexible. It allows you to forget about certain kinds of numeric conversion details which now take care of themselves, even in SQL expressions, in a simple and transparent way.

Downloads


SQLAlchemy - an Architectural Retrospective

September 25, 2011 at 12:01 PM | Code, SQLAlchemy, Talks

The video of my PyGotham talk, SQLAlchemy, an Architectural Retrospective is now available at pyvideo. This talk details my current thinking on SQLAlchemy philosophy and then proceeds through architectural overviews of several key areas, including the core operation of the Unit of Work. The accompanying slides are available in PDF form as well. Enjoy !