Hybrids and Value Agnostic Types

October 21, 2011 at 12:01 PM | Code, SQLAlchemy

Update: This is now part one of a two part series. Be sure to read Value Agnostic Types, Part II for an updated approach using TypeDecorator.

A "hybrid" in SQLAlchemy is this awesome idea thought up by Ants Aasma. The hybrid takes advantage of the ability of Python classes to have an attribute which has customizable behavior both at the class level as well as at the instance level. What's that mean? Just this:

class my_descriptor(object):
    def __get__(self, instance, owner):
        if instance is None:
            return "I'm at the class level!"
        else:
            return "I'm at the instance level!"

class MyClass(object):
    some_attribute = my_descriptor()

Above, my_descriptor is what's known as a Python Descriptor. It's a way to have an attribute on an object where the value produced comes from a function. Above, we've made one that will tell you one thing when invoked on the class, i.e. class level behavior:

>>> MyClass.some_attribute
I'm at the class level!

And another at the instance level, i.e. instance level behavior:

>>> my_object = MyClass()
>>> my_object.some_attribute
I'm at the instance level!

That's all we're dealing with here. As it turns out, descriptors are very commonly used in object relational mappers and other "declarative" kinds of systems, where attributes are defined at the class level and maintain behavior on the class, as well as on instances. Suppose we have a SQLAlchemy class like this:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, Numeric
Base = declarative_base()

class BankAccount(Base):
    __tablename__ = 'bank_account'
    id = Column(Integer, primary_key=True)

    balance = Column(Numeric)
    pending = Column(Numeric)

We've made a new class BankAccount, which represents something like a bank account. It contains fields representing the current account balance, as well as a "pending" balance - say it's checks that haven't cleared yet.

Let's set up a sqlite database as usual and save some accounts:

from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)

from sqlalchemy.orm import Session
session = Session(engine)
session.add_all([
    BankAccount(balance=4000, pending=500),
    BankAccount(balance=10000, pending=0),
])
session.commit()

Descriptors are used here as follows: first, the BankAccount class, as a result of being mapped by SQLAlchemy, has three descriptors on it, named id, balance, and pending. These descriptors have class- and instance-level behavior. When invoked at the class level, they produce SQL expressions:

>>> print BankAccount.balance + BankAccount.pending
balance + pending
>>> print select([BankAccount.id, BankAccount.balance])
SELECT id, balance FROM bank_account

Invoked at the instance level, they provide access to values that are present in __dict__, the same way any other Python object works:

>>> account = BankAccount()
>>> account.balance = 1000
>>> account.pending = 500
>>> account.__dict__
{'balance': 1000, 'pending': 500, '_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x1014dde50>}

(_sa_instance_state is SQLAlchemy's tracking object for this particular BankAccount). SQLAlchemy's descriptors, used at the instance level, also keep track of changes to data, and can also invoke queries against the database to load more data if needed.

Taking Advantage of the Descriptor with a Hybrid

The main thing to understand about the previous section is that the attributes of BankAccount, that is id, balance, and pending, are ultimately powered by Python functions, which work both on an instance as well as on a class. The first argument passed to these functions is either self, that is, an instance of BankAccount, or cls, that is, the BankAccount class itself. Both self and cls have the same attributes present on them, and in many ways can be treated identically; we often don't have to care that we're dealing with self or cls. Hybrids take advantage of this symmetry and allow us to expand upon it. The core idea of the Hybrid is not as much a SQLAlchemy thing as it is a general Python thing - you can use the same idea with most other ORMs that feature usage of descriptors in this way.

To illustrate, let's give our BankAccount class a new data member called total, representing the balance amount plus the pending amount. When we have a BankAccount loaded in memory, this is simple addition of the existing values, and there's no need to go out to the database for this. We'll use the well known Python descriptor helper called @property so that we can say myaccount.total as though it were a regular attribute:

class BankAccount(Base):
    __tablename__ = 'bank_account'
    id = Column(Integer, primary_key=True)

    balance = Column(Numeric)
    pending = Column(Numeric)

    @property
    def total(self):
        return self.balance + self.pending

With the above, we can get the total amount present in an account:

>>> BankAccount(balance=1000, pending=500).total
1500

Simple enough. But the hybrid can add a new twist to the above, which is that we can transparently calculate the area over on the database side as well. The expression we have:

self.balance + self.pending

can be written against the BankAccount class, using a query, like this:

>>> print session.query(BankAccount.balance + BankAccount.pending)
SELECT bank_account.balance + bank_account.pending AS anon_1
FROM bank_account

The hybrid provides a single descriptor that can emit either an expression against the instance, or against the class, where self might be either an instance of BankAccount, or the BankAccount class:

from sqlalchemy.ext.hybrid import hybrid_property

class BankAccount(Base):
    __tablename__ = 'bank_account'
    id = Column(Integer, primary_key=True)

    balance = Column(Numeric)
    pending = Column(Numeric)

    @hybrid_property
    def total(self):
        return self.balance + self.pending

So we can now do this:

>>> print session.query(BankAccount.total).filter(BankAccount.total.between(3000, 5000)).all()
SELECT bank_account.balance + bank_account.pending AS anon_1
FROM bank_account
WHERE bank_account.balance + bank_account.pending BETWEEN ? AND ?
(3000, 5000)
[(Decimal('4500.0000000000'),)]

We can see that the total attribute works at the SQL level by emitting a SQL expression, at the instance level like a regular @property:

>>> myaccount = session.query(BankAccount).first()
>>> print myaccount.total
4500.0000000000

While this is a simple example, hybrids have more options for real world cases, particularly the very common case that the SQL expression needs to be defined separately. That, and everything else hybrids can do, are described fully in the document Hybrid Attributes on the SQLAlchemy site.

Value Agnostic Types

Now to introduce a variant on this technique which is not in the docs. The behavior of the hybrid will be offloaded into a new type that's generally not SQLAlchemy specific. We'll apply the technique towards the problem of representing a monetary amount in a currency-agnostic fashion, where exchange rates can be applied relatively transparently. We'll introduce first how we'll work with amounts and exchange rates, starting with a simple lookup dictionary of rates:

from decimal import Decimal

# US dollars, British pounds, Canadian dollar, Euro, Australian dollar
symbols = ('usd', 'gbp', 'cad', 'eur', 'aud')
currency_lookup = dict(
            ((currency_from, currency_to), Decimal(str(rate)))
            for currency_to, values in zip(
                symbols,
                [
                    (1, 1.59009, 0.988611, 1.37979, 1.02962),
                    (0.628895, 1, 0.621732, 0.867748, 0.647525),
                    (1.01152, 1.6084, 1, 1.39569, 1.04148),
                    (0.724743, 1.1524, 0.716489, 1, 0.746213),
                    (0.971228, 1.54434, 0.960166, 1.34009, 1),
                ])
            for currency_from, rate in zip(symbols, values)
    )

The above creates a dictionary currency_lookup which contains exchange rates between pairs of symbols. If we had the value of 5000 in US dollars (USD), and wanted to convert to Australian dollars (AUD), we'd look up the key ("usd", "aud") in the dictionary, giving us a rate to multiply by 5000:

>>> print currency_lookup[("usd", "aud")] * 5000
4856.140000

Next we'll build an interface for this dictionary, an Amount object that represents the combination of a monetary amount, and a particular currency symbol. The Amount object acts like a numeric value, but intercepts Python math and comparison operations to first coerce the "right" side of an equation to use the same currency as the "left" side:

class Amount(object):
    def __init__(self, amount, currency):
        self.currency = currency
        self.amount = amount

    def __add__(self, other):
        return Amount(
                self.amount +
                other.as_currency(self.currency).amount,
                self.currency
            )

    def __sub__(self, other):
        return Amount(
                self.amount -
                other.as_currency(self.currency).amount,
                self.currency
            )

    def __lt__(self, other):
        return self.amount < other.as_currency(self.currency).amount

    def __gt__(self, other):
        return self.amount > other.as_currency(self.currency).amount

    def __eq__(self, other):
        return self.amount == other.as_currency(self.currency).amount

    def as_currency(self, other_currency):
        return Amount(
                    currency_lookup[(self.currency, other_currency)] * self.amount,
                    other_currency)

    def __str__(self):
        return "%2.4f %s" % (self.amount, self.currency)

Above, while we have only implemented a small set of Python comparison and math operations, the Amount implementation above can handle addition, subtraction, the equals operator, and the less than/greater than operations. All coercion is via a single method as_currency() which just looks up the exchange rate in the currency_lookup table, and converts the "other" side of any equation into a new Amount that is against the source currency:

>>> # Convert 5000 US dollars into Canadian dollars
>>> print Amount(5000, "usd").as_currency("cad")
5057.6000 cad

>>> # Calculate 5000 USD + 1000 USD:
>>> print Amount(5000, "usd") + Amount(1000, "usd")
6000.000 usd

>>> # Calculate 5000 USD + 500 GBP:
>>> print Amount(5000, "usd") + Amount(500, "gbp")
5795.0450 usd

>>> # Take the above and convert to CAD:
>>> print (Amount(5000, "usd") + Amount(500, "gbp")).as_currency("cad")
5861.8039 cad

The thing to keep in mind going forward is that, while we're about to do something more interesting with our Amount class, it will always be used as an instance of an Amount - we're not going to map this class at all. What we're instead going to do is take advantage of the hybrid approach and apply it to the argument we pass to Amount. Suppose we passed a SQLAlchemy expression construct to Amount, instead of a numeric value. What would happen ?

>>> from sqlalchemy.sql import column
>>> balance_col = column('balance')
>>> print Amount(balance_col, "aud").as_currency("eur").amount
:balance_1 * balance

Passing in a sqlalchemy.sql.column object to Amount instead of a numeric value, we got a SQL expression straight out of the Amount object, which itself had no idea it wasn't handling a number. Note we needed to access .amount in order to display it, as the expression would not have made it through our number formatting Amount.__str__() method. But otherwise, we can pass any type of object into Amount which accommodates basic calculations and it will work fine.

Mapping with Value Agnostic Types

To apply the above Amount object to a mapping, we'll again go back to hybrids. The technique we'll show here uses the object model to coerce between Amount and a raw numeric value. Another option is to use custom types via TypeDecorator - this is detailed in the next post, Value Agnostic Types, Part II. A great advantage of using a hybrid with the value agnostic type is that the whole thing is almost totally orthogonal to SQLAlchemy itself, with very few moving parts or library dependencies.

Let's start up a new BankAccount class, this time the BankAccount will specifically be an account that stores funds in US dollars. We'll store just a balance for now, and the public interface for the balance will be via the Amount object:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, Numeric
from sqlalchemy.ext.hybrid import hybrid_property
Base = declarative_base()

class BankAccount(Base):
    __tablename__ = 'bank_account'
    id = Column(Integer, primary_key=True)

    _balance = Column('balance', Numeric)

    @hybrid_property
    def balance(self):
        return Amount(self._balance, "usd")

    @balance.setter
    def balance(self, value):
        self._balance = value.as_currency("usd").amount

    def __repr__(self):
        return "BankAccount(%s)" % self.balance

The balance column is applied to a private attribute _balance, and the public accessor balance uses a hybrid to coerce to/from an Amount object, hardcoding "usd" as the base exchange currency. Given a new BankAccount, we can operate on "balance" at the Python level, using Amount objects as value objects:

>>> account = BankAccount(balance=Amount(4000, "usd"))
>>> print account.balance
4000.0000 usd

>>> print account.balance.as_currency("gbp")
2515.5800 gbp

>>> print account.balance > Amount(500, "cad")
True

>>> print account.balance + Amount(500, "cad") - Amount(50, "eur")
4425.3160 usd

All we're doing here is accessing the .balance attribute on an instance of BankAccount, where the Amount instance returned acts against the underlying value.

But note we've applied @hybrid_property to .balance. That means we want to use it as a SQL expression too. Used in a SQL context, it returns the same Amount object, passing in the ._balance Column object, working as it did earlier. Starting with some test data:

from sqlalchemy import create_engine
from sqlalchemy.orm import Session
from sqlalchemy import func

engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)

session = Session(engine)

session.add_all([
    account,
    BankAccount(balance=Amount(10000, "cad")),
    BankAccount(balance=Amount(5700, "usd")),
    BankAccount(balance=Amount(89682, "aud")),
])
session.commit()

Above, all the balances are coerced from their source currencies to USD, via the @balance.setter portion of our hybrid, which is using value.as_currency("usd").

We can then query BankAccount objects using the .balance column more or less as is, such as in a simple comparison:

>>> print session.query(BankAccount).\
...    filter(BankAccount.balance == Amount(10000, "cad")).one()
SELECT bank_account.balance AS bank_account_balance, bank_account.id AS bank_account_id
FROM bank_account
WHERE bank_account.balance = ?
(9886.11,)
BankAccount(9886.1100 usd)

Notice above the value was converted on the Python side before being converted to a SQL expression. This is due to the coercion occurring upon the right side, which is an Amount object against a numeric in-Python value.

If we told the left side to convert itself, then we'd be getting the currency conversions on the SQL side as well:

>>> print session.query(BankAccount).\
...    filter(BankAccount.balance.as_currency("cad") > Amount(9999, "cad")).\
...    filter(BankAccount.balance.as_currency("cad") < Amount(10001, "cad")).\
...    one()
SELECT bank_account.balance AS bank_account_balance, bank_account.id AS bank_account_id
FROM bank_account
WHERE ? * bank_account.balance > ? AND ? * bank_account.balance < ?
(1.01152, 9999.0, 1.01152, 10001.0)
BankAccount(9886.1100 usd)

So far the above operations take advantage of the fact that the __eq__(), __lt__() and __gt__() methods of Amount are returning SQL expressions. To interpret the Amount object directly, at the moment we need to use the .amount accessor:

>>> # print all amounts as CAD
>>> print session.query(BankAccount.balance.as_currency("cad").amount).all()
SELECT ? * bank_account.balance AS anon_1
FROM bank_account
(1.01152,)
[(Decimal('4046.0800000000'),), (Decimal('9999.9979872000'),), (Decimal('5765.6640000000'),), (Decimal('93402.1189872768'),)]

>>> # return the average balance in EUR
>>> print session.query(func.avg(BankAccount.balance.as_currency("eur").amount)).scalar()
SELECT avg(? * bank_account.balance) AS avg_1
FROM bank_account
(0.724743,)
20279.1228162

The situation can be improved in most cases if we add a __clause_element__() method to our Amount class - the full example script includes this. There's also a few other rough edges to this approach which I may try to address in a future release. But overall, I use this pattern quite a bit as presented in several ways. It in fact stretches all the way back to similar patterns I used when I was primarily a Java programmer, though the Python version is much more flexible. It allows you to forget about certain kinds of numeric conversion details which now take care of themselves, even in SQL expressions, in a simple and transparent way.

Downloads