Archive for March 2006

Mapping Heterogeneous Types with SQLAlchemy

This example illustrates how to map a class to a database table, with a child attribute that stores a list of objects of varying types. To accomplish this, we'll use a setup that features a Mapper created against a SELECT statement, as well as a custom MapperExtension that overrides the mapper's method of instantiating new objects.

First lets start with the database:

from sqlalchemy import *
db = create_engine('sqlite://', echo=True)

companies = Table('companies', db,
   Column('company_id', Integer, primary_key=True),
   Column('name', String(50))).create()

people = Table('people', db,
   Column('person_id', Integer, primary_key=True),
   Column('company_id', Integer, ForeignKey('companies.company_id')),
   Column('name', String(50))).create()

engineers = Table('engineers', db,
   Column('person_id', Integer, ForeignKey('people.person_id'), primary_key=True),
   Column('description', String(50))).create()

managers = Table('managers', db,
   Column('person_id', Integer, ForeignKey('people.person_id'), primary_key=True),
   Column('description', String(50))).create()

We start with an engine configured to talk to an SQLite in-memory database. Next we define four tables, which will store information about companies and their employees. The companies table will store one row per company and describe the company's name. The people table will then store one row per employee, with a foreign key referencing the companies table to indicate the company this employee belongs to. Next, the engineers and managers tables store one row in either table for each row in the people table, and store additional information about the employee specific to his or her general job category. Such a relationship is known as an inheritance relationship between tables.

The tables are also created in the same line they are defined (there will soon be a new feature that enables the creation of all the tables in one step). Now, lets define the classes that will represent our domain model, that is a Company that has People working in it, each of whom can be either an Engineer or a Manager:

# create our classes.  The Engineer and Manager classes extend from Person.
class Person(object):
    def __repr__(self):
        return "Ordinary person %s" % self.name
class Engineer(Person):
    def __repr__(self):
        return "Engineer %s, description %s" % (self.name, self.description)
class Manager(Person):
    def __repr__(self):
        return "Manager %s, description %s" % (self.name, self.description)
class Company(object):
    def __repr__(self):
        return "Company %s" % self.name

We create a Company object which will reference a list of Person objects, and a Person class which has two subclasses Engineer and Manager. These classes have nothing defined on them except for a repr method that will give us some concise information about an instantiated object. Note that the repr methods reference attribute names that are otherwise nonexistent. These attributes will be created, corresponding to table columns as well as defined relationships, when we associate each class with a mapper, and begin creating new instances of the class.

So the mappers join together the domain model and its persistence scheme:

assign_mapper(Person, people)
assign_mapper(Engineer, engineers, inherits=Person.mapper)
assign_mapper(Manager, managers, inherits=Person.mapper)

The above three mappers define persistence for the Engineer and Manager classes, as well as the base class Person. The inheritance relationship between the classes and the inheritance relationship between the database tables is expressed within the mappers using the inherits keyword argument. As these mappers are the first mappers created for the classes, SQLAlchemy marks them as the primary mappers for the classes. This means that, upon a save, these mappers will handle the job of saving and updating new and existing Person subclasses. Once the primary mappers are created, you are free to create as many other mappers as you like for the purposes of loading Person, Engineer, and Manager objects in any number of exotic ways, without any concern for their structure with regards to saving objects, since a save operation is handled strictly by these primary mappers.

At this point, we could begin creating Engineer and Manager objects and have the ability to save and load them to/from the database. But the special mapper in this example is going to be that of the Company class, which aims to have a single property, employees, which will represent a list of both Engineer and Manager objects. SQLAlchemy knows how to associate one class to another via the relation keyword. But here, we want a single relation to correspond to two kinds of classes. To accomplish this, we will create an extension for the mapper that applies to the Company class.

At the start of any mapper configuration is establishing the method by which the data will come from the database, independent of any Python code. Usually, this is straightforward enough that SQLAlchemy can handle the job of creating the queries involved, but here, we need to think like a DBA. We would like to query the people table as well as the engineers and managers tables simultaneously. Furthermore, we would like to do it in such a way that if more "inheriting" tables were added, such as accountants, analysts, clerks etc., the query has a decent chance of scaling to fit those as well.

While it may seem to some like all those tables should be part of a big JOIN, further consideration leads to the notion that a UNION, specifically a UNION ALL, combining together the results of several queries that each join the people table to a different inheriting table, is the best type of query to use. This is because a single large combination of JOINS forms an ever-expanding cartesian product and causes an exponential increase in the number of rows that must be considered in a query, whereas a UNION can aggregate the results of many smaller queries more efficiently since each query is small. UNION ALL specifically removes the need for the database to filter intersecting rows for duplicates, as we are sure that the rows from each query do not overlap.

One extra thing we will need from our query is the ability to distinguish which table each row came from. To achieve this, we will add a "fixed" column to each sub-select which will populate each row with an indication of what type of row this is. So the query would then look something like:

SELECT people.person_id, people.company_id, people.name,
engineers.person_id, engineers.description, 'engineer' AS type from people JOIN engineers
ON people.person_id=engineers.person_id
UNION ALL
SELECT people.person_id, people.company_id, people.name,
managers.person_id, managers.description, 'manager' AS type from people JOIN managers
ON people.person_id=managers.person_id

SQLAlchemy wants you to think like a DBA. It cares very little, if at all, what kind of query you use to create your rows, it only cares about the rows themselves. The above query will give us rows that look something like:

personid  companyid   name               personid    description    type
--------  ---------   -------            ----------  -------------  -----
    1         1       dilbert                1       its dilbert    engineer
    2         1       pointy haired boss     2       manager guy    manager
    3         1       wally                  3       hey wally!     engineer

So the above query is more or less exactly what we will use to get the Company's mapper to load a list of Person objects. Lets construct a SQLAlchemy query like the one above:

person_join = select(
                [people, managers.c.description,column("'manager'").label('type')],
                people.c.person_id==managers.c.person_id).union_all(
            select(
            [people, engineers.c.description, column("'engineer'").label('type')],
            people.c.person_id==engineers.c.person_id)).alias('pjoin')

Not exactly the same, in that we took the extra person_id column out of the joined table, and we also named the entire thing under an alias pjoin. Since we will be creating a Mapper to the above query, SQLAlchemy's mappers like the incoming Selectable object to have a name.

The next special task will be constructing a MapperExtension that will be injected into the Company's related employees mapper, in order to construct the proper type of object based on the information present in each incoming row:

class PersonLoader(MapperExtension):
    def create_instance(self, mapper, row, imap, class_):
        if row['pjoin_type'] =='engineer':
            return Engineer()
        elif row['pjoin_type'] =='manager':
            return Manager()
        else:
            return Person()

personLoaderExtension = PersonLoader()

Above, all we do is look at the special hardcoded "type" row, and use it to send the correct object type. We include a fallback case that delivers a plain Person object, in case the query were ever changed to support new inheritance tables ahead of the MapperExtension, or if the database somehow had rows in the people table that did not correspond to any rows in the managers or engineers tables.

At this point, everything is ready to go, creating the Company mapper looks almost like a regular mapper except we reference our special UNION-enabled mapper for its employees relation:

assign_mapper(Company, companies, properties={
    'employees': relation(mapper(Person, person_join, extension=personLoaderExtension), private=True)
})

The above code creates a mapper for the Company class, and a single relation called employees which maps to the Person class using our new UNION ALL query, utilizing the new MapperExtension to override the default process of instantiating new objects. It also specifies the relation with private=True, indicating that when a Company object is deleted, its corresponding employees should all be deleted as well (the corporate world can be cruel...)

Time to run it. Lets create some data:

c = Company(name='company1')
c.employees.append(Manager(name='pointy haired boss', description='manager1'))
c.employees.append(Engineer(name='dilbert', description='engineer1'))
c.employees.append(Engineer(name='wally', description='engineer2'))
c.employees.append(Manager(name='jsmith', description='manager2'))
objectstore.commit()

Running this produces the SQL:

INSERT INTO companies (name) VALUES (?)
['company1']
INSERT INTO people (company_id, name) VALUES (?, ?)
[1, 'pointy haired boss']
INSERT INTO people (company_id, name) VALUES (?, ?)
[1, 'jsmith']
INSERT INTO managers (person_id, description) VALUES (?, ?)
[1, 'manager1']
INSERT INTO managers (person_id, description) VALUES (?, ?)
[2, 'manager2']
INSERT INTO people (company_id, name) VALUES (?, ?)
[1, 'dilbert']
INSERT INTO people (company_id, name) VALUES (?, ?)
[1, 'wally']
INSERT INTO engineers (person_id, description) VALUES (?, ?)
[3, 'engineer1']
INSERT INTO engineers (person_id, description) VALUES (?, ?)
[4, 'engineer2']

SQLAlchemy did its job of inserting rows in the proper order, including the companies table first, and the correct rows in the people table before dependent rows in the managers and engineers tables. Next, lets clear out the objectstore so that we can test loading data clean from the database, and get some data:

objectstore.clear()

c = Company.get(1)
for e in c.employees:
    print e, e._instance_key

The _instance_key is a special identifier SQLAlchemy adds to mapped objects in order to track their database identity. Selecting the single saved Company object and later its employees attribute produces the SQL output:

SELECT companies.company_id AS companies_company_id, companies.name AS companies_name
FROM companies
WHERE companies.company_id = ?
[1]
SELECT pjoin.person_id AS pjoin_person_id, pjoin.description AS pjoin_description, pjoin.type AS pjoin_type, pjoin.company_id AS pjoin_company_id, pjoin.name AS pjoin_name
FROM (SELECT people.person_id AS person_id, people.company_id AS company_id, people.name AS name, managers.description AS description, 'manager' AS type
FROM people, managers
WHERE people.person_id = managers.person_id UNION ALL SELECT people.person_id AS person_id, people.company_id AS company_id, people.name AS name, engineers.description AS description, 'engineer' AS type
FROM people, engineers
WHERE people.person_id = engineers.person_id) AS pjoin
WHERE pjoin.company_id = ?
[1]

And then some program output:

Manager pointy haired boss, description manager1 (, (1, 1))
Manager jsmith, description manager2 (, (2, 2))
Engineer dilbert, description engineer1 (, (3, 3))
Engineer wally, description engineer2 (, (4, 4))

So we got our correct types back ! Lets modify "dilbert" and save that change:

dilbert = Engineer.mapper.get_by(name='dilbert')
dilbert.description = 'hes dibert!'
objectstore.commit()

SQL:

SELECT engineers.person_id AS engineers_person_id, people.person_id AS people_person_id, engineers.description AS engineers_description, people.company_id AS people_company_id, people.name AS people_name
FROM people JOIN engineers ON people.person_id = engineers.person_id
WHERE people.name = ?
 LIMIT 1 OFFSET 0
['dilbert']
UPDATE engineers SET description=? WHERE engineers.person_id = ?
[['hes dibert!', 3]]

You can see that even though we modified an Engineer object that is represented across two tables, SQLAlchemy was able to figure out that only the description column in the engineers table actually needed an update.

Lets finish up by making sure that new data comes back, and a DELETE to clean up at the end:

objectstore.clear()
c = Company.get(1)
for e in c.employees:
    print e, e._instance_key

objectstore.delete(c)
objectstore.commit()

Program output, including echoed SQL:

SELECT companies.company_id AS companies_company_id, companies.name AS companies_name
FROM companies
WHERE companies.company_id = ?
[1]
SELECT pjoin.person_id AS pjoin_person_id, pjoin.description AS pjoin_description, pjoin.type AS pjoin_type, pjoin.company_id AS pjoin_company_id, pjoin.name AS pjoin_name
FROM (SELECT people.person_id AS person_id, people.company_id AS company_id, people.name AS name, managers.description AS description, 'manager' AS type
FROM people, managers
WHERE people.person_id = managers.person_id UNION ALL SELECT people.person_id AS person_id, people.company_id AS company_id, people.name AS name, engineers.description AS description, 'engineer' AS type
FROM people, engineers
WHERE people.person_id = engineers.person_id) AS pjoin
WHERE pjoin.company_id = ?
[1]

Manager pointy haired boss, description manager1 (, (1, 1))
Manager jsmith, description manager2 (, (2, 2))
Engineer dilbert, description hes dibert! (, (3, 3))
Engineer wally, description engineer2 (, (4, 4))

DELETE FROM managers WHERE managers.person_id = ?
[[1], [2]]
DELETE FROM people WHERE people.person_id = ?
[[1], [2]]
DELETE FROM engineers WHERE engineers.person_id = ?
[[3], [4]]
DELETE FROM people WHERE people.person_id = ?
[[3], [4]]
DELETE FROM companies WHERE companies.company_id = ?
[[1]]

SQLAlchemy also deleted rows in the reverse order in which they were inserted.

A working version of this example is included with the latest version of SQLAlchemy as of version 0.1.4, in the file examples/polymorph/polymorph.py.

Update: The example also works with Eager Loading, as of SVN revision 1175. Just add lazy=False to the employees relation.

Update 4/3/2006: The polymorphic example above has been refined into one that can load more diverse properties of objects, utilizing some API enhancements on Mapper that were added for this reason. The latest example is in "examples/polymorph/polymorph2.py" and is available with version 0.1.5 of SQLAlchemy or the latest trunk.