Introduction to SQLAlchemy - Pycon 2013 - Wrapup

April 04, 2013 at 12:10 PM | Code, Talks, SQLAlchemy

Introduction to SQLAlchemy - Pycon 2013

March 05, 2013 at 12:10 PM | Code, Talks, SQLAlchemy

Preparations are just about complete for my upcoming tutorial Introduction to SQLAlchemy. There's a good crowd of people already attending, and I think registration is still open in case more people want to sign up.

But in any case, if you are coming, this year there is prerequisite material, including the software installs as well as a "Relational Overview" section that covers the basics of SQL and relational databases. Everyone coming to the tutorial should read through this document, so that we're all on roughly the same page regarding upfront SQL knowledge, and try to get the software installed. If there's any issues with the software, please report bugs to me and we'll try to get them resolved by tutorial time. We will also be available at the Wednesday 6:30pm tutorial setup session to help with installs.

Historically, tutorials pack the whole three hours of material up pretty solidly, and I hope I can balance getting lots of coverage versus not talking too fast. Thanks for signing up !

Pycon Canada - the SQLAlchemy Session In Depth

November 14, 2012 at 08:31 AM | Code, Talks, SQLAlchemy

Video is up for my talk, The SQLAlchemy Session - In Depth. In this talk, I delve into the key philosophies behind the design of SQLAlchemy's Session system. Starting with a brief review of the ACID model, I contrast the approach of the so-called "active record" pattern to that of the more explicit Session pattern, and how the two approaches integrate with the ACID model at work within a relational database. Afterwards, I present an HTML animation of a Session object at work.

Supporting a (Very Interesting) New Database

October 25, 2012 at 08:12 PM | Code, SQLAlchemy

Updated 1/23/2014 - Akiban is now rolled into FoundationDB, where it serves as the "FoundationDB SQL Layer". Work has continued on the DBAPI and SQLAlchemy libraries; see the updated links at the bottom.

As SQLAlchemy 0.8 nears its first beta release, possibly within the week, lots of new features and additions became apparent as the development cycle went on. One area that saw more activity than usual was in the area of new dialects. Going forward, SQLAlchemy will be much more capable of supporting externally-installed dialects, thanks to a new testing suite. The motiviation for this new suite started when I was tasked with supporting a new and very interesting database vendor.

A Late Add

I was approached some months ago by a vendor known as Akiban, with a request that was totally new to me. On the initial emails, the idea seemed to be some kind of database that can produce JSON documents from tables, which by itself didn't seem at all very novel; there are already products like HTSQL and SlashDB (friend of SQLAlchemy!) which provide RESTful services around relational databases.

But digging in, it became apparent that this company was taking a much more elaborate path to that goal - this company is producing their own relational database from scratch. The server itself, Akiban server, is written in Java and is designed to act in large part like Postgresql, with some MySQL compatibility added in as well. Within the span of this obviously monumental task, they have built what seems to be exactly one twist, but it is a really interesting twist, which they call a "table grouping".

I was very flattered that Akiban not only wanted me to provide SQLAlchemy support for their database, but to help them come up with their Python DBAPI story overall. And, they wanted not just a SQLAlchemy dialect and Python DBAPI story, they also wanted my thoughts on how modern ORMs can integrate with the unique features of this system.

The reason all three of these areas are up for grabs with Akiban, is that while it acts pretty much like a regular relational database most of the time, it does something strange when the "table grouping" idea is in use. At the DDL level, a "table grouping" is established just like a foreign key:

CREATE TABLE customer (
    name VARCHAR(30) NOT NULL

CREATE TABLE cust_order (
    customer_id INTEGER,
    ordernum VARCHAR(30) NOT NULL,
    timestamp DATETIME,
    GROUPING FOREIGN KEY (customer_id) REFERENCES customer

This single keyword on GROUPING causes Akiban to organize the data in a way I don't think any other database does, which is that it stores the rows for cust_order interleaved within those of customer. Meaning, it stores the data hierarchically, on disk. This special hierarchy is then made available through SQL. This is the second part that is really surprising, which is in order to support this, there were no changes at all needed to the SQL syntax. The only change made to SQL was a semantic one.

Normally, SQL allows us to query for a correlated SELECT in the columns or WHERE clause of a SELECT statement:

SELECT name, (
                SELECT ordernum
                FROM cust_order
                AND ordernum like 'order1%'
        ) AS ordernum
FROM customer

The correlated SELECT must return exactly one column, and one row, or your database will complain. Here, Akiban merely took away the "complain" part, and when specifying a correlated subquery with multiple columns and potentially multiple rows, you instead get back a result that is hierarchical, assuming the tables you're embedding are related to the parent via a "grouping" - the "nested" rows are fetched in groups along with each corresponding parent row, and delivered inline. We would get such a result from Akiban given a statement like that below:

SELECT name, (
                SELECT ordernum, timestamp
                FROM cust_order
        ) AS orders
FROM customer

But what does a "hierarchical" result look like? Well, for Akiban, since they wanted this to all work over the Postgresql protocol, for now they actually send you back a JSON string per parent row, as one column. Such as this:

{"name":"Some Customer", "orders":[{"ordernum":"some order", "timestamp":"20121005121700"}, {"ordernum":"some other order", "timestamp":"20121012184507"}]}

With that, Akiban asked me what I can do with this. Particularly, how can an ORM like SQLAlchemy take advantage of it?

well it's just nested

The first thing I wanted to do with this data was to make that JSON look like a regular SQL result again. Having worked with "eager loading" for so many years, I already tend to see SQL products as "hierarchies" - a regular JOIN of customer and order would have rows like:        ordernum            timestamp
-------------        ----------------    --------------
Some Customer        some order          20121005121700
Some Customer        some other order    20121012184507
Some Other Customer  ...                 ...

The result set from a correlated subquery on a "grouped" foreign key, in my mind, looks pretty similar:        orders
-------------        ----------------------------------
Some Customer        ordernum            timestamp
                     ----------------    --------------
                     some order          20121005121700
                     some other order    20121012184507
Some Other Customer  ...                 ...

When we normally have a subquery in the "columns" clause of a SELECT, the result of that SELECT comes back as a single result set column. My idea was that just as they don't have to change the syntax of SQL to express a "grouping" here, we didn't have to change the idea on the result side either - we just have a nested cursor, where a single result set column orders returns a new cursor as its value.

I initially studied Postgresql's protocol quite a bit, mostly by reading the source to the pure-Python pg8000 DBAPI for Postgresql, and proposed to Akiban that they just add some extra messages to the Postgresql protocol to support this concept without the need for JSON being involved. I could extend or fork pg8000 into a new Akiban-specific DBAPI. But for the time being, what I ended up doing was to stick with the DBAPI that already works with Akiban, psycopg2, and extend it on the outside of the JSON to coerce the result back into a cursor shape. With psycopg2's very extensible design, I was able to make use of its Connection extension system to intercept JSON-formatted results back into cursor-like results, as well as to reuse its existing typing system in order to apply the same string interception it uses on raw Postgresql messages to the JSON-encoded fields as well. The result of this stage of the game is Akiban for Python, an extension to psycopg2 that introduces a new result-row datatype, the nested cursor, which is transparently returned when you emit an Akiban "nested" SQL statement - the presense of JSON is entirely concealed.

turtles all the way down (or up, in this case)

With a DBAPI providing nested cursors, the next task was an Akiban dialect for SQLAlchemy which can wrap these nested cursors into SQLAlchemy's cursor-wrapping ResultProxy object. Over the years, I've had to make ResultProxy objects deal with so many curveballs, from cx_Oracle's streaming LOB objects and OUT parameters, to the buffering required with psycopg2's "server side cursors", to pysqlite's quirk with column name descriptions being prepended with the table name when the SQL statement is a UNION of two SELECT statements, that this task was relatively easy to get going. Slightly more tricky was to get the statement compiler to keep track of "nested" select() constructs and to relate them to the nested cursors ultimately produced. The end result is that we can build an Akiban nested query and get its results:

from sqlalchemy_akiban import nested

stmt = nested([order.c.ordernum, order.c.timestamp]).\
            where(order.c.customer_id ==
stmt = select([, stmt.label('orders')])

for customer_row in connection.execute(stmt):
    print "customer name:", customer_row['name']
    for order_row in customer_row['orders']:
        print "ordernum:", order_row['ordernum']
        print "timestamp:", order_row['timestamp']

And with the above, all of the usual SQLAlchemy Core capabilities like result-set typing and column targeting work out the same. At the ORM level, I added similar nesting features into Query (meaning, you can get back nested tuples) and also produced a new "loader strategy" specific for "Akiban" style loading on relationships, so that a "nested" cursor can provide for the simplest eager loading implementation ever. The familiar syntax we use with joinedload() and subqueryload() gets the addition of nestedload(), which we drop in in the same way:

from sqlalchemy_akiban.orm import nestedload_all

result = session.query(Customer).options(
                    orm.nestedload_all(Customer.orders, Order.items)).\
                        filter( == 1)

Above, the Customer.orders and Order.items collections are populated directly from nested results, using a single statement that optimizes as fast as a straight select of just the customer table and nothing else:

SELECT AS customer_id, AS customer_name,
            (SELECT, item.order_id, item.price, item.quantity
            FROM item
            WHERE "order".id = item.order_id
        ) AS anon_2,
        "order".id, "order".customer_id, "order".order_info
        FROM "order"
        WHERE = "order".customer_id
) AS anon_1
FROM customer
WHERE = %(id_1)s

All of the above was possible with pretty much no changes to SQLAlchemy itself, save for one new dialect hook which allows the cursor context to be available to type objects - since we introduced an extension type called NestedResult which needs more information than most types. The ORM, compiler, etc. needed no changes. The result of this stage is available at SQLAlchemy Akiban.

Pycon 2012 : Hand Coded Applications with SQLAlchemy

March 12, 2012 at 12:01 PM | Talks, SQLAlchemy, Code

Here's the slides from my Pycon 2012 talk, "Hand Coded Applications with SQLAlchemy". I had a great time with this talk and thanks all for coming !

Update: Here's the video!