Archive for August 2007

OK.....*more* speed enhancements

Not satisfied with my last post, Robert Brewer decided to kick me around some more this weekend, posting all kinds of colorful graphs comparing SQLAlchemy's expression compilation and execution speeds to Geniusql and Storm. Not surprisingly, SA was significantly slower than Geniusql and Storm in most tests except one, despite 0.4 being all around way faster than any previous version.

While I haven't looked at Geniusql's source code, I've looked at Storm's; its a far smaller package than SQLAlchemy, and does things in a very simple way. That's great, and is also a result of it being an in-house product designed to do just what they need and only that (plus, Gustavo's code is really great...too bad he couldn't stick around). Both products were clearly written with speed in mind from day one, and its clear that Robert is very proud of every execution-related optimization he's made (although the "we don't use bind params" thing I think he's going to need to look at eventually).

On the other hand, I'm more proud of the five-levels deep SQL expressions SQLAlchemy's ORM can come up with for you, if that's what you're asking it for. Two years ago, SA was the first to bring a really comprehensive version of this idea to the Python world (with Ian's SQLBuilder being the initial inspiration), and inventing all the patterns there was my primary focus. Now we have other projects doing the same thing, and honoring "the relational model" instead of just "tables" is a given. Well, if back then our only big idea was just, "we execute so fast!", it probably wouldn't have made much of a splash. So I think I made the right choice in initial priorities.

Anyway, since Robert keeps poking at me, I went in and cut out some more cruft today. And just like that, just poking around for slow spots, I cut out 37% more method calls out of the "Insert" speed test in his suite; from 8835 to 5420 in step_1a and from 4825 to 3055 in step_2 running against Postgres. As well as 43% method call reduction in step_6 ("Editing"), from 23473 to 13375. So he can drop those dark blue lines a little closer to the other two. As I've said before, there's just a lot of functionality here that takes a lot of time and energy to keep in clean shape. And I haven't even gotten into adding options to "inline" the execution of sequences and defaults; that's coming up.

As far as the overhead of checking out from the connection pool, Robert showed a damning illustration; his pool checks out in just two function calls where SA is using something like 40. Well it seemed like what he really meant was, "thats two function calls, for a connection which is already pulled from the queue and is stored in a dictionary against the current thread". Well, thats just plain misleading. How many function calls does Geniusql use when it does pull from the queue? Considering we have had an option on our pool since version 0.1 that does exactly the same thing, I'm declaring prior art on this idea. So, I've also reinstated the "thread-localness" of the "contextual" connection which is used for connectionless execution, and cleaned out some more cruft. Now the SA connection pool uses 21 function calls on a fresh checkout, or just two function calls, just like Geniusql, on a thread-local checkout. Why wasn't it like that already ? We never profiled, nobody ever complained. SA tries to do more, has been adding user-requested features for two years, and is little more heavyweight than the latest contenders - but nevertheless, it works just fine for people. We just don't have anyone complaining about this stuff (please...complain ! It makes us better). When someone does complain about performance in some area, we fix it (or at least improve as much as we can, and then continue angsting about it for months).

In all honesty, while we aren't a featherlight speed-demon, we perform just fine, and from all accounts remarkably faster than a lot of other ORM tools. We're not the Mako of ORMs, we're the Genshi.

But now that the dogs are on our tail, and they've got enormous line graphs sprawled across planetpython, seems like we're going to have to just grind code through the profiler just that much more often.

Edit Again: Oh wow, Robert's latest stack reveals he was testing a never-released development version...at least rev 3248, if not earlier, based on the method names. So heres another breakdown of individual tests (a unified diff of rev 3248 vs. rev 3388). Average method reduction is 32%:

==> sa_step_1_create_tables.log <==
-         3128 function calls (3001 primitive calls) in 0.132 CPU seconds
+         2086 function calls (2021 primitive calls) in 0.042 CPU seconds

30%

 ==> sa_step_1a_populate.log <==
-         8835 function calls (8772 primitive calls) in 0.120 CPU seconds
+         5420 function calls (5414 primitive calls) in 0.055 CPU seconds

38%

 ==> sa_step_2_insert.log <==
-         4825 function calls (4795 primitive calls) in 0.044 CPU seconds
+         3055 function calls in 0.030 CPU seconds

36%

 ==> sa_step_3_Properties.log <==
-         48840 function calls (46440 primitive calls) in 0.577 CPU seconds
+         33443 function calls (31203 primitive calls) in 0.284 CPU seconds

31%

 ==> sa_step_4_Expressions.log <==
-         207251 function calls (198931 primitive calls) in 2.458 CPU seconds
+         148381 function calls (140781 primitive calls) in 1.039 CPU seconds

28%

 ==> sa_step_5_Aggregates.log <==
-         18991 function calls (18271 primitive calls) in 0.180 CPU seconds
+         13151 function calls (12561 primitive calls) in 0.123 CPU seconds

30%

 ==> sa_step_6_Editing.log <==
-         23473 function calls (22513 primitive calls) in 0.497 CPU seconds
+         13375 function calls (12535 primitive calls) in 0.117 CPU seconds

43%

 ==> sa_step_7_Multiview.log <==
-         44213 function calls (42403 primitive calls) in 0.464 CPU seconds
+         32001 function calls (30621 primitive calls) in 0.238 CPU seconds

27%

Major Speed Enhancements in SQLAlchemy 0.4

A few weeks ago, Flávio Codeço Coelho posted a comparison of Python database interfaces, with a test that focused primarily on mass insert speed. While he had good things to say about our performance, I didn't find the results to be very impressive, and not surprisingly - mass insert speed is something we've never really focused on. The idea being, if you have to insert 100,000 records, you'd probably be using a bulk insert tool or raw SQL scripts anyway.

SQLAlchemy's development has historically been focused on the richness of its SQL expression language and the completeness of its object relational mapper. While it was important that applications using these features don't use up all the system's memory and grind to a halt, at the same time we've never had a "speed-crazy" mindset.

Contrast this to my other project, Mako, where speed was the singlemost important priority; every line of code written was checked against a benchmark suite to measure it's impact. The language itself is pretty barebones, and even a little inconvenient in some ways. That effort provided a pretty fast product which it's rumored will form the basis of the newly designed reddit.com.

But while the "speed at all costs" mindset is very relevant for a template language, it somehow doesn't have the same urgency in the land of SQL, where if people truly want blazing speed, they use raw SQL. The point of a SQL toolkit and/or an ORM is to enhance the developer experience, to bring richness and ease to common (and not so common) use cases. Such as, being able to issue SQL statements with Chinese-language identifiers with all the DBAPI-divergent trickery hidden behind the scenes, or more commonly just to have smooth Unicode support, numeric and date consistency, richly featured default generators, support for database-specific features like Oracle OUT parameters and Psycopg2's server-side cursors, a healthy dose of hooks and options for ORM behavior, and everything else. The sheer volume of behaviors we are shooting to deliver place a burden on the architecture, which constantly has to refactor and adapt in order to accomodate (and SQLAlchemy is nothing if not accommodating). Not to mention it heavily weighs upon the ability for a small group of unpaid volunteers to get around to reviewing, streamlining, and optimizing all of it.

Nevertheless, the 0.4 release is just plain ginormous (and is currently available as a beta), and is changing, streamlining, refactoring just about everything. We have more users, more developers, more trac tickets, so a lot of what was always just "potential" is finally becoming a reality. We even have Robert Brewer of Dejavu (and of course CherryPy) writing benchmark tests for us (although, he might just be trying to get Geniusql to kick our butt...in the most friendly way of course). There's a lot of energy right now so I try to get as much production out of that wave as possible while it lasts.

So the particularly egregious slothiness SQLAlchemy exhibited in Flávio's tests (and really, he picked probably the worst thing he could within SA) has been overhauled. An adaptation of his test, which cranks it up by not only inserting but also fully selecting the results, inserts and selects 50,000 rows from an in-memory SQLite database. When run against 0.3, Hotspot shows this extremely large number of function calls:

2850532 function calls (2850529 primitive calls) in 19.501 CPU seconds

That's really bad. In 0.4 trunk, we have it down to this:

900525 function calls (900513 primitive calls) in 8.128 CPU seconds

That's 68% fewer function calls to insert and iterate 50,000 rows of three columns each. It's not that the new system is all that blazingly fast; a raw insert of 100,000 rows with SQLite takes 4 seconds on my machine, while with SA takes 14 seconds. Still, that's a whole lot better than 0.3 which takes 32 seconds. Speedups like this are not due to any brilliant re-architecting, it's just having enough people helping out such that we have the resources to address these old, crufty functionalities that got written at 2:30 AM eight months ago.

For everything that gets said about SQLAlchemy, we are really just barely out of the "idea" phase, and it was meant to take a long time for this thing to be really great; especially at "homegrown" pace. Our version number being only 40% of 1.0 is quite intentional.