OK.....*more* speed enhancements
Not satisfied with my last post, Robert Brewer decided to kick me around some more this weekend, posting all kinds of colorful graphs comparing SQLAlchemy's expression compilation and execution speeds to Geniusql and Storm. Not surprisingly, SA was significantly slower than Geniusql and Storm in most tests except one, despite 0.4 being all around way faster than any previous version.
While I haven't looked at Geniusql's source code, I've looked at Storm's; its a far smaller package than SQLAlchemy, and does things in a very simple way. That's great, and is also a result of it being an in-house product designed to do just what they need and only that (plus, Gustavo's code is really great...too bad he couldn't stick around). Both products were clearly written with speed in mind from day one, and its clear that Robert is very proud of every execution-related optimization he's made (although the "we don't use bind params" thing I think he's going to need to look at eventually).
On the other hand, I'm more proud of the five-levels deep SQL expressions SQLAlchemy's ORM can come up with for you, if that's what you're asking it for. Two years ago, SA was the first to bring a really comprehensive version of this idea to the Python world (with Ian's SQLBuilder being the initial inspiration), and inventing all the patterns there was my primary focus. Now we have other projects doing the same thing, and honoring "the relational model" instead of just "tables" is a given. Well, if back then our only big idea was just, "we execute so fast!", it probably wouldn't have made much of a splash. So I think I made the right choice in initial priorities.
Anyway, since Robert keeps poking at me, I went in and cut out some more cruft today. And just like that, just poking around for slow spots, I cut out 37% more method calls out of the "Insert" speed test in his suite; from 8835 to 5420 in step_1a and from 4825 to 3055 in step_2 running against Postgres. As well as 43% method call reduction in step_6 ("Editing"), from 23473 to 13375. So he can drop those dark blue lines a little closer to the other two. As I've said before, there's just a lot of functionality here that takes a lot of time and energy to keep in clean shape. And I haven't even gotten into adding options to "inline" the execution of sequences and defaults; that's coming up.
As far as the overhead of checking out from the connection pool, Robert showed a damning illustration; his pool checks out in just two function calls where SA is using something like 40. Well it seemed like what he really meant was, "thats two function calls, for a connection which is already pulled from the queue and is stored in a dictionary against the current thread". Well, thats just plain misleading. How many function calls does Geniusql use when it does pull from the queue? Considering we have had an option on our pool since version 0.1 that does exactly the same thing, I'm declaring prior art on this idea. So, I've also reinstated the "thread-localness" of the "contextual" connection which is used for connectionless execution, and cleaned out some more cruft. Now the SA connection pool uses 21 function calls on a fresh checkout, or just two function calls, just like Geniusql, on a thread-local checkout. Why wasn't it like that already ? We never profiled, nobody ever complained. SA tries to do more, has been adding user-requested features for two years, and is little more heavyweight than the latest contenders - but nevertheless, it works just fine for people. We just don't have anyone complaining about this stuff (please...complain ! It makes us better). When someone does complain about performance in some area, we fix it (or at least improve as much as we can, and then continue angsting about it for months).
In all honesty, while we aren't a featherlight speed-demon, we perform just fine, and from all accounts remarkably faster than a lot of other ORM tools. We're not the Mako of ORMs, we're the Genshi.
But now that the dogs are on our tail, and they've got enormous line graphs sprawled across planetpython, seems like we're going to have to just grind code through the profiler just that much more often.
Edit Again: Oh wow, Robert's latest stack reveals he was testing a never-released development version...at least rev 3248, if not earlier, based on the method names. So heres another breakdown of individual tests (a unified diff of rev 3248 vs. rev 3388). Average method reduction is 32%:
==> sa_step_1_create_tables.log <==
- 3128 function calls (3001 primitive calls) in 0.132 CPU seconds
+ 2086 function calls (2021 primitive calls) in 0.042 CPU seconds
30%
==> sa_step_1a_populate.log <==
- 8835 function calls (8772 primitive calls) in 0.120 CPU seconds
+ 5420 function calls (5414 primitive calls) in 0.055 CPU seconds
38%
==> sa_step_2_insert.log <==
- 4825 function calls (4795 primitive calls) in 0.044 CPU seconds
+ 3055 function calls in 0.030 CPU seconds
36%
==> sa_step_3_Properties.log <==
- 48840 function calls (46440 primitive calls) in 0.577 CPU seconds
+ 33443 function calls (31203 primitive calls) in 0.284 CPU seconds
31%
==> sa_step_4_Expressions.log <==
- 207251 function calls (198931 primitive calls) in 2.458 CPU seconds
+ 148381 function calls (140781 primitive calls) in 1.039 CPU seconds
28%
==> sa_step_5_Aggregates.log <==
- 18991 function calls (18271 primitive calls) in 0.180 CPU seconds
+ 13151 function calls (12561 primitive calls) in 0.123 CPU seconds
30%
==> sa_step_6_Editing.log <==
- 23473 function calls (22513 primitive calls) in 0.497 CPU seconds
+ 13375 function calls (12535 primitive calls) in 0.117 CPU seconds
43%
==> sa_step_7_Multiview.log <==
- 44213 function calls (42403 primitive calls) in 0.464 CPU seconds
+ 32001 function calls (30621 primitive calls) in 0.238 CPU seconds
27%