Happy Thanksgiving

November 24, 2010 at 08:34 PM | Code, SQLAlchemy

Motivations:

  1. Went through all that trouble to rebuild my blog, still nothing on my mind. Let's make something up !
  2. Holiday cheer ! Since I am nothing if not cheery.

The exercise below uses a database to make a turkey. It does so via an entirely convoluted and pointless series of steps, persisting and restoring a needlessly esoteric set of data with a SQLite database through a series of obnoxious arithmetical translations, however delicious the final answer may be.

It does, at least, illustrate some query/domain model techniques I've been using on the job as of late. The @classmethod constructs you see below are a slimmed down version of SQLAlchemy's "hybrid attrbutes" example, attribute helpers that are going to play a much more prominent role in 0.7, where we begin to de-emphasize synonym and comparable_property in favor of the hybrid, a simpler and more versatile component.

In their full form, these components allow domain classes to define transformational expressions that work both as SQL expressions as well as instance attributes. Below, a more succinct version of them is used to associate composed SQL elements with a domain model. Ramped up to scale in a real application, I'm able to generate large paginated displays, containing a wide range of values mathemetically derived from others, all evaluated in SQL and pulled straight from a single Query object composed using this technique. In other contexts, when I load individual objects from the database, those same derivation methods are available to me at the instance level, pulling their components from the in-memory state of the object and interpreted by Python, instead of by the relational engine.

Surmounting the barrier of absurdity inherent in this exercise did yield a small bounty - writing this example uncovered some surprisingly glaring expression language bugs I have to fix (see all those self_group() calls ? Those shouldn't have to be there...ouch).

Hope you enjoy it:

packed_turkey = \
    '789c2d954b121d350c45e7bd12129cc292fc4d522936c18c82193b604465ef9cd'\
    '3c9c0cf7eb66c5d5d5da97f8fbbeaf92f67f59655a3e5aef1fc737a3d1f224f0b'\
    'f6479b6d8cdb7271bc66bb31db1ad9ee1e8d65ac16bb05b6b7c560ea839dd89cc'\
    'c6ca3df36a2b77d3783d339a2ade3cbdcadd576659bc73792516d7b7b3b3676d9'\
    '3281509dc11ceb0eb0700284d5d66e031fc969f6c42200e573b719cb8976f8054'\
    '84436362fe72c165ef01c9793ee58c44e8c9c8d5b0c7c70561b87f88a7d5f7280'\
    '36f143e431011dc9d50444dde7978c013917ec153c9f9b832452e8e5a717ec754'\
    '19d36f56aa0ebc7e2e56a7b1707975b93abfbdee7d70941333ba82f335948ff6b'\
    'd192d06092d90103109887572476810d7e92c7a2f2c5407050c5336413a070b40'\
    'f4e00631c84961054c3b908b8782a481a6100182d7c3cd03bcd741a6719d33577'\
    '6c5ed78dc82fe9004b60c633cca70db01637a05a80984be59598b1fbf349ae025'\
    '905c90caeafebbd002ab16019a41f7750898858c5a807c38edbf75160ae838a38'\
    'e27692b0816986ec1073cce737f83c686020a6970a00aa40c2ac05ec3e4ca98c9'\
    '3ec7d84a2e3fe728161caf29ccf9f5b2590f64b8812cb400e5b46bbd7e18767e6'\
    '3095f82811e31333c20e44380f1eae95b25e9f2ad310b1bc3243e91c221be7150'\
    '37a57a66c6e4b0fe1a422721fc9148a295298aa7fd6f3174f3732a828246733b8'\
    'd94e333b4d3ec404e4a680140a4e9a9515adac1cd2d4ac01cbc18a0ca4d6a6b01'\
    'a4c923db89eaab781114dc323f88804cee7f14cf88d150bb95c78219bd4123a47'\
    '2b446ef977a084778544c2811b2009ff8322c0f053820c1b8ae897954282c3def'\
    '0d6cdee926bca1c64cd760321d42a04b367b28ff93004b5627fb20ab77965a85e'\
    'd672516a4c59db07f84f852f756a7ad42d712fa5b04c026adc47c05d0acd95b35'\
    'cc2a44da89728589081a23a2cc142c6de2ceb0197c7466198dd36658b287bd6c6'\
    '92da2ceaaa08aa08ea22ae0be86b8de1e2027ef07fe07e0274582deebdd5b6189'\
    'ca56bc99b324c479b0e8c2c0a22727fc0ceb8969a33ad60cce7df89b80748969a'\
    'b5d10de7f1fcc167219f2f748a7c3e23bafb7ca7cdcfe7eb78054a72415cf3f93'\
    'b793951482e05e44045047ebb45884cd2a08745e3b7810312316d406f43826493'\
    '07dcb4dd2f8585711ebf387e6ac20cd073cdc09b67aba028e2e1f48ab72014e4c'\
    '7ef001b9674d91615e5b505e078beada0f9541a2e0a29081b641ace6dc75da5d1'\
    '80f17171211f775a25169c7dc9a806e5908af25a667ce4545a7443b42ba95a375'\
    'e69aa63bbaa969ad9d88491e3f9365e7cf5fc0f2961791f'

from sqlalchemy import create_engine, Integer, Float, \
                        CHAR, Column, ForeignKey, \
                        func, cast, case, and_
from sqlalchemy.orm import Session
from sqlalchemy.ext.declarative import declarative_base
import math, binascii, zlib

img = dict(
    (line[0], [int(x) for x in line[1:].split(",")])
    for line in zlib.decompress(binascii.unhexlify(packed_turkey)).\
    split("\n") if line)

class classproperty(property):
    """Class level @property."""
    def __get__(desc, self, cls):
        return desc.fget(cls)

Base = declarative_base()

class SpiralPoint(Base):
    """Store a character and its position along a spiral."""

    __tablename__ = 'spiral_point'

    def __init__(self, character, value):
        self.value = value
        self.character = character
        self.sqrt = math.sqrt(value)

    value = Column(Integer, primary_key=True)
    """The value."""

    character = Column(CHAR(1))
    """The character."""

    sqrt = Column(Float, nullable=False)
    """Store the square root of the value, SQLite
    doesn't have sqrt() built in.

    (custom SQLite functions are beyond the scope
    of this "exercise", as it were)
    """

    @classproperty
    def nearest_odd(cls):
        """Return the nearest odd number below this SpiralPoint's sqrt."""

        return (func.round(cls.sqrt / 2) * 2 - 1).\
                    self_group().label('nearest_odd')

    @classproperty
    def nearest_even(cls):
        """Return the nearest even number below this SpiralPoint's sqrt."""

        return (cls.nearest_odd + 1).\
                    self_group().label('nearest_even')

    @classproperty
    def center_distance(cls):
        """How far from the 'center' is this value ?"""

        return (cls.nearest_even / 2).label('center_distance')

    @classproperty
    def quadrant(cls):
        """Which side of the 'center' is this value part of ?"""

        return (cls.value - (
                cls.nearest_odd * cls.nearest_odd
            )).label('quadrant')

e = create_engine('sqlite://', echo=True)
Base.metadata.create_all(e)

session = Session(e)
session.add_all([SpiralPoint(char, val)
                    for char in img
                    for val in img[char]])
session.commit()

total_x = 60
total_y = 24

# load our turkey !
turkey = session.query(
    SpiralPoint.character,
    cast(case([
        (
            SpiralPoint.quadrant < SpiralPoint.nearest_even,
            (total_x / 2) + SpiralPoint.center_distance
        ),
        (
            and_(
                SpiralPoint.nearest_even <= SpiralPoint.quadrant,
                SpiralPoint.quadrant < SpiralPoint.nearest_even * 2
            ),
            (
                (total_x / 2) + SpiralPoint.center_distance - 1) -
                (SpiralPoint.quadrant - SpiralPoint.nearest_even).self_group()
            ),
        (
            and_(
                SpiralPoint.nearest_even * 2 <= SpiralPoint.quadrant,
                SpiralPoint.quadrant < SpiralPoint.nearest_even *3
            ),
            (total_x / 2) - SpiralPoint.center_distance
        )
    ],
        else_ = (
                    (total_x / 2) - SpiralPoint.center_distance + 1) +
                    (SpiralPoint.quadrant - (SpiralPoint.nearest_even * 3)
                )
    ), Integer).label('x'),

    cast(case([
        (
            SpiralPoint.quadrant < SpiralPoint.nearest_even,
            (total_y / 2) - SpiralPoint.center_distance + 1 + SpiralPoint.quadrant
        ),
        (
            and_(
                SpiralPoint.nearest_even <= SpiralPoint.quadrant,
                SpiralPoint.quadrant < SpiralPoint.nearest_even * 2
            ),
            (total_y / 2) + SpiralPoint.center_distance),
        (
            and_(
                SpiralPoint.nearest_even * 2 <= SpiralPoint.quadrant,
                SpiralPoint.quadrant < SpiralPoint.nearest_even *3
            ),
            (total_y / 2) + SpiralPoint.center_distance - 1 -
            (SpiralPoint.quadrant - (SpiralPoint.nearest_even * 2)).self_group()
        )
    ],
        else_ = (total_y / 2) - SpiralPoint.center_distance
    ), Integer).label('y')
)

# serve our turkey !
grid = [
        [' ' for x in xrange(total_x)]
        for y in xrange(total_y)
]

for char, x, y in turkey:
    grid[y][x] = char

for g in grid:
    print "".join([c for c in g])

Download Source


How Coders Blog

November 21, 2010 at 08:18 PM | Code, Mako/Pylons

It all started just a few days ago, as I had a really rare desire to blog something, and had to go back to my klunky old Wordpress blog and re-figure out how to use it.

I've had Wordpress running for maybe three years, after trying out some other not so spectacular platforms like Serendipity. Years ago Movable Type was the bomb because all it did was generate files for you, but then they got on the PHP bandwagon and became a huge beast just like all the rest. Wordpress at least had marketshare and a lot of plugins.

Running WP is mostly a miserable affair for a coder. We generally don't go for WYSIWYG editors, and we certainly don't want to sit there typing HTML tags, and we need to display lots of code samples which we'd like highlighted. I managed to hack up my WP to use a Markdown plugin for content entry and wp_syntax for syntax highlighting, where getting them to work together was a herculean effort involving direct modification of the plugins. This herculean effort needed to be repeated every few years when it became necessary to upgrade Wordpress, as I had to re-figure-out and re-write all my PHP hacks to make my system work again. Just shoveling around all those PHP files, each one a huge mess of spaghetti, hardcoded SQL, and who knows what future vulnerabilities that you're now going to run on your server, is a distasteful affair.

Which comes down to the worst thing about WP, is that you have to upgrade all the fricking time, as it is simultaneously the most security-hole ridden piece of crap as well as the most highly targeted application by various worms and other web nasties. As paranoid as I was about enabling the PHP interpreter on my server, a pretty harmful nasty managed to stick some backdoor-related files in my /tmp/ directory around 2008 or so, prompting me to literally delete various .php files from the wp-admin/ directory and add additional passwords on the whole thing, as these were php files meant to provide "file upload" features which might as well been designed exclusively for worms and hackers. Searching WP's trac finds hundreds of issues tagged "security", many of them just closed as "can't reproduce" even though the unfortunate reporter of the bug clearly got hacked several times, long after my most recent version of 2.5. Here's an admin exploit in 2.6.1, an improperly escaped eval() (they were using eval!) in 2.8.4.

So the other day, when as is always the case when I go back to my WP admin page, a giant "YOU NEED TO UPGRADE RIGHT NOW!" warning has been sitting there for eighteen months, I got fed up and tweeted:

what do I use to blog where I write posts as ReST files, generate->static site + Disqus, keep the whole thing in VC and use rsync to pub ?

Turns out that field has gone really well since the bad old days when I had to decide between one PHP piece of junk or the other, and a whole bunch of people have already been thinking the same thing. Here's what I got back:

All look extremely promising - but what was even better was how obvious the decision was for me personally - the one that uses my own stuff (i.e. Mako, plus some SQLA utilities for WP import) which is Blogofile. In just two days I got everything the crap out of Wordpress and got ReST-powered, static, Pygments-syntax highlighting, entirely-invisible-to-PHP-worms blog that looks better and I'll never need to upgrade anything. The comments go to Disqus, which is both good and bad. Good because the data-receiving, spam catching dynamic side of the equation is on someone else's damn server. Bad because, there you go they've got my data, as well as my general distaste of smarmy highly designed social media dashboards. But it does look nice.

Blogofile worked terrifically, was designed exactly with my needs in mind by someone who sees things similarly to me, and was super easy to customize and tweak. It did need a little bit of tweaking to work with RST and Pygments, but this is all laid out for you (the coding blogger) in an obvious way that's easy to customize. Publishing is the easiest part, just push to a local Mercurial via ssh, and a two line hg hook to up, rebuild and copy the files - rsync isn't needed at all.

What's hard to ignore about all these platforms is that, your dad will never blog like this. You simply have to be a programmer to get excited about writing posts as plain markup, checking them into a VC and configuring shell scripts to publish, not to mention building the whole blog out using Python scripts and templates. So this is no threat to the world of hosted blog services and dynamically-oriented systems. But in the Python and Ruby worlds this is how we should be doing it.


Quick Mako vs. Jinja Speed Test

November 19, 2010 at 08:18 PM | Code, Mako/Pylons

Updated March 6, 2011 - Mako 0.4 has ironed out some of the bumps and is within 10% of Jinja2's speed for this test.

I'm really glad about Pyramid and all the great work Pylons + BFG is going to accomplish. Also glad that someone did a matchup against other frameworks, including Rails, and its looking great!

Here we address this statement made by Seth (nice to meet you, Seth!):

Jinja2 was consistently around 25-50r/s faster than Mako for me

and when I read that, I said to myself, "yeah, probably, Armin wrote Jinja2 well after Mako, and probably did the same thing I did with Cheetah when I wrote Mako, ensured it was just a teeny bit faster".

As is my pessimistic nature, I made the same assumptions with ORMs a long time ago, I said "yeah OK, they focused more on speed than I did, they're probably right !" Until I went and tried it out, and saw that wasn't the case at all.

So I whipped up a quick test for this one, running the templates directly without any web frameworks involved in a timeit run of 10000 render() calls. Shrugs Oh well, I'm getting 18%-21% faster performance from Mako, the latest Jinja2 is 24% faster using Seth's exact Jinja2 templates, compared against two versions of the Mako template which duplicate the Jinja2 templates down to the newline:

classics-MacBook-Pro:mako_v_jinja classic$ python run.py
jinja2 2.5: 7.5499420166
mako 0.3.6: 6.17144298553
mako 0.3.6 using def: 5.95005702972

Edit: ah crap, forgot to upgrade Jinja2 - Armin wins !:

jinja2 2.5.5: 4.56899094582
mako 0.3.6: 6.26432800293
mako 0.3.6 using def: 6.06626796722

Update March 2011 - some of the issues have been addressed in Mako 0.4.0, Mako now nearly the same:

jinja2 2.5.5: 4.35861802101
mako 0.4.0: 4.83493804932
mako 0.4.0 using def: 4.82003712654

All three versions use a basic template inheritance setup. The first Mako test uses the traditional next.body() approach to render the "body", the second does more exactly the method used by the Jinja2 template, declaring a "block" (in Mako's case a <%def>) and then calling it as a method, i.e. self.content().

I was surprised myself This is hardly surprising ! I've hardly done anything at all with Mako speedwise in years (with one exception below) and assumed newer module-compiled systems were smoking me by now (as I've been told these guys did).

The ironic thing is that Mako got a pretty big speed boost in version 0.3.4, when we started using Armin's own MarkupSafe, the library that was written originally for Jinja specifically, to do escaping. It's written in C and is a huge improvement over the very slow cgi routine we were using. Jinja2 and Mako are almost like cousins at this point - we also use some AST utility code written by Armin.


SQLAlchemy 0.6 ....Getting Warmer

October 12, 2009 at 11:07 AM | Code, SQLAlchemy

It's super-hard to find good large blocks of time to work on SQLAlchemy at the moment...but I did manage to get most of the major "whats new?" bits up on the wiki. I'm really close to pulling the trigger on a beta release, and we already have people running trunk. There's just a certain presence of mind I like to have before releasing that hasn't been clicking on weekends.

Read about the big new items and what to expect when upgrading, at http://www.sqlalchemy.org/trac/wiki/06Migration.


SQLAlchemy 0.5.4p1 Recommended for All Ages

May 18, 2009 at 12:37 PM | Code, SQLAlchemy

I don't usually blog about releases, especially point releases, but this one is pretty significant in that we've repaired some very severe speed bumps that were impacting the flush() process. Anyone working with large numbers of objects that has observed the Session to slow down as it gets bigger should download this release, as that issue has been resolved. Other latencies within flush() have also been flattened, and a few spreadsheet jobs that I run here which were taking 20-30 minutes now complete in about five.

It's still not nearly as fast as running a single huge executemany() to insert lots of rows, but if your experience with this release is like mine you should see much faster runs for large data update operations. As usual there's dozens of other fixes and enhancements too.

Work on the 0.6 series continues, as this release is focused on expanding the world of compatibility for SQLAlchemy, including compatibility with Python 3K, Jython, and many more DBAPI implementations. It also refactors DDL generation to work within the same compiler framework as that of non-DDL expressions, so you can easily create and execute CreateTable kinds of objects. Database reflection has been greatly enhanced with a new Inspector API too.