Thoughts on Beaker

October 01, 2011 at 12:01 PM | Code, Mako/Pylons

Beaker is a widely used caching and HTTP session library published by Ben Bangert. It dates back to the early days of Pylons, when Pylons moved off of the Myghty base to the Python Paste infrastructure. Beaker's guts in fact were originally the internals to the caching and session system of Myghty itself, which in turn was loosely based on the caching code from HTML::Mason (which if those two colons don't look familiar means we've made the jump across the ages to Perl).

The key neato thing nestled deep inside of Beaker is what I later was told is called a "dogpile lock". This is a simple, but slightly less simple than dict.get() type of system whereby the cache can put one thread to work on generating a new value for a certain cache key, while all the other threads and even other processes can all continue to return the expired value, until the new value is ready. I spent weeks getting this aspect to do what it was supposed to. It does things in a slightly more complex but comprehensive way than Mason did; it coordinates workers using a mutex and/or lockfiles, instead of a simple counter that estimates the creation time for a new value.

The storage backends are implemented in terms of a "container", which deals with an individual key, including checking if it's expired, regenerating it, and so forth, and a "namespace", which is the entryway to storing the data. It deals with keys and values like everything else does, and has backends for pickled files, DBM files, relational databases, memcached, and GAE. The general idea here came from Myghty but has been highly modified since then.

Beaker then adds something of a coarse-grained facade over this model, including something of a generic get/put API as well as a system of function decorators that cache the return value of a function. In recent releases it also supports the notion of a "region", which is just a way to package up a particular cache configuration under a single name that can later be referenced by high level caching operations.

Beaker also implements an HTTP session object on top of the same storage backends, taking this from Myghty as well. Implementing HTTP sessions on top of the cache backends was a completely off-the-cuff idea in Myghty, and over the years in Beaker it's had to be wrangled and re-wrangled over basic issues resulting from this mismatch.

As far as the backends themselves, at this point all of them have accumulated a fair degree of woe. The file- and DBM- backends basically perform pretty poorly (DBM maybe not as much). The memcached backend has always been problematic - we slowly learned that the cmemcache library is basically a non starter, and that pylibmc is dramatically faster than the usual memcache library. The memcached backend slowly sprouted an awkward way to select among these backends and more awkwardness to attempt to accommodate pylibmc's special concurrency API. Then when users started combining the memcached backend with HTTP sessions, all kinds of new issues occurred, namely that any particular key might be removed from memcached at any time, that caused us to rework the session implementation even more awkwardly to store the whole session under one "key", defeating the purpose of how the session object works with other backends. As for the SQLAlchemy backends, there are two, and I've no idea why.

The HTTP session as a "bag of random keys", to paraphrase a term I think PJE originally used, is not something I really do these days, as I'm pretty skeptical of the approach of storing a bag of key/value pairs inside of a big serialized BLOB as a solution to anything, really. A few keys in a cookie-based session of very limited size is fine. But a backend-oriented system where you're loading up a giant bag of dozens of keys referencing full object structures inside of a good sized serialized blob inside of a file, database, or memcached-type of system, is both horribly inefficient and a sign of sloppy, ad-hoc coding.

Beaker eventually provided a purely client side encrypted cookie session as an alternative to the server-based sessions. This session implementation stores all the data in an encrypted string (Ben spent a lot of time and got a lot of advice getting the encryption right here) which is stored completely on the client. All of the performance, scalability and failover issues introduced by typical memory or file based HTTP session implementations is solved immediately. So this session implementation, I use lots. I only put a few critical keys in it to track the user, but anything that is more significant is part of the object model which is persisted in the database as a first class object.

The only downside to the cookie-only session is that, unless you coordinate the state of the session with server-side state (which IMHO you should), you can't "force" it to expire sooner than it normally would, or otherwise prevent the replay of previous state, within an attack scenario. While an attacker can't craft a session, (s)he can re-use the same cookie over and over again, and that can only be guarded against by comparing its state to that of a memo on the server; a single "generation" counter can achieve this. I.e. session comes in, generation is "5", the server says it should be "7", reject it. So it may be considered that this turns the encrypted cookie session into a fancy session cookie for a server-side session. But I am actually OK with that. I put very, very little into unstructured sessions.

What I'm getting at here is that I'm not super-interested in most of what Beaker has - a lot of the APIs are awkward, all of the storage backends are either mostly useless or slow and crufty, and I don't care much for the server-side session thing especially using backends that were designed for caching. Overall Beaker creates a coarse opacity on top of a collection of things that I think good application developers should be much more aware of. I really think developers should know how their tools work.

Beaker has some isolated features which I think are great. These are the dogpile lock, the encrypted client-side cookie session, the concept of "cache regions" whereby a set of cache configuration is referencable by a single name, and some nice function decorators - these allow you to apply caching to a function, similarly to:

@cached(region="file_based", namespace="somenamespace")
def return_some_data(x, y, z):
    # ...

I'm not sure what's in store for Beaker. But what I'm doing is:

Created Dogpile, which is just the "dogpile lock" portion of Beaker, cleaned up and turned into a very clear, succinct system which you can use to build your own caching implementation. The README includes a full example using Pylibmc.
In Mako, for either 0.5.1 or 0.6.0 (not sure yet, though I just put out 0.5.0 the other day for unrelated reasons) the built-in Beaker support is genericized into a plugin system. You can write your own plugins very easily for Mako using Dogpile or similar, if you want caching inside your templates that isn't based off of Beaker. The system supports entrypoints so if a Beaker2 comes out, it would publish a cache backend that Mako could then use.

So we have that. Then, for the community I would like to see:

The solution for the key/value backend (or not). Today, high-performance, sophisticated key/value stores are ubiquitous. We of course have memcached. We also have Riak and Redis which appear very feasible for temporary storage, I'm not sure if Tokyo Tyrant is still popular. And of course there's the more permanent-oriented systems like Mongo and Cassandra. I am sure that someone has built some kind of generic facade over these, and there are probably many. These are the key/value systems you should be using, either via a modern facade or one of the libraries directly; Beaker's system, which was originally built to store keys inside of dictionaries or pickled files, is absolutely nothing more than a quaint, historical novelty in comparison.
A new library that is just the encrpyted cookie session. I'd use this. It would be very nice for this portion of Beaker to be its own thing. Until then I may just rip out that part of Beaker and stick it in a lib/ folder somewhere in my work projects for internal use.
For server-side HTTP sessions, I really think people should be rolling solutions for this as needed, probably using the encrypted cookie session for client side state linked to first class model objects in the datastore (relational or non-relational, doesn't matter). If accessing the datastore is a performance issue, you'd be using caching for those data structures, the same way you'd cache data structures that are not specific to a user (probably based on primary key or on their originating query). In the end this is similar to HTTP sessions stored directly in the cache backend, except there's a well defined model layer in between.
People that really insist on the pattern of server-side HTTP sessions as bags of keys inside of serialized blobs on a server should be served by some new library that someone else can maintain. I think I am done with supporting this pattern and I think Ben may be as well.

The theme I've been pushing in my recent talk on SQLAlchemy and I think I'm pushing here, is that if you're writing a big application, one which you'll be spending months or years with, it is very much worth it to spend some time spending a day or so building up foundational patterns. I really have never understood the point of, "I just got the whole app up and running from start to finish on the plane!" Well great, you got the app up in three hours, now you can spend the next six months reworking the whole thing to actually scale (or if it was only a proof-of-concept, then why does it need a cache or even a database?). Did three hours for a quick and dirty implementation versus three days to do it right really improve your life within the scope of the total time spent? Breaking up Beaker into components and encouraging people to use patterns instead of relying upon opaque, pushbutton libraries is part of that idea.

My Blogofile Hacks

December 06, 2010 at 07:26 PM | Code, Mako/Pylons

Update: - Spurred on by Daniel Nouri, configuration examples have been upgraded to 0.7.

I'm having a completely great time with Blogofile. Publishing the whole site static and in one step, letting Disqus handle all the community is so much better than worrying about upgrades and spam. I've noticed some other folks using Blogofile and I wanted to share the key changes I used to make it work the way I want. Ideally, Blogofile itself could provide these features since they are pretty basic; I'm being somewhat lazy by just posting them here rather than requesting features via the BF mailing list, but the changes aren't generic to a plain Blogofile install so they would need to be "featurized" in order to be part of Blogofile's default setup. Until then, these adjustments work right now for an 0.6 installation.

Getting Syntax Highlighting to work with ReST

We're all using Sphinx for our docs now so we've all become experts at Restructured Text. I know this because I can actually type out `[text] <[hyperlink]>`_ from memory. Blogofile supports .rst but the syntax highlighting that's included appears to be tailored towards Markdown. My approach here to allow .rst highlighting is not as nice as that of Sphinx since I continue to be very mystified by Docutils, but it gets the job done.

Step 1 - Put the RST Filter First

We'll be using Docutils' built in system of :: followed by indentation to establish a code block, so the syntax highlighter will detect the HTML generated, instead of Blogofile's default approach of using a special tag $$code(lang=python) which doesn't make it through the rst parser in any case (in fact it's the Pygments HTML the filter generates that doesn't). Change the order in _config.py as follows:

blog.post_default_filters = {
    "rst": "rst, syntax_highlight"
}

Step 2 - New Syntax Filter

I don't need a lot of options in my code blocks other than what language is in use. Below is a simplified syntax_highlight.py that looks for a language name using a comment of the form #!<language name>:

import re
import os

from pygments import util, formatters, lexers, highlight
import blogofile_bf as bf

css_files_written = set()

code_block_re = re.compile(
    r"<pre class=\"literal-block\">\n"
    r"(?:#\!(?P<lang>\w+)\n)?"
    r"(?P<code>.*?)"
    r"</pre>", re.DOTALL
)

def highlight_code(code, language, formatter):
    try:
        lexer = lexers.get_lexer_by_name(language)
    except util.ClassNotFound:
        lexer = lexers.get_lexer_by_name("text")
    highlighted = "\n\n" + \
                  highlight(code, lexer, formatter) + \
                  "\n\n"
    return highlighted

def write_pygments_css(style, formatter, location="/css"):
    path = bf.util.path_join("_site",bf.util.fs_site_path_helper(location))
    bf.util.mkdir(path)
    css_path = os.path.join(path,"pygments_"+style+".css")
    if css_path in css_files_written:
        return #already written, no need to overwrite it.
    f = open(css_path,"w")
    f.write(formatter.get_style_defs(".pygments_"+style))
    f.close()
    css_files_written.add(css_path)

from mako.filters import html_entities_unescape

def run(src):

    style = bf.config.filters.syntax_highlight.style
    css_class = "pygments_"+style
    formatter = formatters.HtmlFormatter(
            linenos=False, cssclass=css_class, style=style)
    write_pygments_css(style,formatter)

    def repl(m):
        lang = m.group('lang')
        code = m.group('code')

        code = html_entities_unescape(code)

        return highlight_code(code,lang,formatter)

    return code_block_re.sub(repl, src)

That's it. All you do now when writing a code example:

This is some blog text.  Some code::

    #!python
    print "hello world"

Permalinks

Blogofile's auto-permalink regular expression needs some serious help. Any kind of punctuation characters or whatever just get dumped into the URL, and there's no hook to change how the auto-permalink generation works. We definitely don't want to have to type permalinks into all our posts that look just like the titles.

With Blogofile 0.6, I had a post-processor in the 0.initial.py controller file that rewrote the permalink on each Post object. As of Blogofile 0.7, the 0.initial.py controller seems to be gone, but the Post class is now part of the blog buildout in the file _controllers/blog/post.py, allowing you to change how things are done. So inside the __post_process() method of Post, I replace the "slug" generation code with the following:

--- _controllers/blog/post.py
+++ _controllers/blog/post.py
@@ -166,7 +168,24 @@
                datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

         if not self.slug:
-            self.slug = re.sub("[ ?]", "-", self.title).lower()
+
+            ########## THIS CODE ADDED FOR TECHSPOT #############
+            slug = self.title.lower()
+
+            # convert ellipses to spaces
+            slug = re.sub(r'\.{2,}', ' ', slug)
+
+            # flatten everything non alpha or . into a single -
+            slug = re.sub(r'[^0-9a-zA-Z\.]+', '-', slug)
+
+            # trim off leading/trailing -
+            slug = re.sub(r'^-+|-+$', '', slug)
+            self.slug = slug
+
+            #######################################################
+
+            # original
+            #self.slug = re.sub("[ ?]", "-", self.title).lower()

This allows the blog.auto_permalink.path configuration in _config.py to remain at "/:year/:month/:day/:title", and special characters will be nicely treated in permalinks.

You need to have permalink generation turned on for the above to work - if your posts don't have permalinks inside them, they don't appear to be generated otherwise.

Hope these are helpful and happy static file blogging !

In Response to "Stupid Template Languages"

December 04, 2010 at 10:15 AM | Code, Mako/Pylons

Responding to Daniel Greenfield's critique of "smart" template languages ("Stupid Template Languages"). In this post we see a typical critique of Mako, that it's allowance of Python code in templates equates to an encouragement of the placement of large amounts of business logic in templates:

I often work on projects crafted by others, some who decided for arcane/brilliant/idiotic reasons to mix the kernel of their applications in template function/macros. This is only possible in Smart Template Languages! If they were using a Stupid Template Language they would have been forced put their kernel code in a Python file where it applies, not in a template that was supposed to just render HTML or XML or plain text.

What it comes down to is that Smart Template Languages designers assume that developers are smart enough to avoid making this mistake. Stupid Template Languages designers assume that developers generally lack the discipline to avoid creating horrific atrocities that because of unnecessary complexity have a bus factor of 1.

Though I'm the author of Mako, I have lots of experience with restricted templating systems, being the original creator of several, including one that became known as FreeMarker. It's my experience that non-trivial projects using such systems virtually always bring forth situations where HTML needs to be stuffed into concatenated strings inside view logic - areas where some intricate interaction of tags and data are needed, or even not so intricate interactions.

Then you have HTML tags, including the choice of tag and its CSS attributes, shoved inside your code, where no HTML person will ever see it. You only need to look as far as Django's own template documentation to see them actually encouraging it ! This to me is infinitely worse than a little bit of code in templates, and I am always struck by the Django communities' critiques of Mako's allowance of small amounts of Python in template code, as they continue to stuff HTML in their Python code as they see fit, seeing no issue at all. Mako's philosophy is that no HTML should ever be in your code anywhere, and to that end it allows your custom tag libraries to be built as templates as well.

I commonly hear the critique of Mako, as we see here, "you could write your whole application in your template ! I've seen people do it!" That argument is entirely a straw man. In contrast, my nightmare experiences with applications are those where entire HTML pages have been shoved into collections of hundreds of Perl modules or Java classes, each broken up into dozens of functions and entirely unreadable and unmaintainable. I would bet that there are some Django apps out there which do some of the same thing. Is that the way Django intended ? Absolutely not. But they can't save the world from bad code.

There's an infinite number of ways to write an application incorrectly, just because a particular library doesn't physically prevent you from doing so doesn't mean it's encouraging this behavior. Mako has no responsiblity to "assume" that developers "won't be stupid". I can assure you that no library or framework has ever achieved the feat of eliminating developer stupidity. If that is to be Django's main selling point, they can run with it.

The PHP mindset is one of the greatest evils in web development but Mako's existence is not an endorsement. We don't encourage the placement of business logic in HTML templates, nor the placement of HTML tags into business logic - something that others do.

How Coders Blog

November 21, 2010 at 08:18 PM | Code, Mako/Pylons

It all started just a few days ago, as I had a really rare desire to blog something, and had to go back to my klunky old Wordpress blog and re-figure out how to use it.

I've had Wordpress running for maybe three years, after trying out some other not so spectacular platforms like Serendipity. Years ago Movable Type was the bomb because all it did was generate files for you, but then they got on the PHP bandwagon and became a huge beast just like all the rest. Wordpress at least had marketshare and a lot of plugins.

Running WP is mostly a miserable affair for a coder. We generally don't go for WYSIWYG editors, and we certainly don't want to sit there typing HTML tags, and we need to display lots of code samples which we'd like highlighted. I managed to hack up my WP to use a Markdown plugin for content entry and wp_syntax for syntax highlighting, where getting them to work together was a herculean effort involving direct modification of the plugins. This herculean effort needed to be repeated every few years when it became necessary to upgrade Wordpress, as I had to re-figure-out and re-write all my PHP hacks to make my system work again. Just shoveling around all those PHP files, each one a huge mess of spaghetti, hardcoded SQL, and who knows what future vulnerabilities that you're now going to run on your server, is a distasteful affair.

Which comes down to the worst thing about WP, is that you have to upgrade all the fricking time, as it is simultaneously the most security-hole ridden piece of crap as well as the most highly targeted application by various worms and other web nasties. As paranoid as I was about enabling the PHP interpreter on my server, a pretty harmful nasty managed to stick some backdoor-related files in my /tmp/ directory around 2008 or so, prompting me to literally delete various .php files from the wp-admin/ directory and add additional passwords on the whole thing, as these were php files meant to provide "file upload" features which might as well been designed exclusively for worms and hackers. Searching WP's trac finds hundreds of issues tagged "security", many of them just closed as "can't reproduce" even though the unfortunate reporter of the bug clearly got hacked several times, long after my most recent version of 2.5. Here's an admin exploit in 2.6.1, an improperly escaped eval() (they were using eval!) in 2.8.4.

So the other day, when as is always the case when I go back to my WP admin page, a giant "YOU NEED TO UPGRADE RIGHT NOW!" warning has been sitting there for eighteen months, I got fed up and tweeted:

what do I use to blog where I write posts as ReST files, generate->static site + Disqus, keep the whole thing in VC and use rsync to pub ?

Turns out that field has gone really well since the bad old days when I had to decide between one PHP piece of junk or the other, and a whole bunch of people have already been thinking the same thing. Here's what I got back:

Blogofile: http://www.blogofile.com/
CodeRanger: https://github.com/coderanger/coderanger.github.com
Jekyll: https://github.com/mojombo/jekyll
Hyde: https://github.com/lakshmivyas/hyde
Pelican: http://alexis.notmyidea.org/pelican/
Rest2Web: http://www.voidspace.org.uk/python/rest2web/

All look extremely promising - but what was even better was how obvious the decision was for me personally - the one that uses my own stuff (i.e. Mako, plus some SQLA utilities for WP import) which is Blogofile. In just two days I got everything the crap out of Wordpress and got ReST-powered, static, Pygments-syntax highlighting, entirely-invisible-to-PHP-worms blog that looks better and I'll never need to upgrade anything. The comments go to Disqus, which is both good and bad. Good because the data-receiving, spam catching dynamic side of the equation is on someone else's damn server. Bad because, there you go they've got my data, as well as my general distaste of smarmy highly designed social media dashboards. But it does look nice.

Blogofile worked terrifically, was designed exactly with my needs in mind by someone who sees things similarly to me, and was super easy to customize and tweak. It did need a little bit of tweaking to work with RST and Pygments, but this is all laid out for you (the coding blogger) in an obvious way that's easy to customize. Publishing is the easiest part, just push to a local Mercurial via ssh, and a two line hg hook to up, rebuild and copy the files - rsync isn't needed at all.

What's hard to ignore about all these platforms is that, your dad will never blog like this. You simply have to be a programmer to get excited about writing posts as plain markup, checking them into a VC and configuring shell scripts to publish, not to mention building the whole blog out using Python scripts and templates. So this is no threat to the world of hosted blog services and dynamically-oriented systems. But in the Python and Ruby worlds this is how we should be doing it.

Quick Mako vs. Jinja Speed Test

November 19, 2010 at 08:18 PM | Code, Mako/Pylons

Updated March 6, 2011 - Mako 0.4 has ironed out some of the bumps and is within 10% of Jinja2's speed for this test.

I'm really glad about Pyramid and all the great work Pylons + BFG is going to accomplish. Also glad that someone did a matchup against other frameworks, including Rails, and its looking great!

Here we address this statement made by Seth (nice to meet you, Seth!):

Jinja2 was consistently around 25-50r/s faster than Mako for me

and when I read that, I said to myself, "yeah, probably, Armin wrote Jinja2 well after Mako, and probably did the same thing I did with Cheetah when I wrote Mako, ensured it was just a teeny bit faster".

As is my pessimistic nature, I made the same assumptions with ORMs a long time ago, I said "yeah OK, they focused more on speed than I did, they're probably right !" Until I went and tried it out, and saw that wasn't the case at all.

So I whipped up a quick test for this one, running the templates directly without any web frameworks involved in a timeit run of 10000 render() calls. Shrugs Oh well, I'm getting 18%-21% faster performance from Mako, the latest Jinja2 is 24% faster using Seth's exact Jinja2 templates, compared against two versions of the Mako template which duplicate the Jinja2 templates down to the newline:

classics-MacBook-Pro:mako_v_jinja classic$ python run.py
jinja2 2.5: 7.5499420166
mako 0.3.6: 6.17144298553
mako 0.3.6 using def: 5.95005702972

Edit: ah crap, forgot to upgrade Jinja2 - Armin wins !:

jinja2 2.5.5: 4.56899094582
mako 0.3.6: 6.26432800293
mako 0.3.6 using def: 6.06626796722

Update March 2011 - some of the issues have been addressed in Mako 0.4.0, Mako now nearly the same:

jinja2 2.5.5: 4.35861802101
mako 0.4.0: 4.83493804932
mako 0.4.0 using def: 4.82003712654

All three versions use a basic template inheritance setup. The first Mako test uses the traditional next.body() approach to render the "body", the second does more exactly the method used by the Jinja2 template, declaring a "block" (in Mako's case a <%def>) and then calling it as a method, i.e. self.content().

I was surprised myself This is hardly surprising ! I've hardly done anything at all with Mako speedwise in years (with one exception below) and assumed newer module-compiled systems were smoking me by now (as I've been told these guys did).

The ironic thing is that Mako got a pretty big speed boost in version 0.3.4, when we started using Armin's own MarkupSafe, the library that was written originally for Jinja specifically, to do escaping. It's written in C and is a huge improvement over the very slow cgi routine we were using. Jinja2 and Mako are almost like cousins at this point - we also use some AST utility code written by Armin.

zzzeek

About

Categories

Archives