Using Beaker for Caching? Why You'll Want to Switch to dogpile.cache

April 19, 2012 at 12:01 PM | Code

Continuing on where I left off regarding Beaker in October (see Thoughts on Beaker), my new replacement for Beaker caching, dogpile.cache, has had a bunch of early releases. While I'm still considering it "alpha" until I know a few people have taken it around the block, it should be pretty much set for early testing and hopefully can be tagged as production quality in the near future.

The core of Beaker's caching mechanism is based on code I first wrote in 2005. It was adapted from what was basically my first Python program ever, a web template engine called Myghty, which in turn was based on a Perl system called HTML::Mason. The caching scenarios Beaker was designed for were primarily that of storing data in files, such as DBM files. A key assumption made at that time was that the backends would all provide some system of returning a flag whether or not a key was present, which would precede the actual fetch of the value from the cache. Another assumption made was that the actual lock applied to these backends to deal with the dogpile situation would be at its most "distributed" scope a file-based lock, using flock().

When memcached support was added to Beaker, these assumptions proved to be architectural shortcomings. There is no "check for a key" function in memcached; there's only get(). Beaker's dogpile lock calls "check for a key" twice. As a result, Beaker will in general pull a value out of memcached three times, each time resulting in an "unpickle" of a pickled object. The upshot of this is that Beaker pulls over the network and unpickles your object three times times on every cache hit. Users of Beaker are also well familiar with the awkward lock files Beaker insists on generating, even though there are more appropriate ways to lock for distributed caches.

So for no other reason than these, dogpile.cache's entirely new and extremely simplified architecture is an improvement of vast proportions. The test program below illustrates the improvement in unpickling behavior, as well as dogpile.cache's simplified API:

class Widget(object):
    """Sample object to be cached.

    Counts pickles and unpickles.

    """
    pickles = 0
    unpickles = 0

    def __init__(self, id):
        self.id = id

    def __getstate__(self):
        Widget.pickles +=1
        return self.__dict__

    def __setstate__(self, state):
        Widget.unpickles +=1
        self.__dict__.update(state)

def test_beaker():
    from beaker import cache

    cache_manager = cache.CacheManager(cache_regions={
    'default' :{
            'type':'memcached',
            'url':'127.0.0.1:11211',
            'expiretime':1,
            'lock_dir':'.',
            'key_length':250
        }
    })

    @cache_manager.region("default", "some_key")
    def get_widget_beaker(id):
        return Widget(id)

    _run_test(get_widget_beaker)

def test_dogpile():
    from dogpile.cache import make_region
    from dogpile.cache.util import sha1_mangle_key

    region = make_region(key_mangler=sha1_mangle_key).configure(
        'dogpile.cache.memcached',
        expiration_time = 1,
        arguments = {
            'url':["127.0.0.1:11211"],
        },
    )

    @region.cache_on_arguments()
    def get_widget_dogpile(id):
        return Widget(id)

    _run_test(get_widget_dogpile)

def _run_test(get_widget):
    """Store an object, retrieve from the cache.

    Wait two seconds, then exercise a regeneration.

    """
    import time

    Widget.pickles = Widget.unpickles = 0

    # create and cache a widget.
    # no unpickle necessary.
    w1 = get_widget(2)

    # get it again.  one pull from cache
    # equals one unpickle needed.
    w1 = get_widget(2)

    time.sleep(2)

    # get from cache, will pull out the
    # object but also the fact that it's
    # expired (costs one unpickle).
    # newly generated object
    # cached and returned.
    w1 = get_widget(2)

    print "Total pickles:", Widget.pickles
    print "Total unpickles:", Widget.unpickles

print "beaker"
test_beaker()

print "dogpile"
test_dogpile()

Running this with a clean memcached you get:

beaker
Total pickles: 2
Total unpickles: 6
dogpile
Total pickles: 2
Total unpickles: 2

Run it a second time, so that the Widget is already in the cache. Now you get ten unpickles with Beaker compared to dogpile.cache's three:

beaker
Total pickles: 2
Total unpickles: 10
dogpile
Total pickles: 2
Total unpickles: 3

The advantages of dogpile.cache go way beyond that:

  • dogpile.cache includes distinct memcached backends for pylibmc, memcache and bmemcached. These are all explicitly available via different backend names, in contrast to Beaker's approach of deciding for you which memcached backend it wants to use.
  • A dedicated API-space for backend-specific arguments, such as all the special arguments pylibmc offers.
  • A Redis backend is provided.
  • The system of "dogpile locking" is completely modular, and in the case of memcached and Redis, a "distributed lock" option is provided which will use the "set key if not exists" feature of those backends to provide the dogpile lock. A plain threaded mutex can be specified also.
  • Cache regions and function decorators are open ended. You can plug in your own system of generating cache keys from decorated functions, as well as what kind of "key mangling" you'd like to apply to keys going into the cache (such as encoding, hashing, etc.)
  • No lockfiles whatsoever unless you use the provided DBM backend; and there, you tell it exactly where to put the lockfile, or tell it to use a regular mutex instead.
  • New backends are ridiculously simple to write, and can be popped in using regular setuptools entry points or in-application using the register_backend() function.
  • Vastly simplified scope - there's no dilution of the task at hand with session, cookie, or encryption features.
  • Python 3 compatible in-place with no 2to3 step needed.

So I'm hoping we can all soon get modernized onto dogpile.cache.

dogpile.cache documentation.