Selecting Booleans

September 09, 2008 at 01:29 PM | Code, SQLAlchemy

Ticket 798, allow conjunctions to act as column elements, is complete after a long wait, finally attended to for the great reason that I suddenly needed this feature myself ;). A few tweaks and we can now talk about asking yes/no questions of our database.

As we head into the 0.5 era of SQLAlchemy, one theme is that if we're using the ORM, we're going to need select() constructs a lot less, if at all. Most things can now be done using Query objects. The other day, I needed to ask my database a question regarding if some particular data were available, and that's it. The query needed to be as fast as possible so I wanted to use EXISTS, so that the database needs to only access the first row of the selected rows in order to give an answer. Additionally, I needed to ask this question of a few different sets of criterion, which could be most efficiently achieved by combining them together into a single statement using OR.

Using the current trunk of SQLAlchemy 0.5, we can use Query() against boolean values:

from sqlalchemy import *
from sqlalchemy.orm import *
engine = create_engine('sqlite://', echo=True)
Session = scoped_session(sessionmaker(bind=engine))

true, false = literal(True), literal(False)

(ret, ), = Session.query(true)
print ret

which returns:

True

The question we just asked our SQLite database is SELECT 1, where "1" is SQLite's way of representing true (in Postgres it's true, in MySQL it can be 1 or true. Using literal(True) means we don't have to worry about this). SQLite tells us, "yes, True is True". Note the way I'm getting the row out of Query. At the moment that seems to be a fun way to go, but you could also say:

ret, = Session.query(true).one()

and of course:

ret = Session.query(true).one()[0]

We can also ask our database about boolean values combined with OR and AND:

>>> (ret, ), = Session.query(or_(true, false))
>>> ret
True

>>> (ret, ), = Session.query(and_(true, false))
>>> ret
False

To demonstrate EXISTS, let's build a table. We're using declarative, which is pretty much the only way I roll these days (hey, better late than never....):

from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Keyword(Base):
    __tablename__ = "keyword"
    id = Column(Integer, primary_key=True)
    name = Column(String(255))

Base.metadata.create_all(engine)

One way we could find out if our Keyword table has any of a certain set of keywords, is to do something familiar and ask for a count:

keywords = ["beans", "lentils", "legumes"]
(ret, ), = Session.query(func.count(Keyword.id)).filter(Keyword.name.in_(keywords))

But to eliminate the need for the database to actually count the full set of rows, we can ask it just if any rows at all exist, using the exists() construct:

(ret, ), = Session.query(exists().where(Keyword.name.in_(keywords)))

Which issues the SQL:

SELECT EXISTS (SELECT * FROM keyword WHERE keyword.name IN (?, ?, ?)) AS anon_1

The return value is boolean, indicating True for rows were found, False for no rows were found. In my case, I was querying among a bunch of tables. Let's suppose we have a database which represents cookbooks and individual recipes, both of which have lists of keywords associated. Let's declare that:

recipe_keywords = Table("recipe_keyword", Base.metadata,
                    Column("recipe_id", Integer, ForeignKey("recipe.id")),
                    Column("keyword_id", Integer, ForeignKey("keyword.id"))
                    )

cookbook_keywords = Table("cookbook_keyword", Base.metadata,
                    Column("cookbook_id", Integer, ForeignKey("cookbook.id")),
                    Column("keyword_id", Integer, ForeignKey("keyword.id"))
                    )

class Recipe(Base):
    __tablename__ = "recipe"
    id = Column(Integer, primary_key=True)
    description = Column(Text)
    keywords = relation(Keyword, secondary=recipe_keywords)

class Cookbook(Base):
    __tablename__ = "cookbook"
    id = Column(Integer, primary_key=True)
    description = Column(Text)
    keywords = relation(Keyword, secondary=cookbook_keywords)

Base.metadata.create_all(engine)

The above schema defines a recipe and a cookbook table, each of which relate to keyword via the recipe_keyword or cookbook_keyword association tables, respectively.

The question of "Do any recipes feature any of our above three keywords?" can be answered by:

(ret, ), = Session.query(
    exists().where(Recipe.id==recipe_keywords.c.recipe_id).
            where(recipe_keywords.c.keyword_id==Keyword.id).
            where(Keyword.name.in_(keywords))
)

Above we ask "do any rows exist, where the ID of the recipe table matches the recipe ID of the recipe_keywords table, and the keyword ID of the recipe_keywords table matches the ID of a keyword table row, and the name of the keyword is in the list ["beans", "lentils", "legumes"]". SQL is:

SELECT EXISTS (SELECT *
FROM recipe, recipe_keyword, keyword
WHERE (recipe.id = recipe_keyword.recipe_id AND recipe_keyword.keyword_id = keyword.id)
AND keyword.name IN (?, ?, ?)) AS anon_1

Spelling out the full join from Recipe to Keyword above is a little bit verbose. We have the option to let SQLAlchemy create this for us using the Recipe.keywords relation already set up, using the ORM level join() function, which provides direct access to the ORM's join algorithm:

from sqlalchemy.orm import join
(ret, ), = Session.query(
        exists().select_from(join(Recipe, Keyword, Recipe.keywords)).
                where(Keyword.name.in_(keywords))
)

This generates:

SELECT EXISTS (SELECT *
FROM recipe JOIN recipe_keyword AS recipe_keyword_1 ON
recipe.id = recipe_keyword_1.recipe_id JOIN keyword ON
keyword.id = recipe_keyword_1.keyword_id
WHERE keyword.name IN (?, ?, ?)) AS anon_1

We now know how to ask the database if it has any recipe rows which relate to a given set of keyword names. To ask the database if it has any cookbook rows or recipe rows matching our keywords at the same time, we can double up on exists() clauses using or_():

(ret, ), = Session.query(or_(
        exists().select_from(join(Recipe, Keyword, Recipe.keywords)).
                where(Keyword.name.in_(keywords)),

        exists().select_from(join(Cookbook, Keyword, Cookbook.keywords)).
            where(Keyword.name.in_(keywords))
))

At this point, our eyes begin to glaze over when viewing the SQL itself. But that's fine; just remember that whenever this happens, somewhere in the world another "ORM's aren't worth it" blog post fades away:

SELECT ((EXISTS (SELECT *
FROM recipe JOIN recipe_keyword AS recipe_keyword_1 ON recipe.id = recipe_keyword_1.recipe_id
JOIN keyword ON keyword.id = recipe_keyword_1.keyword_id
WHERE keyword.name IN (?, ?, ?))) OR (EXISTS (SELECT *
FROM cookbook JOIN cookbook_keyword AS cookbook_keyword_1 ON
cookbook.id = cookbook_keyword_1.cookbook_id JOIN keyword ON
keyword.id = cookbook_keyword_1.keyword_id
WHERE keyword.name IN (?, ?, ?)))) AS anon_1

Those readers who are familiar with some of SQLA's advanced Query operators might recognize that the above EXISTS queries look a lot like an any() expression. This is because, they are! Though in this case, almost. As a review, an any() expression creates an EXISTS clause which is correlated with the enclosing query, such as:

rows = Session.query(Recipe).filter(Recipe.keywords.any(Keyword.name.in_(keywords))).all()

This query produces a regular SELECT from the recipe table, then embeds a correlated EXISTS inside the WHERE clause:

SELECT recipe.id AS recipe_id, recipe.description AS recipe_description
FROM recipe
WHERE EXISTS (SELECT 1
FROM recipe_keyword, keyword
WHERE (recipe.id = recipe_keyword.recipe_id AND keyword.id = recipe_keyword.keyword_id) AND
keyword.name IN (?, ?, ?))

When any() is used, it explicitly states that the exists() clause should correlate() to the recipe table, which allows it to work in scenarios where SQLAlchemy's usual "auto" correlation cannot make the right decision (we have tests which illustrate this). So to use any() in our "column-based" approach, we just need to turn off that correlation using correlate(None). Our "recipes/cookbooks which exist" query can be stated as:

(ret, ), = Session.query(or_(
    Recipe.keywords.any(Keyword.name.in_(keywords)).correlate(None),
    Cookbook.keywords.any(Keyword.name.in_(keywords)).correlate(None)
    ))

While the any() approach above is nice, the correlation part of it has me preferring the more explicit exists() version.