In the most general sense an enumeration is an exact listing of all the
elements of a set. In software design, enums are typically sets of fixed
string values that define some kind of discriminating value within an
application. In contrast to a generic "dropdown" list, such as a selection of
timezones, country names, or years in a date picker, the enum usually refers
to values that are also explicit within the application's source code, such as
"debit" or "credit" in an accounting application, "draft" or "publish" in a
CMS, "everyone", "friends of friends", or "friends only" in your typical
social media sell-your-details-to-the-highest-bidder system. Differing values
have a direct impact on business logic. Adding new values to the list usually
corresponds with the addition of some new logic in the application to
accommodate its meaning.
The requirements for an application-level enumeration are usually:
- Can represent a single value within application logic with no chance of specifying
a non-existent value (i.e., we don't want to hardcode strings or numbers).
- Can associate each value with a textual description suitable for a user interface.
- Can get the list of all possible values, usually for user interface display.
- Can efficiently associate the discriminatory value with many database records.
Representing an enumerated value in a relational database often goes like this:
CREATE TABLE employee_type (
id INTEGER PRIMARY KEY,
description VARCHAR(30) NOT NULL
);
CREATE TABLE employee (
id INTEGER PRIMARY KEY,
name VARCHAR(60) NOT NULL,
type INTEGER REFERENCES employee_type(id)
);
INSERT INTO employee_type (id, description) VALUES
(1, 'Part Time'),
(2, 'Full Time'),
(3, 'Contractor');
Above we use the example of a database of employees and their status. Advantages
of the above include:
- The choice of "employee type" is constrained.
- The textual descriptions of employees are associated with the constrained value.
- New employee types can be added just by adding a new row.
- Queries can be written directly against the data that produce textual displays
of discriminator values, without leaving the database console.
But as we all know this approach also has disadvantages:
- It's difficult to avoid hardcoding integer IDs in our application. Adding a character
based "code" field to the employee_type table, even making the character
field the primary key, can ameliorate this, but this is not information that would
otherwise be needed in the database. Our DBAs also got grumpy when we proposed
a character-based primary key.
- To display choices in dropdowns, as well as to display the textual description of
the value associated with a particular piece of data, we need to query the
database for the text - either by loading them into an in-memory lookup
ahead of time, or by joining to the lookup table when we query the base
table. This adds noise and boilerplate to the application.
- For each new data-driven enumerative type used by the application, we need to add
a new table, and populate.
- When the descriptive names change, we have to update the database, tying database
migration work to what would normally be a user-interface-only update.
- Whatever framework we build around these lookup tables, doesn't really work
for enumerations that don't otherwise need to be persisted.
- If we moved to a non-relational database, we'd probably do this completely
differently.
Basically, this approach is tedious and puts information about the enum further
away from the application code than we'd prefer.
An alternative to the lookup table is to use a database supplied enumeration. Both MySQL
and Postgresql (as of 8.3) offer an ENUM type for this purpose. It's
fairly straightforward to create an approximation of an ENUM datatype in most databases by using
a CHAR column in conjunction with a CHECK constraint, that tests incoming rows to be
within one of a set of possible values.
SQLAlchemy provides an Enum type which abstracts this technique:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, Enum
Base = declarative_base()
class Employee(Base):
__tablename__ = 'employee'
id = Column(Integer, primary_key=True)
name = Column(String(60), nullable=False)
type = Column(Enum('part_time', 'full_time', 'contractor', name='employee_types'))
On backends that support ENUM, a metadata.create_all()
emits the appropriate DDL to generate the type. The 'name' field of the Enum is used as the
name of the type created in PG:
CREATE TYPE employee_types AS ENUM ('part_time','full_time','contractor')
CREATE TABLE employee (
id SERIAL NOT NULL,
name VARCHAR(60) NOT NULL,
type employee_types,
PRIMARY KEY (id)
)
On those that don't, it emits a VARCHAR datatype and additionally
emits DDL to generate an appropriate CHECK constraint. Here, the 'name' field
is used as the name of the constraint:
CREATE TABLE employee (
id INTEGER NOT NULL,
name VARCHAR(60) NOT NULL,
type VARCHAR(10),
PRIMARY KEY (id),
CONSTRAINT employee_types CHECK (type IN ('part_time', 'full_time', 'contractor'))
)
In the case of PG's native ENUM, we're using the same space as a regular
integer (four bytes on PG). In the case of CHAR/VARCHAR, keeping the size of
the symbols down to one or two characters should keep the size under four bytes
(database-specific overhead and encoding concerns may vary results).
To combine the ENUM database type with the other requirements of source-code level identification
and descriptive naming, we'll encapsulate the whole thing into a base class
that can be used to generate all kinds of enums:
class EnumSymbol(object):
"""Define a fixed symbol tied to a parent class."""
def __init__(self, cls_, name, value, description):
self.cls_ = cls_
self.name = name
self.value = value
self.description = description
def __reduce__(self):
"""Allow unpickling to return the symbol
linked to the DeclEnum class."""
return getattr, (self.cls_, self.name)
def __iter__(self):
return iter([self.value, self.description])
def __repr__(self):
return "<%s>" % self.name
class EnumMeta(type):
"""Generate new DeclEnum classes."""
def __init__(cls, classname, bases, dict_):
cls._reg = reg = cls._reg.copy()
for k, v in dict_.items():
if isinstance(v, tuple):
sym = reg[v[0]] = EnumSymbol(cls, k, *v)
setattr(cls, k, sym)
return type.__init__(cls, classname, bases, dict_)
def __iter__(cls):
return iter(cls._reg.values())
class DeclEnum(object):
"""Declarative enumeration."""
__metaclass__ = EnumMeta
_reg = {}
@classmethod
def from_string(cls, value):
try:
return cls._reg[value]
except KeyError:
raise ValueError(
"Invalid value for %r: %r" %
(cls.__name__, value)
)
@classmethod
def values(cls):
return cls._reg.keys()
Where above, DeclEnum is the public interface. There's a bit of fancy pants
stuff in there, but here's what it looks like in usage.
We build an EmployeeType class, as a subclass of DeclEnum, that has all the
things we want at once, with zero of anything else:
class EmployeeType(DeclEnum):
part_time = "part_time", "Part Time"
full_time = "full_time", "Full Time"
contractor = "contractor", "Contractor"
If we're trying to save space on a non-ENUM platform, we might use single
character values:
class EmployeeType(DeclEnum):
part_time = "P", "Part Time"
full_time = "F", "Full Time"
contractor = "C", "Contractor"
Our application references individual values using the class level symbols:
employee = Employee(name, EmployeeType.part_time)
# ...
if employee.type is EmployeeType.part_time:
# do something with part time employee
These symbols are global constants, hashable, and even pickleable,
thanks to the special __reduce__ above.
To get at value/description pairs for a dropdown, we can iterate the class as well
as the symbols themselves to get 2-tuples:
>>> for key, description in EmployeeType:
... print key, description
P Part Time
F Full Time
C Contractor
To convert from a string value, as passed to us in a web request, to an EmployeeType symbol,
we use from_string():
type = EmployeeType.from_string('P')
The textual description is always available directly from the symbol itself:
print EmployeeType.contractor.description
So we have application level constants, textual descriptions, and iteration.
The last step is persistence. We'll use SQLAlchemy's TypeDecorator to
augment the Enum() type such that it can read and write our custom values:
from sqlalchemy.types import SchemaType, TypeDecorator, Enum
import re
class DeclEnumType(SchemaType, TypeDecorator):
def __init__(self, enum):
self.enum = enum
self.impl = Enum(
*enum.values(),
name="ck%s" % re.sub(
'([A-Z])',
lambda m:"_" + m.group(1).lower(),
enum.__name__)
)
def _set_table(self, table, column):
self.impl._set_table(table, column)
def copy(self):
return DeclEnumType(self.enum)
def process_bind_param(self, value, dialect):
if value is None:
return None
return value.value
def process_result_value(self, value, dialect):
if value is None:
return None
return self.enum.from_string(value.strip())
The idea of TypeDecorator, for those who haven't worked with it, is to provide
a wrapper around a plain database type to provide additional marshaling behavior
above what we need just to get consistency from the DBAPI. The impl datamember
refers to the type being wrapped. In this case, DeclEnumType generates
a new Enum object using information from a given DeclEnum subclass. The name
of the enum is derived from the name of our class, using the world's
shortest camel-case-to-underscore converter.
The addition of SchemaType as well as the _set_table() method
represent a little bit of inside knowledge about the sqlalchemy.types
module. TypeDecorator currently does not automatically figure out from its
impl that it needs to export additional functionality related to the
generation of the CHECK constraint and/or the CREATE TYPE. SQLAlchemy will try
to improve upon this at some point.
We can nicely wrap the creation of DeclEnumType into our DeclEnum via a new class
method:
class DeclEnum(object):
"""Declarative enumeration."""
# ...
@classmethod
def db_type(cls):
return DeclEnumType(cls)
So the full declaration and usage of our type looks like:
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
class EmployeeType(DeclEnum):
part_time = "P", "Part Time"
full_time = "F", "Full Time"
contractor = "C", "Contractor"
Base = declarative_base()
class Employee(Base):
__tablename__ = 'employee'
id = Column(Integer, primary_key=True)
name = Column(String(60), nullable=False)
type = Column(EmployeeType.db_type())
Our Employee class will persist its 'type' field into a new ENUM on the
database side, and on the Python side we use exclusively EmployeeType.part_time,
EmployeeType.full_time, EmployeeType.contractor as values for the
'type' attribute.
The enum is also ideal for so-called polymorphic-discriminators, where different
values indicate the usage of different subclasses of Employee:
class Employee(Base):
__tablename__ = 'employee'
id = Column(Integer, primary_key=True)
name = Column(String(60), nullable=False)
type = Column(EmployeeType.db_type())
__mapper_args__ = {'polymorphic_on':type}
class PartTimeEmployee(Employee):
__mapper_args__ = {'polymorphic_identity':EmployeeType.part_time}
TypeDecorator also takes care of coercing Python values used in expressions into the
appropriate SQLAlchemy type, so that the constants are usable in queries:
session.query(Employee).filter_by(type=EmployeeType.contractor).all()
A runnable demo of the enumeration recipe is packed up at decl_enum.py