Dataclasses vs namedtuple: The evolution of python code generators.

May 17, 2020

Whether you want to create concise data structures that implement important functionality behind the scenes or you simply just want to save time writing code, code generators have their place in the development of everyday python code. Knowing whether or not to use them or which of them to use can prove to be important from both code and functionality perspectives.

In this article, we look at two generations of python code generators. The ‘classic’ python namedtuple and the more modern dataclasses.

Say we wanted to build the next Amazon and our software needs to keep track of orders placed by customers so they get delivered to the right people. We could use a simple tuple as the data structure to represent an Order which stores the name of the item ordered and the name and address of the recipient.

order = ('Face mask', 'Anastasia', '10, Miso Street')

For the most part, you could say this gets the job done. However, there are concerns firstly from a readability perspective; it’s not exactly the easiest to know what each item in the tuple represents just by looking at the code. Also, using tuples means we always need to remember the number of items (when declaring a new order) and the index of an item (when attempting to access its value) as in:

>>> order[1]
'Anastasia'

As you can imagine, this isn’t quite sustainable for many scenarios and there are better ways to handle this.

Enter the namedtuple.

A namedtuple makes it possible to declare and reference items in tuples by name. As the name implies, they are basically tuples which support name references for child items. Rewriting the above functionality using namedtuple, we have:

from collections import namedtuple

Order = namedtuple('Order', 'name recipient address')  # Declaration of the named tuple
order = Order(name='Face mask', recipient='Anastasia', address='10, Miso Street')
>>> order.recipient
'Anastasia'

Notice that we now reference object attributes by their name. This makes for a more intuitive and logical code representation in general.

Next, to really appreciate the namedtuple, let’s take a look at the code it generates when an order is declared.

class Order(tuple):
    'Order(name, recipient, address)'

    __slots__ = ()

    _fields = ('name', 'recipient', 'address')

    def __new__(_cls, name, recipient, address):
        'Create new instance of Order(name, recipient, address)'
        return _tuple.__new__(_cls, (name, recipient, address))

    @classmethod
    def _make(cls, iterable, new=tuple.__new__, len=len):
        'Make a new Order object from a sequence or iterable'
        result = new(cls, iterable)
        if len(result) != 3:
            raise TypeError('Expected 3 arguments, got %d' % len(result))
        return result

    def __repr__(self):
        'Return a nicely formatted representation string'
        return 'Order(name=%r, recipient=%r, address=%r)' % self

    def _asdict(self):
        'Return a new OrderedDict which maps field names to their values'
        return OrderedDict(zip(self._fields, self))

    def _replace(_self, **kwds):
        'Return a new Order object replacing specified fields with new values'
        result = _self._make(map(kwds.pop, ('name', 'recipient', 'address'), _self))
        if kwds:
            raise ValueError('Got unexpected field names: %r' % kwds.keys())
        return result

    def __getnewargs__(self):
        'Return self as a plain tuple.  Used by copy and pickle.'
        return tuple(self)

    __dict__ = _property(_asdict)

    def __getstate__(self):
        'Exclude the OrderedDict from pickling'
        pass

    name = _property(_itemgetter(0), doc='Alias for field number 0')

    recipient = _property(_itemgetter(1), doc='Alias for field number 1')

    address = _property(_itemgetter(2), doc='Alias for field number 2')

Yikes.

Generating these metadata and object functions help implement many behaviours ranging from how an Order object is displayed when printed to how sorting is handled. So it not only saves you the time of writing these but also helps implement functionality that most people would normally not think to.

PS: To view the generated class in your python interpreter, add verbose=True as an argument to your namedtuple declaration.

It’s all well and good thus far but as you may have already noticed, there are also drawbacks to using namedtuples.

  • Firstly, as previously mentioned, they are effectively tuples and tuples are immutable. This means that items cannot be changed once they have been initialised.

    >>> order.address = '11, New address'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: can't set attribute
    
  • There are limited ways to which you can customise a namedtuple .

Dataclasses: namedtuple on steriods

Introduced in python 3.7, dataclasses are mutable named tuples with defaults. Think of them as namedtuples but with high level of customisability including options for mutable or immutable types. In addition to classes, dataclasses typically make use of some interesting python concepts; decorators and type annotations.

Here’s the new look of our Order data structure using dataclasses.

from dataclasses import dataclass

@dataclass
class Order:
    name: str
    recipient: str
    address: str
    shipped: bool = False

The @dataclass decorator lets python know that we are declaring a dataclass and generates the needed methods and attributes like in the namedtuple. Object declarations and referencing of attributes are also handled in a similar way. The newly added shipped attribute shows that we can also have default values for attributes.

Similar typed class-based syntax is also available in namedtuples via the typing.NamedTuple class.

Now with the mutability of dataclasses, we can make changes to our object attributes.

>>> order = Order(name='Face mask', recipient='Anastasia', address='10, Miso Street')
>>> order
Order(name='Face mask', recipient='Anastasia', address='10, Miso Street', shipped=False)
>>> order.address = '11, New address'
>>> order
Order(name='Face mask', recipient='Anastasia', address='11, New address', shipped=False)

Dataclasses come built-in with provisions to make it possible to customise class behaviour in several ways including:

  • Selecting what fields are comparable, hashable or displayed by repr,
  • Making objects ordered, sortable or hashable
  • Inserting metadata for field attributes and objects

To illustrate some of this, we have the following:

from dataclasses import dataclass, field

@dataclass(order=True, frozen=True)
class Order:
    name: str
    recipient: str
    address: str = field(compare=False, repr=False, metadata={'address_type': 'shipping'})
    shipped: bool = False
    amount: float = 0
    
    def tax_due(self) -> float:
        return self.amount * 5 / 100

Let’s take a look at the new additions one by one

  • The new field function allows for customisations at the field level. At its most basic usage, no function argument is needed to be given.
  • The order=True and frozen=True dataclass arguments enable sorting and mutability respectively for cases where you do need nametuple-like immutable objects
  • The compare , repr and metadata supply field-specific values to the address field making it [1] to not be used when comparing objects , [2] to not be displayed when printed and [3] to include an extra metadata field ‘address_type’ letting us know what kind of address field it is.
  • The tax_due function illustrates a good use of function attributes which calculates tax on an order based on its purchase amount.

This ability to quite easily plug in (and pull out) class and field properties to suit specific scenarios is the power of dataclasses in a nutshell.

Even though already a major enabler of libraries like Object-relational mappers (ORMs), code generators can also be applied in much more common everyday use cases we face when developing python solutions. It is however important to discern scenarios where they may be overkill or simply not suited for a specific use case. For the most part, as demonstrated in this article, they are a good option to have when looking to write more concise code and less ‘boilerplate’ code while focusing on the business logic of your application.

comments powered by Disqus