Reducing boilerplate with Python dataclasses

I used to think of dataclasses as something like a fancy namedtuple. To borrow an example from the Python docs, I might do this


from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float
    

rather than this


from collections import namedtuple

Point = namedtuple("Point", ["x", "y"])
    

But it turns out dataclasses are a lot more useful that. Rather than "fancy namedtuples", dataclasses are more like "fancy classes". They can

All at the same time! Here's a simplified Book written as a standard class.


class Book:
    def __init__(self, title, author, year=2022):
        self.title = title
        self.author = author
        self.year = year

    def __repr__(self):
        return f"Book(title={self.title}, author={self.author}, year={self.year})"

    def cite(self):
        return f"{self.author}, {self.title} ({self.year})"
    

That works. But there's a chunk of boilerplate to give the Book instance its data and get a nice string representation. Only the cite method has logic that's particularly specific to the Book class. (The attribute names reflect Book characteristics, like title, but nothing uniquely book-like happens in the __init__ or __repr___.)

Worse, this class can't be used in some basic ways. Python doesn't know how to compare Book instances, so doesn't recognise when one book is the same as another and can't sort a list of books.


book = Book("Histories", "Herodotus", -430)
identical_book = Book("Histories", "Herodotus", -430)
book == identical_book
# False

book2 = Book("The History of England", "Thomas Macauly", 1848)
sorted([book, book2])
# TypeError: '<' not supported between instances of 'Book' and 'Book'
      

The Book class has a __repr__ method, allowing the use of both str(book) and repr(book). But it doesn't have other dunder (double underscore) methods that support operators like == and <. In other words, a little more boilerplate is needed.


from functools import total_ordering

@total_ordering
class Book:
    # Methods hidden for brevity
    # __init__, __repr__, cite
  
    def __eq__(self, other):
        # If both objects are books, check if all values are equal
        # If other object is not a Book, let Python raise a TypeError
        if other.__class__ is self.__class__:
            return (self.title, self.author, self.year) == (other.title, other.author, other.year)
        return NotImplemented
    
    def __gt__(self, other):
        # If both objects are books, check if values of the book on the left are 
        # greater than values on the right
        if other.__class__ is self.__class__:
            return (self.title, self.author, self.year) > (other.title, other.author, other.year)
        return NotImplemented
    
      

Book instances can now be compared and sorted, thanks to the __eq__ (equality) and __gt__ (greater than) dunder methods and total_ordering decorator. The decorator isn't essential; it reduces repetition by letting the programmer define __eq__ and one other comparison dunder method (__gt__, in this case). Without total_ordering, it would be necessary to add the other comparison dunder methods: __gte__, __lt__ and __lte__.

A dataclass provides this behaviour with even less effort. The class and dataclass below, shown side by side, work identically. They have the same init behaviour, same string representation, even the same comparison and sort behaviour. But the dataclass takes care of the boilerplate.


from dataclasses import dataclass

@dataclass(order=True)
class Book:
    title: str
    author: str
    year: int = 2022

    def cite(self):
        return f"{self.author}, {self.title} ({self.year})"
    

from functools import total_ordering

@total_ordering
class Book:
    def __init__(self, title, author, year=2022):
        self.title = title
        self.author = author
        self.year = year

    def __repr__(self):
        return f"Book(title={self.title}, author={self.author}, year={self.year})"
    
    def __eq__(self, other):
        if other.__class__ is self.__class__:
            return (self.title, self.author, self.year) == (other.title, other.author, other.year)
        return NotImplemented

    def __gt__(self, other):
      if other.__class__ is self.__class__:
          return (self.title, self.author, self.year) > (other.title, other.author, other.year)
      return NotImplemented

    def cite(self):
        return f"{self.author}, {self.title} ({self.year})"
    

Dataclasses allow some simple customisation. In the example above, order=True is used to add comparison behaviour to the class. By default, dataclasses only provide __init__, __repr__ and __eq__. (In other words, that's what you get when using the dataclass decorator without any arguments.) Behaviours can be turned off by passing the relevant argument to the decorator, e.g. repr=False or, in some cases, defining the method youself, e.g. adding your own __repr__ to the class.


from dataclasses import dataclass

# Default settings (automatic __init__, __repr__, __eq__)
@dataclass
class Book:
    title: str
    author: str
    year: int = 2022


# Don't create __repr__ method
@dataclass(repr=False)
class Book:
    title: str
    author: str
    year: int = 2022


# Use custom __repr__ method
@dataclass
class Book:
    title: str
    author: str
    year: int = 2022

    def __repr__(self):
        return f"Book: {self.title}"
    

After a certain point, it's worth implementing the dunder methods manually rather than tweaking and overriding dataclass behaviour. For example, the preferred ordering of Books might not be title, then author, then year. Perhaps ordering should happen author first. Or perhaps the author shouldn't be included in the ordering at all. In cases like those, custom dunder methods are still needed.

Dataclasses do a lot, though, and they're a great way to add Pythonic behaviour to classes.

Useful things

For more info about dataclasses, try Eric V. Smith's proposal for adding them to Python, PEP 557 -- Data Classes. It's excellent and I wish I'd read it sooner.

Dunder methods (a.k.a. magic methods) can add a lot more Pythonic behaviour to custom classes, e.g. letting them behave like Python lists with indexing, slicing and looping. I like Rafe Kettler's Guide to Python's Magic Methods. A few things are specific to Python 2, but they're noted in the short appendix and aren't likely to be relevant when you're starting out. In my experience, it's easiest to start with "Comparison magic methods" and then move onto "Making custom sequences", which are things that behave like lists.