Kent's Korner
by Kent S Johnson

2008-08-21 20:16:52

collections.defaultdict

defaultdict is a little-known gem tucked away in the collections module of the standard library.

A defaultdict is a dict with a default value - a value that will be returned for missing keys. When you create a defaultdict, you provide a factory function. This is a function with no arguments that is called to create a default value as needed.

For example, the int() function, called with no arguments, returns the integer 0. It can be used as the factory for a dict with default values of 0:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> d[0]
0
>>> d[1] = 3
>>> d
defaultdict(<type 'int'>, {0: 0, 1: 3})

Notice that there are no parentheses () after int. You are passing the function object to defaultdict(). If you write defaultdict(int()) you will pass the number 0 to defaultdict() which raises TypeError.

defaultdict is useful whenever you want a dictionary that accumulates values. For example, one way to count words in a file is this:

words = open('words.txt').read().split()
counts = dict()
for word in words:
  if word in counts:
    counts[word] += 1
  else:
    counts[word] = 1

With defaultdict, this program becomes

from collections import defaultdict
words = open('words.txt').read().split()
counts = defaultdict(int)
for word in words:
  counts[word] += 1

By using defaultdict, there is no need to check whether the word is in the counts dictionary or not.

Another common requirement is to map a key to a list of values. Using list as the value factory, this is simple. Suppose items is a list of key, value pairs where keys may repeat. To create a dict mapping each unique key to a list of corresponding values, use defaultdict(list):

d = defaultdict(list)
for key, value in items:
  d[key].append(item)

Again, there is no need to check for the existence of the key. This could also be written using dict.setdefault() but the defaultdict version is much easier to understand. Compare:

d = dict()
for key, value in items:
  d.setdefault(key, []).append(item)

Recursive defaultdict

Here is one way to create a defaultdict whose values are also defaultdicts, nested arbitrarily deep:

class recursivedefaultdict(defaultdict):
  def __init__(self):
    self.default_factory = type(self)

For a few alternatives see these threads.

 
© Kent S Johnson Creative Commons License

Short, introductory essays about Python.

kentsjohnson.com

Kent's Korner home

Weblog

Other Essays