2008-08-21 20:16:52
collections.defaultdict
defaultdict is a little-known gem tucked away in the collections module of the standard library.
A defaultdict is a dict with a default value - a value that will be returned for missing keys. When you create a defaultdict, you provide a factory function. This is a function with no arguments that is called to create a default value as needed.
For example, the int() function, called with no arguments, returns the integer 0. It can be used as the factory for a dict with default values of 0:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> d[0]
0
>>> d[1] = 3
>>> d
defaultdict(<type 'int'>, {0: 0, 1: 3})
Notice that there are no parentheses () after int. You are passing the function object to defaultdict(). If you write defaultdict(int()) you will pass the number 0 to defaultdict() which raises TypeError.
defaultdict is useful whenever you want a dictionary that accumulates values. For example, one way to count words in a file is this:
words = open('words.txt').read().split()
counts = dict()
for word in words:
if word in counts:
counts[word] += 1
else:
counts[word] = 1
With defaultdict, this program becomes
from collections import defaultdict
words = open('words.txt').read().split()
counts = defaultdict(int)
for word in words:
counts[word] += 1
By using defaultdict, there is no need to check whether the word is in the counts dictionary or not.
Another common requirement is to map a key to a list of values. Using list as the value factory, this is simple. Suppose items is a list of key, value pairs where keys may repeat. To create a dict mapping each unique key to a list of corresponding values, use defaultdict(list):
d = defaultdict(list)
for key, value in items:
d[key].append(item)
Again, there is no need to check for the existence of the key. This could also be written using dict.setdefault() but the defaultdict version is much easier to understand. Compare:
d = dict()
for key, value in items:
d.setdefault(key, []).append(item)
Here is one way to create a defaultdict whose values are also defaultdicts, nested arbitrarily deep:
class recursivedefaultdict(defaultdict):
def __init__(self):
self.default_factory = type(self)
For a few alternatives see these threads.
|