Kent's Korner
by Kent S Johnson

2007-07-28 17:24:44

Sorting

Python has powerful sorting capabilities built-in. Lists can be sorted using their sort() method:

>>> data = [('Kent', 1), ('Tim', 2), ('Bill', 3), ('Alex', 4), ('Mark', 5), ('Bill', 6)]
>>> data.sort()
>>> data
[('Alex', 4), ('Bill', 3), ('Bill', 6), ('Kent', 1), ('Mark', 5), ('Tim', 2)]

list.sort() sorts the list according to the natural order of the elements of the list. In this case the elements are tuples and they are sorted according to the first elements, with the second element as a tie-breaker.

Note that list.sort() is a mutating method - the list is sorted in place - and it does not return a useful value (it returns None).

Sorting with a comparison function

What if you want to sort in a different order than the natural order of the list elements? There are several options. The one beginners seem to find is the cmp parameter to sort(). From the docs:

cmp specifies a custom comparison function of two arguments (list items) which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument.

This is fairly easy to understand and similar to what is available in other languages such as C and Java, but it is the wrong way to customize a sort in Python. The cmp function will be called many times by the sort, with each call fairly expensive as the function is written in Python. I won't say any more about cmp except to repeat - don't use it.

Decorate-Sort-Undecorate

Until Python 2.4, the best way to customize a sort is with an idiom called Decorate-Sort-Undecorate. Another name for the idiom is Schwartzian transform, named after Randal Schwartz who popularized the technique in the perl community.

DSU involves three steps. In the Decorate step, the elements of the original list are decorated with the portions of the element that are needed for the desired sort. For example, to sort the data list according to the second element of each tuple, a decorated list is created like this:

>>> deco = [ (item[1], item) for item in data]
>>> deco
[(4, ('Alex', 4)), (3, ('Bill', 3)), (6, ('Bill', 6)), (1, ('Kent', 1)), (5, ('Mark', 5)), (2, ('Tim', 2))]

If you want the decorated sort to be stable, or you want to avoid comparing the original data items, you can include the index of the item as a tie-breaker:

[ (item[1], i, item) for i, item in enumerate(data) ]

In the second step, the decorated list is sorted. This gives the data in the correct order:

>>> deco.sort()
>>> deco
[(1, ('Kent', 1)), (2, ('Tim', 2)), (3, ('Bill', 3)), (4, ('Alex', 4)), (5, ('Mark', 5)), (6, ('Bill', 6))]

Finally the decoration is removed - Undecorate - to give the original data in the desired order:

>>> data = [item[1] for item in deco ]
>>> data
[('Kent', 1), ('Tim', 2), ('Bill', 3), ('Alex', 4), ('Mark', 5), ('Bill', 6)]

Sorting with a key function

Python 2.4 added a streamlined method to accomplish a DSU. The sort() method now has a key parameter. From the docs:

key specifies a function of one argument that is used to extract a comparison key from each list element.

To sort a list by the second element of each list item, the key would be lambda x: x[1]. The sort is done with

data.sort(key=lambda x: x[1])

The operator module

Key functions like lambda x: x[1] or lambda x: x.a are common and useful enough that Python has a way to create high-performance versions. This is done with the functions operator.itemgetter() and operator.attrgetter(). These two functions accept an argument of an index or attribute name, and return a function that will access that index or attribute from its argument. For example,

>>> import operator
>>> k = operator.itemgetter(1)
>>> k( ('Kent', 1) )
1
>>> [k(item) for item in data ]
[1, 2, 3, 4, 5, 6]

This function can be used as a sort key:

>>> data.sort(key=k)

More commonly, itemgetter() is invoked directly in the call to sort():

>>> data.sort(key=operator.itemgetter(1))

operator.attrgetter() is used similarly to fetch object attributes.

Sorting on multiple keys

To sort a list of items with primary and secondary sort keys, provide a key function that returns a tuple of keys. In Python 2.5, both itemgetter() and attrgetter() accept multiple arguments and return functions that create the correct tuples. For earlier versions you have to write your own (simple) function.

Final notes

sort() is order-preserving - if two items compare as equal, their order will not change in a sort.

sort() is fast on a wide variety of data. It takes advantage of partial ordering when available while keeping good performance on random data. The current sort was written by Tim Peters and is called timsort. Extensive notes on the implementation can be found in the file Objects/listsort.txt in the Python source distribution.

A reverse parameter allows sorting in reverse order:

>>> data.sort(key=k, reverse=True)
>>> data
[('Bill', 6), ('Mark', 5), ('Alex', 4), ('Bill', 3), ('Tim', 2), ('Kent', 1)]

The standalone sorted() function sorts a sequence and returns a new sorted sequence. The original sequence is not changed. Because sorted() returns a value it is useful in an expression. For example to process a dictionary's keys and values in sorted order, use

for k, v in sorted(d.items()):

Since Python 2.5, the built-in functions max() and min() also take a key parameter similar to the one used by sort().

 
© Kent S Johnson Creative Commons License

Short, introductory essays about Python.

kentsjohnson.com

Kent's Korner home

Weblog

Other Essays