I Was Wrong about Python Dictionaries


Often enough, in Python, I find myself wanting to do something with a key’s value in a dictionary, but only if the key exists. I think that the most obvious way to write this is:

if key in some_dict:

However, I always thought looking up the key in the dictionary twice was wasteful. Instead I preferred to write the following code, which remains readable but does only a single dictionary look up:

value = some_dict.get(key)
if value is not None:

Surely one dictionary look up is faster than two!

Unfortunately I think my assumption was wrong.

Here’s a test program:

import timeit

def one_look_up(d, k):
    v = d.get(k)
    if v is not None:
        return v * 2

def two_look_ups(d, k):
    if k in d:
        v = d[k]
        return v * 2


def run_dict_test(look_up_func):
    print "===== %s =====" % (look_up_func.__name__,)
    d = {1: 2}
    print "when key is present:"
    print timeit.repeat(lambda: look_up_func(d, 1), number=NUM_EXECUTIONS)
    print "when key is not present:"
    print timeit.repeat(lambda: look_up_func(d, 3), number=NUM_EXECUTIONS)


return v * 2 is there just to show that we’re doing something with the value.1 This may be unnecessary but it seemed to me a more realistic test. two_look_ups assigns the value from the dictionary to a local variable because this makes it more similar to one_look_up, and also because it represents the real and very common case where you want to use the value more than once within the if block.

First, here’s my Python version information:

Python 2.7.7 (default, Jun  2 2014, 01:41:14)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin

Next, the results:

===== one_look_up =====
when key is present:
[2.6594371795654297, 2.6614291667938232, 2.683712959289551]
when key is not present:
[2.5561981201171875, 2.453350067138672, 2.4554250240325928]

===== two_look_ups =====
when key is present:
[2.159899950027466, 2.1492769718170166, 2.135672092437744]
when key is not present:
[1.6867189407348633, 1.6827809810638428, 1.684412956237793]

When the key was present, dict.get needed 24.3% more time to run than using in and subscript. When the key was not present, dict.get needed 45.8% more time. It turns out my micro-optimization wasn’t an optimization at all. I’m happy that I can now write more idiomatic Python without feeling “wasteful.”

After some searching and reading, I hypothesize that one_look_up is more expensive primarily because it has to look up the get method every time. I’ll test that hypothesis with this code:

import timeit


def no_look_up(get, k):
    v = get(k)
    if v is not None:
        return v * 2

print "===== no_look_up ====="
d = {1: 2}
get = d.get
print "when key is present:"
print timeit.repeat(lambda: no_look_up(get, 1), number=NUM_EXECUTIONS)
print "when key is not present:"
print timeit.repeat(lambda: no_look_up(get, 3), number=NUM_EXECUTIONS)

The result:

===== no_look_up =====
when key is present:
[2.3476529121398926, 2.3353281021118164, 2.3154399394989014]
when key is not present:
[2.123357057571411, 2.1545472145080566, 2.144979953765869]

This implementation needs just 8.41% and 26.2% more time, respectively, than in and subscript. Still slower but we did narrow the gap significantly.

There are other ways I could accomplish “do something with d[k] if k exists.” For example, I could experiment with using __contains__ and __getitem__, or try/except. But those methods are mostly un-Pythonic, so I wouldn’t sacrifice readability by using those methods unless I really needed to squeeze out every last bit of performance from some code. (try/except actually might be idiomatic, but only if I was fairly certain that the exception case was going to be used very infrequently.)

  1. Separately I tried just return v, and added a return None to each function as well. The results came out nearly the same. [return]