I Was Wrong about Python Dictionaries
Often enough, in Python, I find myself wanting to do something with a key’s value in a dictionary, but only if the key exists. I think that the most obvious way to write this is:
if key in some_dict:
do_something_with(some_dict[key])
However, I always thought looking up the key in the dictionary twice was wasteful. Instead I preferred to write the following code, which remains readable but does only a single dictionary look up:
value = some_dict.get(key)
if value is not None:
do_something_with(value)
Surely one dictionary look up is faster than two!
Unfortunately I think my assumption was wrong.
Here’s a test program:
import timeit
def one_look_up(d, k):
v = d.get(k)
if v is not None:
return v * 2
def two_look_ups(d, k):
if k in d:
v = d[k]
return v * 2
NUM_EXECUTIONS = 10000000
def run_dict_test(look_up_func):
print "===== %s =====" % (look_up_func.__name__,)
d = {1: 2}
print "when key is present:"
print timeit.repeat(lambda: look_up_func(d, 1), number=NUM_EXECUTIONS)
print "when key is not present:"
print timeit.repeat(lambda: look_up_func(d, 3), number=NUM_EXECUTIONS)
print
run_dict_test(one_look_up)
run_dict_test(two_look_ups)
return v * 2
is there just to show that we’re doing something with the value.1 This may be unnecessary but it seemed to me a more realistic test. two_look_ups
assigns the value from the dictionary to a local variable because this makes it more similar to one_look_up
, and also because it represents the real and very common case where you want to use the value more than once within the if
block.
First, here’s my Python version information:
Python 2.7.7 (default, Jun 2 2014, 01:41:14)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Next, the results:
===== one_look_up =====
when key is present:
[2.6594371795654297, 2.6614291667938232, 2.683712959289551]
when key is not present:
[2.5561981201171875, 2.453350067138672, 2.4554250240325928]
===== two_look_ups =====
when key is present:
[2.159899950027466, 2.1492769718170166, 2.135672092437744]
when key is not present:
[1.6867189407348633, 1.6827809810638428, 1.684412956237793]
When the key was present, dict.get
needed 24.3% more time to run than using in
and subscript. When the key was not present, dict.get
needed 45.8% more time. It turns out my micro-optimization wasn’t an optimization at all. I’m happy that I can now write more idiomatic Python without feeling “wasteful.”
After some searching and reading, I hypothesize that one_look_up
is more expensive primarily because it has to look up the get
method every time. I’ll test that hypothesis with this code:
import timeit
NUM_EXECUTIONS = 10000000
def no_look_up(get, k):
v = get(k)
if v is not None:
return v * 2
print "===== no_look_up ====="
d = {1: 2}
get = d.get
print "when key is present:"
print timeit.repeat(lambda: no_look_up(get, 1), number=NUM_EXECUTIONS)
print "when key is not present:"
print timeit.repeat(lambda: no_look_up(get, 3), number=NUM_EXECUTIONS)
The result:
===== no_look_up =====
when key is present:
[2.3476529121398926, 2.3353281021118164, 2.3154399394989014]
when key is not present:
[2.123357057571411, 2.1545472145080566, 2.144979953765869]
This implementation needs just 8.41% and 26.2% more time, respectively, than in
and subscript. Still slower but we did narrow the gap significantly.
There are other ways I could accomplish “do something with d[k]
if k
exists.” For example, I could experiment with using __contains__
and __getitem__
, or try
/except
. But those methods are mostly un-Pythonic, so I wouldn’t sacrifice readability by using those methods unless I really needed to squeeze out every last bit of performance from some code. (try
/except
actually might be idiomatic, but only if I was fairly certain that the exception case was going to be used very infrequently.)
- Separately I tried just
return v
, and added areturn None
to each function as well. The results came out nearly the same. [return]