More about Pickling Time Zones in Python
In my prior post on using dateutil for time zone information in Python, I half-jokingly posted a bit at the end about the size of pickled pytz tzinfo
instances compared to pickled dateutil tzinfo
instance. Paul Ganssle later emailed me to speculate that pytz’s smaller size is due to the fact that pytz is storing the key (name) of the time zone when you pickle one of its instances, whereas a pickled dateutil instance has all of the information about its time zone.
I’m not one to ignore the generous gift of someone’s time and knowledge, so let me dig into that a bit further. (Though Paul emailed me about four months ago, so I guess I am one to postpone acknowledgment of said gift.)
The TL;DR here is that Paul is basically right, and understanding this difference could be useful if you find yourself pickling/unpickling tzinfo
instances from either library.
A Side Trip into Python Pickling
Let’s first look at what happens when we pickle a simple object in Python 2.7.15:
>>> class A(object):
... def __init__(self, x):
... self.x = x
...
>>> an_instance = A(42)
>>> import pickle, pickletools
>>> pickletools.dis(pickle.dumps(an_instance))
0: c GLOBAL 'copy_reg _reconstructor'
25: p PUT 0
28: ( MARK
29: c GLOBAL '__main__ A'
41: p PUT 1
44: c GLOBAL '__builtin__ object'
64: p PUT 2
67: N NONE
68: t TUPLE (MARK at 28)
69: p PUT 3
72: R REDUCE
73: p PUT 4
76: ( MARK
77: d DICT (MARK at 76)
78: p PUT 5
81: S STRING 'x'
86: p PUT 6
89: I INT 42
93: s SETITEM
94: b BUILD
95: . STOP
highest protocol among opcodes = 0
I read this as calling copy_reg._reconstructor
, passing the class A
, its base class object
, and {'x': 42}
as the state
of the new object. In other words:
copy_reg._reconstructor(A, object, {'x': 42})
Pickling UTC
Now let’s see what we get when we pickle dateutil’s UTC instance:
>>> import dateutil.tz
>>> pickletools.dis(pickle.dumps(dateutil.tz.UTC))
0: c GLOBAL 'copy_reg _reconstructor'
25: p PUT 0
28: ( MARK
29: c GLOBAL 'dateutil.tz.tz tzutc'
51: p PUT 1
54: c GLOBAL 'datetime tzinfo'
71: p PUT 2
74: g GET 2
77: ( MARK
78: t TUPLE (MARK at 77)
79: R REDUCE
80: p PUT 3
83: t TUPLE (MARK at 28)
84: p PUT 4
87: R REDUCE
88: p PUT 5
91: . STOP
highest protocol among opcodes = 0
Again, if I’m reading this right, this is equivalent to:
copy_reg._reconstructor(dateutil.tz.tz.tzutc, datetime.tzinfo, ())
We can test that:
>>> import copy_reg, datetime, dateutil.tz.tz
>>> copy_reg._reconstructor(dateutil.tz.tz.tzutc, datetime.tzinfo, ())
tzutc()
Looks right to me.
Now, what do you get when you pickle pytz’s UTC?
>>> import pytz
>>> pickletools.dis(pickle.dumps(pytz.UTC))
0: c GLOBAL 'pytz _UTC'
11: p PUT 0
14: ( MARK
15: t TUPLE (MARK at 14)
16: R REDUCE
17: p PUT 1
20: . STOP
highest protocol among opcodes = 0
I believe this says to just call pytz._UTC()
:
>>> pytz._UTC()
<UTC>
>>> pytz._UTC() is pytz.UTC
True
As Paul pointed out to me in his email, pytz has some custom pickling code, which is why we don’t see copy_reg._reconstructor
here.
Pickling Non-UTC Time Zones
As in my original post, the difference is more dramatic for non-UTC time zones. Whereas dateutil used the default Python pickling behavior for its UTC
object, dateutil does customize pickling for tzinfo
objects it reads from the tz database. Let’s try pickling US central time1 from dateutil:
>>> dateutil_central = dateutil.tz.gettz("America/Chicago")
>>> pickletools.dis(pickle.dumps(dateutil_central))
0: c GLOBAL 'dateutil.tz.tz tzfile'
23: p PUT 0
26: ( MARK
27: N NONE
28: S STRING '/usr/share/zoneinfo/America/Chicago'
67: p PUT 1
70: t TUPLE (MARK at 26)
71: p PUT 2
74: R REDUCE
75: p PUT 3
78: ( MARK
79: d DICT (MARK at 78)
80: p PUT 4
83: S STRING '_trans_list'
98: p PUT 5
101: ( MARK
102: I INT -1633276800
115: I INT -1615158000
128: I INT -1601848800
[...copious output omitted...]
7842: S STRING '_ttinfo_std'
7857: p PUT 76
7861: g GET 30
7865: s SETITEM
7866: b BUILD
7867: . STOP
highest protocol among opcodes = 0
>>> len(pickle.dumps(dateutil_central))
7868
7,868 bytes! Looking at the output, I believe this confirms my guess from my earlier post, that a pickled dateutil instance has all the time zone information it needs. I’ll confirm that in a moment, but first let’s check out pytz’s pickle size for the same time zone:
>>> pytz_central = pytz.timezone('America/Chicago')
>>> pickletools.dis(pickle.dumps(pytz_central))
0: c GLOBAL 'pytz _p'
9: p PUT 0
12: ( MARK
13: S STRING 'America/Chicago'
32: p PUT 1
35: I INT -21060
43: I INT 0
46: S STRING 'LMT'
53: p PUT 2
56: t TUPLE (MARK at 12)
57: p PUT 3
60: R REDUCE
61: p PUT 4
64: . STOP
highest protocol among opcodes = 0
>>> len(pickle.dumps(pytz_central))
65
This is a much smaller pickle, and as I said before, Paul hinted me as to why: pytz just pickles the key (time zone name) along with a bit of other information. When you unpickle it, it loads the data for that time zone from the tz database.
Pickled dateutil
Instances Are Usable Without the Tz Database
I believe that unpickling and using a dateutil.tz.tz.tzfile
instance will not need to reference the tz database. We can actually test this by denying dateutil access to the tz database on my system (/usr/share/zoneinfo
), then unpickling and using one of dateutil’s tzinfo
instances.2
First, pickle one of its instances out to a file:
>>> pickle.dump(dateutil_central, open('/tmp/central', 'wb'))
Next I will run this script as root (necessary because it uses chroot
):
import os, os.path, pickle, datetime, dateutil.tz
# A few unpickling bits are not loaded until you use them, so use them
# before chroot.
pickle.loads(pickle.dumps(('foo', u'bar')))
os.chroot('/tmp')
# Confirm that we can't load a time zone by name because we the tz
# database files are no longer accessible.
assert not os.path.exists('/usr/share/zoneinfo')
print "/usr/share/zoneinfo does not exist"
try:
central = dateutil.tz.gettz('America/Chicago')
except ImportError, ex:
print "dateutil gave expected ImportError: %s" % (ex,)
else:
raise Exception(
'should not have been able to load time zone, got: %r' % (central,))
# Unpickle and use our pickled dateutil tzinfo! Works even when the
# tz database can't be accessed.
central = pickle.load(open('/central', 'rb'))
print "I unpickled this tzinfo: ", central
print "naively, now is: ", datetime.datetime.now()
utc_now = datetime.datetime.utcnow().replace(tzinfo=dateutil.tz.UTC)
print "now in UTC is: ", utc_now
print "converted to central time:", utc_now.astimezone(central)
(If I wasn’t so lazy I would have made this chroot somewhere other than /tmp
, which is most likely quite insecure. Don’t do this at home!)
The result:
$ sudo python chrooted_demo.py
/usr/share/zoneinfo does not exist
dateutil gave expected ImportError: No module named zoneinfo
I unpickled this tzinfo: tzfile('/usr/share/zoneinfo/America/Chicago')
naively, now is: 2018-12-26 13:21:37.416167
now in UTC is: 2018-12-26 19:21:37.416184+00:00
converted to central time: 2018-12-26 13:21:37.416184-06:00
This confirms that an unpickled dateutil tzinfo
object has all the information it needs, and thus computations using unpickled instances will yield the same results no matter what version of the tz database is available at the time, or even if no tz database is available at all.
Conclusion
Perhaps this is a trade-off: when you unpickle and use a dateutil tzinfo
instance, you’re going to get the same results as you would have when using the original instance.3 In contrast, unpickling a pytz tzinfo
instance will load the information for that time zone from whatever tz database is included with whatever version of pytz you’re using at the time.
So if you want to make sure you get the same results when using a tzinfo
object no matter when or where you unpickle it, dateutil seems superior. If you instead want to always be using the most recent version of the tz database when you unpickle a tzinfo
, and/or if you want smaller pickles, pytz seems superior.
This is probably not something the vast majority of people need to worry themselves with, but I found it interesting to dig into these libraries a bit further, and also to play with the pickletools
module and understand more about Python pickling.
- Yes, I did check that the Chicago Manual of Style, seventeenth edition, tells me not to capitalize time zone names. [return]
- I’m on macOS today, and I kind of hate fs_usage. If I was instead using Linux today, I would have included strace in my testing so that I could more definitely confirm that no tz database files were being accessed. In particular, it seems plausible that, after unpickling a
tzinfo
instance, dateutil could prefer to use the information from the system’s tz database instead of what it unpickled if the tz database is available—but I think this is very unlikely based on what Paul has told me, and I haven’t seen a hint of it in dateutil’s code, either. [return] - When I say, “you’re going to get the same results”, I’m assuming you can unpickle it in the first place, of course, and that there have been no big changes in the underlying dateutil library. [return]