The Case for pytz over dateutil
When it comes to dealing with dates and times in Python, my first choice is the standard library’s datetime
module. However, while datetime
supports time zones in general, it has no knowledge of any specific time zones. It lacks a time zone database, information about all the world’s time zones past and present. PEP 431 would add such a database to the standard library, but it seems to be stalled.
I am aware of two libraries you can use when you need time zone information in Python, pytz
and dateutil
. pytz
seems to be the most popular time zone library by far. However, dateutil
has the same time zone information and a lot of other useful date and time functionality besides. Is there any reason to use pytz
when dateutil
seems to be a superset of pytz
?
As it turns out, yes, there is at least one good reason to prefer pytz
.
Whenever Possible, Use UTC
Before I divulge that reason, I should tell you that I am a firm believer that the software you write should internally use only UTC. Convert all times to UTC on input, and convert to local times only when producing output for the user. To do otherwise is to invite errors.
I mention this because pytz
and dateutil
have some differences that are only relevant if you, for example, perform arithmetic on non-UTC times. I’m not addressing those differences here because I will (hopefully) never need to care about them. I avoid them by simply using UTC whenever possible.
Lest you think this advice regarding UTC is unsubstantiated, here’s an example: The United States’ Eastern time zone entered daylight saving time1 on April 7, 2002 at 7:00 a.m. UTC. Local clocks went from 1:00 a.m. EST to 3:00 a.m. EDT. In this example I’ll do some arithmetic on a local time.
>>> from datetime import datetime, timedelta
>>> import pytz
>>> fmt = '%Y-%m-%d %H:%M:%S %Z (%z)'
>>> pytz_eastern = pytz.timezone("America/New_York")
>>> utc_ny_dst_start = datetime(2002, 4, 7, 7, tzinfo=pytz.utc)
>>> local_ny_dst_start = utc_ny_dst_start.astimezone(pytz_eastern)
>>> local_ny_dst_start.strftime(fmt)
'2002-04-07 03:00:00 EDT (-0400)'
>>> (local_ny_dst_start - timedelta(minutes=1)).strftime(fmt)
'2002-04-07 02:59:00 EDT (-0400)'
Oops! 2:59 a.m. never happened in the Eastern time zone on April 7, 2002. I used pytz
here, but if I had used time zones from dateutil
instead I would get exactly the same results. (pytz
’s normalize
method can actually fix this problem—but just use UTC whenever possible, OK?)
I was most recently reminded that I should use UTC internally by Taavi Burns’ excellent PyCon 2012 presentation, What You Need to Know about datetimes
, which itself quotes this advice from Armin Ronacher’s “Dealing with Timezones in Python.”
The Case for pytz
My prior example demonstrated a time that “never occurred,” 2:00 a.m. on April 7, 2002 in the US Eastern time zone. When daylight saving time ends, though, you have a different problem: times that “happen twice.” Later on in 2002, daylight saving time ended in the Eastern time zone on October 27 at 6:00 a.m. UTC. The local clocks went from 1:59:59 a.m. EDT to 1:00:00 a.m. EST, and all times between 1:00:00 a.m. and 1:59:59 a.m. “happened twice”: once in Eastern Daylight Time and then again in Eastern Standard Time. Therefore a time such as “1:30 a.m. on October 27, 2002” is ambiguous. Did I mean 1:30 a.m. EDT or 1:30 a.m. EST?
datetime
’s API has problems with these ambiguous times. To demonstrate this problem, let’s say you read October 27, 2002 6:00 a.m. UTC from your database, and now you want to display this date to the user in his or her local time zone, which is Eastern time.
>>> from dateutil import tz
>>> datu_eastern = tz.gettz("America/New_York")
>>> utc_1am_est = datetime(2002, 10, 27, 6, tzinfo=tz.tzutc())
>>> utc_1am_est.astimezone(datu_eastern).strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'
Great, daylight saving time ended at 6:00 a.m. UTC and the output is 1:00 a.m. EST as expected. Now, what if you did the same thing with the hour before that, 5:00 a.m. UTC? That should still be in EDT.
>>> utc_1am_edt = datetime(2002, 10, 27, 5, tzinfo=tz.tzutc())
>>> utc_1am_edt.astimezone(datu_eastern).strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'
Oops! 5:00 a.m. UTC on October 27, 2002 was 1:00 a.m. EDT, not EST. Both the time zone abbreviation and the time zone offset are wrong.
This problem can be traced back to datetime
’s API, which documents this problem:
[T]he
tzinfo.dst()
method must consider times in the “repeated hour” to be in standard time. […] Applications that can’t bear such ambiguities should avoid using hybridtzinfo
subclasses; there are no ambiguities when using UTC, or any other fixed-offsettzinfo
subclass (such as a class representing only EST (fixed offset -5 hours), or only EDT (fixed offset -4 hours)).
pytz
, however, gets this right:
>>> utc_1am_est.astimezone(pytz_eastern).strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'
>>> utc_1am_edt.astimezone(pytz_eastern).strftime(fmt)
'2002-10-27 01:00:00 EDT (-0400)'
This difference in correctness is why I think pytz
should be preferred to dateutil
when you need to work with time zones. It may seem unlikely that your application will ever hit the problem I’ve demonstrated here, but in many cases I bet you can imagine how it is possible, and that’s enough for me. I prefer to err on the side of correctness.
In case you’re wondering how pytz
works here while dateutil
does not, it seems that pytz
actually uses two separate tzinfo
instances for the same time zone, one for standard time and another for daylight saving time:
>>> utc_12am_edt = datetime(2002, 10, 27, 4, tzinfo=pytz.utc)
>>> one_hour = timedelta(hours=1)
>>> local_12am_edt = utc_12am_edt.astimezone(pytz_eastern)
>>> local_1am_edt = (utc_12am_edt + one_hour).astimezone(pytz_eastern)
>>> local_1am_est = (utc_12am_edt + one_hour*2).astimezone(pytz_eastern)
>>> local_2am_est = (utc_12am_edt + one_hour*3).astimezone(pytz_eastern)
>>> local_12am_edt.tzinfo is local_1am_edt.tzinfo
True
>>> local_1am_edt.tzinfo is local_1am_est.tzinfo
False
>>> local_1am_est.tzinfo is local_2am_est.tzinfo
True
Even though I passed in the same pytz_eastern
to every astimezone
call, the tzinfo
attached to each datetime
instance is different depending on whether or not the datetime
should be in standard time or daylight saving time.2
Note that pytz
’s documentation actually says that you should call its time zone instances’ normalize
methods on the result of astimezone
when converting to a non-UTC time zone, but I have my doubts whether this is necessary. I have not yet found any circumstances where normalize
was necessary when merely converting from one time zone to another via astimezone
.
Constructing datetimes
in a Repeated Hour
If you ever need to construct a local time directly, perhaps as a result of parsing a string, it’s nigh impossible to get the DST version of the hour where daylight saving time ends with dateutil
.
>>> datetime(2002, 10, 27, 1, tzinfo=datu_eastern).strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'
That’s it. EST is your only option. There’s no way to tell the datetime
library which 1 a.m. you meant, EDT or EST. As stated in the passage quoted from the datetime
documentation, above, you will always get standard time.
pytz
, on the other hand, gives you a way out of this problem using the localize
methods of its tzinfo
instances:
>>> naive_1am = datetime(2002, 10, 27, 1)
>>> pytz_eastern.localize(naive_1am, is_dst=False).strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'
>>> pytz_eastern.localize(naive_1am, is_dst=True).strftime(fmt)
'2002-10-27 01:00:00 EDT (-0400)'
>>> pytz_eastern.localize(naive_1am, is_dst=None).strftime(fmt)
Traceback (most recent call last):
File "<ipython-input-12-6e97a68309e9>", line 1, in <module>
pytz_eastern.localize(naive_1am, is_dst=None).strftime(fmt)
File ".../lib/python2.7/site-packages/pytz/tzinfo.py", line 349, in localize
raise AmbiguousTimeError(dt)
AmbiguousTimeError: 2002-10-27 01:00:00
In that last case, is_dst=None
means, “I don’t know if this is supposed to be daylight saving time or not, so raise an error if it’s ambiguous.”
Any Reason to Use dateutil
?
Based on the preceding argument, I feel pretty strongly that I should use pytz
instead of dateutil
for my time zone needs. That said, I have found an argument in dateutil
’s favor: support for Windows’ built-in time zone data.
Both pytz
and dateutil
use the well-known “tz database”. Both Python libraries include a copy of this database, but on *nix systems both libraries will prefer to use your system’s database if available. I think the assumption is that your OS’s time zone information is more likely to be up-to-date.
Windows doesn’t use the tz database, though. It has its own time zone database stored in the Windows Registry.
On Windows, dateutil
will use Windows’ database before falling back to its included tz database. In contrast, pytz
doesn’t know how to use the Windows time zone database. pytz
can only use the tz database.
This is probably an argument in dateutil
’s favor, but I find it to be a particularly weak argument compared with the fact that common time zone usage with dateutil
may produce an incorrect result. For the vast majority of applications, no matter the platform, I suspect an up-to-date pytz
is a superior choice.
Premature Optimizations
For fun, here’s a few other points of comparison between these libraries.
While we’re on the topic of included time zone databases, I’ll mention that, as of this writing, pytz
’s included tz database seems to be 2.3 MiB. In contrast, dateutil
keeps a compressed tarball of the tz database, weighing in at just 208 KiB. That could be a meaningful difference if you’re tight on storage space (e.g. an embedded system). Perhaps the pytz
author would accept a patch!
On the other hand, Armin Ronacher’s blog entry mentions that datetime
instances with tzinfo
, “often cause much larger pickles.” For your consideration:
>>> import pickle
>>> now = datetime.utcnow()
>>> len(pickle.dumps(now, 2))
44
>>> len(pickle.dumps(now.replace(tzinfo=pytz.utc), 2))
61
>>> len(pickle.dumps(now.replace(tzinfo=tz.tzutc()), 2))
73
pytz
’s UTC is a little smaller than dateutil
’s. That could make a difference if, for example, you wanted to store aware (as opposed to naïve) datetime
instances. (But really, how hard is it to call .replace(tzinfo=pytz.utc)
when reading in time stamps?)
Speaking of UTC, it seems like dateutil
makes a new tzinfo
object for UTC every time you ask for it:
>>> tz.tzutc() is tz.tzutc()
False
Contrast with pytz
which just has its single utc
member, rather than creating a new object every time you want to use it.
- Not “daylight savings time”! I am disappointed that she doesn’t think there is a clear standard for capitalizing time zone names. I usually prefer Chicago, but their time zone capitalization rules are too complicated for my tastes, so I’m choosing the AP Style Guide’s rules for time zones. [return]
- Incidentally, neither of those
tzinfo
attributes are the same object aspytz_eastern
, but eachtzinfo
attribute as well aspytz_eastern
are instances ofdatetime.tzinfo
according toisinstance
. [return]