The Case for pytz over dateutil

Published 2014-05-25

**This article is outdated.** Please see instead [my new article on dateutil](/posts/dateutil-preferred/).

When it comes to dealing with dates and times in Python, my first choice is the standard library’s datetime module. However, while datetime supports time zones in general, it has no knowledge of any specific time zones. It lacks a time zone database, information about all the world’s time zones past and present. PEP 431 would add such a database to the standard library, but it seems to be stalled.

I am aware of two libraries you can use when you need time zone information in Python, pytz and dateutil. pytz seems to be the most popular time zone library by far. However, dateutil has the same time zone information and a lot of other useful date and time functionality besides. Is there any reason to use pytz when dateutil seems to be a superset of pytz?

As it turns out, yes, there is at least one good reason to prefer pytz.

Whenever Possible, Use UTC

Before I divulge that reason, I should tell you that I am a firm believer that the software you write should internally use only UTC. Convert all times to UTC on input, and convert to local times only when producing output for the user. To do otherwise is to invite errors.

I mention this because pytz and dateutil have some differences that are only relevant if you, for example, perform arithmetic on non-UTC times. I’m not addressing those differences here because I will (hopefully) never need to care about them. I avoid them by simply using UTC whenever possible.

Lest you think this advice regarding UTC is unsubstantiated, here’s an example: The United States’ Eastern time zone entered daylight saving time¹ on April 7, 2002 at 7:00 a.m. UTC. Local clocks went from 1:00 a.m. EST to 3:00 a.m. EDT. In this example I’ll do some arithmetic on a local time.

>>> from datetime import datetime, timedelta
>>> import pytz
>>> fmt = '%Y-%m-%d %H:%M:%S %Z (%z)'
>>> pytz_eastern = pytz.timezone("America/New_York")
>>> utc_ny_dst_start = datetime(2002, 4, 7, 7, tzinfo=pytz.utc)
>>> local_ny_dst_start = utc_ny_dst_start.astimezone(pytz_eastern)
>>> local_ny_dst_start.strftime(fmt)
'2002-04-07 03:00:00 EDT (-0400)'
>>> (local_ny_dst_start - timedelta(minutes=1)).strftime(fmt)
'2002-04-07 02:59:00 EDT (-0400)'

Oops! 2:59 a.m. never happened in the Eastern time zone on April 7, 2002. I used pytz here, but if I had used time zones from dateutil instead I would get exactly the same results. (pytz’s normalize method can actually fix this problem—but just use UTC whenever possible, OK?)

I was most recently reminded that I should use UTC internally by Taavi Burns’ excellent PyCon 2012 presentation, What You Need to Know about datetimes, which itself quotes this advice from Armin Ronacher’s “Dealing with Timezones in Python.”

The Case for `pytz`

My prior example demonstrated a time that “never occurred,” 2:00 a.m. on April 7, 2002 in the US Eastern time zone. When daylight saving time ends, though, you have a different problem: times that “happen twice.” Later on in 2002, daylight saving time ended in the Eastern time zone on October 27 at 6:00 a.m. UTC. The local clocks went from 1:59:59 a.m. EDT to 1:00:00 a.m. EST, and all times between 1:00:00 a.m. and 1:59:59 a.m. “happened twice”: once in Eastern Daylight Time and then again in Eastern Standard Time. Therefore a time such as “1:30 a.m. on October 27, 2002” is ambiguous. Did I mean 1:30 a.m. EDT or 1:30 a.m. EST?

datetime’s API has problems with these ambiguous times. To demonstrate this problem, let’s say you read October 27, 2002 6:00 a.m. UTC from your database, and now you want to display this date to the user in his or her local time zone, which is Eastern time.

>>> from dateutil import tz
>>> datu_eastern = tz.gettz("America/New_York")
>>> utc_1am_est = datetime(2002, 10, 27, 6, tzinfo=tz.tzutc())
>>> utc_1am_est.astimezone(datu_eastern).strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'

Great, daylight saving time ended at 6:00 a.m. UTC and the output is 1:00 a.m. EST as expected. Now, what if you did the same thing with the hour before that, 5:00 a.m. UTC? That should still be in EDT.

>>> utc_1am_edt = datetime(2002, 10, 27, 5, tzinfo=tz.tzutc())
>>> utc_1am_edt.astimezone(datu_eastern).strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'

Oops! 5:00 a.m. UTC on October 27, 2002 was 1:00 a.m. EDT, not EST. Both the time zone abbreviation and the time zone offset are wrong.

This problem can be traced back to datetime’s API, which documents this problem:

[T]he tzinfo.dst() method must consider times in the “repeated hour” to be in standard time. […] Applications that can’t bear such ambiguities should avoid using hybrid tzinfo subclasses; there are no ambiguities when using UTC, or any other fixed-offset tzinfo subclass (such as a class representing only EST (fixed offset -5 hours), or only EDT (fixed offset -4 hours)).

pytz, however, gets this right:

>>> utc_1am_est.astimezone(pytz_eastern).strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'
>>> utc_1am_edt.astimezone(pytz_eastern).strftime(fmt)
'2002-10-27 01:00:00 EDT (-0400)'

This difference in correctness is why I think pytz should be preferred to dateutil when you need to work with time zones. It may seem unlikely that your application will ever hit the problem I’ve demonstrated here, but in many cases I bet you can imagine how it is possible, and that’s enough for me. I prefer to err on the side of correctness.

In case you’re wondering how pytz works here while dateutil does not, it seems that pytz actually uses two separate tzinfo instances for the same time zone, one for standard time and another for daylight saving time:

>>> utc_12am_edt = datetime(2002, 10, 27, 4, tzinfo=pytz.utc)
>>> one_hour = timedelta(hours=1)
>>> local_12am_edt = utc_12am_edt.astimezone(pytz_eastern)
>>> local_1am_edt = (utc_12am_edt + one_hour).astimezone(pytz_eastern)
>>> local_1am_est = (utc_12am_edt + one_hour*2).astimezone(pytz_eastern)
>>> local_2am_est = (utc_12am_edt + one_hour*3).astimezone(pytz_eastern)
>>> local_12am_edt.tzinfo is local_1am_edt.tzinfo
True
>>> local_1am_edt.tzinfo is local_1am_est.tzinfo
False
>>> local_1am_est.tzinfo is local_2am_est.tzinfo
True

Even though I passed in the same pytz_eastern to every astimezone call, the tzinfo attached to each datetime instance is different depending on whether or not the datetime should be in standard time or daylight saving time.²

Note that pytz’s documentation actually says that you should call its time zone instances’ normalize methods on the result of astimezone when converting to a non-UTC time zone, but I have my doubts whether this is necessary. I have not yet found any circumstances where normalize was necessary when merely converting from one time zone to another via astimezone.

Constructing `datetimes` in a Repeated Hour

If you ever need to construct a local time directly, perhaps as a result of parsing a string, it’s nigh impossible to get the DST version of the hour where daylight saving time ends with dateutil.

>>> datetime(2002, 10, 27, 1, tzinfo=datu_eastern).strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'

That’s it. EST is your only option. There’s no way to tell the datetime library which 1 a.m. you meant, EDT or EST. As stated in the passage quoted from the datetime documentation, above, you will always get standard time.

pytz, on the other hand, gives you a way out of this problem using the localize methods of its tzinfo instances:

>>> naive_1am = datetime(2002, 10, 27, 1)
>>> pytz_eastern.localize(naive_1am, is_dst=False).strftime(fmt)
'2002-10-27 01:00:00 EST (-0500)'
>>> pytz_eastern.localize(naive_1am, is_dst=True).strftime(fmt)
'2002-10-27 01:00:00 EDT (-0400)'
>>> pytz_eastern.localize(naive_1am, is_dst=None).strftime(fmt)
Traceback (most recent call last):
  File "<ipython-input-12-6e97a68309e9>", line 1, in <module>
    pytz_eastern.localize(naive_1am, is_dst=None).strftime(fmt)
  File ".../lib/python2.7/site-packages/pytz/tzinfo.py", line 349, in localize
    raise AmbiguousTimeError(dt)
AmbiguousTimeError: 2002-10-27 01:00:00

In that last case, is_dst=None means, “I don’t know if this is supposed to be daylight saving time or not, so raise an error if it’s ambiguous.”

Any Reason to Use `dateutil`?

Based on the preceding argument, I feel pretty strongly that I should use pytz instead of dateutil for my time zone needs. That said, I have found an argument in dateutil’s favor: support for Windows’ built-in time zone data.

Both pytz and dateutil use the well-known “tz database”. Both Python libraries include a copy of this database, but on *nix systems both libraries will prefer to use your system’s database if available. I think the assumption is that your OS’s time zone information is more likely to be up-to-date.

Windows doesn’t use the tz database, though. It has its own time zone database stored in the Windows Registry.

On Windows, dateutil will use Windows’ database before falling back to its included tz database. In contrast, pytz doesn’t know how to use the Windows time zone database. pytz can only use the tz database.

This is probably an argument in dateutil’s favor, but I find it to be a particularly weak argument compared with the fact that common time zone usage with dateutil may produce an incorrect result. For the vast majority of applications, no matter the platform, I suspect an up-to-date pytz is a superior choice.

Premature Optimizations

For fun, here’s a few other points of comparison between these libraries.

While we’re on the topic of included time zone databases, I’ll mention that, as of this writing, pytz’s included tz database seems to be 2.3 MiB. In contrast, dateutil keeps a compressed tarball of the tz database, weighing in at just 208 KiB. That could be a meaningful difference if you’re tight on storage space (e.g. an embedded system). Perhaps the pytz author would accept a patch!

On the other hand, Armin Ronacher’s blog entry mentions that datetime instances with tzinfo, “often cause much larger pickles.” For your consideration:

>>> import pickle
>>> now = datetime.utcnow()
>>> len(pickle.dumps(now, 2))
44
>>> len(pickle.dumps(now.replace(tzinfo=pytz.utc), 2))
61
>>> len(pickle.dumps(now.replace(tzinfo=tz.tzutc()), 2))
73

pytz’s UTC is a little smaller than dateutil’s. That could make a difference if, for example, you wanted to store aware (as opposed to naïve) datetime instances. (But really, how hard is it to call .replace(tzinfo=pytz.utc) when reading in time stamps?)

Speaking of UTC, it seems like dateutil makes a new tzinfo object for UTC every time you ask for it:

>>> tz.tzutc() is tz.tzutc()
False

Contrast with pytz which just has its single utc member, rather than creating a new object every time you want to use it.

Not “daylight savings time”! I am disappointed that she doesn’t think there is a clear standard for capitalizing time zone names. I usually prefer Chicago, but their time zone capitalization rules are too complicated for my tastes, so I’m choosing the AP Style Guide’s rules for time zones. ^[return]
Incidentally, neither of those tzinfo attributes are the same object as pytz_eastern, but each tzinfo attribute as well as pytz_eastern are instances of datetime.tzinfo according to isinstance. ^[return]

Whenever Possible, Use UTC

The Case for pytz

Constructing datetimes in a Repeated Hour

Any Reason to Use dateutil?

Premature Optimizations

The Case for `pytz`

Constructing `datetimes` in a Repeated Hour

Any Reason to Use `dateutil`?