Convert date string to ISO 8601 format (with "T" and "Z")

Asked

Viewed 396 times

2

With a string date in the format 01/01/2018 13:00:40, how I could convert it to ISO 8601 format using the "T" and "Z"?

Example: 2018-01-01T13:00:40Z

I got it this way:

datetime.strftime(
    datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S"),
    "%Y-%m-%dT%H:%M:%SZ"
)

But I do not believe it is the best way, there is some other method for this conversion?

  • 2

    One detail: the Z in the end indicates that the date is in UTC. Then it is important to know which Timezone the input date is in. For example, if 01/01/2018 13:00:40 is a date and time in Brasilia time, when converting to UTC it turns 2018-01-01T15:00:40Z (on 1 January is in daylight time, so 2 hours before UTC -> 13h in Brasilia = 15h UTC (Z)). Without knowing which Timezone of the input date, there is no way to know the corresponding value in UTC - it is not only put a "Z" at the end and ready :-)

2 answers

3

Yeah, to define an object of the date type, there’s not much to do. Since your date is not in any ISO format, you need to set it manually, as you did.

date = datetime.strptime('15/03/2018 13:00:40', '%d/%m/%Y %H:%M:%S')

However, the desired format is part of ISO and already has native methods that handle it:

print( date.isoformat() )  # 2018-01-01T13:00:40

So just because you can use isoformat() instead of strftime already simplifies your code.

If the entry is always in Brasilia time, you can set by default the time zone by replacing the field tzinfo of the object date:

tzone = timezone(timedelta(hours=-3))
date = datetime.strptime('15/03/2018 13:00:40', '%d/%m/%Y %H:%M:%S')
date = date.replace(tzinfo=tzone)

print( date.isoformat() )  # 2018-03-15T13:00:40-03:00
  • Thanks for the explanation Anderson, the question of how to include "T" and "Z" in the format still continues. In your example with isoformat() (that I had already tried) is only included the "T" and not the "Z"

2

The format you want to convert to (with "T" and "Z") is defined by ISO 8601 standard. And the "Z" at the end indicates that the date and time is in UTC.

The problem is that in the string we only have the date and time, so there is no way to convert to UTC without assuming some arbitrary premises (which will impact directly on the final result).


Explaining it a little better: when the string only has the date and time, the datetime returned by strptime is called "naive" (in literal translation, "naive"), which is when he has no information on the Timezone (time zone).

In your case, the string "01/01/2018 13:00:40" generates a datetime corresponding to January 1, 2018, at 13:00:40. But in which Timezone? It is not possible to know, because the datetime generated is naive. This can be checked using the rules described in the documentation:

from datetime import datetime

dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
# verificar se é naive
print(dt.tzinfo is None or dt.tzinfo.utcoffset(dt) is None) # True

In the case, the tzinfo is the information about Timezone.

Like the datetime does not have information about Timezone, we have no way of knowing what the corresponding value in UTC. Because in each part of the world (in each Timezone/time zone), the instant corresponding to January 1, 2018 at 13h occurred a different time, and therefore the date and time in UTC will be different.

For example, if we consider "1 January 2018 at 13:00:40 at the Time of Brasilia", then when converting to UTC the result should be "2018-01-01T15:00:40Z" - note that the time in UTC is 15h, since in January 2018 the Brasilia Time was in daylight time, and therefore 2 hours less than UTC.

But if we consider "1 January 2018 at 13:00:40 in Japan’s time zone", when converting to UTC the result would be "2018-01-01T04:00:40Z" (time changed to 4 am), as Japan’s time zone is 9 hours ahead of UTC.

And depending on the place, even the day and month may be different. " January 1, 2018 at 1:00:40 am time zone of Samoa" in UTC is "2017-12-31T23:00:40Z" (31 de December de 2017, at 23:00:40), and "January 1, 2018 at 13:00:40 on time zone of Niue" (one small island country South Pacific) in UTC is "2018-01-02T00:00:40Z" (midnight of the day 2 of January 2018).

That is, depending on the Timezone to which the date and time refer, the UTC result will be different. And how the string "01/01/2018 13:00:40" has no information about Timezone, the options are: choose one arbitrarily, or use the default configured on the system.


Consider the Timezone of the system

We can just use the method astimezone, passing as parameter a Timezone corresponding to UTC. According to the documentation: "If self is naive, it is Presumed to represent time in the system Timezone"; that is, when the datetime for naive (what is our case), it is assumed that the date and time are in the "Timezone of the system".

To get a Timezone corresponding to UTC we have two options. The first is to build a object timezone, passing as parameters the offset (the difference with respect to UTC) and the name. As I want a Timezone that corresponds to UTC, the offset is zero (and for that we use a timedelta). And the name is "Z", because then I can use the format %Z (that according to the documentation, prints Timezone name):

from datetime import datetime, timezone, timedelta

dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
dt = dt.astimezone(timezone(timedelta(hours=0), 'Z'))
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z"))

I didn’t use the method isoformat because instead of "Z", it prints the offset (in this case, "+00:00", and then I would have to use a replace('+00:00', 'Z')). Instead, I used a formatting string with the specific fields, including the %Z to display the name you defined in the timezone.


Another option (for Python <= 3.8) is to use module pytz, which supports the timezones of the IANA (highly recommended if you want to keep up to date with the constant changes in timezones, such as daylight saving time rules, that change all the time). It already has a UTC-specific Timezone:

import pytz
from datetime import datetime

dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
dt = dt.astimezone(pytz.utc)
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z'))

The problem is that the name of Timezone is "UTC" (not "Z"), and testing all options available for strftime, none returned "Z". So the way was to make a replace in the end.


A disadvantage of this solution is that you are completely dependent on the system’s Timezone, and the result can vary greatly. For example, on my machine the result was 2018-01-01T15:00:40Z (15h instead of 13h), since in my system the Timezone corresponds to the Schedule of Brasilia. But running on Ideone.com and in the Repl.it, the result was 2018-01-01T13:00:40Z (13h, probably because these environments are configured with UTC).


Another test to show how the result varies depending on the setting is to keep changing the Timezone of the environment. For example, on Linux you can set the variable TZ to change the Timezone. In this case, I ran the following script (in a file called date_test.py):

import pytz
from datetime import datetime, timezone, timedelta

dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
print('--- sem timezone')
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z"))

print('--- converter para UTC')
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
dt = dt.astimezone(pytz.utc)
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z'))

print('--- converter para UTC com timedelta')
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
dt = dt.astimezone(timezone(timedelta(hours=0), 'Z'))
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z"))

On the command line, I set the variable TZ for different timezones:

$ TZ=Pacific/Niue python3 date_test.py
--- sem timezone
2018-01-01T13:00:40
--- converter para UTC
2018-01-02T00:00:40Z
--- converter para UTC com timedelta
2018-01-02T00:00:40Z

$ TZ=America/Sao_Paulo python3 date_test.py
--- sem timezone
2018-01-01T13:00:40
--- converter para UTC
2018-01-01T15:00:40Z
--- converter para UTC com timedelta
2018-01-01T15:00:40Z

$ TZ=Asia/Tokyo python3 date_test.py
--- sem timezone
2018-01-01T13:00:40
--- converter para UTC
2018-01-01T04:00:40Z
--- converter para UTC com timedelta
2018-01-01T04:00:40Z

$ TZ=Pacific/Apia python3 date_test.py
--- sem timezone
2018-01-01T13:00:40
--- converter para UTC
2017-12-31T23:00:40Z
--- converter para UTC com timedelta
2017-12-31T23:00:40Z

Notice how each Timezone results in a different value (confirming what has already been explained earlier).

To avoid this, another solution would be to consider that the string refers to a specific Timezone.


Consider a specific Timezone before converting to UTC

In that case, I’ll use the module pytz (recommended for Python <= 3.8 - at the end I also speak of the alternative for Python >= 3.9), since it has support for IANA timezones (which are the names in the format Continente/Regiao used in the previous example).

The advantage of pytz is that we don’t need to use timedelta, since this is extremely prone to errors. There is a more detailed explanation in this answer (in the "Do not use timedelta"), but basically: an identifier like America/Sao_Paulo has the entire history of changes to this Timezone, such as daylight saving time changes (when they occurred, and what the offset before and after the change).

For example, on Timezone America/Sao_Paulo (that corresponds to the Time of Brasilia), during daylight saving time offset is -02:00 (2 hours less than UTC), but during "normal" time is -03:00 (3 hours less than UTC). However, the rules vary widely from one year to the next (in each year, daylight saving time begins and ends at a different date, apart from the years it did not have). If you were to use timedelta, would have to know if the date and time you are manipulating corresponds to daylight saving or not, otherwise the incorrect value would be used (and consequently the conversion to UTC would be wrong). But using the identifier America/Sao_Paulo, the pytz already consults the Timezone history and uses the correct value (it does the "dirty work" for you).

The disadvantage of pytz we have seen above (you have to do the replace), but given the advantages (no need to "guess" the offset correct of each Timezone in each season), in which case I find perfectly acceptable.

The code goes like this:

from datetime import datetime
import pytz

dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
# setar o timezone (sem conversão)
dt = pytz.timezone('America/Sao_Paulo').localize(dt)
# converter para UTC
dt = dt.astimezone(pytz.utc)
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z'))

In this case, I assumed that the string "01/01/2018 13:00:40" corresponds to a date and time in the Time of Brasilia. This is the arbitrary premise I mentioned at the beginning. Since we have no information about the Timezone to which the string refers, we have to assume some specific Timezone in order not to depend on what is configured in the system (if you have this information about Timezone somewhere, use the correct one instead of America/Sao_Paulo).

So first I know the Timezone in datetime (which ceases to be naive), and then I convert to UTC. Now the return at all times is 2018-01-01T15:00:40Z, independent of the Timezone that is configured in the system (see Ideone.com and in the Repl.it, for example). If I do the above test by changing the variable TZ, the result remains 2018-01-01T15:00:40Z, because now the code always assumes that the date and time is in Brasilia Time before converting to UTC (no longer depends on the Timezone system).


Note: if you assume that the string represents a date and time in UTC, then just do:

dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
dt = pytz.utc.localize(dt)
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z'))

Now the return will always be 2018-01-01T13:00:40Z, independent of the Timezone system.

Obviously in this particular case you could even use isoformat() or dt.strftime("%Y-%m-%dT%H:%M:%S") and concatenate the "Z", but I still prefer to use localize, for the datetime is no longer naive and represents a date and time in UTC (which prevents errors if you need to convert it to another Timezone, for example).


Python >= 3.9

From Python 3.9 it is possible to use module zoneinfo, which also supports IANA timezones. Its operation is similar to pytz.

For example, in case you consider that the date is in a specific Timezone before converting to UTC:

from datetime import datetime
from zoneinfo import ZoneInfo

# considerar que a data está no timezone America/Sao_Paulo
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S").replace(tzinfo=ZoneInfo('America/Sao_Paulo'))
# converter para UTC
dt = dt.astimezone(ZoneInfo('UTC'))
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z')) # 2018-01-01T15:00:40Z

Or, if you want to consider that the date is already in UTC:

from datetime import datetime
from zoneinfo import ZoneInfo

dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S").replace(tzinfo=ZoneInfo('UTC'))
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z')) # 2018-01-01T13:00:40Z

Browser other questions tagged

You are not signed in. Login or sign up in order to post.