The format you want to convert to (with "T" and "Z") is defined by ISO 8601 standard. And the "Z" at the end indicates that the date and time is in UTC.
The problem is that in the string we only have the date and time, so there is no way to convert to UTC without assuming some arbitrary premises (which will impact directly on the final result).
Explaining it a little better: when the string only has the date and time, the datetime
returned by strptime
is called "naive" (in literal translation, "naive"), which is when he has no information on the Timezone (time zone).
In your case, the string "01/01/2018 13:00:40"
generates a datetime
corresponding to January 1, 2018, at 13:00:40. But in which Timezone? It is not possible to know, because the datetime
generated is naive. This can be checked using the rules described in the documentation:
from datetime import datetime
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
# verificar se é naive
print(dt.tzinfo is None or dt.tzinfo.utcoffset(dt) is None) # True
In the case, the tzinfo
is the information about Timezone.
Like the datetime
does not have information about Timezone, we have no way of knowing what the corresponding value in UTC. Because in each part of the world (in each Timezone/time zone), the instant corresponding to January 1, 2018 at 13h occurred a different time, and therefore the date and time in UTC will be different.
For example, if we consider "1 January 2018 at 13:00:40 at the Time of Brasilia", then when converting to UTC the result should be "2018-01-01T15:00:40Z" - note that the time in UTC is 15h, since in January 2018 the Brasilia Time was in daylight time, and therefore 2 hours less than UTC.
But if we consider "1 January 2018 at 13:00:40 in Japan’s time zone", when converting to UTC the result would be "2018-01-01T04:00:40Z" (time changed to 4 am), as Japan’s time zone is 9 hours ahead of UTC.
And depending on the place, even the day and month may be different. " January 1, 2018 at 1:00:40 am time zone of Samoa" in UTC is "2017-12-31T23:00:40Z" (31 de December de 2017, at 23:00:40), and "January 1, 2018 at 13:00:40 on time zone of Niue" (one small island country South Pacific) in UTC is "2018-01-02T00:00:40Z" (midnight of the day 2 of January 2018).
That is, depending on the Timezone to which the date and time refer, the UTC result will be different. And how the string "01/01/2018 13:00:40"
has no information about Timezone, the options are: choose one arbitrarily, or use the default configured on the system.
Consider the Timezone of the system
We can just use the method astimezone
, passing as parameter a Timezone corresponding to UTC. According to the documentation: "If self is naive, it is Presumed to represent time in the system Timezone"; that is, when the datetime
for naive (what is our case), it is assumed that the date and time are in the "Timezone of the system".
To get a Timezone corresponding to UTC we have two options. The first is to build a object timezone
, passing as parameters the offset (the difference with respect to UTC) and the name. As I want a Timezone that corresponds to UTC, the offset is zero (and for that we use a timedelta
). And the name is "Z", because then I can use the format %Z
(that according to the documentation, prints Timezone name):
from datetime import datetime, timezone, timedelta
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
dt = dt.astimezone(timezone(timedelta(hours=0), 'Z'))
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z"))
I didn’t use the method isoformat
because instead of "Z", it prints the offset (in this case, "+00:00", and then I would have to use a replace('+00:00', 'Z')
). Instead, I used a formatting string with the specific fields, including the %Z
to display the name you defined in the timezone
.
Another option (for Python <= 3.8) is to use module pytz
, which supports the timezones of the IANA (highly recommended if you want to keep up to date with the constant changes in timezones, such as daylight saving time rules, that change all the time). It already has a UTC-specific Timezone:
import pytz
from datetime import datetime
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
dt = dt.astimezone(pytz.utc)
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z'))
The problem is that the name of Timezone is "UTC" (not "Z"), and testing all options available for strftime
, none returned "Z". So the way was to make a replace
in the end.
A disadvantage of this solution is that you are completely dependent on the system’s Timezone, and the result can vary greatly. For example, on my machine the result was 2018-01-01T15:00:40Z
(15h instead of 13h), since in my system the Timezone corresponds to the Schedule of Brasilia. But running on Ideone.com and in the Repl.it, the result was 2018-01-01T13:00:40Z
(13h, probably because these environments are configured with UTC).
Another test to show how the result varies depending on the setting is to keep changing the Timezone of the environment. For example, on Linux you can set the variable TZ
to change the Timezone. In this case, I ran the following script (in a file called date_test.py
):
import pytz
from datetime import datetime, timezone, timedelta
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
print('--- sem timezone')
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z"))
print('--- converter para UTC')
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
dt = dt.astimezone(pytz.utc)
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z'))
print('--- converter para UTC com timedelta')
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
dt = dt.astimezone(timezone(timedelta(hours=0), 'Z'))
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z"))
On the command line, I set the variable TZ
for different timezones:
$ TZ=Pacific/Niue python3 date_test.py
--- sem timezone
2018-01-01T13:00:40
--- converter para UTC
2018-01-02T00:00:40Z
--- converter para UTC com timedelta
2018-01-02T00:00:40Z
$ TZ=America/Sao_Paulo python3 date_test.py
--- sem timezone
2018-01-01T13:00:40
--- converter para UTC
2018-01-01T15:00:40Z
--- converter para UTC com timedelta
2018-01-01T15:00:40Z
$ TZ=Asia/Tokyo python3 date_test.py
--- sem timezone
2018-01-01T13:00:40
--- converter para UTC
2018-01-01T04:00:40Z
--- converter para UTC com timedelta
2018-01-01T04:00:40Z
$ TZ=Pacific/Apia python3 date_test.py
--- sem timezone
2018-01-01T13:00:40
--- converter para UTC
2017-12-31T23:00:40Z
--- converter para UTC com timedelta
2017-12-31T23:00:40Z
Notice how each Timezone results in a different value (confirming what has already been explained earlier).
To avoid this, another solution would be to consider that the string refers to a specific Timezone.
Consider a specific Timezone before converting to UTC
In that case, I’ll use the module pytz
(recommended for Python <= 3.8 - at the end I also speak of the alternative for Python >= 3.9), since it has support for IANA timezones (which are the names in the format Continente/Regiao
used in the previous example).
The advantage of pytz
is that we don’t need to use timedelta
, since this is extremely prone to errors. There is a more detailed explanation in this answer (in the "Do not use timedelta
"), but basically: an identifier like America/Sao_Paulo
has the entire history of changes to this Timezone, such as daylight saving time changes (when they occurred, and what the offset before and after the change).
For example, on Timezone America/Sao_Paulo
(that corresponds to the Time of Brasilia), during daylight saving time offset is -02:00
(2 hours less than UTC), but during "normal" time is -03:00
(3 hours less than UTC). However, the rules vary widely from one year to the next (in each year, daylight saving time begins and ends at a different date, apart from the years it did not have). If you were to use timedelta
, would have to know if the date and time you are manipulating corresponds to daylight saving or not, otherwise the incorrect value would be used (and consequently the conversion to UTC would be wrong). But using the identifier America/Sao_Paulo
, the pytz
already consults the Timezone history and uses the correct value (it does the "dirty work" for you).
The disadvantage of pytz
we have seen above (you have to do the replace
), but given the advantages (no need to "guess" the offset correct of each Timezone in each season), in which case I find perfectly acceptable.
The code goes like this:
from datetime import datetime
import pytz
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
# setar o timezone (sem conversão)
dt = pytz.timezone('America/Sao_Paulo').localize(dt)
# converter para UTC
dt = dt.astimezone(pytz.utc)
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z'))
In this case, I assumed that the string "01/01/2018 13:00:40"
corresponds to a date and time in the Time of Brasilia. This is the arbitrary premise I mentioned at the beginning. Since we have no information about the Timezone to which the string refers, we have to assume some specific Timezone in order not to depend on what is configured in the system (if you have this information about Timezone somewhere, use the correct one instead of America/Sao_Paulo
).
So first I know the Timezone in datetime
(which ceases to be naive), and then I convert to UTC. Now the return at all times is 2018-01-01T15:00:40Z
, independent of the Timezone that is configured in the system (see Ideone.com and in the Repl.it, for example). If I do the above test by changing the variable TZ
, the result remains 2018-01-01T15:00:40Z
, because now the code always assumes that the date and time is in Brasilia Time before converting to UTC (no longer depends on the Timezone system).
Note: if you assume that the string represents a date and time in UTC, then just do:
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S")
dt = pytz.utc.localize(dt)
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z'))
Now the return will always be 2018-01-01T13:00:40Z
, independent of the Timezone system.
Obviously in this particular case you could even use isoformat()
or dt.strftime("%Y-%m-%dT%H:%M:%S")
and concatenate the "Z", but I still prefer to use localize
, for the datetime
is no longer naive and represents a date and time in UTC (which prevents errors if you need to convert it to another Timezone, for example).
Python >= 3.9
From Python 3.9 it is possible to use module zoneinfo
, which also supports IANA timezones. Its operation is similar to pytz
.
For example, in case you consider that the date is in a specific Timezone before converting to UTC:
from datetime import datetime
from zoneinfo import ZoneInfo
# considerar que a data está no timezone America/Sao_Paulo
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S").replace(tzinfo=ZoneInfo('America/Sao_Paulo'))
# converter para UTC
dt = dt.astimezone(ZoneInfo('UTC'))
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z')) # 2018-01-01T15:00:40Z
Or, if you want to consider that the date is already in UTC:
from datetime import datetime
from zoneinfo import ZoneInfo
dt = datetime.strptime("01/01/2018 13:00:40", "%d/%m/%Y %H:%M:%S").replace(tzinfo=ZoneInfo('UTC'))
print(dt.strftime("%Y-%m-%dT%H:%M:%S%Z").replace('UTC', 'Z')) # 2018-01-01T13:00:40Z
One detail: the
Z
in the end indicates that the date is in UTC. Then it is important to know which Timezone the input date is in. For example, if01/01/2018 13:00:40
is a date and time in Brasilia time, when converting to UTC it turns2018-01-01T15:00:40Z
(on 1 January is in daylight time, so 2 hours before UTC -> 13h in Brasilia = 15h UTC (Z
)). Without knowing which Timezone of the input date, there is no way to know the corresponding value in UTC - it is not only put a "Z" at the end and ready :-)– hkotsubo