Rather than simply concatenating column data into a Python list, where you would have timestamps,.
In this case, it is best to use the method apply
- it can call a function row by row from your dataframe, and aggregate the values returned for each row in a Series. This series will share the Index with your original dataframe, and can be concatenated as an extra column. (And then, you can delete the columns with separate date elements).
And while we’re at it, the dataframe, unlike a "csv" file where "everything is text" can contain more elaborate objects - like datetimes, which contains a "timestamp" data with date, hours, minutes - which can be ordered, taking into account daylight savings time and time zone-time, subtracted from other date-time values to find duration, and so on.
If the function to be applied returns a datetime object, the pandas automatically creates a series with that content:
from datetime import datetime
def processa(linha):
# transformar as colunas desejadas em uma lista de valores inteiros:
valores = [int(val) for val in (linha[" yyyy"], linha[" mm"], linha[" dd"], linha[" hour"], linha[" min"]
# criar objeto datetime:
# O construtor do python "datetime" recebe na ordem os valores
# para ano, mes, dia, horas e minutos - o operador "*"
# desempacota esses argumentos, que estão em uma lista, na chamada:
return datetime(*valores)
# Ler o seu dataframe:
df = pd.read_csv("B116353.csv")
# criar a série com as datas e horas:
timestamps = df.apply(processa, axis=1)
timstamps.name = "timestamps"
# Criar um novo dataframe com as colunas de interesse -
# descobrir indice da coluna apos " min":
remainder_start = list(df.columns).index(" min")
new_df = pd.concat(
(df[["id_argos", " id_wmo"]],
timestamps,
df[list(df.columns)[remainder_start + 1: ]
),
axis=1
)
Ready - now you have the "timestamp" column with a datetime object combining the numbers of 5 columns - and you can proceed with the processing of your dataframe.
The penultimate line of the "Concat" call uses "pure" Python (that is, no pandas) - to select the names of all columns in front of " min" without them needing to be typed - these names are passed as a list of strings as an index to the dataframe, And that selects those columns. The "Concat" call then uses the first two columns of the original frame, the time series we created, and all the remaining columns to create a new dataframe.
this creates the list, but does not help to incorporate the dates created in the dataframe for continuity of analysis. The format method although working is very impractical -from Python 3.6 it is much more convenient to use the
f-strings
for that kind of expression.– jsbueno