One way to do this would be (I don’t know if it’s the most efficient, but it’s possible and it works):
Know where your separators are DataFrame
, that is, which rows have empty text values for each of the columns, and save the index in a list
Rotate for each index in the list linhasVazias
and separate the Series you own into subseries according to the index (each subseries containing an Action)
Reformat this DataFrame
containing the resulting sub-series in the new format
Saving at the end df, you will receive the new information
Here’s the code where I do these operations:
linhasVazias = df[(df['c0'] == "") & (df['c1'] == "") ].index.tolist()
df_final = pd.DataFrame({'c0': [], 'c1': [], 'c2': []})
anterior = -1
for i in linhasVazias:
# Separa a série relacionada
temp = df[anterior+1 : i]
# Cria a nova coluna com o nome da ação
temp['c2'] = temp.iloc[0][0]
# Remove a primeira linha, com o nome da ação
temp = temp.drop([anterior+1], axis = 0)
# Salva no novo dataFrame as linhas relacionadas
df_final = df_final.append(temp)
anterior = i
# Reseta os index no novo DataFrame, excluindo a coluna dos valores antigos
df_final = df_final.reset_index(drop = True)
OBS:
- here I used "C0", "C1" and "C2" to name the columns
- for your case, which has a very large DF, I do not know if the processing will be efficient, but worth the test
What are the names of the columns in your csv, @Filipe?
– Luan Naufal
In my original (as I said above, the action name is in the "Date" column): Date ; Price
– Filipe
I answered below using generic names for the columns, but you can change, in case the
c0
would turn theData
: https://answall.com/a/348663/132077– Luan Naufal