Selecting different ranges on a giant dataframe in Rstudio

Question

Selecting different ranges on a giant dataframe in Rstudio

Asked 6 years, 8 months ago

Viewed 143 times

0

I have a CSV much large with multiple stock dates and their closing prices, impossible to use Excel.

The action name is in the same date column and only appears at the beginning of the series, as shown below:

I have limited knowledge in R and am in need of some Function to help me do this interval reading.

NOTE: The name of the action is always in parentheses: (AÇÃO X)

What are the names of the columns in your csv, @Filipe?

– Luan Naufal

2018/12/07 at 15:22
In my original (as I said above, the action name is in the "Date" column): Date ; Price

– Filipe

2018/12/07 at 16:32
I answered below using generic names for the columns, but you can change, in case the c0 would turn the Data: https://answall.com/a/348663/132077

– Luan Naufal

2018/12/07 at 16:33

1 answer

Browser other questions tagged python r

You are not signed in. Login or sign up in order to post.

by Luan Naufal • **517** points · Answer 1 · 2018-12-07T16:31:28+00:00

One way to do this would be (I don’t know if it’s the most efficient, but it’s possible and it works):

Know where your separators are DataFrame, that is, which rows have empty text values for each of the columns, and save the index in a list
Rotate for each index in the list linhasVazias and separate the Series you own into subseries according to the index (each subseries containing an Action)
Reformat this DataFrame containing the resulting sub-series in the new format
Saving at the end df, you will receive the new information

Here’s the code where I do these operations:

linhasVazias = df[(df['c0'] == "") & (df['c1'] == "") ].index.tolist()

df_final = pd.DataFrame({'c0': [], 'c1': [], 'c2': []})
anterior = -1

for i in linhasVazias:
    # Separa a série relacionada
    temp = df[anterior+1 : i]

    # Cria a nova coluna com o nome da ação
    temp['c2'] = temp.iloc[0][0]

    # Remove a primeira linha, com o nome da ação
    temp = temp.drop([anterior+1], axis = 0)

    # Salva no novo dataFrame as linhas relacionadas
    df_final = df_final.append(temp)
    anterior = i

# Reseta os index no novo DataFrame, excluindo a coluna dos valores antigos
df_final = df_final.reset_index(drop = True)

OBS:
- here I used "C0", "C1" and "C2" to name the columns
- for your case, which has a very large DF, I do not know if the processing will be efficient, but worth the test