Replace certain values by media in a pandas Dataframe

Asked

Viewed 794 times

0

Hello,

I have a Dataframe as image below. I would like to replace the Nan values of the QTDVENDADIARIA column with the media of the two previous records (40+27/2), in the same way the price column (2.38+2.38/2) and in the EMISSAO column where the Nat value is placed the date of the SEQ_DATA Column. Similarly with the next Nan value of the QTDVENDADIARY column (68+54/2) and the PRECO (2.38+2.38/2). I tried to do it as follows:

df.fillna(df.mean(),axis=1)

but in this way it replaces the medias of the whole column, and in the date it places a value there being nothing.

inserir a descrição da imagem aqui

  • If on the line whose index is equal to zero QTDVENDADIARIA be equal nan what will replace you ?

  • I would substitute the average of the next two

1 answer

1


The fillna can’t be so flexible as to create a rule like yours, in which case I would use the (iterrows) to go through the dataframe to make the adjustments.

Following your example I created the following dataframe:

inserir a descrição da imagem aqui

And to fill in the empty fields I used the following code:

import numpy as np
for index, row in df.iterrows():

    if index > 1:

        if pd.isnull(row['QTDVENDADIARIA']):
            df.loc[index, 'QTDVENDADIARIA'] = (df.loc[index-1,'QTDVENDADIARIA']+df.loc[index-2,  'QTDVENDADIARIA'])/2

        if pd.isnull(row['PRECOMEDIODIARIO']):
            df.loc[index, 'PRECOMEDIODIARIO'] = (df.loc[index-1,'PRECOMEDIODIARIO']+df.loc[index-2,'PRECOMEDIODIARIO'])/2

        if pd.isnull(row['EMISSAO']):
            df.loc[index, 'EMISSAO'] = row['SEQ_DATA']

    else:

        if pd.isnull(row['QTDVENDADIARIA']):
            df.loc[index, 'QTDVENDADIARIA'] = (df.loc[index+2,'QTDVENDADIARIA']+df.loc[index+1,  'QTDVENDADIARIA'])/2

        if pd.isnull(row['PRECOMEDIODIARIO']):
            df.loc[index, 'PRECOMEDIODIARIO'] =     (df.loc[index+2,'PRECOMEDIODIARIO']+df.loc[index+1,'PRECOMEDIODIARIO'])/2

Explaining a little code...

The iterrows function runs through all the lines of the dataframe, I do this to know where I have empty fields.

Speaking only of the value fields, if the field is empty I do the average account based on the indexes of the previous lines (-1 and -2).

When the date field is empty I simply copy the data from the column ('SEQ_DATA')

The initial IF is used to check which index the routine is, because, if it was in the index 0 or 1, when trying to do the calculation would give error ( I did this based on the response of your comment and assuming the same premise that used in the -0 to 1 also).

Once that’s done, that’s the result:

inserir a descrição da imagem aqui

Browser other questions tagged

You are not signed in. Login or sign up in order to post.