Repetition of values within a Dataframe

Asked

Viewed 116 times

0

I’m conducting a data analysis on a dataset which displays temperature data (from January to December) over the years.

By importing the dataset I found that it has some inaccurate temperature values in the range of 999.90. What I could do replace this value by the average of the months?

inserir a descrição da imagem aqui

1 answer

0


You can first replace the value 999.90 for np.nan and then use pandas.Dataframe.fillna.

In [1]: import numpy as np                                                                                                                            

In [2]: import pandas as pd                                                                                                                           

In [3]: df = pd.DataFrame([[10,2,4,5], [20,7,8,9], [30, 999.9, 40, 50]], columns=list('ABCD'))  

In [4]: df                                                                                                                                            
Out[4]: 
    A      B   C   D
0  10    2.0   4   5
1  20    7.0   8   9
2  30  999.9  40  50

In [5]: df = df.replace(999.90, np.nan)    

In [6]: df                                                                                                                                            
Out[6]: 
    A    B   C   D
0  10  2.0   4   5
1  20  7.0   8   9
2  30  NaN  40  50


In [7]: df.fillna(df.mean())                                                                                                                          
Out[7]: 
    A    B   C   D
0  10  2.0   4   5
1  20  7.0   8   9
2  30  4.5  40  50

In [8]: df                                                                                                                                            
Out[8]: 
    A    B   C   D
0  10  2.0   4   5
1  20  7.0   8   9
2  30  NaN  40  50
  • I got it. The laptop I was using bugged! Replace the values of 999.90 for NaN and then insert in place of NaN the average of the values by the following command df_rec2_nan.fillna(df_rec2_nan.mean(0),inplace=True) which resulted in the same values as the drec4s solution. Thank you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.