Python fill values with data from other lines

Asked

Viewed 972 times

2

Good morning, I’m having a major data failure problem in my df. I need to find the value of CO2 similar to that of another time using conditions that I am not able to do with the information on the line. I own a df of 1 year with values of 30 em 30 minutos. the values of Temperature and Radiation do not have missing values, I only have missing values in CO2.

import numpy as np
import pandas as pd

df = pd.read_hdf('./dados.hd5')

df.head()

Year_DoY_Hour          Temperatura    radiacao        CO2
2016-01-01 00:00:00    22.44570        0              380
2016-01-01 00:30:00    22.445700       0              390 
.
.
2016-01-15 00:00:00    22.88300        0              379
2016-01-15 00:30:00    22.445700       0              381 
2016-01-15 01:00:00    22.388300       0              NaN
.
.
.
2016-01-30 00:00:00    22.400000       0              350       
2016-01-30 00:30:00    16.393900       0              375                
2016-01-30 01:00:00    17.133900       0              365 
  • (a)Temperature must be between +- 2.5ºC;
  • (b)Radiation +- 50W/m²;
  • I have to have a window of -+ 3 dias between the value with NaN of CO2.
  • Average the values of CO2 when (a) and (b) are accepted on condition and place where I have the missing data of CO2.

In the df shown above we have to the day and time 2016-01-15 01:00:00 we have NaN in the CO2 and then I can’t find a temp. and radia. to fill the value of CO2. I believe with conditions I can do it, but I’m not getting it.

  • 1

    Hello! Your question was not very clear. You want to be able to read your file and treat an exception when reading find the NaN?

  • Hello! That’s right, when you find NaN in the CO2 he takes that same time the temperature and radiation and searches the last 3 days and the next similar values.

2 answers

2

Lucas, this process is called interpolação.

How your data is in the format of dataframe, take a look at the Docs. And also take a look at the part that talks about How to Work with Missing Data.

According to Docs, try running the command:

df['CO2'].interpolate()

You can also define which interpolation method to use:

method : {‘linear’, ‘time’, ‘index’, ‘values’, ‘nearest’, ‘zero’,
‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’, ‘piecewise_polynomial’, ‘from_derivatives’, ‘pchip’, ‘akima’}

Ex.:

df['CO2'].interpolate(method='linear')]

In addition, you can also incorporate conditional clauses to interpolate under certain conditions.

  • 1

    This method I have done, but now I need to do with those conditions I wrote above.

1


# Cria um index dos valores que são Nan
nan_index = df[df.isnull()].index
# Para todos os Nans
for i in range(df.isnull().sum()):
    # Extrai os valores da outra coluna que você quer procurar
    dado_nan = df[['coluna']][df.isnull()].iloc[i].values()
    # Substitui com as médias dos valores dentro da faixa desejada
    df['novaColuna'][nan_index[i]] = df[abs(df.coluna - dado_nan[0]) < 2.5].mean()
  • Thank you very much, it worked.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.