Complete values in a table, with values of the table itself?

Asked

Viewed 83 times

0

I have the table

    data        ativo   valor     ajuste       arquivo
17/07/2020  teste1  4,35       nan       9032800000190015197.txt
16/07/2020  teste1  nan        nan       9032800000190015197.txt
15/07/2020  teste1  3,12       nan       9032800000190015197.txt
14/07/2020  teste1  nan        nan       9032800000190015197.txt
13/07/2020  teste1  nan       6367.55    9032800000190015197.txt
17/07/2020  teste2  3,7     -1462.825    9032800000190015197.txt
16/07/2020  teste2  nan        nan       9032800000190015197.txt
15/07/2020  teste2  nan        nan       9032800000190015197.txt
14/07/2020  teste2  nan      -235.825    9032800000190015197.txt
13/07/2020  teste2  nan        nan       9032800000190015197.txt
17/07/2020  teste3  5          nan       9032800000190015197.txt
16/07/2020  teste3  nan        nan       9032800000190015197.txt
15/07/2020  teste3  6,7        nan       9032800000190015197.txt
14/07/2020  teste3  nan        nan       9032800000190015197.txt
13/07/2020  teste3  nan        nan       9032800000190015197.txt

and I need to get the following result

   date        ativo   valor     ajuste       arquivo
17/07/2020  teste1  4,35       nan       9032800000190015197.txt
16/07/2020  teste1  4,35       nan       9032800000190015197.txt
15/07/2020  teste1  3,12       nan       9032800000190015197.txt
14/07/2020  teste1  3,12       nan       9032800000190015197.txt
13/07/2020  teste1  3,12      6367.55    9032800000190015197.txt
17/07/2020  teste2  3,7     -1462.825    9032800000190015197.txt
16/07/2020  teste2  3,7        nan       9032800000190015197.txt
15/07/2020  teste2  3,7        nan       9032800000190015197.txt
14/07/2020  teste2  3,7      -235.825    9032800000190015197.txt
13/07/2020  teste2  3,7        nan       9032800000190015197.txt
17/07/2020  teste3  5          nan       9032800000190015197.txt
16/07/2020  teste3  5          nan       9032800000190015197.txt
15/07/2020  teste3  6,7        nan       9032800000190015197.txt
14/07/2020  teste3  6,7        nan       9032800000190015197.txt
13/07/2020  teste3  6,7        nan       9032800000190015197.txt

I have values absent in the field value and I need to fill in the previous values of the same table, considering that the value must be of the same active and date lower. And the value is not the highest or the lowest, but the value of the previous day that has a non-zero value.

I couldn’t find a solution that met these conditions. Does anyone have any idea how to do it? Using pandas.DataFrame.backfill, couldn’t.

I tried SQL, and I couldn’t, I made this code

select a.dt_data, a.co_ativo,
nullif(a.vr_valor, (select b.vr_valor from tabela b 
where b.co_ativo = a.co_ativo and b.dt_data between '2020-07-01' and '2020-07-31' and b.vr_valor is not null order by b.dt_data desc limit 1))
from tabela a
where a.dt_data between '2020-07-01' and '2020-07-31'
order by a.dt_data desc 

unsuccessful.

1 answer

0


It is possible to make this substitution by grouping by active with the function groupby together with the ffill, in this way:

df['data'] = pd.to_datetime(df['data'], dayfirst=True)
df.sort_values(['ativo', 'data'], ascending = [True,False], inplace = True)

df['valor'] = df.groupby(['ativo'])['valor'].ffill()
#ou df = df.assign(valor = df.groupby(['ativo'])['valor'].ffill())
#resultado
    data        ativo   valor
0   2020-07-17  teste1  4,35
1   2020-07-16  teste1  4,35
2   2020-07-15  teste1  3,12
3   2020-07-14  teste1  3,12
4   2020-07-13  teste1  3,12
5   2020-07-17  teste2  3,7
6   2020-07-16  teste2  3,7
7   2020-07-15  teste2  3,7
8   2020-07-14  teste2  3,7
9   2020-07-13  teste2  3,7
10  2020-07-17  teste3  5
11  2020-07-16  teste3  5
12  2020-07-15  teste3  6,7
13  2020-07-14  teste3  6,7
14  2020-07-13  teste3  6,7

Edited

Using the new data sampling, the code still gives the expected output:

    s = io.StringIO("""    data        ativo   valor     ajuste       arquivo
17/07/2020  teste1  4,35       nan       9032800000190015197.txt
16/07/2020  teste1  nan        nan       9032800000190015197.txt
15/07/2020  teste1  3,12       nan       9032800000190015197.txt
14/07/2020  teste1  nan        nan       9032800000190015197.txt
13/07/2020  teste1  nan       6367.55    9032800000190015197.txt
17/07/2020  teste2  3,7     -1462.825    9032800000190015197.txt
16/07/2020  teste2  nan        nan       9032800000190015197.txt
15/07/2020  teste2  nan        nan       9032800000190015197.txt
14/07/2020  teste2  nan      -235.825    9032800000190015197.txt
13/07/2020  teste2  nan        nan       9032800000190015197.txt
17/07/2020  teste3  5          nan       9032800000190015197.txt
16/07/2020  teste3  nan        nan       9032800000190015197.txt
15/07/2020  teste3  6,7        nan       9032800000190015197.txt
14/07/2020  teste3  nan        nan       9032800000190015197.txt
13/07/2020  teste3  nan        nan       9032800000190015197.txt
    """)

df = pd.read_csv(s, sep="\s\s+")

df['data'] = pd.to_datetime(df['data'], dayfirst=True)
df.sort_values(['ativo', 'data'], ascending = [True,False], inplace = True)
df['valor'] = df.groupby(['ativo'])['valor'].ffill()
df.head(20)

    data        ativo   valor   ajuste      arquivo
0   2020-07-17  teste1  4,35    NaN         9032800000190015197.txt
1   2020-07-16  teste1  4,35    NaN         9032800000190015197.txt
2   2020-07-15  teste1  3,12    NaN         9032800000190015197.txt
3   2020-07-14  teste1  3,12    NaN         9032800000190015197.txt
4   2020-07-13  teste1  3,12    6367.550    9032800000190015197.txt
5   2020-07-17  teste2  3,7     -1462.825   9032800000190015197.txt
6   2020-07-16  teste2  3,7     NaN         9032800000190015197.txt
7   2020-07-15  teste2  3,7     NaN         9032800000190015197.txt
8   2020-07-14  teste2  3,7     -235.825    9032800000190015197.txt
9   2020-07-13  teste2  3,7     NaN         9032800000190015197.txt
10  2020-07-17  teste3  5       NaN         9032800000190015197.txt
11  2020-07-16  teste3  5       NaN         9032800000190015197.txt
12  2020-07-15  teste3  6,7     NaN         9032800000190015197.txt
13  2020-07-14  teste3  6,7     NaN         9032800000190015197.txt
14  2020-07-13  teste3  6,7     NaN         9032800000190015197.txt
  • Thank you, Terry, for your contribution. Playing the code in my data frame, did not execute the replacements this is due to the fact that I have more columns in the file?

  • @Kabelo Having more columns shouldn’t be a problem. In your DF the column date was already in the format of Date? 'Cause if she was already this could go wrong

  • I also added another option using assign to return the filled values to the column valor. If you can test :)

  • If the problem persists, I will ask you to modify your available test data in the question so that I can reproduce on my machine what is happening to you. Using the data you have provided so far, my output is equal to the way you expected it to be correct

  • I inserted the columns I have in the file with the same data types. This alternative did not work.

  • @Kabelo For me the code keeps working, giving the expected output.

  • 1

    Terry, it worked, man!! Re-start the kernel here and rolled!! Thank you so much!!!

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.