Apply dynamic filter to a dynamic dataframe in Python Pandas

Asked

Viewed 528 times

1

TL;DR

Does anyone know how to apply a filter that will sum the values of a column of a dynamic table?

The problem

Well, I’ll show you the Dataframe first, it’ll be easier to explain.

dados = ["Cidade_a",2,'-'],
        ["Cidade_b",5,7],
        ["Cidade_c",'X',9]

df = pd.DataFrame(dados, columns=['Nome', 'var_1', 'var_2'])

What will generate the following DF:

Nome       var_1 var_2
Cidade_a    2    '-'
Cidade_b    5     7
Cidade_c   'X'    9

I need to create a total, below this list below this information, however, ignoring the values "X" and "-". That is, the sum for the var_1 was 7 and the sum to var_2 was 16.

If it was just that, it would not be a problem, it happens that I need to do the same task with other tables, are more than 10 tables and may be increasing in the next few days. As well, each table has a different size in column questions.

What I’ve already tried

I did a column drop Nome. And I tried to create a dynamic dictionary to include the values, but when changing which columns Pandas should check, it gives error, says it did not find the column with the name coluna, instead of finding the column var_1 and var_2. This was the code I used. Based on this, I would play a function that would be applied to all the tables I work.

valor = dict()

for coluna in colunas:
    
    valor[coluna] = tabela[(tabela.coluna != 'X') & (tabela.coluna != '-')][coluna].sum()

With the result of this, it was only to include in the dataframe that I was manipulating with a:

df[len(df)] = valor

I have thought about doing this in the spreadsheet, and then "cut out" the columns I want with the data already included, but then I would have to add more than 100 lines of code for each column that exists in the complete spreadsheet.

I accept suggestions.

2 answers

3


I don’t know if this is exactly what you want, but see if it helps you.

First has a function that creates another dataframe with totals

def total(tabela):
    valor = {"Nome": "Total"}

    for coluna in tabela.columns:
        if coluna != "Nome":
            valor[coluna] = tabela[(tabela[coluna] != 'X') & (tabela[coluna] != '-')][coluna].sum()
    
    valor = pd.DataFrame(valor, index=[len(tabela)])   
    
    return valor

Then you concatenate this new dataframe with the old one:

total = total(df)
df = pd.concat([df, total])

Then you’ll have an exit like this:

    Nome    var_1   var_2
0   Cidade_a    2   "-"
1   Cidade_b    5    7
2   Cidade_c   "X"   9
3   Total       7    16

1

df.loc[len(df)] = np.insert('Total', 1, df.drop('Nome', 1).replace('X|-', 0, regex=True).sum(0))
  • Explain your answer better.

  • That’s a request?

  • His answer worked as well as the other, however, I would have a little more difficulty if I had to make any changes to the code. But it worked.

  • This code favours the treatment of exceptional values, which seems to be its biggest problem. In the other answer, you would have to make a new condition (tabela[coluna] != '<valor>') every time a different not integer data appears. In this one, you just add to regex 'X|-|<valor>'.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.