1
TL;DR
Does anyone know how to apply a filter that will sum the values of a column of a dynamic table?
The problem
Well, I’ll show you the Dataframe first, it’ll be easier to explain.
dados = ["Cidade_a",2,'-'],
["Cidade_b",5,7],
["Cidade_c",'X',9]
df = pd.DataFrame(dados, columns=['Nome', 'var_1', 'var_2'])
What will generate the following DF:
Nome var_1 var_2
Cidade_a 2 '-'
Cidade_b 5 7
Cidade_c 'X' 9
I need to create a total, below this list below this information, however, ignoring the values "X" and "-". That is, the sum for the var_1
was 7
and the sum to var_2
was 16
.
If it was just that, it would not be a problem, it happens that I need to do the same task with other tables, are more than 10 tables and may be increasing in the next few days. As well, each table has a different size in column questions.
What I’ve already tried
I did a column drop Nome
. And I tried to create a dynamic dictionary to include the values, but when changing which columns Pandas should check, it gives error, says it did not find the column with the name coluna
, instead of finding the column var_1
and var_2
. This was the code I used. Based on this, I would play a function that would be applied to all the tables I work.
valor = dict()
for coluna in colunas:
valor[coluna] = tabela[(tabela.coluna != 'X') & (tabela.coluna != '-')][coluna].sum()
With the result of this, it was only to include in the dataframe that I was manipulating with a:
df[len(df)] = valor
I have thought about doing this in the spreadsheet, and then "cut out" the columns I want with the data already included, but then I would have to add more than 100 lines of code for each column that exists in the complete spreadsheet.
I accept suggestions.
Explain your answer better.
– Edu Mendonça
That’s a request?
– Guilherme Brügger
His answer worked as well as the other, however, I would have a little more difficulty if I had to make any changes to the code. But it worked.
– R. C. Junior
This code favours the treatment of exceptional values, which seems to be its biggest problem. In the other answer, you would have to make a new condition
(tabela[coluna] != '<valor>')
every time a different not integer data appears. In this one, you just add to regex'X|-|<valor>'
.– Guilherme Brügger