Filter a header inside df in pandas

Asked

Viewed 66 times

0

I’m new to pandas and I’m having a hard time. I need to filter the columns in my dataframe, check if they are in a defined list and, if they are, group the columns and add up the values. It sounds simple, but I’m having a really hard time. The first image was how I tried to do it, but it went wrong. The second is my dataframe and the third is my list that I need to filter and return the sum of the columns within this list. Thanks in advance! inserir a descrição da imagem aqui

inserir a descrição da imagem aqui

inserir a descrição da imagem aqui

import pandas as pd
df = pd.read_csv(patch + 'PARTNER_23022021002403838.csv',sep=',',encoding='utf-8')
df = df.pivot_table('Value',df.index,['Flow','Country'])
df.columns = df.columns.droplevel(0)
grupos = {'Mundo' : ['Australia', 'Austria', 'Belgium', 'Canada', 'Chile', 'Czech Republic',
      'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece',
      'Hungary', 'Iceland', 'Ireland', 'Israel', 'Italy', 'Japan', 'Korea',
      'Luxembourg', 'Mexico', 'Netherlands', 'New Zealand', 'Norway',
      'Poland', 'Portugal', 'Slovak Republic', 'Slovenia', 'Spain', 'Sweden',
      'Switzerland', 'Turkey', 'United Kingdom', 'United States', 'Australia',
      'Austria', 'Belgium', 'Canada', 'Chile', 'Czech Republic', 'Denmark',
      'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary',
      'Iceland', 'Ireland', 'Israel', 'Italy', 'Japan', 'Korea', 'Luxembourg',
      'Mexico', 'Netherlands', 'New Zealand', 'Norway', 'Poland', 'Portugal',
      'Slovak Republic', 'Slovenia', 'Spain', 'Sweden', 'Switzerland',
      'Turkey', 'United Kingdom', 'United States'],
        'Desenvolvidos' : ['United Kingdom', 'United States', 'Canada', 'Japan', 'Norway', 'Switzerland' ],
        'America Latina' : ['Chile','Mexico']}
paises = []
df_fim = pd.DataFrame()
for item in df.columns:
   if item in grupos['Mundo']:
       paises.append(item)
       df_resultado = df.loc[:,paises].sum(axis=1)
       df_fim = pd.concat([df_fim,df_resultado],axis=0)
       df_fim.rename(columns = {'0':'B'},inplace=True)

   else:
       print('erro')

  • 2

    João Victor, instead of code images enter the code itself in the question, there is a block for indentation. Also put the dataset for testing if possible for people who will try to help you. Hug!

  • 1

    Thanks for the help, I’m new here

1 answer

0


As lmonferrari commented, having part of the data becomes easier for someone to help.
See if this helps:

First, I will generate a false data with 4 columns a, b, x, y with 5 lines.

colunas_A = ['a', 'b']
colunas_B = ['x', 'y']

df = pd.DataFrame()

for i, col in enumerate(colunas_A + colunas_B):
    n = (i+1)*100
    df[col] = range(n,n+5)

print(df)

In theory you already have the table and a list of columns,
that in my example would be the lists selected1 and selected2. And now the selection with the sum by columns:

selecao1 = ['a','x']
selecao2 = ['y','b']

soma_sel1 = df.filter(items=selecao1).sum(axis=1)
soma_sel2 = df.filter(items=selecao2).sum(axis=1)

df['soma_sel_1'] = soma_sel1
df['soma_sel_2'] = soma_sel2

print(df)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.