Filter a header inside df in pandas

Question

Filter a header inside df in pandas

Asked 4 years, 4 months ago

Viewed 66 times

0

I’m new to pandas and I’m having a hard time. I need to filter the columns in my dataframe, check if they are in a defined list and, if they are, group the columns and add up the values. It sounds simple, but I’m having a really hard time. The first image was how I tried to do it, but it went wrong. The second is my dataframe and the third is my list that I need to filter and return the sum of the columns within this list. Thanks in advance!

import pandas as pd
df = pd.read_csv(patch + 'PARTNER_23022021002403838.csv',sep=',',encoding='utf-8')
df = df.pivot_table('Value',df.index,['Flow','Country'])
df.columns = df.columns.droplevel(0)
grupos = {'Mundo' : ['Australia', 'Austria', 'Belgium', 'Canada', 'Chile', 'Czech Republic',
      'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece',
      'Hungary', 'Iceland', 'Ireland', 'Israel', 'Italy', 'Japan', 'Korea',
      'Luxembourg', 'Mexico', 'Netherlands', 'New Zealand', 'Norway',
      'Poland', 'Portugal', 'Slovak Republic', 'Slovenia', 'Spain', 'Sweden',
      'Switzerland', 'Turkey', 'United Kingdom', 'United States', 'Australia',
      'Austria', 'Belgium', 'Canada', 'Chile', 'Czech Republic', 'Denmark',
      'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary',
      'Iceland', 'Ireland', 'Israel', 'Italy', 'Japan', 'Korea', 'Luxembourg',
      'Mexico', 'Netherlands', 'New Zealand', 'Norway', 'Poland', 'Portugal',
      'Slovak Republic', 'Slovenia', 'Spain', 'Sweden', 'Switzerland',
      'Turkey', 'United Kingdom', 'United States'],
        'Desenvolvidos' : ['United Kingdom', 'United States', 'Canada', 'Japan', 'Norway', 'Switzerland' ],
        'America Latina' : ['Chile','Mexico']}
paises = []
df_fim = pd.DataFrame()
for item in df.columns:
   if item in grupos['Mundo']:
       paises.append(item)
       df_resultado = df.loc[:,paises].sum(axis=1)
       df_fim = pd.concat([df_fim,df_resultado],axis=0)
       df_fim.rename(columns = {'0':'B'},inplace=True)

   else:
       print('erro')

2

João Victor, instead of code images enter the code itself in the question, there is a block for indentation. Also put the dataset for testing if possible for people who will try to help you. Hug!

– lmonferrari

2021/02/23 at 19:18
1

Thanks for the help, I’m new here

– Joao Victor

2021/02/23 at 22:07

1 answer

Browser other questions tagged python database list pandas

You are not signed in. Login or sign up in order to post.

by famaral42 • **126** points · Answer 1 · 2021-02-24T21:18:45+00:00

As lmonferrari commented, having part of the data becomes easier for someone to help.
See if this helps:

First, I will generate a false data with 4 columns a, b, x, y with 5 lines.

colunas_A = ['a', 'b']
colunas_B = ['x', 'y']

df = pd.DataFrame()

for i, col in enumerate(colunas_A + colunas_B):
    n = (i+1)*100
    df[col] = range(n,n+5)

print(df)

In theory you already have the table and a list of columns,
that in my example would be the lists selected1 and selected2. And now the selection with the sum by columns:

selecao1 = ['a','x']
selecao2 = ['y','b']

soma_sel1 = df.filter(items=selecao1).sum(axis=1)
soma_sel2 = df.filter(items=selecao2).sum(axis=1)

df['soma_sel_1'] = soma_sel1
df['soma_sel_2'] = soma_sel2

print(df)