Python: conditional sum with variable condition

Question

Python: conditional sum with variable condition

Asked 5 years, 2 months ago

Viewed 723 times

2

good morning.

The task is as follows, I am trying to make a 'a,b,c' curve of products per company. I have the following df as an example:

df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
               "produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
               "vl_total": [100,200,150,50,200,300,400]
               })

To make the abc curve, you need to create a column with the accumulated sum of the total value per company, and get the following value (vl_acm):

df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
               "produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
               "vl_total": [100,200,150,50,200,300,400],
               "vl_acm": [100,300,450,50,250,550,950]
               })

In Excel it is very easy to do this by applying the function sum up and putting the values of the column 'company' as conditional, but in Python I could not find a way to do it.

Thanks in advance guys.

Alive, what defines '...trying to make a turn...'?? Analyze this post I did once. https://answall.com/questions/444139/erro-utilizando-matplotlib/444339#444339 is this what you need? Set the example of the result you expect to get and the code you’ve already done to achieve it.

– Ernesto Casanova

2020/06/05 at 14:01
Hello Ernesno. I’m referring to the a,b,c curve of product value, is a widely used classification in logistics. This post is about graphics, I saw no relation with what I requested above. The result I hope to get is the 'vl_acm' column of the second dataframe I posted above. I made it manually to illustrate, what I’m trying to do is to generate a solution to calculate this column in Python. I didn’t post code because unfortunately I don’t even know where to start... thank you!

– Alvaro Baraldi

2020/06/05 at 14:28
Okay, I’m no expert in logistics, so I couldn’t infer that from your question, it wasn’t clear to anyone who wasn’t an expert. However, I think you should try to make some code and then share if there are problems, just so you can learn

– Ernesto Casanova

2020/06/05 at 14:38
Yeah, I’m doing my best. Unfortunately, with the knowledge I have today, the closest I could get was to create a column with the total accumulated value, using the code 'df['vl_acm'] = df['vl_total']. cumsum()', but this solution does not respect the 'company' column as a condition, as I need it to be done. Anyway, thanks for the comment.

– Alvaro Baraldi

2020/06/05 at 14:48

3 answers

Browser other questions tagged python pandas summing-up

You are not signed in. Login or sign up in order to post.

by afonso • **748** points · Answer 1 · 2020-06-05T17:05:14+00:00

You can perform this sum using the method cumsum after doing the groupby per company:

df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
               "produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
               "vl_total": [100,200,150,50,200,300,400]
               })
df['vl_acm'] = df.groupby('empresa')['vl_total'].cumsum()
df

Out[2]:
    empresa produto vl_total    vl_acm
0   HST     prod1   100         100
1   HST     prod2   200         300
2   HST     prod3   150         450
3   HSC     prod4   50          50
4   HSC     prod5   200         250
5   HSC     prod6   300         550
6   HSC     prod7   400         950

by JoaoP_L • 51 points · Answer 2 · 2020-06-05T16:42:53+00:00

Buddy, I’m a beginner, but I’m still gonna try to help you because I believe I’ve found a solution. Surely there are more efficient ways but to help you I thought as follows:

import pandas as pd
df = pd.DataFrame({"empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"],
               "produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"],
               "vl_total": [100,200,150,50,200,300,400]
               })
filtro = df["empresa", "vl_total"] # Aqui você consegue colocar na váriavel filtro apenas
                                   # as colunhas 'empresa' e 'vl_total'
vl_acm = [] # Cria uma nova lista que será a coluna que você quer atribuir ao df
valores_dos_indices = [] # Cria um vetor para colocar os indices certos para posteriormente
                        # serem colocados os valores nas linhas corretas
for i in range(filtro["empresa"]): # O valor 'i' passará por todas as empresas
    if filtro["empresa"][i] == 'HSC': # Se a empresa correspondente ao indice 'i' for igual ao valor 'HSC'
        vl_acm.append(filtro["vl_total"][i]) # O valor da empressa  correspondente ao indice 'i' será atribuida ao vetor criado 
        valores_dos_indices.append(i)

df["vl_acm"] = [] # Cria uma nova coluna no df, por enquanto vazia
for i in valores_dos_indices: # Percorre todos os indices já marcados anteriormente
    df['vl_acm'].append(vl_acm[i]) # Preenche a nova coluna com os valores correspondentes

Is that about what you wanted? If not, explain to me better so I can help you. Abs

by jfaccioni • **1,283** points · Answer 3 · 2020-06-05T18:26:02+00:00

You need to use pandas.groupby to group by company, and then cumsum to make the cumulative sum for each group (enterprise).

Knowing this is a matter of a line:

df = pd.DataFrame({
    "empresa":["HST", "HST", "HST", "HSC", "HSC","HSC","HSC"], 
    "produto":["prod1", "prod2", "prod3", "prod4", "prod5", "prod6", "prod7"], 
    "vl_total": [100,200,150,50,200,300,400]
})

df['vl_acm'] = df.groupby('empresa').cumsum()

Dataframe obtained at the end:

  empresa produto  vl_total  vl_acm
0     HST   prod1       100     100
1     HST   prod2       200     300
2     HST   prod3       150     450
3     HSC   prod4        50      50
4     HSC   prod5       200     250
5     HSC   prod6       300     550
6     HSC   prod7       400     950