Create column in dataframe that contains the list of row values except null values (Nan)

Question

Create column in dataframe that contains the list of row values except null values (Nan)

Asked 5 years, 8 months ago

Viewed 95 times

0

Code and file used: https://github.com/CaioEuzebio/Python-DataScience-MachineLearning/tree/master/SalesLogistics

I’m working on an analysis using pandas. Basically I need to sort orders by quantity of products, and that contain the same products.

Example: I have order 1 and order 2, both have product A and product B. Using as key the list of products and quantity of products I will create a pivot that will index this combination of products and will return me the order who own the same products.

The purpose of the analysis is to obtain a dataframe like the one below:

dfFinal 

listProds            Ordens  NumProds
[prod1,prod2,prod3]     1       3
                        2
                        3
[prod1,prod3,prod5]     7       3
                        15
                        25
[prod5]                 8       1
                        3

So far the code is like this.

Setting as index the column 'Order' for the first pivot to be made

df1.index=df1['Ordem']
df3 = df1.assign(col=df1.groupby(level=0).Produto.cumcount()).pivot(columns='col', values='Produto')

With this pivot get the dataframe below.

df3 = 

col 0   1   2   3   4   5   6   7   8   9   ... 54  55  56  57  58  59  60  61  62  63
Ordem                                                                                   
10911KD YIZ12FF-A   YIZ12FF-A   YIIE2FF-A   YIR72FF-A   YIR72FF-A   YIR72FF-A   NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
124636  HYY32ZY-A   HYY32ZY-A   NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1719KD5 YI742FF-A   YI742FF-A   YI742FF-A   YI742FF-A   NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
22215KD YI762FF-A   YI762FF-A   YI762FF-A   YI762FF-A   YI762FF-A   YI762FF-A   YI6E2FF-A   YI6E2FF-A   YI6E2FF-A   NaN ... NaN NaN NaN NaN NaN

When I finish running the code, Nan values appear, and I need to remove them from the lines, so it does not influence the analysis I’m doing.

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by SakuraFreak • **319** points · Answer 1 · 2019-11-05T14:51:22+00:00

Be as simple as possible with the code. It will help you for both debugging and readability of the code, in case someone else needs to analyze. I recommend reading the pandas library, that there are several methods that inhibit verbose.

import pandas as pd    

# arquivos .csv não especificam a variável
df = pd.read_csv('Sales.csv', dtype={
        "Linha" : int,
        "Quantidade" : int,
        })
# transformar em datetime para facilitar análises de data
df['Data'] = pd.to_datetime(df['Data'], format='%d/%m/%Y')
df.groupby(['Ordem', 'Produto']).first()