Add new data in an empty pandas dataframe

Question

Add new data in an empty pandas dataframe

Asked 6 years, 8 months ago

Viewed 2,305 times

4

I am creating code to read several CSV files and extract some parameters from it and assemble a new dataframe with pandas, however I am facing a problem in this construction.

Initially intended to create an empty dataframe and as I read the Csvs I would add the rows and columns desired.

For example. Let’s say I initially have df empty. After reading my first CSV and adding it to df, I have:

df = pd.DataFrame(columns = ['01/05/2017','01/05/2018','01/05/2019'], index = [0], data=[0,10,11])

          '01/05/2017' '01/05/2018' '01/05/2019'
'Ana'      0            10           11

After sweeping the second CSV, man df would be:

          '01/05/2017' '01/05/2018' '01/05/2019' '10/06/2009'
'Ana'      0            10           11           nan
'Joao'     5            11           nan          5

In such a way that after several Csvs after I had a df as long and complete as I need.

I tried to form different df N and keep adding, but it didn’t work out as I wanted. One of the reasons is that if by chance the data of 'Joao' are distributed in more than 1 csv, the df would be:

          '01/05/2017' '01/05/2018' '01/05/2019' '10/06/2009'
'Ana'      0            10           11           nan
'Joao'     nan          nan          nan          5
'Joao'     5            nan          nan          nan
'Joao'     nan          11           nan          nan

Which is not the data format I want.

Is there any way to compose the information as desired?

1 answer

Browser other questions tagged python pandas

You are not signed in. Login or sign up in order to post.

by Leonardo Borges • **171** points · Answer 1 · 2018-11-19T04:51:03+00:00

It seems simple to solve. Assuming you have scanned all your files and inserted all the lines into the Dataframe. Just use the code:

df.groupby(df.index).sum()

Example

import pandas as pd
import numpy as np

data = np.array([[0,10,11,np.nan],
                [np.nan,np.nan,np.nan,5],
                [5,np.nan,6,np.nan],
                [np.nan,11,np.nan,np.nan]])

df = pd.DataFrame(data, columns=['01/05/2017','01/05/2018','01/05/2019','10/06/2009'], index=['Ana','Joao','Joao','Joao'])

Dataframe:

        01/05/2017  01/05/2018  01/05/2019  10/06/2009
Ana     0.0         10.0        11.0        NaN
Joao    NaN         NaN         NaN         5.0
Joao    5.0         NaN         6.0         NaN
Joao    NaN         11.0        NaN         NaN

Using the Groupby

df.groupby(df.index).sum()

Exit:

        01/05/2017  01/05/2018  01/05/2019  10/06/2009
Ana     0.0         10.0        11.0        0.0
Joao    5.0         11.0        6.0         5.0