Problem concatenating csv files

Asked

Viewed 389 times

1

I’m trying to concatenate one CSV file with another. My goal is to remove data from an HTML daily and my routine should take a csv file called 'old data' where there is a dataframe saved in csv, and when it runs again should create a new updated file and concater this new file with the old one. After this happens he should erase the repeated data and adding only the new ones to the csv file, creating a new 'old data' so that tomorrow the routine runs again. I’m using:

#a.to_csv('dado_antigo.csv')
b = pd.read_csv('dado_antigo.csv', 
                index_col='Data',
                parse_dates= ['Data'])
#arquivo concatenado
c = pd.concat((b,a))
aa, bb = np.unique(c, return_index=True)
c = c.ix[bb]
c = pd.read_csv('dado_antigo.csv')

And I get this mistake:

Indexerror: indices are out-of-Bounds

How could I fix it? Thank you.

1 answer

2

Based on pandas version 0.20.1, there is a function called pandas.DataFrame.drop_duplicates here in documentation that can help you.

You can do so, for example:

df1 = pd.DataFrame(data=[['1', '2'], ['3', '4'], ['1', '2']], columns=['A', 'B'])

df2 = pd.DataFrame(data=[['5', '6'], ['7', '8'], ['1', '2']], columns=['A', 'B'])

res = pd.concat([df1, df2], axis=0)

res = res.drop_duplicates().reset_index(drop=True)

The result in res should contain what you need.

Heed: The .reset_index(drop=True) it is not necessary, but I strongly advise, because without it your frame will have the contents out of order and this can cause you problems depending on what you want to do later.

I hope I’ve helped.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.