0
I’m reading a CSV file with pandas on a dataframe, it turns out on line 10, all the data is going to the first column, this way:
How can I solve this problem and separate correctly? I need only the first number to be in the ID column.
The line in the csv file is like this:
9009101002,"Apple iPhone XS Smartphone 256GB 4G Screen 5.8"""Front 12MP camera 7MP iOS 12 Golden", 32 934,102,401
And it was automatically generated by the system.
To read the file, I use the following line of code:
df = pd.read_csv("prods_tab.csv", encoding='latin-1', sep=',')
Note: When opening the file in notepad, it is presented as follows:
ID. Forn.,Prod. DESC.,SKU.,GRP. MERC. 3,COD. MARC.
,,,,
302100012,GELADEIRA FROST FREE INVERTER IB53X ELECTROLUX 454 LITROS INOX,100312-,996,302
302100012,GELADEIRA FROST FREE DB84 ELECTROLUX 598 LITROS BRANCO,89 721 ?,,
302100012,Frigobar Electrolux RE80 79 Litros Classe A 110V Branco,?1920-- 63,996,302
,,,,
ID. Forn.,Prod. DESC.,SKU.,GRP. MERC. 3,COD. MARC.
302100012,Geladeira Electrolux SS72X Side by Side Frost Free 504 Litros 2 Portas Classe A 127V Inox,18228 5,996,302
,,,,
ID. Forn.,Prod. DESC.,SKU.,GRP. MERC. 3,COD. MARC.
,,,,
"9009101002,""Smartphone Apple iPhone XS 256GB 4G Tela 5,8"""""""" Câmera 12MP Frontal 7MP iOS 12 Dourado"", 32 934,102,401"
9030121093,SMARTPHONE SAMSUNG GALAXY NOTE 8 N950F 64GB 2CHIPS PRETO,4??349 5,102,607
320621093,BONECA MULTIKIDS BUSH BABY WORLD SHIMMIES BR106,4342I,766,481
320621093,Brinquedo Kit de Voley Disney Princesas Líder 759 ,3 1---24-,766,481
9030121093,SMARTPHONE SAMSUNG GALAXY A8+ A730 64GB 2CHIPS DOURADO, 1 92501 ,,607
,,,,
Note 2: I have already treated the other lines, only this is missing.
has a comma there do not have? on line 10, was placed by you or generated by the system?
– Rafael Rotiroti
How is your csv structured (mainly line 10)? Please edit the question and add this information.
– AlexCiuffa
Reinforcing what @Alexciuffa said, add the top 10-12 lines of your csv to the question.
– Sidon
@Rafaelrotiroti was generated by the system.
– Lucas Almeida
@Alexciuffa edited, added the information.
– Lucas Almeida
@Sidon edited.
– Lucas Almeida
Can you put the line you use read_csv() on? , it is better to see the parameters passed.
– Rafael
@Rafael ready. :)
– Lucas Almeida
With the data you passed could not replicate the error here. I assembled a file
prods_tab.csv
with the header and the line passed, I calleddf = pd.read_csv("prods_tab.csv", encoding='latin-1', sep=',')
and it worked normally. Try to make the file available with a cut of the data so that it is possible to replicate this error. There must be something else in that CSV that’s slipping through your eyes.– Rafael Barros
I did tests here too, the problem really is the CSV, there is a separation pattern, there are hours TAB or some comma, the ideal is to use only one made this adaptation to TAB’s and it worked, this CSV is private or some testing base only?
– Rafael
@Rafael he is part of a test I’m doing for trainee in a company. It could help me better as I make the adaptation to TAB’s?
– Lucas Almeida
I only replace the separations by TAB’s but if your dataset is too big it doesn’t pay off if there are too many different separations. But anyway the first image with the data in columns is related to the dataset or just a representation of the system, I ask because in excel spreadsheet has a different representation.
– Rafael
@Rafael then, the dataset is small. I removed the first image and put as it is appearing the dataframe, that image was confused, but it was a copy of the dataframe presented in the notebook jupyter. Anyway, I don’t know how to fix this and I need to deliver :(
– Lucas Almeida
So you’re using excel, go to the location of that file, right click on it, and click edit, will open in the notepad, then you’ll see how the data really are, copy that same snippet of your example above and put the question.
– Rafael
@Rafael I did this and I realized that the line that is in trouble is in quotes... Does it have something to do?
– Lucas Almeida
Yes, whenever you are going to solve a data analysis problem open them with the notebook, already put a better answer.
– Rafael