Remove quotes in lines from csv file

Asked

Viewed 1,730 times

2

Given a particular file iris.csv:

"sepal_length,sepal_width,petal_length,petal_width,species"
"5.1,3.5,1.4,0.2,setosa"
"4.9,3,1.4,0.2,setosa"
"4.7,3.2,1.3,0.2,setosa"
"4.6,3.1,1.5,0.2,setosa"

I try to load the file with the following code:

import os
import numpy as np 
filename = os.path.join('iris.csv')
arquivo = np.loadtxt(filename, delimiter=',', usecols=(0,1,2,3), skiprows=1)

Returned error: could not Convert string to float: '"5.1'

I try to remove double quotes with the code below, however the error persists:

input_fd = open('iris.csv', 'r')
output_fd = open('saida.csv', 'w')
for line in input_fd.readlines():
    line = ','.join(['%s'%field.strip() for field in line.split(';')])+'\n'
    output_fd.write(line)
input_fd.close()
output_fd.close()
iris = open('saida.csv', 'r')

So, how can I automate the quotation marks between the lines of the file . csv?

  • If your file is really asism, with quotes around the whole line, the file is incorrect - no ready function will be able to read the file - answer below.

3 answers

2


If your file is really asism, with quotes around the entire line, the file is incorrect - no ready function will be able to read the file - The correct for a CSV file is to have only the values, delimited by a separator, and optionally quotes delimiting the "content" of the cells - ie, a line could be like this:

"5.1", "3.5" , "1.4", "0.2", "setosa"

But if so, the trend is that the interpreters who will read this file will interpret each line as a single cell:

"5.1,3.5,1.4,0.2,setosa"

The ideal would be to arrange whatever is generating this file so that it is corrected -without the quotation marks around each line.

If not possible, it would be possible to create with advanced Python something that "pretends" to be the file, but removes the '"' from each line, and passes the line already without these characters to the readers. But given that this file type will be something casual, until the problem in the generator is solved, the best is to use a few basic Python lines to record a another file, tidy - and use normal read methods.

The method strip of strings is enough to remvover quotes at the beginning and at the end of a string. But we have to remember that when reading lines, the last character is always " n" - so we include this character in the call to the strip.

Note that this code will break a file that is correctly quoting each cell:

def arruma_aspas(nome_do_arquivo):
   with open(nome_do_arquivo) as entrada, open(nome_do_arquivo + ".tmp", "wt") as saida:
       for linha in entrada:
           saida.write(linha.strip('"\n' + '\n')

   return nome_do_arquivo + ".tmp"
  • In fact, the lines are like this: "'5.1','3.5','1.4','0.2','setosa", which makes python interpret the first and last element as string. When using loaded_csv = np.genfromtxt('iris.csv', delimiter=','), the output is [Nan 3.5 1.4 0.2 Nan]

  • so I made the suggestion to use Pandas. The above code of preprocessing the file can be used to remove the quotes as well - but principametne if you will need the given-string of the last column, you will need the pandas anyway.

1

  • The output of the first line loaded_csv: [Nan 3.5 1.4 0.2 Nan]

  • Yes - numpy works with homogeneous data arrays - if you want to work with numbers, which is not number it uses the value "Nan": "not a number". If you want a table with hybrid columns, in which some columns are text and other numerical, you should use Pandas, as indicated - the fundamental object of Pandas - the "Dataframe" that is created with the "read_csv" function, you can have columns with different data types, as a spreadsheet.

-1

you can use . replace("\"", "")

Ex:

palavra = "\"teste\""
out = "teste"
palavra.replace("\"", "")
out = teste
  • 1

    replace will not work in the case, because it is using the numpy call to read the lines and already processes them - the questioner’s program has no access to the strings between them being read and numpy trying to decode the data in it. (pardon for downvote, the concept would be correct if it weren’t for that)

  • I understood your point, the type of the object is not string to do the manipulation and I also researched about what is downvote, already I became aware of 2 things.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.