Very good solution @Carloscinelli. But an alternative solution is using the package Iterators. The change_dot() function basically reads a line, swaps the ',' for the '.' and writes the line in a text file.
library(iterators)
change_dot <- function(file, saida='teste.txt', chunk=1) {
con1 <- file(file, 'r')
con2 <- file(saida, open = 'w')
linha <- 0
it <- ireadLines(con1, n=chunk)
out <- tryCatch(expr=write(x = gsub(pattern = ',', replacement = '.', x = nextElem(it)), con2),
error=function(e) e)
while(!any(class(out) == "error")) {
linha = linha + 1
print(paste('Escrita linha ', linha))
out <- tryCatch(expr=write(x = gsub(pattern = ',', replacement = '.', x = nextElem(it)), con2, append = T),
error=function(e) e)
}
}
system.time(change_dot(file = 'AC2012.txt', saida = 'saida.csv'))
user system elapsed
48.65 4.70 53.04
In this file, the AC2012.txt, the procedure took 48 seconds on my machine.
Here it should be noted that it is possible to increase the size of Chunk to values greater than 1. For example, increasing to 40000 obtained the following times using this solution, and Damico’s solution:
change_ponto <- function() {
file.create("acre.txt")
outcon <- file( "acre.txt" , "w" )
incon <- file("AC2012.txt" , "r" )
while( length( one.line <- readLines( incon , 40000 , encoding="latin1") ) > 0 ){
one.line <- gsub( ',' , '.' , one.line )
writeLines( one.line , outcon )
}
}
system.time(change_ponto())
user system elapsed
6.53 0.82 7.36
system.time(change_dot(file = 'AC2012.txt', saida = 'teste4.csv', chunk = 40000))
user system elapsed
6.71 3.12 9.92
And now testing if the files are the same:
teste2 <- read.csv("acre.txt", header=F, sep=";", stringsAsFactors=FALSE, row.names=NULL)
teste4 <- read.csv("teste4.csv", header=F, sep=";", stringsAsFactors=FALSE, row.names=NULL)
all.equal(teste2, teste4)
[1] TRUE
I wrote a post about Iterators on my blog a while back: http://www.rmining.com.br/2015/09/07/preparacao-de-dados-parte-2/
Very good @Lucasmation!
– Flavio Barros