How to parallelize a sapply with table

Asked

Viewed 46 times

2

I can perform the sapply smoothly, but I can’t parallel. In the original script I have more than 9,000,000 lines and so it is impossible to continue without parallelization.

dfteste<-data.frame(c(1,1,1),c(1,1,1),c(1,1,1))
apteste<-sapply(1:3,function (x) {paste(dfteste[x,], collapse="-")})

library(parallel)
cl<-makeCluster(4)
apteste<-parSapply(cl,1:3,function (x) {paste(dfteste[x,], collapse="-")}) #nao funciona
stopCluster()

Thank you.

1 answer

4

The problem is that the object dfteste is present in only two 4 environments created by makeCluster(). That is, you create the object in the current environment, then create another 3 environments in which dfteste is non-existent.

Possible solution: you can export the object dfteste environments created using the function clusterExport():

library(parallel)

cl <- makeCluster(4)
dfteste <- data.frame(c(1, 1, 1), c(1, 1, 1), c(1, 1, 1))
sapply(1:3, function (x) {paste(dfteste[x, ], collapse = "-")})
# [1] "1-1-1" "1-1-1" "1-1-1"

clusterExport(cl, "dfteste")
parSapply(cl, 1:3, function (x) {paste(dfteste[x,], collapse = "-")}) # funciona
#[1] "1-1-1" "1-1-1" "1-1-1"

stopCluster(cl)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.