perform . Globalenv function in parallel processing

Asked

Viewed 64 times

3

I need to execute a function that is in . Globalenv in a parallel processing using the multidplyr package.

Using a simple example without parallel processing, it works as expected:

library(dplyr)
library(purrr)
library(multidplyr)

data.frame(x = 1:10) %>%
  mutate(y = purrr::map(x, add_a))

But when I try to put the parallelism, it does not recognize the function "add_a"

add_a <- function(x) {
  paste0(x, "A")
}

data.frame(x = 1:10) %>%
  partition() %>% 
  mutate(
    y = purrr::map(x, add_a)
  )

Returning the following message:

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  10 nodes produced errors; first error: objeto 'add_a' não encontrado 
  • Dear @Italo Cegatta, this link might help you: https://github.com/hadley/multidplyr/issues/15

1 answer

2


You have to export the object add_a for each cluster node.

One way to do this is to create the cluster manually and add the function to each node.

For example:

library(dplyr)
library(purrr)
library(multidplyr)

add_a <- function(x) {
  paste0(x, "A")
}

cluster <- create_cluster() # cria o cluster
cluster_assign_value(cluster, "add_a", add_a) # adiciona a função add_a a cada nó

data.frame(x = 1:10) %>%
  partition(cluster = cluster) %>% # fala qual cluster você vai usar
  mutate(
    y = purrr::map(x, add_a))

Source: party_df [10 x 3]
Groups: PARTITION_ID
Shards: 7 [1--2 rows]

# S3: party_df
       x PARTITION_ID         y
   <int>        <dbl>    <list>
1      1            1 <chr [1]>
2      6            1 <chr [1]>
3      9            2 <chr [1]>
4      5            3 <chr [1]>
5      7            3 <chr [1]>
6      2            4 <chr [1]>
7      3            5 <chr [1]>
8     10            5 <chr [1]>
9      8            6 <chr [1]>
10     4            7 <chr [1]>

Browser other questions tagged

You are not signed in. Login or sign up in order to post.