Operation with very large size lists

Asked

Viewed 115 times

5

I have a code that calculates the area of the intersection between two polygons and for that I use lists to store the coordinates of the vertices of the polygons, however there are many polygons and it is taking on average 6h to run the whole code. Do you know any list operation that can help minimize the procedure?

My code

require(dplyr); require(rgeos); require(sp)
sim.polygons = function(objects, vertex){
  polygons = NULL
  for(i in 1:objects) polygons[[i]] = matrix(runif(vertex*2), ncol = 2)
  return(polygons)
}

teste = function(lista1, lista2, progress = F){
  lista1 = lapply(lista1, as, Class = "gpc.poly")
  lista2 = lapply(lista2,  as, Class = "gpc.poly")
  res = matrix(0, nrow = length(lista2), ncol = length(lista1))
  for(k in 1 : length(lista1)){
    for(l in 1 : length(lista2)){
      res[l, k] = area.poly(intersect(lista1[[k]], lista2[[l]])) #Gargalo do código
    }
    if(progress == T) print(k)
  }
  res
}
#exemplo
a = sim.polygons(50, 3) #no meu problema objects = 144 e vertex = 3
b = sim.polygons(100, 3) #objects = 114^2 e vertex = 3

teste(a, b, T)

1 answer

3


I couldn’t streamline your code unless I proposed a solution that runs in parallel.

teste2 <- function(lista1, lista2, progress = F){
  lista1 = lapply(lista1, as, Class = "gpc.poly")
  lista2 = lapply(lista2,  as, Class = "gpc.poly")

  res <- plyr::laply(lista2, function(l2){
    plyr::laply(lista1, function(l1){
      area.poly(intersect(l1 , l2)) #Gargalo do código
    })
  },.parallel = T)

  res
}

Note the argument .parallel = T. Next you need to register the backend:

On Windows:

library(doSNOW)
library(foreach)
cl <- makeCluster(2)
registerDoSNOW(cl)

On Linux:

library(doMC)
registerDoMC(2)

In which 2 is the number of cores of your processor (maybe it has more).

a = sim.polygons(10, 3) #no meu problema objects = 144 e vertex = 3
b = sim.polygons(20, 3) #objects = 114^2 e vertex = 3
microbenchmark::microbenchmark(
  v1 = teste(a,b,F),
  v2 = teste2(a,b,F),
  times = 5
)

Unit: milliseconds
 expr      min       lq     mean   median       uq       max neval
   v1 569.4241 629.3930 819.8292 833.3761 889.4672 1177.4855     5
   v2 445.0611 465.1625 548.7329 483.9004 598.9802  750.5603     5

With two cores the time does not reduce so much, but if your computer has 4 maybe the reduction is significant.

The problem is that the function itself area.poly(intersect(a , b)) is slow:

> a <- as(a[[1]], "gpc.poly") 
> b <- as(b[[1]], "gpc.poly")
> microbenchmark::microbenchmark(
+     area.poly(intersect(a , b)) 
+ )
Unit: milliseconds
                       expr    min      lq     mean median      uq    max neval
 area.poly(intersect(a, b)) 2.9008 2.97925 3.146169 3.0493 3.33235 4.0275   100

See that in my case it is called 200x:

> 10*20*3.146169 
[1] 629.2338

What would give this time. That is, the manipulation of the results is not adding so much time to the execution time of the function.

> 144^3*3.146169/1000/60
[1] 156.5735

Even without capturing the results, the estimated time would be approx. 2h30.

  • I replaced the area.poly by a that one and the time has dropped enough, but I’ll wait a while to see if someone can decrease the team. If not, you take it. Thank you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.