Optimization of R code

Asked

Viewed 143 times

3

I wonder if anyone has any suggestions so I can optimize the code below.

The idea I took from that website. You have a permutation of n cards, for example [2, 4, 1, 3] (where the 2 is the top card). In each round, the person must reverse the m first cards, where m is the top card. Rounds are repeated until the final permutation is [1, 2, 3, ..., n].

Exemplifying:

[2, 4, 1, 3]
[4, 2, 1, 3]
[3, 1, 2, 4]
[2, 1, 3, 4]
[1, 2, 3, 4]

I used the function perm library Deducer to obtain all possible permutations of a given vector.

#retorna a quantidade de mudanças necessárias para obter o vetor de 1 até n
topswops <- function(x){ 
  i <- 0
  while(x[1] != 1){
    primeira <- x[1]
    if(primeira == length(x)){
       x <- rev(x)
    } else{
      x <- c(x[primeira:1], x[(primeira+1):length(x)])
    }
    i <- i + 1
  }
  return(i)
}

library(Deducer)

inicio <- Sys.time()
resultado <- NULL
for(i in 1:10){
  A <- perm(1:i)
  A <- data.frame(A)
  A$flips <- apply(A, 1, topswops)
  resultado <- rbind(resultado, c(i, max(A$flips)))
}
fim <- Sys.time()
fim - inicio

The processing time was 3.74 minutes. I suspect the permutations part A <- perm(1:i) it takes the most time, but I couldn’t think of any other way.

1 answer

2


From what I’ve analyzed of your code, your suspicion is correct: the function perm is what is slowing down the code. I came to this conclusion by testing the function execution time topswops:

library(rbenchmark)
benchmark(topswops(sample(1:i, 10)), replications = 100000)
                       test replications elapsed relative user.self sys.self user.child sys.child
1 topswops(sample(1:i, 10))       100000   2.388        1     2.253    0.107          0         0

Repeat topswops 100,000 times in random vectors of size 10, with elements 1 to 10, took 2,388 seconds. That is, the bottleneck is not here.

That said, I looked for other ways to generate all possible permutations for a sequence of numbers. In addition to the Deducer, the packages combinat and gtools can also generate all permutations of a sequence of numbers. My test results were as follows::

library(Deducer)
inicio <- Sys.time()
resultado.Deducer <- NULL
for(i in 1:10){
  A <- perm(1:i)
  A <- data.frame(A)
  A$flips <- apply(A, 1, topswops)
  resultado.Deducer <- rbind(resultado.Deducer, c(i, max(A$flips)))
}
fim <- Sys.time()
fim - inicio
Time difference of 4.131159 mins

library(combinat)
inicio <- Sys.time()
resultado.combinat <- NULL
for(i in 1:10){
  A <- as.data.frame(matrix(unlist(permn(1:i)), ncol=i, byrow=TRUE))
  A$flips <- apply(A, 1, topswops)
  resultado.combinat <- rbind(resultado.combinat, c(i, max(A$flips)))
}
fim <- Sys.time()
fim - inicio
Time difference of 2.371664 mins

library(gtools)
inicio <- Sys.time()
resultado.gtools <- NULL
for(i in 1:10){
  A <- permutations(i, i, 1:i)
  A$flips <- apply(A, 1, topswops)
  resultado.gtools <- rbind(resultado.gtools, c(i, max(A$flips)))
}
Warning messages:
1: In A$flips <- apply(A, 1, topswops) : Coercing LHS to a list
2: In A$flips <- apply(A, 1, topswops) : Coercing LHS to a list
3: In A$flips <- apply(A, 1, topswops) : Coercing LHS to a list
4: In A$flips <- apply(A, 1, topswops) : Coercing LHS to a list
5: In A$flips <- apply(A, 1, topswops) : Coercing LHS to a list
6: In A$flips <- apply(A, 1, topswops) : Coercing LHS to a list
7: In A$flips <- apply(A, 1, topswops) : Coercing LHS to a list
8: In A$flips <- apply(A, 1, topswops) : Coercing LHS to a list
9: In A$flips <- apply(A, 1, topswops) : Coercing LHS to a list
10: In A$flips <- apply(A, 1, topswops) : Coercing LHS to a list
fim <- Sys.time()
fim - inicio
Time difference of 1.849558 mins

Therefore, one way to optimize the code is precisely by improving the generation of permutations. In my test, the function that best came out was gtools::permutations, taking 1.849558 minutes to run. This is equivalent to 44% of the time used by the original function.

It may be possible to further improve this performance by optimizing the generation of permutations. Since there are millions of replications, I believe that any improvement in permutation generation is something that counts very much in the end.

If the search is for the shortest execution time and not necessarily the best algorithm, it is also possible to decrease the execution time by parallelizing the code. But then I leave this exercise to the reader : )

  • The idea of parallelization is good, but since it was a joke code, I’m not gonna worry about it that much. I was able to find another package that reduced the runtime to 1.58 minutes - iterpc. I ran my initial code again and the time spent was 3.25 minutes. So the gain, it was over 50%.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.