Add line to line in a date.table in R

Asked

Viewed 1,224 times

8

Is there any method that can be added row by row in a date.table in R without using a'for', so that the value of the next row in a column is the value of the previous row added with a value different to each row?

Trying to solve an R voting exercise, I came across the following problem: I have a data.table with a column with the first value equal to 50 and the remaining 0 (10 6 lines), and the other column contains -1 or 1, corresponding to add or subtract 1 from that initial number. However, after summing or subtracting, it may not be less than zero or greater than 100.

# Valores iniciais:
N <- 100             # Número total de bolas
nRetiradas <- 1e6   # Número de inteirações (retiradas) do problema
q <- 0.35            # Probabilidade de transferência para urna com n <= N/2
p <- 1.0 - q           # Probabilidade de transferência para urna com n > N/2

# A tabela urnas[] armazena o número de bolas em cada passo em cada retirada
urnas <- data.table(A <- rep(0, nRetiradas), B <- rep(0, nRetiradas))
setnames(urnas, names(urnas), c("A", "B"))
rm(A, B)

# Condição inicial das urnas A e B:
urnas$A[[1]] <- N/2
urnas$B[[1]] <- N/2

# Escolha aleatória: considero que 1 represente uma bola retirada de uma urna
# com n > N/2, e -1 para uma urna com n <= N/2. Gero uma sequência inicial 
# para uma das urnas e multiplico esta por -1 para a outra urna
retiradas_A <- ifelse(runif(nRetiradas)<q,-1,1)
retiradas_B <- (-1)*retiradas_A

foreach(i=2:nRetiradas) %do% {
  if(min(urnas$A[[i-1]] + retiradas_A[i], urnas$B[[i-1]] + 
         retiradas_B[i]) >= 0){
    urnas$A[[i]] <- urnas$A[[i-1]] + retiradas_A[i]
    urnas$B[[i]] <- urnas$B[[i-1]] + retiradas_B[i]    

  } else {
    urnas$A[[i]] <- urnas$A[[i-1]] + retiradas_B[i]
    urnas$B[[i]] <- urnas$B[[i-1]] + retiradas_A[i]   
  }
}

The way I implemented it, each line of the data.table will depend on the previous line and, therefore, I was only able to solve it using a very time consuming. But to go through 10 6 lines of data.table is taking awe. Is there a more efficient solution? (even if it involves fully modifying my approach)

  • Let me get this straight: Vc has a date frame with two columns (urn and urn_b) where each row corresponds to the state (number of balls) of each repetition of the following experiment: play a coin and transfer a ball from the urn_a to the urn_b if it is heads (with probability p to be expensive). What should happen if the amount of balls in the urn burst? I stop the simulation?

  • If the amount of balls reaches 0 in one urn, I remove from the other. And if it reaches 100, I remove instead of adding. Was not specified in the statement, but I adopted this solution or also ignore until a removal is found (in case the urn is full).

1 answer

3


I was able to optimize the execution MUCH using matrices instead of data table. (there was nothing in my program that justified the need for a date.table).

Follow the code with solution I found:

# Valores iniciais:
N <- 100             # Número total de bolas
nRetiradas <- 1e6   # Número de inteirações (retiradas) do problema
nDescarte <- 1e4    # Número de descartes no início da sequência
q <- 0.35            # Probabilidade de transferência para urna com n <= N/2
p <- 1.0 - q           # Probabilidade de transferência para urna com n > N/2

# A tabela urnas[] armazena o número de bolas em cada passo em cada retirada
urnas <- matrix(data = NA, nrow = nRetiradas, ncol = 2)

# Condição inicial das urnas A e B:
urnas[1,1] <- N/2
urnas[1,2] <- N/2

# Escolha aleatória: considero que 1 represente uma bola retirada de uma urna
# com n > N/2, e -1 para uma urna com n <= N/2. Gero uma sequência inicial 
# para uma das urnas e multiplico esta por -1 para a outra urna
retiradas_A <- ifelse(runif(nRetiradas)<q,-1,1)
retiradas_B <- (-1)*retiradas_A

tempoInicio <- Sys.time()

# foreach é um comando para realizar um laço 'for' utilizando 
# paralelismo entre os multiplos Cores disponíveis no hardware.
for(i in 2:nRetiradas) {
  if(min(urnas[i-1, 1] + retiradas_A[i], urnas[i-1,2] + 
         retiradas_B[i]) >= 0){
    urnas[i,1] <- urnas[i-1,1] + retiradas_A[i]
    urnas[i,2] <- urnas[i-1,2] + retiradas_B[i]    

  } else {
    urnas[i,1] <- urnas[i-1, 1]+ retiradas_B[i]
    urnas[i,2] <- urnas[i-1, 2] + retiradas_A[i]   
  }
}

I hope that at some point this can help those who come across a similar problem!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.