3
good night!
I cross two bases in Rstudio, using the merge, however, I would like to know if using another crossover medium (ex:left_join), I get faster, because my tables reach 8 million lines.
Thank you.
3
good night!
I cross two bases in Rstudio, using the merge, however, I would like to know if using another crossover medium (ex:left_join), I get faster, because my tables reach 8 million lines.
Thank you.
4
Ronaldo, it’s all right?
Check out this experiment by comparing the merge() function with the inner_join() function of the dplyr package.
# Garantindo a reprodução dos resultados aleatórios
set.seed(101)
# Gerando dois datasets com 8.000.000 de observações para exemplo
df1 <- data.frame(x = sample(seq(1,16000000,1),8000000),
y = sample(seq(1,16000000,1),8000000),
z = sample(seq(1,16000000,1),8000000))
df2 <- data.frame(x = sample(seq(1,16000000,1),8000000),
y = sample(seq(1,16000000,1),8000000),
z = sample(seq(1,16000000,1),8000000))
# Testando a função merge()
system.time(dfa <- merge(df1, df2, by = c("x", "y")))
# user system elapsed
# 115.911 2.563 122.016
# Testando a função inner_join()
library(dplyr)
system.time(dfb <- inner_join(df1, df2, by = c("x", "y")))
# user system elapsed
# 16.459 0.966 17.833
Note that on my machine the merge function took 122 seconds to complete the operation, while the inner_join function took only 17 seconds.
Browser other questions tagged r modeling rstudio merge
You are not signed in. Login or sign up in order to post.
sample(16000000, 8000000)
is simpler and gives the same numbers. A only difference is that its version gives class vectorsnumeric
(61Mb) and this classinteger
(30.5Mb). Try, and test withidentical()
andall.equal()
.– Rui Barradas
@Noisy thanks for the tip.
– Antonio C. da Silva Júnior
Thanks Antonio, I’ll change my code and put the result here.
– Ronaldo Ribeiro