4
I have three dataframes with different number of lines and I would like to create a new dataframe with 100 random values from these dataframes and based on three criteria:
A - Column a and b will have 100 random values of 1 dataframe
B - The first 50 rows of columns C1 and D1 in 50 paired random values, that is, they occur in the same row of dataframe 2
C - The subsequent 50 rows of columns (51-100) C2 and D2 in the other 50 paired random values, which occur in the same row from dataframe 3
I tried with loop but it doesn’t go well. How could I fix or do this in a better way?
Here are the data and the script, and the expected result:
a <- c(4,6,7,3,2,5,6,9,6,5,8,6,7,8,9,7,6)
b <- c(40,60,70,30,20,NA,60,90,60,50,75,34,42,32,NA,45,29)
c1 <- c(1,2,3,4,5,6,7,8,9,10)
d1 <- c(10,9,8,7,6,5,4,3,2,1)
c2 <- c(11,12,13,14,15,16,17,18,19,20)
d2 <- c(20,19,18,17,16,15,14,13,12,11)
df1 <- data.frame(a,b)
df2 <- data.frame(c1,d1)
df3 <- data.frame(c2,d2)
#newdf (with 100 rows)
n <- 100
newdf <- data.frame(n=rep(1:n))
newdf$a <- NA
newdf$b <- NA
newdf$c <- NA
newdf$d<- NA
for (i in 1:50){
newdf$a[i] <- sample(df1$a, 1, replace=T) # random value
newdf$b[i] <- sample(df1$b, 1, replace=T) # random value
newdf$c[i] <- sample[df2$c1,1, replace=T] # one criterion
newdf$d[i] <- sample[df2$d1,1, replace=T] # one criterion
}
for (i in 51:100){
newdf$a[i] <- sample(df1$a, 1, replace=T) # random value
newdf$b[i] <- sample(df1$b, 1, replace=T) # random value
newdf$c[i] <- sample[df3$c2,1, replace=T] # two criterion
newdf$d[i] <- sample[df3$d2,1, replace=T] #two criterion
}
#Result
a b c d
7 60 1 10 # linha 1
6 50 3 8
2 90 5 6 # linha 50
.
.
.
2 90 11 20 # linha 51
.
.
.