Calculate the product of an operation between two dataframes conditionally

Asked

Viewed 71 times

3

Suppose I have these two dataframes:

set.seed(123)
df1<-data.frame(rep=rep(1:4,each=360),parc=rep(1:40,each=36),trat=rep(sample(1:10),each=36),tree=rep(1:36,40),med=1,dap_prev=rnorm(1440, mean = 12))
df2<-data.frame(med=rep(1:18,each=10),trat=rep(sample(1:10)),b0=rnorm(180),b1=rnorm(180))

In df2 need to recover the values of df2$b0 and df2$b1 which meet the criteria df1$med == df2$med and df1$trat == df2$trat. Next create a new column in df1 whose product is df2$b0 + df2$b1 * df1$dap_prev.

I tried with this command below, but of course it didn’t work:

df1$ddap_cm <- df2$b0[df2$med == df1$med & df2$trat == df1$trat] + df2$b1[df1$med == df2$med & df1$trat == df2$trat] * df1$dap_prev

All help is welcome. Grateful.

EDIT:

I ended up finding a very simple solution with dplyr

library(dplyr)
df1 <- left_join(df1, df2, by = c("med", "trat")) # copia as colunas df2$b0 e df2$b1 que cumpram os critérios
df1$ddap_cm <- df1$b0 + (df1$b1*df1$dap_prev)
  • 2

    In the command you made there is a typo df.1$trat when I believe you wanted to say df1$trat. Test and see if it fulfills the order!

  • As Flavio said, I think the problem was only in the typo itself

  • How is it that df1$dap_prev is not indexed by logical conditions?

2 answers

1

A proposal using basic R. I made a change in Dfs to simulate more real cases where there is data that is not in both Dfs

set.seed(123)
df1<-data.frame(rep=rep(1:4,each=360),parc=rep(1:40,each=36),trat=rep(sample(1:10),each=36),tree=rep(1:36,40),med=c(rep(1,1000),rep(4,400),rep(20,40)),dap_prev=rnorm(1440, mean = 12))
df2<-data.frame(med=rep(1:18,each=10),trat=rep(sample(1:15,10)),b0=rnorm(180),b1=rnorm(180))

df2$inddf1med=0
for (i in unique(df1$med))df2$inddf1med[df2$med==i]=1
df2$inddf1trat=0
for (i in unique(df1$trat))df2$inddf1trat[df2$trat==i]=1
sel=df2[df2$inddf1med == 1 & df2$inddf1trat==1,]

df1$res=NA
for (i in 1:nrow(sel)){
  selc=df1$med==sel$med[i]&df1$trat==sel$trat[i]
df1$res[selc]=sel$b0[i] + sel$b1[i] * df1$dap_prev[selc]
}
df1f=df1[!is.na(df1$res),]

1

You must first create the new column, then you can assign the respective values of the calculation.

i1 <- df1$med == df2$med
i2 <- df1$trat == df2$trat

df1$ddap_cm <- NA
df1$ddap_cm <- df2$b0[i1 & i2] + df2$b1[i1 & i2] * df1$dap_prev[i1 & i2]

Note: If there are values NA in the original tables, must use which(i1 & i2) to index the columns of interest.

  • df1$dap_prev does not take into account the criteria i1 and i2, as it is in your code

  • @Rafaelcunha I noticed but have to take. If not how, recycles? Vectors have to be the same length. Maybe it’s better to ask the OP that.

  • I just commented according to the topic, nor stopped to think whether to take into consideration or not

  • I ended up building a simpler solution with dplyr but that had the same effect. Thank you.

  • @Aníbaldebonineto you have to understand exactly what you need and how you will do it. If it was a typo, as pointed out earlier, the summary of your new variable has certain values, which differ if you consider the criteria i1 and i2 raised by Rui. And the summary is also different with this approach using the package dplyr

  • @Rafaelcunha in the code that I pasted here that point was a typo, in my script there was that point in the variable name. merge() correctly associated B0 and B1 values for each observation. Thank you.

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.