Is it possible to pair values of two dataframes with different observation numbers?

Asked

Viewed 91 times

3

I have two data frames:

Sexo <- rep(1:2 , length.out = 51)
Estudo <- rep(1: 17, length.out = 51)
Salário <- c(runif(51, min=900, max=3000))

data1 <- data.frame(Sexo, Estudo, Salário)

data2 <-  data.frame(TaxaHomens = c(seq(0.1,0.99,length=17)), 
                TaxaMulheres = c(seq(0.2,0.99,length=17)), 
                Estudo = c(1:17))

The variable Sex in data1 is classified as 1 for men and 2 for women, Study corresponds to the person’s years of studies and salary to monthly earnings.

It is possible to create a column called Rate in the first data frame(data1) so that the rates of the second data frame are organized according to Sex and Study?

For example, in the new data1 column I want to have for Sex=1 and Study=3 a rate of 0.211250, for all such observations and so on.

3 answers

4


Using the packages dplyr and tidyr:

library(dplyr)
library(tidyr)
data2 <- data2 %>% 
  gather(Sexo, Taxa, TaxaHomens:TaxaMulheres) 
data2$Sexo <- ifelse(data2$Sexo == "TaxaHomens", 1, 2)
left_join(data1, data2, by = c("Sexo", "Estudo"))

What I did was turn your data2 so that it would only have a rate column (men and women in the same column).

0

On the basis of R:

Sexo <- rep(1:2 , length.out = 51)
Estudo <- rep(1: 17, length.out = 51)
Salario <- c(runif(51, min=900, max=3000))

data1 <- data.frame(Sexo, Estudo, Salario)

data2 <-  data.frame(TaxaHomens = c(seq(0.1,0.99,length=17)), 
                TaxaMulheres = c(seq(0.2,0.99,length=17)), 
                Estudo = c(1:17))


data1$Fator = data1$Estudo
data1$Fator = factor(data1$Fator)
levels(data1$Fator) = data2$TaxaHomens
data1$Fator = as.numeric(as.character(data1$Fator))
head(data1)

  Sexo Estudo   Salario    Fator
1    1      1 2287.2813 0.100000
2    2      2 2845.6058 0.155625
3    1      3 2606.5139 0.211250
4    2      4  911.2628 0.266875
5    1      5 2658.2753 0.322500
6    2      6 2082.1462 0.378125

The key is to create a column of factors with the same values as in the Study column and replace the levels by the values of the column Taxahomens or Taxamulheres. Important that this way only works because in data2 there are no duplicate data.

0

There may be a more efficient way, but if you don’t have many lines it won’t be a problem:

for(i in 1: dim(data1)[1]) {
  data1$Taxa[i] <- data2[which(data2$Estudo == data1$Estudo[i]), data1$Sexo[i]]
}
  • When I use this solution in data frame I get the following answer: Error in $<-.data.frame(*tmp*, "Desemp", value = c(NA, NA, NA, 0.0559149359202274 : Replacement has 4 Rows, date has 362555 -------- My data frame has more than 300000 lines so I think it doesn’t work.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.