Are there any R functions similar to excel PROCV?

Asked

Viewed 1,958 times

8

In my case I have two date frames.:

> head(Trecho)
         Xt       Yt Zt
1 -75.56468 1.642710  0
2 -74.56469 1.639634  0
3 -73.56469 1.636557  0
4 -72.56470 1.633480  0
5 -71.56470 1.630403  0
6 -70.56471 1.627326  0

    > head(TrechoSim)
        Xs        Ys Zs
1 -71.7856 -0.509196  0
2 -71.7856 -0.509196  0
3 -71.7856 -0.509196  0
4 -71.7856 -0.509196  0
5 -71.7856 -0.509196  0
6 -71.7856 -0.509196  0

The data frame Trecho has approximately 5 thousand lines and the TrechoSim has 20 thousand lines. Similar to PROCV from Excel, I need to fetch the nearest value where Xt = Xs (in excel I use TRUE, and returns the first value closest to Xt). There is no tolerance for this proximity. I need all data frame values Trecho with their respective value closer to TrechoSim. I tried to difference_inner_join but it returns values NA in some lines.

Grateful,

  • 1

    https://answall.com/a/124326/6036

  • @Danielfalbel this solution is for IDENTICAL searched values. In my case it is a similar value, because none will be identical, or if it happens will be few.

  • sorry, li rapido! Maybe you’re behind this: https://github.com/dgrtwo/fuzzyjoin

  • @Danielfalbel already tested this. He asks for a distance. And my distance varies. Temp <- difference_left_join(Excerpt, Trechosim, by = c(Xt="Xs") , max_dist = .1 ). And some values return NA.

  • How is the question of multiple matches for the objective of the problem? E.g., in the sample data, the value 5 of the database Trecho could be mapped to any of the six initial values of TrechoSim. It is acceptable that two entries of Trecho are associated with the same entry of TrechoSim?

  • @Eriksonk. Disregard these initial values, as it is a data collection of a simulated car, at this time it is stopped. I did in Excel and it worked with PROCV.

  • I think the simplest solution will be using the library fuzzyjoin. You can update the question including the code Voce used with difference_inner_join? It would also be good to update the question title to make it clear that you want to use Fuzzy match

Show 2 more comments

1 answer

4


I don’t have the original datasets or Excel installed to test the function PROCV, but I think the code below solved the problem.

The function procura calculate the difference, in absolute value, between a number and a vector and finds which position of the vector is closest to this number.

The code is not optimized, but I imagine it should run fairly fast on current computers. I tested the same code by increasing the sample sizes of simulated data to 5000 and 20000 and my code took less than 2 seconds to make all comparisons.

Trecho    <- data.frame(Xt=rnorm(5),  Yt=rnorm(5),  Zt=0)
TrechoSim <- data.frame(Xs=rnorm(20), Ys=rnorm(20), Zt=0)

procura <- function(x, y){
  return(which.min(abs(x-y)))
}

index <- 0

for (j in 1:length(Trecho$Xt)){
  index[j] <- procura(Trecho$Xt[j], TrechoSim$Xs)
}

Trecho
TrechoSim[index, ]
  • Marcus Nunes has how to insert the columns of TrechoSim[index,] their respective lines of Trecho? Creating a data frame Xt Yt Zt Xs Ys Zs

  • Try cbind(Trecho, TrechoSim[index, ]).

  • I did it and it worked Temp <- {cbind(Trecho, data.frame(TrechoSim[index, ], &#xA; row.names = seq_along(Trecho$Xt)), deparse.level = 0)}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.