1
Given the following data frame:
df <- tibble::tribble(
~pass_id, ~km_ini, ~km_fin,
1L, 0.89, 2.39,
2L, 1.53, 3.03,
3L, 21.9, 23.4,
4L, 23.4, 24.9,
5L, 24, 25.5,
6L, 25.9, 27.4,
7L, 36.7, 38.2,
8L, 41.4, 42.9,
9L, 42.1, 43.6,
10L, 45.5, 47
)
Created on 2020-02-17 by the reprex package (v0.3.0)
I need a sample of 50 numbers that meet the following criteria for the data frame as a whole, not just for each row of it:
>= .750
<= 99.450
< km_ini - .750
> km_fin + .750
What I’ve achieved so far is the easiest part, which are the first two (which I could do directly from the drawing itself with runif
- merit 0). The problem is that later I tried to make a enframe
and then I tried filter
, unsuccessful.
P.S.: I don’t necessarily need the result as a data frame, it can be a vector.
library(tidyverse, verbose = F)
set.seed(42)
sort(runif(100000, 0, 99.450)) %>%
enframe(., "ID", "km") %>%
filter(km >= .750 & km <= 99.450 - .750)
#> # A tibble: 98,467 x 2
#> ID km
#> <int> <dbl>
#> 1 763 0.750
#> 2 764 0.751
#> 3 765 0.751
#> 4 766 0.753
#> 5 767 0.753
#> 6 768 0.754
#> 7 769 0.754
#> 8 770 0.755
#> 9 771 0.755
#> 10 772 0.757
#> # … with 98,457 more rows
Created on 2020-02-17 by the reprex package (v0.3.0)
EDIT: Trying to visually represent the problem
The final result needs to be a numerical set that evaluates the entire data set, not just each line separately. As an example for the first two lines, see the following representation:
In this way, see that:
- The black line indicates that I cannot have data smaller than . 750.
- The blue line indicates where I can’t have records depending on the coverage area of the
km_ini
andkm_fin
(arrows) of line 1 plus an appendix considering the area of + or - 750 (between arrows and dots). - The red line indicates where I can’t have records depending on the coverage area of the
km_ini
andkm_fin
(arrows) of line 2 plus an appendix considering the area of + or - 750 (between arrows and dots).
This way, already face, the random set of data, within the first 4000 meters, could only have numbers from 3030 + 750
.
The question, then, is to try to do this programmatically so that all the lines of the data frame are evaluated and the numbers generated are not within all the conditions cited.
I really appreciate the help, maybe I didn’t ask the question in the most correct way (how to improve?). Look, the result in the column
km
doesn’t suit me because she picks criteria from some lines ofdf
original, but not at alldf
. So let’s take, for example, line 8 of the result of your example. It points to a value of 2.54. However, 2.54 is not. 750 less than the next km_ini, which is 1.53. Thus, this value does not suit me. Even the value of line 7 (.413) does not meet me because it is not . 750 less than . 89 and nor is it also greater than . 750 (item 1 of the requirements list).– rdornas
@rdornas I realized that for each line the random value should be between
km_ini - 0.750
andkm_fin + 0.750
, hence therunif
have these limits.– Rui Barradas
@rdornas In fact it is not greater than 0.750 (condition 1). So on line 8 it should be between
max(km_ini - 0.750, 0.750)
andmin(km_fin + 0.750, 3.14)
, or not? These range limits are0.750
and3.14
.– Rui Barradas
I think your approach is correct, the path may be there. The point is that the number I need has to be contemplated for the
df
as a whole. Really think of stretches of a highway, where none of the random numbers can be within any interval betweenkm_ini
andkm_fim
(and not 750 m before km_ini, not 750 m after km_end) in the entire data frame. In these conditions, in some km there is overlap, which makes it difficult to solve the problem. I really appreciate the help and the discussion!– rdornas
@rdornas I think this is it. See now.
– Rui Barradas
It’s amazing how simple something can be so complex. Unfortunately we didn’t get the answer. For example, in line 1, . 829 is not true in condition
km_ini - .750
. In fact, in this short excerpt of the result you sent, there is no result that iskm_ini < km_ini - 750
orkm_fin > km_fin + 750
. At the end of the day, see that I cannot have any number less than 2.39 (which is the first km_fin), because on line 1, for example,.89 - .750 = .14
, but condition 1 says it has to be>750
and between km_ini and km_fin I can’t have numbers either. I apologize for the work!– rdornas
@rdornas Line 1:
km
always has to be>.75
andkm_ini - .75 = .14
soon theliminf = .75
andkm_ini = .89
. Upstairs,km
must be<99.45
andkm_fin+.75=3.14
soonlimsup = 3.14
andkm_fin=2.39
. So the random numberkm
is or in[.75, .89]
or in[2.39, 3.14]
. That’s how I’m understanding the problem. Is that right? (Apparently not.) When you say that "there is no result that iskm_ini < km_ini - 750
orkm_fin > km_fin + 750
" it is clear that not, eliminatingkm_ini
on both sides would0 < -.75
and the same forkm_fin
.– Rui Barradas
Rui, I made a new edition of the question. I put an illustration to try to show the problem more visually. Maybe it’s a little clearer. If you have any questions, don’t hesitate to ask.
– rdornas