The secret here is to understand the output of the function gather
. Let’s see him in detail below.
df <- read.table(
text = "c1 c2 c3 x
2 4 5 0
3 5 2 0
6 7 8 0
1 2 5 1
2 5 6 1
3 3 3 1", header = TRUE)
library(tidyverse)
df %>% gather("id", "value", 1:3)
#> x id value
#> 1 0 c1 2
#> 2 0 c1 3
#> 3 0 c1 6
#> 4 1 c1 1
#> 5 1 c1 2
#> 6 1 c1 3
#> 7 0 c2 4
#> 8 0 c2 5
#> 9 0 c2 7
#> 10 1 c2 2
#> 11 1 c2 5
#> 12 1 c2 3
#> 13 0 c3 5
#> 14 0 c3 2
#> 15 0 c3 8
#> 16 1 c3 5
#> 17 1 c3 6
#> 18 1 c3 3
Note that there are 3 columns. We will use x
and value
to take the values of x
and c1
, c2
and c3
of the original set, making it id
be the identifier of each column in the long format. Thus, we have the following chart:
df %>% gather("id", "value", 1:3) %>%
ggplot(., aes(x = x, y = value, colour = id)) +
geom_point()
Note that there is not much to understand regarding the correlation of c1
, c2
and c3
with x
, for x
only has two values. One way to solve this is by adjusting a curve to each value of id
. In this case, I opted for a simple linear regression and the visualization improved a little.
df %>% gather("id", "value", 1:3) %>%
ggplot(., aes(x = x, y = value, colour = id)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
#> `geom_smooth()` using formula 'y ~ x'
Another way would be to divide the id
in panels, to at least better separate the groups of points and facilitate the visualization of trends.
df %>% gather("id", "value", 1:3) %>%
ggplot(., aes(x = x, y = value)) +
geom_point() +
facet_wrap(~ id)
Created on 2021-07-28 by the reprex package (v2.0.0)
I’ll try here. VLW
– Cézar Azevedo