Plot of lines with months on the x-axis in ggplot - R

Asked

Viewed 699 times

2

I’m taking R classes and an exercise came up to make a chart of lines on ggplot2 with data of flight delays per airport. On the X axis should stay the months (which are in numbers), but when plotted, it shows intervals of 2.5 (The graph ends up showing until the month 12.5!).

I tried turning the months into factors and still continues to show in 2,5. I did not find in help the use of "breaks".

Does anyone have any idea what the problem is and how to solve it?

library(tidyverse)
library(nycflights13)
voo <- flights

voo %>% group_by(month, origin) %>% 
  summarize(media_delay = mean(dep_delay, na.rm = T)) %>% 
  ggplot() +
  geom_line(aes(x = month, y = media_delay, group = origin, col = origin))
  • 1

    I tested it here voo$month <- as.factor(voo$month) before plotting and worked.

  • William, thanks for the help! I had used factor(voo$month). It returns the inputs and levels in the console but does not transform into factors (I didn’t know).

1 answer

5


The central in the ggplot are always your data. Your data are in a numerical format and therefore the ggplot believes it is better to use a continuous scale in X and to divide so that the intervals that appear are2.5 in 2.5.

There are several ways to fix this:

  1. Explain the range you want by keeping the scale continuous;
  2. Turn data into headings to force all data to appear;
  3. Turn data into date to use date scale on graph.

Option 2 will not be shown because it is an option that, although it may solve a specific problem, may also include new problems (such as how to sort values, etc.) and does not correspond to an adequate representation of the variable.

1. Ranges of the numerical scale

To do this just include the desired breaks in the numerical scale with the arguments breaks of function scale_x_continuous().

voo %>% group_by(month, origin) %>% 
  summarize(media_delay = mean(dep_delay, na.rm = T)) %>% 
  ggplot() +
  geom_line(aes(x = month, y = media_delay, group = origin, col = origin)) + 
  scale_x_continuous(breaks = 1:12)

inserir a descrição da imagem aqui

To see more about scales on ggplot, see this one link

2. Time scale

In this case, we won’t even need to define the scale, but change the variable and rely on the ggplot. For this just turn the column month in a type variable date keep the same question command to create the chart.

voo %>% 
  # Adiciona zeros a esquerda 
  mutate(month = formatC(month, width = 2, flag = "0"),
         day = formatC(day, width = 2, flag = "0"),
         month = lubridate::ymd(paste0(year, month, day)),
         month = lubridate::floor_date(month, "month")) %>% 
  group_by(month, origin) %>% 
  summarize(media_delay = mean(dep_delay, na.rm = T)) %>% 
  ggplot() +
  geom_line(aes(x = month, y = media_delay, group = origin, col = origin))

inserir a descrição da imagem aqui

  • 1

    Anyway, I always recommend changing the variable to date in this type of case.

  • Wonder Tomas! these solutions have not been addressed in the course (yet) but are accurate! could only explain to me why add zeros to the left and pq use

  • 1

    I did the answer half in a hurry, but the short version is that the date format expects dates like 01032019 for March 1, 2019, and as the days and months were full, they would stay 132019. The paste0 glue texts without spaces between them.

  • 1

    Just a tip: you don’t need to trust ggplot patterns for time scale. You can use scale_x_date() to customize.

  • Sure! The goal was to simplify the answer.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.