How to make boxplots monthly, quarterly, biannual, annual and every 10 years for a period of 30 years?

Asked

Viewed 94 times

0

I need to create several boxplots with different data intervals. I have a data-frame composed of precipitations of the city of Alagoas and need to compare with other data-frame with precipitation data of the city of Viçosa. I thought to make this comparison using boxplots for the following cases:

  1. Monthly boxplots for each year (1990-2019).

As done by colleague "Luizz" but inserting for each year the values of the second city. In order to stay for each year the boxplots of the city of Alagoas above and below the city of Viçosa.

  1. Quarterly boxplots for each year (1990-2019).

The graphic representation would be in the same style that the colleague "Luizz" did, but inserting for each year the values of the second city. In order to stay for each year the boxplots of the city of Alagoas above and below the city of Viçosa.

  1. Six-month boxplots for each year (1990-2019).

The graphic representation would be in the same style that the colleague "Luizz" did, but inserting for each year the values of the second city. In order to stay for each year the boxplots of the city of Alagoas above and below the city of Viçosa.

  1. Annual boxplots (1990-2019).

The graphic representation would be in the same style that the colleague "Luizz" did, but inserting for each year the values of the second city. In order to stay for each year the boxplots of the city of Alagoas above and below the city of Viçosa.

  1. Boxplots every 10 years (1990-2019).

The graphic representation would be in the same style that the colleague "Luizz" did, but inserting for decade the values of the second city. In order to stay for each decade the boxplots of the city of Alagoas above and below the city of Viçosa.


The only way I know how to do the boxplot is by selecting the entire column that are precipitation data (10958 precipitations for the 10957 days of the 30 years (1990-2019)) and still do separately for each city:

cidade1 <- read.csv("Alagoas.csv", header = T, skip = 0)
boxplot(cidade1$prec)
cidade2 <- read.csv("Vicosa.csv", header = T, skip = 0)
boxplot(cidade2$prec)

But I would like to learn a way to select only the data-frame dates and build the boxplots for each city and display the graphs of the two cities comparatively visually. Would it be possible?

Note: I could not understand how to use the dput (data) suggested by colleagues Rui Barradas and Carlos Eduardo Lagosta even researching how to use it. I apologize for that, but I just started programming in R and it’s been a challenge for me. Trying to answer minimally descriptively what I understood to be dput(data), the data-frame of the city of Alagoas is in format . csv and Prec values are in millimetres, arranged as follows:

Data        prec
01/01/1990  8,9
02/01/1990  32,7
03/01/1990  0,3
04/01/1990  3,9
05/01/1990  8,8
06/01/1990  29,2
07/01/1990  2,5
08/01/1990  0,6
09/01/1990  27,7
10/01/1990  11,2
11/01/1990  9,8
12/01/1990  0
13/01/1990  6,6
14/01/1990  0
15/01/1990  0
16/01/1990  0
17/01/1990  0
18/01/1990  0
19/01/1990  5,5
20/01/1990  0
21/01/1990  0
22/01/1990  0
23/01/1990  0
24/01/1990  6,2
25/01/1990  0
26/01/1990  0
27/01/1990  19,4
28/01/1990  15,1
29/01/1990  0,9
30/01/1990  16,4
31/01/1990  0
01/02/1990  0
02/02/1990  0
03/02/1990  0
04/02/1990  0
05/02/1990  0
06/02/1990  0
07/02/1990  0,1
08/02/1990  10,1
09/02/1990  0,1
10/02/1990  0,2
11/02/1990  1,3
12/02/1990  2,4
13/02/1990  2,6
14/02/1990  0
15/02/1990  0
16/02/1990  4,8
17/02/1990  0
18/02/1990  0,6
19/02/1990  7,1
20/02/1990  4,1
21/02/1990  0
22/02/1990  5
23/02/1990  0
24/02/1990  2,3
25/02/1990  6,3
26/02/1990  9,1
27/02/1990  25,7
28/02/1990  8,3
...
31/12/2019  0,5

For the city of Viçosa the layout is the same:

Data        prec 
01/01/1990  0
02/01/1990  14,2
03/01/1990  0
04/01/1990  1
05/01/1990  4,8
06/01/1990  0
07/01/1990  0,4
08/01/1990  0
09/01/1990  0
10/01/1990  0
11/01/1990  0
12/01/1990  0
13/01/1990  2,6
14/01/1990  0
15/01/1990  0
16/01/1990  0
17/01/1990  0
18/01/1990  0
19/01/1990  0
20/01/1990  0
21/01/1990  2,4
22/01/1990  1,8
23/01/1990  0
24/01/1990  0
25/01/1990  0
26/01/1990  0
27/01/1990  4
28/01/1990  0
29/01/1990  3,8
30/01/1990  11,8
31/01/1990  0
01/02/1990  0
02/02/1990  0
03/02/1990  0
04/02/1990  0
05/02/1990  0
06/02/1990  0
07/02/1990  0
08/02/1990  0
09/02/1990  90
10/02/1990  21,4
11/02/1990  0
12/02/1990  0
13/02/1990  0
14/02/1990  0
15/02/1990  0
16/02/1990  0
17/02/1990  0
18/02/1990  0
19/02/1990  0
20/02/1990  0
21/02/1990  0
22/02/1990  0
23/02/1990  0
24/02/1990  0
25/02/1990  0,2
26/02/1990  24
27/02/1990  9,6
28/02/1990  5,4
...
31/12/2019  1,2
  • 4

    Can you please, edit the question with the departure of dput(dados) or, if the base is too large, dput(head(dados, 20))? Note: dados is the name of the base.

  • 3

    What exactly do you want? Select a specific month to plot? Compare a same month for several years? A chart with all months and years? In addition to a sample of your data using dput, include an example of which end result you want.

  • 1

    Bruna, it seems to me that your question has become too complex and asking for many things. I suggest you focus on just one question, try to be more objective see this link regarding: https://answall.com/help/how-to-ask. . How to use the dput(): is very simple, just put the name of your dataframe inside the parentheses and rotate. There’s going to be a strange result, you paste this result here in code format. Here are examples of how to use dput() at this link: https://pt.meta.stackoverflow.com/questions/824/como-crea-um-exemplo-m%C3%Adnimo-reproduces%C3%Advel-em-r

  • 1

    Your dataframe has more than 10,000 lines. Maybe one option is to select two years only: dput(head(cidade1, 730)) . It works more or less like this: you ask one specific thing per question. If you want many things. Ask first the simplest. Once you have the answer, you try to reproduce it, understand it, and do the most complex on your own. If you fail, ask for help again here on another question to make the most complex.

  • Luizz thanks for the guidance. I saw the tutorials you gave me and I will reformulate my question soon!

1 answer

5


Since you did not provide the complete data, we can create something similar to them by the features described. This dataframe will be called df:

library(tidyverse)
library(ggplot2)

temp_date <- seq(from=as.Date("1990-01-01"), to=as.Date("2019-12-31"), by="day")
data <- format(as.Date(temp_date), format = "%d/%m/%Y") 
set.seed(1)
prec <- rnorm(10957, mean=1.5, sd=0.3)
df <- data.frame(data, prec)

If your dates are not in format Date, then it is useful to convert them to this format, if they are already, you do not need to run the line below:

df$data <- lubridate::dmy(df$data)

Then we will create a column with only the months and another with only the years, to use them in the boxplots:

df$mes <- format(as.Date(df$data), "%m")
df$ano <- format(as.Date(df$data), "%Y")

Finally, we built the boxplots:

ggplot(df, aes(x=mes, y=prec)) + 
    geom_boxplot() +
    facet_wrap(~ano)

inserir a descrição da imagem aqui

Well, this is a possible visualization. The result was very compressed. Other visualizations are possible. You can, for example, assign a different color for each month of the year:

ggplot(df, aes(x=mes, y=prec, fill=mes)) + 
    geom_boxplot() +
    facet_wrap(~ano)

inserir a descrição da imagem aqui

Edited:

Well, you edited your question, requesting the construction of new boxplots for various periods. Above, I had already shown how to create separate variables for months and years. Now I’m going to show you how to recode these variables and make a graph combining two of these periods. Just follow this template and adjust for any other periods you may wish.

Recoding variables to create periods:

df <- df %>% 
  mutate(
    trimestre = case_when(
      as.numeric(mes) <4 ~ "1",
      as.numeric(mes) <7 ~ "2",
      as.numeric(mes) <10 ~ "3",
      TRUE ~ "4"),
    semestre = case_when(
      as.numeric(mes) <7 ~ "1",
      TRUE ~ "2"),
    decada = case_when(
      as.numeric(ano) < 2000 ~ "1990 - 1999",
      as.numeric(ano) < 2010 ~ "2000 - 2009",
      TRUE ~ "2010 - 2019")) %>% 
   mutate(
      tri_ano = paste(trimestre, substr(ano, 4, 4), sep="/"),
      sem_ano = paste(semestre, substr(ano, 3, 4), sep="/")) 

Note the code above that to build the variable of tri_ano I took only the last digit of the year. This was necessary for the quarters to be aligned for the several decades in the graph below. In the variable of sem_ano, picked the last two digits, for you to notice the difference in the code. You can adjust these parameters according to your need.

Boxplot by quarter and decade:

ggplot(df, aes(x=tri_ano, y=prec, fill=trimestre)) + 
    geom_boxplot() +
    theme(
      legend.position = "none",
      plot.title = element_text(size=20, hjust = 0.5),
      axis.text.x = element_text(angle=-90)) +
    labs(title="Precipitação por trimestre e década em Alagoas",
        x ="trimestre/ano",
        y = "precipitação") +
    facet_wrap(~decada, nrow=3)

inserir a descrição da imagem aqui

You can see above that I indicated in the title that this graphic is for the city of Alagoas. That is, you will use the data of Alagoas with this title, note the code above, where Alagoas is written. To make the same graphic for Viçosa, just use the data of Viçosa and indicate in the title the difference. Then you position the graphics where you want in your work.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.