Stacked Bar Graph - Labels and Sort - GGPLOT

Asked

Viewed 2,369 times

3

I am building a graph indicating the population of the Brazilian states, organized by regions, according to the code below:

State <- c("Rondônia", "Acre", "Amazonas", "Roraima", "Pará", "Amapá", "Tocantins",
           "Maranhão", "Piauí", "Ceará", "Rio Grande do Norte", "Paraíba", "Pernambuco", "Alagoas", "Sergipe", "Bahia",
           "Minas Gerais", "Espírito Santo", "Rio de Janeiro", "São Paulo",
           "Paraná", "Santa Catarina", "Rio Grande do Sul",
           "Mato Grosso do Sul", "Mato Grosso", "Goiás", "Distrito Federal"   )

Population <- c(1805788, 829619, 4063614, 522636, 8366628, 797722, 1550194,
                7000229, 3219257, 9020460, 3507003, 4025558, 9473266, 3375823, 2288116, 15344447,
                21119536, 4016356, 16718956, 45094866,
                11320892, 7001161, 11322895,
                2713147, 3344544, 6778772, 3039444)

Region <- c(rep("Região Norte", 7),
            rep("Região Nordeste", 9),
            rep("Região Sudeste", 4),
            rep("Região Sul", 3),
            rep("Região Centro-Oeste", 4))

dfPop <- data.frame(State, Population, Region)

ggplot(data=dfPop, 
       aes(x=Region, weights=Population / 1E+6)) +
  geom_bar(aes(fill=State), color="Black") +
  geom_text(aes(x=Region, y=Population / 1E+6, group=State, label=State),
            position = position_stack(vjust = 0.5), size=3.3) +
  guides(fill=FALSE) +
  xlab("Região do Brasil") + ylab("Milhões de habitantes")

The resulting graph is as follows::

População dos Estados Brasileiros

I have two problems that I would like to solve and I’m not getting

1. Hide label of states with less than 3 million inhabitants

To clarify the graph, I want to hide the label of states with less than 3 million inhabitants. For this I found a tip for filtering the data.frame directly in the element geom_text, in order to remove these states, as follows:

ggplot(data=dfPop, 
       aes(x=Region, weights=Population / 1E+6)) +
  geom_bar(aes(fill=State), color="Black") +
  geom_text(data=dfPop[dfPop$Population > 3E+6,],
            aes(x=Region, y=Population / 1E+6, group=State, label=State),
            position = position_stack(vjust = 0.5), size=3.3) +
  guides(fill=FALSE) +
  xlab("Região do Brasil") + ylab("Milhões de habitantes")

População dos Estados Brasileiros

However, as you can see, all other labels have been misplaced. How could I hide the desired labels without displacing the others?

2. Order stacking based on population of states (most populous below)

As an alternative to solving problem 1, I sought to sort each stack according to the state’s population, to hide the name of the states at the top of the stack. However, even ordering the input date.frame, I couldn’t do this visual ordering in ggplot. Can anyone help me?

Thanks for the support!

1 answer

3


When working with the ggplot2, I am of the opinion that all transformations to be performed on the data should be done outside the Plot command itself. It is my personal opinion, because I think that the code becomes more organized and easy to understand. That being said, follow my solution to your problem.

First of all, I’m gonna turn the column dfPop$State in an ordered factor according to the size of the population of each state. This will serve to keep your bars stacked the way you wish:

dfPop$State <- factor(dfPop$State, levels=dfPop$State[order(dfPop$Population)], 
  ordered=TRUE)

Notice the result obtained:

dfPop$State
 [1] Rondônia            Acre                Amazonas           
 [4] Roraima             Pará                Amapá              
 [7] Tocantins           Maranhão            Piauí              
[10] Ceará               Rio Grande do Norte Paraíba            
[13] Pernambuco          Alagoas             Sergipe            
[16] Bahia               Minas Gerais        Espírito Santo     
[19] Rio de Janeiro      São Paulo           Paraná             
[22] Santa Catarina      Rio Grande do Sul   Mato Grosso do Sul 
[25] Mato Grosso         Goiás               Distrito Federal   
27 Levels: Roraima < Amapá < Acre < Tocantins < Rondônia < ... < São Paulo

Now the column State is no longer sorted alphabetically, but rather by the size of the state’s population. Then, I will create a flame column StateNamePlot, which will not be factor, but rather character. It will serve only to put the Labels on the graph. Note that I am eliminating from this column all states such that the population is less than three million:

dfPop$StateNamePlot <- as.character(dfPop$State)
dfPop$StateNamePlot[which(dfPop$Population < 3e6)] <- NA

Now just create the chart according to this new dfPop, changing only some details of your original chart.

ggplot(data=dfPop, aes(x=Region, weights=Population/1E+6)) +
  geom_bar(aes(fill=State), color="Black") +
  geom_text(aes(x=Region, y=Population / 1E+6, group=State, 
    label=StateNamePlot), position = position_stack(vjust = 0.5),
    size=3.3) +
  guides(fill=FALSE) +
  xlab("Região do Brasil") + 
  ylab("Milhões de habitantes")

inserir a descrição da imagem aqui

Browser other questions tagged

You are not signed in. Login or sign up in order to post.