Filter in dplyr constrained by the maximum variable value of db Gapminder

Asked

Viewed 176 times

4

I’m making the filter on df gapminder, generating a df empty when I use the variable gdpPercap:

library(gapminder) # versão 0.2.0

library(dplyr)  # versão 0.7.2

gapminder %>%
  filter(year == 2007, gdpPercap==max(gdpPercap)) 

# A tibble: 0 x 6

# ... with 6 variables: country <fctr>, continent <fctr>, year <int>, lifeExp <dbl>, pop <int>, gdpPercap <dbl>

If I change the query variable, the expected result appears

gapminder %>%
  filter(year == 2007, pop==max(pop)) 

# A tibble: 1 x 6

# country continent  year lifeExp   pop      gdpPercap

#1   China      Asia  2007  72.961 1318683096  4959.115

Would be a bug of dplyr? I am using Rstudio (Version 1.0.143) and MRO (3.3.3).

1 answer

5

The result is correct. The command

gapminder %>%
  filter(year == 2007, gdpPercap==max(gdpPercap)) 

will return all lines of the data frame gapminder which occurred in the year of 2007 and whose gdpPercap is equal to the maximum value of gdpPercap. It turns out that no country meets this condition. See the following:

gapminder %>%
  group_by(year) %>% 
  summarise(max(gdpPercap))
# A tibble: 12 x 2
    year `max(gdpPercap)`
   <int>            <dbl>
 1  1952        108382.35
 2  1957        113523.13
 3  1962         95458.11
 4  1967         80894.88
 5  1972        109347.87
 6  1977         59265.48
 7  1982         33693.18
 8  1987         31540.97
 9  1992         34932.92
10  1997         41283.16
11  2002         44683.98
12  2007         49357.19

The maximum value of gdpPercap occurred in 1957. Thus, it makes no sense to ask which countries obtained this value in 2007. Note that the command below, with the year 1957, returns a non-empty result:

gapminder %>%
  filter(year == 1957, gdpPercap==max(gdpPercap))
# A tibble: 1 x 6
  country continent  year lifeExp    pop gdpPercap
   <fctr>    <fctr> <int>   <dbl>  <int>     <dbl>
1  Kuwait      Asia  1957  58.033 212846  113523.1

If your goal was to find the country with the greatest gdpPercap in 2007, you should first group the data according to the year:

gapminder %>%
  group_by(year) %>%
  filter(year==2007, gdpPercap==max(gdpPercap))
# A tibble: 1 x 6
# Groups:   year [1]
  country continent  year lifeExp     pop gdpPercap
   <fctr>    <fctr> <int>   <dbl>   <int>     <dbl>
1  Norway    Europe  2007  80.196 4627926  49357.19

Browser other questions tagged

You are not signed in. Login or sign up in order to post.