Select dataframe information based on specific conditions

Asked

Viewed 74 times

3

I am working with election data for mayors in Brazil and would like to select in my database only data related to municipalities based on two conditions:

  • Municipalities where the dispute took place only between two candidates and
  • Municipalities where these two only candidates were a man and a woman

My data frame follows this type:

Candidato <- c('Alberto', 'Alessandra', ' Cassio', 'Roberta', 'Denis', 'Flavia', 'Jefferson', 'Henrique', 'Paulo')
Municipio <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'D', 'D')
Genero <- c('M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M')
dados <- data.frame (Candidato, Municipio, Genero)

I would like to have a result respecting the above conditions as follows:

Candidato     Municipio     Genero 
Alberto       A               M
Alessandra    A               F
Flavia        C               F
Jefferson     C               M

2 answers

1

Look Phil, the code below solves your problem:

'%ni%' <- Negate('%in%')
cidades <- unique(dados$Municipio)
for(cidade in cidades)
{
  generos <- subset(dados$Genero, dados$Municipio == cidade)
  if('M' %ni% generos | 'F' %ni% generos | length(generos) > 2)
    dados <- dados[which(dados$Municipio != cidade),]
}

The data frame that this function returned here was this:

   Candidato Municipio Genero
1    Alberto         A      M
2 Alessandra         A      F
3     Flavia         C      F
4  Jefferson         C      M
  • Wow, buddy. The code didn’t work :( Just returned me the same dataframe. I appreciate your contribution

  • I updated my answer with the data frame that the function returned. The question data frame has 4 different municipalities. My code excludes the Cadidatos of City D, which has both male candidates. You can retain the code?

  • I had forgotten to put the restriction to have only 2 candidates, I edited the answer.

1

Solution tidyverse:

library(tidyverse)

dados %>% 
  group_by(Municipio) %>% 
  mutate(var_1 = n()) %>% 
  mutate(var_2 = n_distinct(Genero)) %>% 
  filter(var_1 == 2 & var_2 == 2) %>% 
  select(- c('var_1', 'var_2'))

Output:

# A tibble: 4 x 3
# Groups:   Municipio [2]
Candidato  Municipio Genero
<fct>      <fct>     <fct> 
1 Alberto    A         M     
2 Alessandra A         F     
3 Flavia     C         F     
4 Jefferson  C         M  

Where:

  • group_by(Municipio) groups the variable Município in its categories (which are 4);

  • mutate(var_1 = n()) specifies the number of counts for each category of Município;

  • mutate(var_2 = n_distinct(Genero)) specifies that I want cases other than the variable Genero (F and M);

  • filter(var_1 == 2 & var_2 == 2) filters according to the above conditions:

Municipalities where the dispute took place only between two candidates and;

Municipalities where these two only candidates were a man and a woman

Browser other questions tagged

You are not signed in. Login or sign up in order to post.