How to relate a column to a dictionary in R?

Asked

Viewed 50 times

5

library(tidyverse)

I have a dataset that has in one column titles of articles and in another column, their respective authors. I reproduce here a row of this dataset:

df<-tibble(
  titulo= "A URBANIZAÇÃO NEOLIBERAL",
  autores= "CLAUDIO; DIANA; MILENA")

df

# A tibble: 1 x 2
  titulo                   autores               
  <chr>                    <chr>                 
1 A URBANIZAÇÃO NEOLIBERAL CLAUDIO; DIANA; MILENA

Then I divided the line above according to the authors using separate_rows()

df2<-df %>% 
  separate_rows(autores, sep = "; ")

df2

# A tibble: 3 x 2
  titulo                   autores
  <chr>                    <chr>  
1 A URBANIZAÇÃO NEOLIBERAL CLAUDIO
2 A URBANIZAÇÃO NEOLIBERAL DIANA  
3 A URBANIZAÇÃO NEOLIBERAL MILENA 

Now I wanted each author to be identified according to their situation ("teacher", "student" or "graduate"). I have a spreadsheet with everything detailed. I reproduce here an excerpt:

corpo_programa <- tibble(nome = c("FULANO", "BELTRANO", "CLAUDIO", "MILENA", "DIANA"),
                    situacao = c("docente", "docente", "docente", "discente", "egresso"))

corpo_programa

# A tibble: 5 x 2
  nome     situacao
  <chr>    <chr>   
1 FULANO   docente 
2 BELTRANO docente 
3 CLAUDIO  docente 
4 MILENA   discente
5 DIANA    egresso 

What I wanted was to use this spreadsheet as a kind of dictionary that compared with the column "authors" to generate me a column with the situation of each author.

I figured using the function setNames(), I could.

dicionario<- setNames(corpo_programa$nome, corpo_programa$situacao)

Then I used mutate():

df2 %>% 
  mutate(condicao = dicionario[autores])

# A tibble: 3 x 3
  titulo                   autores condicao
  <chr>                    <chr>   <chr>   
1 A URBANIZAÇÃO NEOLIBERAL CLAUDIO NA      
2 A URBANIZAÇÃO NEOLIBERAL DIANA   NA      
3 A URBANIZAÇÃO NEOLIBERAL MILENA  NA     

The newly created "condition" column appears with "NA", when I expected it to be filled with "teacher", "graduate" and "student".

2 answers

7


The problem is that you reversed the arguments of valores and nomes when you created the object dicionario. It would be right:

dicionario <- setNames(corpo_programa$situacao, corpo_programa$nome)

6

After the job you’ve had, the easiest way should be with merge, not forgetting that the columns to be matched have different names.

merge(df2, corpo_programa, by.x = "autores", by.y = "nome")
#  autores                   titulo situacao
#1 CLAUDIO A URBANIZAÇÃO NEOLIBERAL  docente
#2   DIANA A URBANIZAÇÃO NEOLIBERAL  egresso
#3  MILENA A URBANIZAÇÃO NEOLIBERAL discente

Another solution is to do everything, the separation of the column of authors and the Join, in a single pipe:

df %>%
  separate_rows(autores, sep = "; ") %>%
  inner_join(corpo_programa, by = c("autores" = "nome"))
## A tibble: 3 x 3
#  titulo                   autores situacao
#  <chr>                    <chr>   <chr>   
#1 A URBANIZAÇÃO NEOLIBERAL CLAUDIO docente 
#2 A URBANIZAÇÃO NEOLIBERAL DIANA   egresso 
#3 A URBANIZAÇÃO NEOLIBERAL MILENA  discente

Browser other questions tagged

You are not signed in. Login or sign up in order to post.