How to group by text [R]

Asked

Viewed 67 times

-3

I have a column in the dataset that has several variants of 5 options. I wanted to group based on what you have written in common. For example:

coluna1 
lapis vermelho grande
lapis azul grande
lapis verde pequeno
lapis vermelho pequeno

I want to create a column, keeping the original, but group by character

    coluna1                  coluna2
lapis vermelho grande       caixa grande
lapis azul grande           caixa grande
lapis verde pequeno         caixa pequeno
lapis vermelho pequeno      caixa pequeno

thought of creating a WHEN, or if Else but I couldn’t get the logic to capture a word in the string. anyone with suggestion?

2 answers

3

With a regex can be done in a line of code.

df$coluna2 <- sub(".*\\b([^[:space:]]+$)", "\\1", df$coluna1)

Explanation of the regex.

  1. The catch group ([^[:space:]]+$) denies (^) the class space and repeat at least once. This group goes all the way ($) string.
  2. The capture group is preceded by a word boundary, \\b.
  3. Before \\b can come any string.

String is replaced only by captured group, "\\1", which is his last word.

  • this resolution, have regex material to indicate? Thanks for the answer

  • I can take this code and take words in different positions? and the code could identify a word? for example: I put 'red' and it calls everyone who has red regardless of position?

  • @whatshallwedon0w No, this regular expression only finds the last word. To extract a word, for example 'red', you can do stringr::str_extract(df$coluna1, 'vermelho').

  • @whatshallwedon0w Regarding regex material, see R base or package stringr.

0

Solution tidyverse:

library(tidyverse)
library(magrittr)

df %>% 
  mutate(.data = ., coluna2 = case_when(
  equals(e1 = ., e2 = 'lapis vermelho grande') ~ "caixa grande", 
  equals(e1 = ., e2 = 'lapis azul grande') ~ "caixa grande", 
  equals(e1 = ., e2 = 'lapis verde pequeno') ~ "caixa pequeno", 
  equals(e1 = ., e2 = 'lapis vermelho pequeno') ~ "caixa pequeno"))

#                 coluna1       coluna2
#1  lapis vermelho grande  caixa grande
#2      lapis azul grande  caixa grande
#3    lapis verde pequeno caixa pequeno
#4 lapis vermelho pequeno caixa pequeno

Browser other questions tagged

You are not signed in. Login or sign up in order to post.