Huuum, I’ve been on your side. I’d say you have two options:
- based on an analysis of the terms used, create a function that normalizes them, in the billiard style - CASE WHENof- SQL, or- if_elseof- Ror- dplyr::case_when.
 
- based on similarity of strings I would try to put those that have more than a certain number of characters and more than x% similarity to the same term. In - Rhas the package- stringdistthat does this.
 
Option 1 seems to me more "correct" in terms of criteria for classification of the reasons. So I’ll talk a little more about how she would do.
How would I make option 1
As it would be an analysis count of terms, I would use the package resources tidytext to count these words and get the 2-Rams and 3-Rams that are most common and try to put the most alike (and equal depending on your domain knowledge). With that I think you could kill a lot of the problem. The very granular cases, I would put in a category 'other' without weight in consciousness.
What the code would look like
I’m going to leave a snipet of how I would write this code, using the packets stringr, dplyr and purrr.
library(dplyr)
library(purrr)
library(stringr)
## dados para teste
toy_df <- tibble(motivo = c('danos morais', 'dano moral', 'danos materiais', 'dano x'))
## dados para facilitar a criação de vetores
make_vector <- function(string) stringr::str_split(string, pattern = ', ') %>% purrr::as_vector()
## a transformação em si
toy_df %>% 
mutate(classe = case_when(
  motivo %in% make_vector('dano moral, danos morais, danos whatever') ~ 'dano_moral',
  motivo %in% make_vector('danos materiais, dano material') ~ 'dano_material',       
  TRUE ~ 'outros'     
))
I prefer the use of dplyr::case_when because it is vectorized and ends up processing voluminous data a little faster. The syntax, although strange, is more functional than the if_else (where you’d be repeating if_else to everything that’s side).
The function make_vector this one just to facilitate the creation of string vectors with a cleaner syntax. In a perfect world these vectors that served to put the motives in the same classes would come out of an analysis of their, or something asism.
							
							
						 
We need a more complete example of the data. If the database is called
dadosPlease put the output ofdput(head(dados, 30)). Or something representative of the data.– Rui Barradas
Create another column by reading your base column with the grep function ("moral damage|moral damage", column, ignore.case = TRUE)
– Daniel Ikenaga