Doubt how to measure the number of words between two specific words in a string in R

Asked

Viewed 59 times

5

Hello, People!

I’m working on a function in R that measures the amount of words between two specific words, I’m calling the function worDistance, it works as follows, you insert two arguments, given a string t, for example, palavra1 and palavra2 and it returns the number of words between word 1 and word 2, for example, given that:

t <- "bom dia posso ajudar nao viu zunkz sabe tava pagar"

worDistance("bom","ajudar") # ela retorna o número 2. 

Denotes that the function reads the string t from left to right, when I invert the word order to

worDistance("ajudar","bom")

it returns the number 0. Instead of returning 2, again, how can I resolve this??

I’ll put the function structure below:

worDistance <- function( palavra1, palavra2 , direcao ) {#

###Legenda
#A função vai retornar "-1" quando uma das palavras inseridas no input não existir na string t
#A função vai retornar "-2" quando ambas as palavras inseridas no input não existir na string t



 if( direcao == 1 ) {##

    # 1 = Esquerda para direita

    total_palavras <- sapply(strsplit(transcricao, " "), length) 

    a <- gsub( paste0('^.*',palavra1,'\\s*|\\s*',palavra2,'.*$'), '', 
    transcricao)

    b <- sapply(strsplit(a, " "), length)

    if( b == total_palavras ) {

      return(-2)

    }else if( b == (total_palavras) - 1) {

      return(-1)

    }else if( b != total_palavras ){

      return(b)

    }

  }##

}#

1 answer

5


One possibility is to use the operator %in% to find the position of palavra1 and of palavra2 and then calculate the distance between the two:

t <- "bom dia posso ajudar nao viu zunkz sabe nao tava pagar"
frase <- unlist(strsplit(t, " "))
palavras <- c('dia', 'zunkz')

# posicao das palavras na frase
pos <- which(frase %in% palavras)
pos
# [1] 2 7

# calcular distância
diff(pos) - 1
# [1] 4

Note that even if the words are not in the same order, the position will not change and then the distance can be easily calculated:

palavras <- c('zunkz', 'dia')
which(frase %in% palavras) # mesma posição que antes
# [1] 2 7 

You will need to adjust the function to handle possible repeated words, but this is subject to another question.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.