Compare string with array to return the most compatible item

Asked

Viewed 985 times

10

How to compare to string with the array and return whatever has more compatible words?

Example:

string = "uma frase qualquer aqui"
array = ["frase qualquer", "uma qualquer aqui", "nada compatível"]

Returns the second item of the array for having 3 compatible words


I tried to use includes() but it returns the first item by having at least 1 word compatible

//string a ser comparada com o array
let str = "uma frase qualquer aqui";
let strArr = str.split(" ");
//ao comparar com o array deve retornar o segundo item, pois tem mais palavras compativeis
let arr = ["frase qualquer", "uma qualquer aqui", "nada compatível"];

function maisCompativel(){
  for(let i = 0; i < strArr.length; i++){
    if(arr.includes(strArr[i])){
      //retornar o item mais compativel / mais parecido
    }
  }
}

4 answers

13


The problem is that you need to split each element of arr in a word array. For example:

["frase qualquer", "uma qualquer aqui", "nada compatível"]

It has to be turned into:

[
  ["frase", "qualquer"],
  ["uma", "qualquer", "aqui"],
  ["nada", "compatível"]
]

So you can compare every word of this array of arrays individually.


As a first attempt, you can arrive at a solution more or less like this:

function mostCompatible(string, array) {
  const words = string.split(' ');

  // O `array` atualmente contém diversas strings.
  // Fazendo o `map` abaixo, fazemos que cada string desse array
  // seja dividida pelo espaço. Desse modo, teremos um array de arrays.
  const subArrays = array.map((subString) => subString.split(' '));

  // Na variável a seguir, vamos armazenar o índice referente
  // ao elemento que tiver mais "palavras compatíveis", e o
  // número de palavras compatíveis encontradas:
  let current = {
    index: null,
    compatibleWordsCount: 0
  };

  // Agora vamos verificar qual elemento de `subArrays` tem
  // mais "palavras compatíveis". Estou utilizando o `entries`
  // como forma de obter o índice e o valor no laço `for..of`.
  for (const [index, subArray] of subArrays.entries()) {
    // Vamos obter a quantidade de "elementos compatíveis".
    // Note que a seguinte computação possui complexidade `O(n^2)`.
    const compatibleWordsCount = words.filter((word) => subArray.includes(word))
      .length;

    // Caso a quantidade de elementos compatíveis seja maior que o da iteração
    // anterior, vamos sobrescrever o objeto `current`.
    if (compatibleWordsCount > current.compatibleWordsCount) {
      current = { index, compatibleWordsCount };
    }
  }

  // Retornar o elemento de `array` correspondente ao índice do `subArray` com
  // mais "elementos compatíveis" encontrados.
  return array[current.index];
}

console.log(
  mostCompatible('uma frase qualquer aqui', [
    'frase qualquer',
    'uma qualquer aqui',
    'nada compatível'
  ])
); // Deve logar `'uma qualquer aqui'`.

The problem is that it is not very performative, having complexity of approximately O(n3), since we are using a includes within a filter, that is inside a for, and each of these steps has complexity O(n). But frankly, you only need to take this into account if you are going to take a significant number of data to that function.

A slightly more efficient implementation could make use of a map to perform the searches. Thus, you reduce the complexity of each search from O(n) to O(1). Something like this:

function createCountMap(arr) {
  const map = Object.create(null);
  for (const item of arr) {
    map[item] = (map[item] || 0) + 1;
  }
  return map;
}

function mostCompatible(string, array) {
  // Criamos um mapa com cada palavra a ser procurda.
  // Assim, a busca terá complexidade O(1) ao invés de O(n).
  const wordsMap = createCountMap(string.split(' '));

  let result = {
    index: null,
    compatibleWordsCount: 0
  };

  for (let index = 0; index < array.length; index++) {
    const currentString = array[index];
    let compatibleWordsCount = 0;

    for (const word of currentString.split(' ')) {
      if (wordsMap[word]) {
        compatibleWordsCount++;
      }
    }

    if (compatibleWordsCount > result.compatibleWordsCount) {
      result = { index, compatibleWordsCount };
    }
  }

  return array[result.index];
}

console.log(
  mostCompatible('uma frase qualquer aqui', [
    'frase qualquer',
    'uma qualquer aqui',
    'nada compatível'
  ])
); // Deve logar `'uma qualquer aqui'`.

To learn more about this notation - O(n) - I used, read here.

  • But if the array you are looking for is fixed, you can save the result of map and it will only be used once, or even no

6

I choose here to show another approach to the same problem, and taking advantage of almost all the code you already had.

The difference is that it uses an auxiliary counting vector for each sentence, which counts how many words it has in common. This vector is being filled in as you go through the input phrase per word, as you already have in your code.

Example:

//string a ser comparada com o array
let str = "uma frase qualquer aqui";
let strArr = str.split(" ");
//ao comparar com o array deve retornar o segundo item, pois tem mais palavras compativeis
let arr = ["frase qualquer", "uma qualquer aqui", "nada compatível"];

function maisCompativel(){
  //Vetor de contagens com tamanho de arr e preenchido com zeros
  const contagens = new Array(arr.length).fill(0); 
  
  for(let i = 0; i < strArr.length; i++){
  
    //loop para percorrer as frases 
    for (let j = 0; j < arr.length; j++){ 
      if (arr[j].includes(strArr[i])){
        contagens[j]++; //contabilizar a palavra já que existe
      }
    }
  }
  
  let maiorContagem = Math.max(...contagens);
  let posicaoMaior = contagens.indexOf(maiorContagem);
  return arr[posicaoMaior];
}

console.log(maisCompativel());

If you compare my code to yours, you see that in the course of words there is only one more for. There’s also logic within the if for the accounting, which was missing.

  • I’m not sure, but I think if can be simplified to contagens[j] += arr[j].includes(strArr[i]) which will add 0 or 1 respectively to true or false

  • @Costamilam Indeed can, but I prefer so that it is more readable. Anyway thank you for the comment.

6

One way is by using two .forEach(): the first in the array with the phrases and the second nested in the string array. Then it tells what has more occurrences and compares with a variable that stores the highest occurrence number, and another variable to store in which index of the array had more occurrences:

let str = "uma frase qualquer aqui";
let strArr = str.split(" ");
let arr = ["frase qualquer", "uma qualquer aqui", "nada compatível"];

function maisCompativel(str){
   let indice,
       maior = 0;
   arr.forEach((a,i)=>{
      let conta = 0;
      strArr.forEach(b=>{
         if(a.includes(b)) conta++;
      });

      if(conta > maior){
         indice = i;
         maior = conta;
      }
  });
  return `O que tem mais ocorrência é o índice [${indice}] com ${maior} palavras: "${arr[indice]}"`;
}

console.log(maisCompativel());

-2

You can use a similar PHP text implementation in JS Example

Ai loops and returns the item with the highest score.

//string a ser comparada com o array
let str = "uma frase qualquer aqui";
//ao comparar com o array deve retornar o segundo item, pois tem mais palavras compativeis
let arr = ["frase qualquer", "uma qualquer aqui", "nada compatível"];

var max = 0;
var id = 0;

function maisCompativel() {
  for (let i = 0; i < arr.length; i++) {
    value = similar_text(str, arr[i]);
    if (value > max) {
      max = value
      id = i;
    }

  }
  return arr[id];
}

function similar_text(first, second, percent) { // eslint-disable-line camelcase
  //  discuss at: https://locutus.io/php/similar_text/
  // original by: Rafał Kukawski (https://blog.kukawski.pl)
  // bugfixed by: Chris McMacken
  // bugfixed by: Jarkko Rantavuori original by findings in stackoverflow (https://stackoverflow.com/questions/14136349/how-does-similar-text-work)
  // improved by: Markus Padourek (taken from https://www.kevinhq.com/2012/06/php-similartext-function-in-javascript_16.html)
  //   example 1: similar_text('Hello World!', 'Hello locutus!')
  //   returns 1: 8
  //   example 2: similar_text('Hello World!', null)
  //   returns 2: 0

  if (first === null ||
    second === null ||
    typeof first === 'undefined' ||
    typeof second === 'undefined') {
    return 0
  }

  first += ''
  second += ''

  var pos1 = 0
  var pos2 = 0
  var max = 0
  var firstLength = first.length
  var secondLength = second.length
  var p
  var q
  var l
  var sum

  for (p = 0; p < firstLength; p++) {
    for (q = 0; q < secondLength; q++) {
      for (l = 0;
        (p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++) { // eslint-disable-line max-len
        // @todo: ^-- break up this crazy for loop and put the logic in its body
      }
      if (l > max) {
        max = l
        pos1 = p
        pos2 = q
      }
    }
  }

  sum = max

  if (sum) {
    if (pos1 && pos2) {
      sum += similar_text(first.substr(0, pos1), second.substr(0, pos2))
    }

    if ((pos1 + max < firstLength) && (pos2 + max < secondLength)) {
      sum += similar_text(
        first.substr(pos1 + max, firstLength - pos1 - max),
        second.substr(pos2 + max,
          secondLength - pos2 - max))
    }
  }

  if (!percent) {
    return sum
  }

  return (sum * 200) / (firstLength + secondLength)
}

console.log(maisCompativel());

  • 1

    While this link may answer the question, it is best to include the essential parts of the answer here and provide the link for reference. Replies per link only can be invalidated if the page with the link is changed. - Of Revision

  • @Daviaragao Added code to answer. However I do not understand why if I answer the algorithm is a valid answer, but if I put an example link it is invalid.

  • 3

    I suggest you read the FAQ: We want answers that contain only links?

  • @hkotsubo That’s why my answer was what he should do. "Use an implementation of similar_text" + "Loop with the score check". My answer was never intended to post the code or to do all the work. If you always get things in hand you will never learn. The similar_text function is well diffused and with a google by "similar_text source code" returns its implementation. I passed a link to help, as a complement to the reply. The mentioned FAQ says that this is great and should be used like this. However, apparently, what is spoken in the FAQ is valid only if it is not a code

  • 1

    There is a difference between giving the code a kiss and giving the code explaining how and why it works. The first form is not well accepted because in fact the person does not learn, the second form is preferable because there the answer is complete, without relying on external links (by the way, this is the main point of not putting only a link: that all the information is in the Stack Overflow itself). If you’re just going to say "use the X algorithm of this link" and nothing else, do it in the comments. If you’re going to answer, do it as completely as possible.

  • @hkotsubo And that would not be different from the [Sopt Survival Guide] (https://pt.meta.stackoverflow.com/questions/8045/guia-survivor%C3%aancia-do-stack-overflow-em-inglés%C3%aa) ? só não podemos dar uma solução frívola para resolver seu problema específico. , Não fazemos o serviço para você and especially the part Se não vai ler e nem seguir os links, boa sorte! Só não reclame de as pessoas não ajudarem.. If the information has to be here and not on a link, it means that the person does not need to follow the links. This section of the manual would not be meaningless?

  • The idea of "we don’t do the job for you" is when the person just asks "I want a program that does X" and nothing else. It is not the case of this question, because it was placed the code that the person tried to do, explained the result that obtained and which should be the correct one. The person showed effort, was part of the service and wants help at a specific point, and the answers gave the correct code. It’s different from when the person just asks for the code ready but doesn’t even show what tried, in this case we don’t do the service for her.

  • As for "If you are not going to read or follow the links", this refers to the links of the Guide itself, which has complementary information about the operation of the site

  • @hkotsubo Thanks for the information. I had read the Faqs but had understood otherwise. :)

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.