Looking for Keywords in Elasticsearch

Asked

Viewed 398 times

4

I’m registering some objects of the kind:

[{
  nome: "bom_atendimento",
  chaves: ["bem atendido", "atendimento bom"]
},
{
  nome: "ruim_atendimento",
  chaves: ["pessimo atendimento", "atendimento ruim"]
}]

I need these keys to be identified in a text.

Input text examples / output I need:

1 - "This service was bad"

outworking:

    {
       nome: "ruim_atendimento",
      chaves: ["pessimo atendimento", "atendimento ruim"]
    }

2 - "Today I was well attended"

outworking:

{
      nome: "bom_atendimento",
      chaves: ["bem atendido", "atendimento bom"]
    }

How I’m indexing:

{
      "settings": {
        "analysis": {
          "analyzer": {
            "custom_keyword_analyzer": {
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "asciifolding",
                "lowercase",
                "custom_stopwords",
                "custom_stemmer"
              ]
            },
            "custom_shingle_analyzer": {
              "type": "custom",
              "tokenizer": "whitespace",
              "filter": [
                "custom_stopwords",
                "custom_stemmer",
                "asciifolding",
                "lowercase",
                "custom_shingle"
              ]
            }
          },
          "filter": {
            "custom_stemmer": {
              "type": "stemmer",
              "name": "brazilian"
            },
            "custom_stopwords": {
              "type": "stop",
              "stopwords": [
                "a",
                "as",
                "o",
                "os",
                "fui"
              ],
              "ignore_case": true
            },
            "custom_shingle": {
              "type": "shingle",
              "min_shingle_size": 2,
              "output_unigrams": false,
              "output_unigrams_if_no_shingles": true
            }
          }
        }
      },
      "mappings": {
        "meutipo": {
          "properties": {
            "nome": { "type": "keyword" },
            "chaves": {
              "type": "text",
              "analyzer": "custom_keyword_analyzer",
              "search_analyzer": "custom_shingle_analyzer"
            }
          }
        }
      }
    }

My research: { index: 'meuindex', type: 'meutipo',

      body:{
         "query": {
            "match": { "chaves": "texto"} }
          }
      }
 }

I’m not getting results. When I remove custom_shingle filter custom_shingle_analyzer, any text that has "attendance" returns the two ES records.

I need it to only have results if the text contains at least one expression exactly equal to a chave registered in ES. In my example:

To get the result:

 {
      nome: "bom_atendimento",
      chaves: ["bem atendido", "atendimento bom"]
    }

the text should contain "well served" or "good service".

What’s the best way to do this? Using synonyms and Shingle?

Elasticsearch version: 5.1.2

  • I don’t quite understand. It can describe a little better how it is indexed in ES, at least 3 examples of how the query (input) would be and how the answer should be (output)?

  • @Tommelo edited my question. It became clearer?

1 answer

0

You can get the result using Function Score Query.

Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Assuming you are using the Elasticsearch-js as a Elasticsearch client:

1 - The first part is the use of query by terms, since "exact phrase" may never give "match" in your query:

var terms = "Hoje eu fui bem atendido".split(" ");

es.search({
  index: 'seu-index',
  type: 'seu tipo',
  body: {
    query: {
      terms: { chaves: terms }
    }
  }
}, function(error, result) {
  console.log(JSON.stringify(result, null, 2));
});

Even using the query for terms the ES will return all keys because both contain common terms ("attendance" and etc).

2 - Adding Function Score to filter keys:

Note in the score of the first query, the key "bom_atentimento" got a score higher than "ruim_atentimento", so we can define a minimum score.

var terms = "Hoje eu fui bem atendido".split(" ");

es.search({
  index: 'seu-index',
  type: 'seu tipo',
  body: {
    query: {
      function_score: {
        query: { terms: { chaves: terms } },
        min_score: 0.7
      }
    }
  }
}, function(error, result) {
  console.log(JSON.stringify(result, null, 2));
});

Thus only the key that reaches the minimum score is returned.

Tip: Why you need to keep the term "attendance" in the key?

Keeping only relevant terms to search in the sentence would make your search much simpler, ex:

 {
   nome: "ruim_atendimento",
   chaves: ["pessimo", "ruim", "horrivel"]
 }
  • I understand the hint. But in some cases I still need tokens with 2 or more words

  • I get it, I’m gonna try to match the exact text inside your key list. The scenario I described using Terms and function_score doesn’t suit you then, right?

  • In some cases it doesn’t suit me. Another example : {
 nome: "produto_x",
 chaves: ["nomeproduto x"]
 },
{
 nome: "produto_x_2",
 chaves: ["nomeproduto x 2"]
 },
{
 nome: "produto y",
 chaves: ["nomeproduto y"]
 } With the search: "See more about product name x". It is acceptable in my case that appear in the results "product_x" and "product_x_2". Only with punctuation, "product y" also appears, as they are very similar. That’s why I need them to be found in a text expressions exactly equal to chaves registered (at least this is the way I can think)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.