How to include minimum_should_match in the query of Elasticsearch dsl?

Asked

Viewed 63 times

1

I’m trying to use the minimum_should_match how are you in documentation

q = Q('bool',
    must=[Q('match', title='python')],
    should=[Q(...), Q(...)],
    minimum_should_match=1
)
s = Search().query(q)

But when I include it in my query it returns nothing.

termo = 'Cartucho p/caneta tinteiro preto 301218 Pelikan CX 6 UN'

q = Q('bool', must=[Q('match', TermoBusca=termo)],minimum_should_match=1)
s = Search(using=es, index='produtos').query(q)
print(s.to_dict())

produtosEncontrados: list = s.execute()
for produto in produtosEncontrados:
    print(produto.Descricao)

If I remove the parameter (minimum_should_match=1) query returns results normally.

    termo = 'Cartucho p/caneta tinteiro preto 301218 Pelikan CX 6 UN'

    q = Q('bool', must=[Q('match', TermoBusca=termo)])
    s = Search(using=es, index='produtos').query(q)
    print(s.to_dict())

    produtosEncontrados: list = s.execute()
    for produto in produtosEncontrados:
        print(produto.Descricao)

Exit:

Generated query:

{'query': {'match': {'TermoBusca': 'Cartucho p/caneta tinteiro preto 301218 Pelikan CX 6 UN'}}}

Exit:

  • Cartridge p/pen ink cartridge black 301218 Pelikan CX 6 UN
  • Cartridge p/pen royal blue 301176 Pelikan CX 6 UN
  • Ink p/pen 30ml black 301051 Pelikan EN 1 UN
  • Ink p/pen 30ml dark blue 301028 Pelikan EN 1 UN
  • Cartridge p/pen blue cartridge CA32005A Crown BT 3 UN
  • Cartridge p/pen black cartridge CA32005P Crown BT 3 UN
  • Capricci silver YW32616S Crown CX 1 UN source pen
  • Capricci black ink pen YW32616P Crown CX 1 UN
  • Neutral Regent Source Pen YW39422N Crown CX 1 UN
  • Cartridge p/Brother black LC3039BK Brother CX 1 UN

I’ve looked everywhere and I don’t understand what I’m doing wrong. My intention is to filter through the field TermoBusca 100% of the term reported. As in the query below, only using the DSL.

    termo = 'Cartucho p/caneta tinteiro preto 301218 Pelikan CX 6 UN'
    query_body = {
            "sort" : [
                { "Prioridade" : {"order" : "desc"}},
                { "CurvaABC" : {"order" : "desc"}},
                "_score"
            ],
            "query": {
                "match": {
                    "TermoBusca": {
                        "query": termo,
                        "minimum_should_match": "100%"
                    }
                }
            },
        "size": 60
    }

    result = es.search(index="produtos", body=query_body)
    print()

    for hit in result['hits']['hits']:
        print(hit['_source']['TermoBusca'])
    self.assertEqual(56, len(result['hits']['hits']))

Exit:

  • Cartridge p/pen ink cartridge black 301218 Pelikan CX 6 UN
  • Have you seen this post ?

  • @Paulomarques, had not seen, but he is also using the querys of Elasticsearch and not Elasticsearch DSL

1 answer

1


The parameter minimum_should_match should be used in conjunction with optional clauses of the type should. When minimum_should_match=1 an item will only be returned by the query if at least one of the clauses should is true. Your query has no clauses should. She has only one clause must. By definition, when you use must, all returned items must satisfy the clause condition. Hence, your query with must already filtered by the field TermoBusca in 100% of cases (the use of minimum_should_match is not necessary).

In older versions of Elasticsearch (before version 5), in the absence of clauses should the parameter minimum_should_match was ignored. This behavior has changed and in modern versions of Elasticsearch. Nowadays a query with minimum_should_match greater than the number of clauses should does not return any item.

This change impacted some systems that built clauses should dynamically.

I had to deal with this problem in the past after an update of Elasticsearch broke a system’s search functionality. In this case, when no optional filter was specified the searches stopped returning results. The solution was to modify the logic that built the search to only include the parameter minimum_should_match when the amount of clauses should is greater than 1. More details about this problem can be found in the following Issue of Github (in English):


Update:

The mass of data posted by Marconcilio called my attention to the fact that the parameter minimum_should_match is in the query internal. The desired behavior of the OP, in this case, is that the query only returns items where TermoBusca contains 100% of the terms in the string termo. In the DSL of Python we can rebuild the internal query with a dict:

 q = Q('match', TermoBusca={'query': termo, 'minimum_should_match': '100%'})
  • have an example of how to mount the query with Elasticsearch dsl? using "minimum_should_match": "100%" as in the query I built and searched for paper sulfite chamex A4 75g only the products that have all these words are returned.

  • minimum_should_match=1 was just an example of how it is in the documentation, but in the query I want it to be 100%

  • Your third example (with must and without minimum_should_match) doesn’t do exactly what you need? minimum_should_match='100%' works if your query has one or more clauses should (according to your first example taken from the documentation). To make the second example work just change must for should, but that doesn’t make much sense.

  • I tried so many ways I couldn’t.

  • Hi Marconcilio, for me it is not clear what the problem. In your question you say your third example (DSL, without minimum_should_match) returns the data right? What I’m saying is that your example is correct and that the minimum_should_match is unnecessary. If the data is not matching with the fourth example (of the query), it would be worth updating the question with a MVCE + data mass, demonstrating which data are being returned more or less using the DSL than in the query.

  • I think the quote in the post was resolved in 7.5.2

  • 1

    @Anthonyaccioly, I updated with my data output and the query generated in the 3rd example. in the 3rd example it brings 10 item and in the 4th it brings only the item that corresponds 100% of the informed term. in the case it only has this item in my products index

  • Ah, how interesting. I hadn’t realized that the minimum_should_match is on the same level as the term. I don’t have an environment to test now, but have you tried it using a dictionary? E.g., q = Q('match', TermoBusca={'query': termo, 'minimum_should_match': '100%'})

  • @Anthonyaccioly, had not tested so, it worked right here vlw

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.