How to sort with Elasticsearch while maintaining search relevance?

Asked

Viewed 63 times

1

I’m trying to search for a word in Lasticsearch with python and sort by one of the columns, but this is changing the result the way I really wanted it.

For Ex: When I search without order it returns the most relevant search result.

TermoBusca:  Cartucho toner p/Samsung preto MLT-D205E Samsung CX 1 UN             === CurvaABC:  0.0
TermoBusca:  Cartucho toner p/Samsung ciano CLT-C404S Samsung CX 1 UN             === CurvaABC:  519.8
TermoBusca:  Cartucho toner p/Samsung amarelo CLT-Y406S 4HZ00A Samsung CX 1 UN    === CurvaABC:  1379.4
TermoBusca:  Cartucho toner p/Samsung preto CLT-K406S 4HY99A Samsung CX 1 UN      === CurvaABC:  2396.9
TermoBusca:  Cartucho toner p/Samsung preto MLT-D103L SU722A Samsung CX 1 UN      === CurvaABC:  0.0
TermoBusca:  Cartucho toner p/Samsung preto MLT-D116S-SIU 4HY95A Samsung CX 1 UN  === CurvaABC:  793.6
TermoBusca:  Cartucho toner p/Samsung preto MLT-D101S-SI 4HY94A Samsung CX 1 UN   === CurvaABC:  3112.0
TermoBusca:  Cartucho toner p/Samsung preto MLT-D104S-SI 4HY98A Samsung CX 1 UN   === CurvaABC:  2206.2
TermoBusca:  Cartucho toner p/HP preto 508X CF360XC HP CX 1 UN                    === CurvaABC:  0.0
TermoBusca:  Cartucho toner p/HP ciano 508X CF361XC HP CX 1 UN                    === CurvaABC:  0.0

Already when I sort by my field it returns the list in the order of the field ordering.

TermoBusca:  Cartucho HP 662XL preto Original (CZ105AB) Para HP DeskJet 2516, 3516, 3546, 2546, 1516, 4646, 2646                                  === CurvaABC:  1450084.17
TermoBusca:  Cartucho HP 662 preto Original (CZ103AB) Para HP DeskJet 2516, 3516, 3546, 2546, 1516, 4646, 2646                                    === CurvaABC:  1024852.22
TermoBusca:  Cartucho HP 662 Colorido Original (CZ104AB) Para HP DeskJet 2516, 3516, 3546, 2546, 1516, 4646, 2646                                 === CurvaABC:  692714.72
TermoBusca:  Notebook NP550XCJ-KT1BR, Processador Core i3 (10ª geração) de 2.1ghz, 4gb de Memória, 1tb de Armazenamento, Tela de 15,6" - Samsung  === CurvaABC:  422120.67
TermoBusca:  Cartucho HP 662XL Colorido Original (CZ106AB) Para HP DeskJet 2516, 3516, 3546, 2546, 1516, 4646, 2646                               === CurvaABC:  373512.98
TermoBusca:  Cartucho HP 954XL Preto Original (L0S71AB) Para HP Deskjet 7720, 7740, 8210, 8710, 8720                                              === CurvaABC:  208120.37
TermoBusca:  Cartucho HP 670XL preto Original (CZ117AB) Para HP Deskjet 4615, 4625,  5525                                                         === CurvaABC:  161593.02
TermoBusca:  Cartucho HP 901 Preto Original (CC653AB) Para HP Officejet J4660, J4524, J4624, 4500                                                 === CurvaABC:  159048.76
TermoBusca:  Cartucho HP 60B preto everyday 4,5ml CC636WB HP CX 1 UN                                                                              === CurvaABC:  128542.25
TermoBusca:  Notebook NP550XCJ-KO1BR, Processador Dual Core de 1.9ghz, 4gb de Memória, 500gb de Armazenamento, Tela de 15,6" - Samsung            === CurvaABC:  121756.93

What I’m wanting is for it to be returned by relevance and then by field order as follows.

TermoBusca:  Cartucho toner p/Samsung preto MLT-D101S-SI 4HY94A Samsung CX 1 UN   === CurvaABC:  3112.0
TermoBusca:  Cartucho toner p/Samsung preto CLT-K406S 4HY99A Samsung CX 1 UN      === CurvaABC:  2396.9
TermoBusca:  Cartucho toner p/Samsung preto MLT-D104S-SI 4HY98A Samsung CX 1 UN   === CurvaABC:  2206.2
TermoBusca:  Cartucho toner p/Samsung amarelo CLT-Y406S 4HZ00A Samsung CX 1 UN    === CurvaABC:  1379.4
TermoBusca:  Cartucho toner p/Samsung preto MLT-D116S-SIU 4HY95A Samsung CX 1 UN  === CurvaABC:  793.6
TermoBusca:  Cartucho toner p/Samsung ciano CLT-C404S Samsung CX 1 UN             === CurvaABC:  519.8
TermoBusca:  Cartucho toner p/Samsung preto MLT-D205E Samsung CX 1 UN             === CurvaABC:  0.0
TermoBusca:  Cartucho toner p/Samsung preto MLT-D103L SU722A Samsung CX 1 UN      === CurvaABC:  0.0
TermoBusca:  Cartucho toner p/HP preto 508X CF360XC HP CX 1 UN                    === CurvaABC:  0.0
TermoBusca:  Cartucho toner p/HP ciano 508X CF361XC HP CX 1 UN                    === CurvaABC:  0.0

This is my consultation.

from elasticsearch_dsl import Document, Q, query, Search
from config import es
from dicttoxml import dicttoxml


termo = 'Cartucho toner p/Samsung'
data_input: dict = {"CurvaABC":"desc"}
args = [{k:{'order':v}} for k,v in data_input.items()]

print('')
print('Sem ordem ')
print('')

s = Search(using=es, index="produto").query("match", TermoBusca=termo).extra(size=10)
buscaproduto: list = s.execute()
for produto in buscaproduto:
    print('TermoBusca: ', produto.TermoBusca,   ' === CurvaABC: ', produto.CurvaABC)

print('')
print('Com order ')
print('')
s = Search(using=es, index="produto").query("match", TermoBusca=termo).sort(*args).extra(size=10)
buscaproduto: list = s.execute()
for produto in buscaproduto:
    print('TermoBusca: ', produto.TermoBusca, ' === CurvaABC: ', produto.CurvaABC)

1 answer

1


If you had passed the return list of Elasticserach it would be easier to answer, but I believe by modifying the line below, you will have the result you want.

buscaproduto: list = s.execute()

for something like

buscaproduto: list = sorted(s.execute(), key=lambda x: x[1])

The function sorted will sort the returned list of s.execute() using the second item of each list element.

See the example below:

>>> minha_lista = [(2, 2), (3, 4), (4, 1), (1, 3)]
>>>
>>> lista_ordenada = sorted(minha_lista, key=lambda x: x[1])
>>>
>>> print(f'Lista ordenada: {lista_ordenada}')
Lista ordenada: [(4, 1), (2, 2), (1, 3), (3, 4)]
>>>

I hope I’ve helped.

  • gave a good idea of how to do thank you!. just had to change the def (key=lambda x: x[1]) to receive the field you wanted to sort.

  • Glad you solved it. The function lambda receives the parameter that in this case I called x. It can have any name.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.