How to sort a query in Django by ignoring accents?

Asked

Viewed 470 times

4

I’m returning the query Carro.objects.all().order_by(Lower('marca')), but the order is not respecting names that begin with accent, causing these results to appear at the end of the ordination. Is there a function that can ignore accents when sorting the results? Something like unaccent at the time of filtering that is available in Django 1.8.

1 answer

1

I can’t talk about the recent versions of Django, but in the last system I developed (I think it was 1.4 or 1.5) I had found nothing, and I ended up using this one workaround:

  1. Attribute the locale application. The way to do this is slightly different in Unix and Linux/Windows, so I ended up with the following code:

    locale_set_correctly = False
    try:
        locale.setlocale(locale.LC_ALL, "pt_BR.UTF-8") # Unix
        locale_set_correctly = True
    except:
        try:
            locale.setlocale(locale.LC_ALL, "Portuguese_Brazil.1252") # Linux
            locale_set_correctly = True
        except:
            try:
                locale.setlocale(locale.LC_ALL, "") # Tenta usar o locale padrão
                locale_set_correctly = True
            except:
                pass
    
  2. Read the bank records and then order. If you intend to use all, no problem, but if you’d like to say 100 first in alphabetical order, so unfortunately this option is not for you.

    def locale_sort(result, field):
        if locale_set_correctly:
            def collation(a,b):
                if hasattr(a,field):
                    if hasattr(b,field):
                        fa = getattr(a,field)
                        fb = getattr(b,field)
                        return locale.strcoll(fa, fb)
                    else:
                        return -1
                elif hasattr(b,field):
                    return 1
                else:
                    return -1 if a.pk < b.pk else 1 if b.pk < a.pk else 0
            result.sort(collation)
        return result
    

    Uses so (ordering first the form "dumb" case the locale has not been assigned correctly):

    resultado = locale_sort(list(Carro.objects.all().order_by(Lower('marca'))), 'marca')
    

The locale.strcoll will sort so that all variations of the same letter (uppercase, lowercase, with accent, without accent) stay together in the ordering. Only if the rest is the same, only the letter is different, is it ordered in order minúscula sem acento < maiúscula sem acento < minúscula com acento < maiúscula com acento. Example:

>>> sorted([u"Alberto", u"Álvaro", u"avião", u"águia"], cmp=locale.strcoll)
[u'\xe1guia', u'Alberto', u'\xc1lvaro', u'avi\xe3o']
>>> sorted([u"A", u"Á", u"a", u"á"], cmp=locale.strcoll)
[u'a', u'A', u'\xe1', u'\xc1']

Note: this is a very "robust" solution, which I have been using in practice. If you are sure that the locale will be supported, and all its objects always have the attribute marca, you can simplify this code to:

resultado = sorted(Carro.objects.all(), cmp=lambda a,b: locale.strcoll(a.marca, b.marca))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.