Problem with sorting in Postgresql

Asked

Viewed 1,530 times

1

I’m having a problem sorting columns in Postgresql despite being configured correctly in UTF-8.

The version of Postgresql is 9.3 and it is installing on a Macosx Maverick 10.9.5 (this problem has already occurred in previous versions of the system).

inserir a descrição da imagem aqui

When I order the 'name' column in phpPgAdmin, Django or by the terminal, Postgresql is not dealing with upper, lower and lower case characters and accents correctly.

See the sorted table as it is getting:

inserir a descrição da imagem aqui

As you can see, lowercase characters are getting after uppercase characters, and accented characters are getting last.

I want that regardless of having accents, be capitalized or lowercase, that the ordering is performed correctly. Someone knows how to solve this?

  • 1

    It’s not exactly a solution, but try sorting by lower(nome) (which is basically what a CITEXT column does automatically).

  • Have you tried to explicitly in the comparison query? Type select * from tabela order by nome collate "pt_BR"; Or maybe this combined with the lower(nome). I do not know if it will work, because I have read an ordination that at the same time respects accents and ignores capitalization is problematic, but here is the suggestion...

  • @Bacco the problem is that just this is not enough, I really wanted to change some Postgresql configuration to return the way I want without the need for this.

  • @mgibsonbr the last time I had this problem I was obliged to create a new column for each field that needed sorting and filter, these new columns being treated to not have accents and uppercase characters, so I change the order of the treated column, but I show you the result of the original column. Due to the system resources and number of tables, this type of query is unviable as you suggested. The way out will be for me to resort to what I used to do.

  • For instance, which column collation name ?

  • use citext reference at link :http://simplesideias.com.br/usando-campos-case-insensitive-no-postgresql

  • @gmsantos the table is being generated by Django, and the bank is set to lc_collate pt_BR.UTF-8

  • @Bacco but just leave it tiny is not enough, I need that in reading do not distinguish when there are accents, so I am being obliged to have two columns 'name' and 'sort_name', and in the first the original text is saved and in the second the text without accents and lowercase. In this way it works perfectly, when seeking words without placing accent, returns also those who have accent. The only problem is the duplicate content only to work correctly the ordering and the search.

  • @Bacco remember that with mysql did not have this problem, the issue is that it is impossible to change table by table, even because they are created dynamically and at any time can be recreated/changed, outside when it is run tests and I have no control over it.

  • @Orion posted a test in SQL Fiddle with collation no 9.3 and apparently was in order.

Show 5 more comments

4 answers

3

Postgresql uses the collations provided by the system. Check which ones are available:

$ locale -a

In Linux Fedora the collation "pt_BR" works correctly. Try other collations in your query. Note that only the collations for the UTF-8 encoding will work in the UTF-8 database:

select * 
from t 
order by nome collate "pt_PT" -- ou "en_US"

If no collation works you can still, for test purpose, create a base in the encoding LATIN1 (ISO-88591) and try the collations again.

Its biggest problem is that OSX should not be used on a production server. If there is an opportunity try or suggest to the authorities on call a Linux distribution.

  • Still remains the same problem, must be some config in OSX related to postgresql that is causing this

2

The ideal is to specify the encoding and the collation when creating the database, so that future tables stay the way you want them to be:

CREATE DATABASE name ENCODING 'UTF8 LC_COLLATE 'pt_BR'

A possible alternative is to specify the collation in the column itself:

CREATE TABLE alfabetica (
    id SERIAL PRIMARY KEY,
    nome TEXT COLLATE "pt_BR"
);

See an example working on SQL Fiddle.

  • No use, I did exactly what you said and the problem continues, must be some configuration of postgresql in macosx

  • 2

    @Too bad I can’t help, I can’t even test it right here. But let’s hope for someone with PG experience and a MAC post some alternative. I think I’ll leave the answer here anyway, in case another user with another OS has a reference. But if you find anything, post a response from you, or give feedback if you can.

0

I’m not experienced in Postgresql to say there’s nothing wrong with your bank, but I’m experienced enough with Mac to say there’s a problem with OS X.

By default, the options of locale Mac are just symbolic links to other options.

In my version (Sierra), for example, the LC_COLLATE of "pt_BR.UTF-8" points to the LC_COLLATE of "la_LN.US-ASCII".

This is a huge problem (not to say nonsense) in several senses, but in practice what it means is that any library that depends on the collate of the system will work in the wrong way.

I agree with @Clodoaldo that the best is to take OS X out of production.

-1

Delayed response but may still be useful to someone:

The only solution that worked for me was to use the encoding LATIN1 instead of UTF-8, and with the collation pt_BR.ISO8859-1 that was already installed natively on my mac (Macos Catalina 10.5.5).

Follow the command I used:

create database encoding test "LATIN1" lc_collate "pt_BR.ISO8859-1" lc_ctype "pt_BR.ISO8859-1" template0 template;

With this, the ordering worked respecting the accented characters.

Remarks:

- LATIN1 encoding is more limited than UTF-8. For example, you may not include in the text fields characters that are not compatible with Latin languages (such as Greek characters or some special characters), emoticons, etc. These things are supported in UTF-8. But depending on the need of your database, this is not necessary and LATIN1 meets your need.

- Be careful if you are restoring a base backup that used UTF-8 on a new base created in LATIN1. This can make the accented letters all shuffled. But in my case it didn’t happen, it worked cool.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.