Improve search performance when there is no record

Asked

Viewed 374 times

1

I have a database with 9551011 lines. It contains addresses from all over the country.

When I make an appointment for a certain place, and that address exists in the bank, I can get a return within two seconds. The problem is that this database is used in an application where one has to upload a file with multiple addresses, and many of these addresses do not exist in the database. When an address does not exist, it takes TOO LONG in that search... Is there any way to improve it? I’ve tried it with some classic forms of performance in the Postgresql world like Vacuum, for example. But I haven’t been successful so far.

Columns in the table: id, gid, nome, num_inicio_esquerda, num_fim_esquerda, num_inicio_direita, num_fim_direita, cep_esquerda, cep_direita, bairro_esquerda, bairro_direita, nivel_detalhamento, estado_nome, estado_sigla, cidade, latitude_inicio, latitude_fim, longitude_inicio, longitude_fim.

  • Index in column name.

It is possible to make the bank not slow to give the answer when no record is found?

Example of a query performed in the address table:

select
      gid,
      bairro_esquerda,
      cep_direita,
      cep_esquerda,
      cidade,
      estado_nome,
      estado_sigla,
      latitude_fim,
      latitude_inicio,
      longitude_fim,
      longitude_inicio,
      nivel_detalhamento,
      nome,
      num_fim_direita,
      num_fim_esquerda,
      num_inicio_direita,
      num_inicio_esquerda 
  from
      enderecos
  where
      nome like 'avenida treze de maio'
      and (
          estado_sigla like 'CE'
      ) 
      and (
          lower(unaccent(cidade))='fortaleza'
      ) 
      and (
          cast(num_inicio_esquerda as integer)<=1116 
          and cast(num_fim_esquerda as integer)>=1116 
          or cast(num_fim_esquerda as integer)<=1116 
          and cast(num_inicio_esquerda as integer)>=1116 
          or cast(num_inicio_direita as integer)<=1116 
          and cast(num_fim_direita as integer)>=1116 
          or cast(num_fim_direita as integer)<=1116 
          and cast(num_inicio_direita as integer)>=1116
      ) limit 1
  • 4

    It is possible, but only knowing the problem in detail to help. Usually the solution is to create index. http://answall.com/q/35088/101, http://answall.com/q/32052/101 and http://answall.com/q/55118/101

  • @Naldson Chagas: It depends a lot on the query you are running in the database. You could post some example ?

  • Guys, I got a significant improvement by adding index in three more columns (name, city, state). @Lacobus: I will post an example of a query.

1 answer

2

Use of the clause is not recommended LIKE for textual searches in tables with a large volume of data.

It is recommended to use a Postgres resource called FTS (Full Text Search).

Follow a step-by-step how you can use it to improve the performance of your query:

1 - Create an auxiliary column sv_nome of the kind tsvector on the table enderecos:

ALTER TABLE enderecos ADD COLUMN sv_nome tsvector;

2 - Create a TRIGGER that is fired at each INSERT and UPDATE on the table enderecos, that will be able to keep the auxiliary column always cohesive:

CREATE FUNCTION fc_enderecos() RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        new.sv_nome = to_tsvector('portuguese', COALESCE(NEW.nome, ''));
    END IF;

    IF TG_OP = 'UPDATE' THEN
        IF NEW.nome <> OLD.nome THEN
            new.sv_nome = to_tsvector('portuguese', COALESCE(NEW.nome, ''));
        END IF;
    END IF;

    RETURN NEW;
END
$$
LANGUAGE 'plpgsql';

CREATE TRIGGER trg_enderecos BEFORE INSERT OR UPDATE ON enderecos FOR EACH ROW EXECUTE PROCEDURE fc_enderecos();

3 - Creation of INDEX in the auxiliary column created in step 1:

CREATE INDEX idx_enderecos_nome ON enderecos USING gin( to_tsvector('portuguese'::regconfig, COALESCE((sv_nome)::text, ''::text)));

4 - Updating existing data in the table:

UPDATE enderecos SET sv_nome = to_tsvector('portuguese', COALESCE(nome, ''));

5 - And finally, your consultation can be done as follows:

SELECT
    gid,
    bairro_esquerda,
    cep_direita,
    cep_esquerda,
    cidade,
    estado_nome,
    estado_sigla,
    latitude_fim,
    latitude_inicio,
    longitude_fim,
    longitude_inicio,
    nivel_detalhamento,
    nome,
    num_fim_direita,
    num_fim_esquerda,
    num_inicio_direita,
    num_inicio_esquerda 
FROM
    enderecos
WHERE
    sv_nome @@ public.to_tsquery('portuguese', 'avenida&treze&de&maio' ) AND
    and (
          estado_sigla = 'CE'
      ) 
      and (
          lower(unaccent(cidade))='fortaleza'
      ) 
      and (
          cast(num_inicio_esquerda as integer)<=1116 
          and cast(num_fim_esquerda as integer)>=1116 
          or cast(num_fim_esquerda as integer)<=1116 
          and cast(num_inicio_esquerda as integer)>=1116 
          or cast(num_inicio_direita as integer)<=1116 
          and cast(num_fim_direita as integer)>=1116 
          or cast(num_fim_direita as integer)<=1116 
          and cast(num_inicio_direita as integer)>=1116
      ) limit 1

I hope I helped!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.