Django/Python primary key generator with low collision risk and good performance

Asked

Viewed 88 times

-2

GENERATING UNIQUE PRIMARY APPLICATION KEYS

I would like to get tips on how to generate primary keys with low risk of collision and that do not reduce the performance of Postgresql operations.

Explaining the problem:

Assuming I have three models: Customer, Product and User In a given application I would like to join all these models in the same data table and be able to search through the primary key, but for this each record must have a unique identifier. In this case, I had already thought of using UUID, but they are 128 bits and this ends up damaging the performance of SGDB Postgresql. That’s why I’m here looking for an idea that’s better than the one I’ve already had. I don’t need a universal exclusico identifier, but one that is unique only within the application.

I thought to use, with low possibility of collision:

import uuid
import base64

def get_pk():
    pk = str(base64.b64encode(uuid.uuid4().bytes))[2:-3]
    return pk

Already I thank you for any hint!

  • Which version of postgres you are using?

  • The pseudo type SERIAL meets its requirement?

  • @Danizavtz I use version 12

  • @Augustovasques does not satisfy, because I already use SERIAL in the tables separately. When I join the tables collisions occur. It is logical that I could define different ranges of values for each table, but I am just seeking some more Pitonic method to solve the problem.

  • See this example if it helps you: http://sqlfiddle.com/#! 17/163f5/2 if a sequenced to create the keys, with each insert, regardless of which table, a new key is generated.

  • Good alternative, @Augustovasques. I will evaluate the alternatives, considering safety, performance and likelihood of collisions, but thank you so much for your cooperation.

Show 1 more comment

2 answers

1


An alternative is to solve the problem in SQL by defining a new sequence generator with CREATE SEQUENCE and share it among the tables thus creating each insert a new unique primary key with nextval() whatever the table thus avoiding key collision.

--Inicializa um no gerador de sequência.
CREATE SEQUENCE chave_geral START 1; 

CREATE TABLE Cliente (
    /*A cada inserção na tabela Cliente o gerador de sequência chave_geral será 
    incrementado e utilizado como valor para o campo pKey.*/
    pKey integer DEFAULT nextval('chave_geral') NOT NULL,
    nome CHARACTER(10)  
);

CREATE TABLE Produto (
    /*A cada inserção na tabela Produto o gerador de sequência chave_geral será 
    incrementado e utilizado como valor para o campo pKey.*/
    pKey integer DEFAULT nextval('chave_geral') NOT NULL,
    nome CHARACTER(10)
);

CREATE TABLE Usuario (
    /*A cada inserção na tabela Usuario o gerador de sequência chave_geral será 
    incrementado e utilizado como valor para o campo pKey.*/
    pKey integer DEFAULT nextval('chave_geral') NOT NULL,
    nome CHARACTER(10) 

INSERT INTO Produto(nome) VALUES('Produto1');
INSERT INTO Usuario(nome) VALUES('Usuário1');
INSERT INTO Cliente(nome) VALUES('Cliente1');

SELECT C.pkey AS ChaveCliente,
       P.pkey AS ChaveProduto,
       U.pkey AS ChaveUsuário
from Cliente C, Produto P, Usuario U;

test the example in SQL Fiddle

The performance of this approach is equivalent to the use of the pseudo-type SERIAL:

CREATE TABLE tablename (
    colname SERIAL
);

Since that pseudo type SERIAL is Postgreesql-specific syntactic sugar used for sequence generator creation and assign it to the specified column:

CREATE SEQUENCE tablename_colname_seq AS integer;
CREATE TABLE tablename (
    colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;

0

Hashing uuid will generate a much smaller key with a huge possibility of not conflicting

import uuid
import base64

def get_pk():
    pk = hash(uuid.uuid4())
    return pk

Output example:

>>> print(get_pk())

749691330465060769

Note

Size difference in memory

>>> import sys

>>> sys.getsizeof(get_pk())
32

>>> sys.getsizeof(uuid.uuid4())
56

>>> sys.getsizeof(str(base64.b64encode(uuid.uuid4().bytes))[2:-3])
71
  • Thanks for the tip, Paulo Marques! I’ll wait to see if anyone has a better solution, but this method is quite interesting. My goal is simple to solve, but it costs nothing to search for new solutions.

  • 1

    For those who voted negative, I could tell you the reason why I can improve the answer?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.