Calculate how many times the values are repeated inside a dictionary KEY - PYTHON

Asked

Viewed 647 times

0

I have a dictionary that is created from a select in the database, what I needed was to generate a metric from that dictionary

Dictionary

# cria dicionário do banco de dados com os valores
lds_data = {}
for lds_item in db_result:
    lds_data.update({lds_item[1] : {'code_client' : lds_item[0], 'city' : lds_item[2]}})

Exit of the Dicinário:

u'BRASIL_ALIMEN': {'code_client': u'BRA', 'city': u'SAO PAULO'},
u'BRASIL_CARROS': {'code_client': u'BRC', 'city': u'PARANA'}

Example of metric:

code_client: BRA appears 1x within the dictionary

In short:

I need to calculate how many times the values inside the KEY = *code_client*

I tried to do it this way:

ct = {}
for key in lds_data:
    ct['code_client'] = len(lds_data[key]['code_client'])
  • He even used a countor if to count?

  • That’s the only way I could do it: print(Len(lds_data.Keys()))

  • Take a look at this documentation: https://docs.python.org/2/library/collections.html#Collections. Counter

  • for item in lds_data: clients = Counter(lds_data[item]['code_client']) print(clients) Ja tested, but to no avail...

  • If you understand English, take a look at this question: https://stackoverflow.com/questions/17705829/count-repeated-keys-in-a-dict

  • I’ll take a look, thanks!

  • 1

    Luis, for your last questions here on the site I think you need to study SQL. You can take the replay count directly from the database, it’s much more performatic. Unless you’re learning python and you don’t want to use SQL on purpose.

  • Unfortunately I can not use SQL, I have to do in python in the back-end... It would be much faster to do straight in the comic book, but unfortunately I can not, but thanks for the tip!

  • But where does the db_result?

  • A DB2 database

  • 2

    That’s not what I meant... the db_result is the result of a query made by python in a database... You cannot modify this query?

  • Ah ok, sorry, n had understood, I can, but n I can perform a different query than the one I was given, I can only work on the script

  • @Luisv. answered the question with 3 ways to make the occurrence count in eternal. Any question is just ask.

Show 8 more comments

2 answers

3


To perform a count of code_client of your records using python you can use:

  1. the class collections.Counter
  2. a common dictionary
  3. the class collections.default_dict
  4. ...some other solution I don’t know...

For the following examples, I will use a sequence of tuples to simulate the return of a SELECT in the database... Fictitious data are:

dados = (
    ('BRA', 'BRASIL_ALIMEN', 'SAO_PAULO'),
    ('BRA', 'BRASIL_CARROS', 'PARANA'),
    ('BRA', 'BRASIL_NAVIOS', 'PARAIBA'),
    ('CAN', 'CANADA_ALIMEN', 'ALBERTA'),
    ('USA', 'USA_CARROS', 'MASSACHUSSETS'),
    ('USA', 'USA_NAVIOS', 'CALIFORNIA'),
    ('UK', 'UK_NAVIOS', 'YORK'),
)

In the following examples, I will count the occurrence of the first element of tuples, as BRA, CAN, etc...


Necessary knowledge

In the examples of the answers I use list comprehensions* and iterable unpacking (see the PEP 3132 for more information).

But for the sake of clarity, here is a brief demonstration of how I use them in the answers below:

# cria uma lista normal
lista = [1, 2, 3, 4, 5]

# Usa uma list comprehension para criar outra lista
list_comprehension = [-x for x in lista]
# list_comprehension = [-1, -2, -3, -4, -5]

# Usa iterable unpacking para "quebrar a lista em pedaços"
um, dois, *restante = lista
# um = 1
# dois = 2
# restante = (3, 4, 5)

With this information, I believe the following codes will be no problem.

* is actually a Generator Expressions, but it will be much easier to understand Generator Expressions, Dict comprehensions and its variations, if you understand list comprehensions.


1. Using collections.Counter (#Docs)

The class Counter is a subclass of dict, the standard python dictionary, which serves as a counter for objects hashable.

We can create a counter from any iterable, such as a list, tuple or, for example, a string:

from collections import Counter

contador = Counter("abracadabra")
# Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

That way we can make it Counter count only the first elements of a tuple with the following code:

from collections import Counter

contador = Counter(cod_cliente for cod_cliente, *_ in dados)
# contador = Counter({'BRA': 3, 'USA': 2, 'CAN': 1, 'UK': 1})

Remembering that:

[cod_cliente for cod_cliente, *_ in dados]
# ['BRA', 'BRA', 'BRA', 'CAN', 'USA', 'USA', 'UK']

2. Using dict (#Docs)

We can use a common dictionary to add the sum of occurrences as we iterate over dados.

For this we only have to deal with when the key does not yet exist in the dictionary, because if we try to get a non-existent key, dict.__getitem__ invoke dict.__missing__ who will make the exception KeyError. Examples:

dicionario = {'teste': 10}

# 1) Atualiza uma chave existente (OK)
dicionario['teste'] = dicionario['teste'] + 1
# dicionario = {'teste': 11}

# 2) Atualiza uma chave inexistente (erro)
dicionario["outra-chave"] = dicionario["outra-chave"] + 1
#                           ^^^^^^^^^^^^^^^^^^^^^^^^^
# KeyError: 'outra-chave' não existe em 'dicionario'

# 3) Testando antes de usar a chave (OK)
if 'outra-chave' not in dicionario:
    dicionario["outra-chave"] = 0

dicionario["outra-chave"] += 1
# dicionario = {'teste': 11, 'outra-chave': 1}

You could also use the exception KeyError to treat these cases. Example:

dicionario = {}
try:
    dicionario['teste'] += 1
except KeyError:
    dicionario['teste'] = 1

However, dictionaries have the method get who receives the arguments dict.get(key, default), where key is the key to the dictionary you want to read and default is the value that will be returned if this key does not exist.

In our case, we want to add 1 unit to the current value of the key, but if the key does not exist we want this value to be 0. See in practice:

dicionario = {}

dicionario['teste'] += 1
# KeyError
dicionario['teste'] = dicionario['teste'] + 1
# KeyError

dicionario['teste'] = dicionario.get('teste', 0) + 1
# dicionario = {'teste': 1}

Thus, if the key does not yet exist, it creates the new key with the appropriate value.

The final code would be:

contador = {}

for code_client, *_ in dados:
    contador[code_client] = contador.get(code_client, 0) + 1

# contador = {'BRA': 3, 'CAN': 1, 'USA': 2, 'UK': 1}

3. Using defaultdict (#Docs)

Just like the collections.Counter mentioning earlier, default_dict is also a subclass of dict.

The class defaultdict has an attribute default_factory which must be an enforceable object or None.

By default, when accessing a nonexistent key, the method dict.__getitem__ invokes the method dict.__missing__, and this throws an exception KeyError. Already the defaultdict override the method dict.__missing__ to invoke defaultdict.default_factory and use your return as default value if the key does not exist.

A summary:

dicionario = {}
valor = dicionario['teste']
# 1. invoca dicionario.__getitem__('teste')
# 2. chave não existe, então invoca dicionario.__missing__('teste')
# 3. dict.__missing__ lança uma KeyError
# KeyError

Now defaultdict:

from collections import defaultdict

# função que será a 'default_factory' do defaultdict
def valor_padrao():
    return "Valor padrão"

dicionario = defaultdict(valor_padrao)
valor = dicionario['teste']
# 1. invoca dicionario.__getitem__('teste')
# 2. chave não existe, então invoca dicionario.__missing__('teste')
# 3. defaultdict.default_factory é um objeto invocável, então retorna o resultado do método
# 4. dicionario['teste'] = dicionario.default_factory()
# 5. valor = dicionario['teste']
# valor = 'Valor padrão'

If defaultdict.default_factory for None, defaultdict behaves in the same way that dict and will launch a KeyError in non-existent keys.

For our final code, just create a function that returns zero and use it as default_factory. For our convenience the function int, if invoked without parameters, returns zero. Then the final code using defaultdict would be:

contador = defaultdict(int)

for code_client, *_ in dados:
    contador[code_client] += 1

# contador = defaultdict(<class 'int'>, {'BRA': 3, 'CAN': 1, 'USA': 2, 'UK': 1})

These are 3 ways you can count the repetitions you receive from your database, but remember for future visitors you use GROUP BY and COUNT in his query is much more performatic.

I created this Repl.it with the 3 examples running, if it is in someone’s interest.

0

I resorted to ask in the gringa, the solution is below:

Solution

ct = {}
for _, value in lds_data.items():
    if value['code_client'] in ct:
        ct [value['code_client']] += 1
    else:
        ct [value['code_client']] = 1
  • 1

    If you’re going to ignore the dictionary keys, you could just do for value in lds_data.values()

Browser other questions tagged

You are not signed in. Login or sign up in order to post.