Single values average filter in dictionary list

Asked

Viewed 73 times

0

I have the following dictionary list in Python 3.7:

a = [
    {'linha': 0,  'porcentagem': 1.0,   'id': 3,  'nome': 'bruno'},
    {'linha': 8,  'porcentagem': 1.0,   'id': 7,  'nome': 'teste'},
    {'linha': 12, 'porcentagem': 1.0,   'id': 8,  'nome': 'testerino'},
    {'linha': 18, 'porcentagem': 1.0,   'id': 9,  'nome': 'joão'}, 
    {'linha': 7,  'porcentagem': 0.624, 'id': 3,  'nome': 'bruno'},
    {'linha': 23, 'porcentagem': 0.624, 'id': 10, 'nome': 'mais um teste'},
    {'linha': 2,  'porcentagem': 0.439, 'id': 3,  'nome': 'bruno'},
    {'linha': 10, 'porcentagem': 0.439, 'id': 7,  'nome': 'teste'},
    {'linha': 13, 'porcentagem': 0.439, 'id': 8,  'nome': 'testerino'},
    {'linha': 19, 'porcentagem': 0.439, 'id': 9,  'nome': 'joão'},
    {'linha': 1,  'porcentagem': 0.418, 'id': 3,  'nome': 'bruno'},
    {'linha': 9,  'porcentagem': 0.418, 'id': 7,  'nome': 'teste'},
    {'linha': 15, 'porcentagem': 0.418, 'id': 8,  'nome': 'testerino'},
    {'linha': 20, 'porcentagem': 0.418, 'id': 9,  'nome': 'joão'},
    {'linha': 5,  'porcentagem': 0.294, 'id': 3,  'nome': 'bruno'},
    {'linha': 17, 'porcentagem': 0.294, 'id': 8,  'nome': 'testerino'},
    {'linha': 6,  'porcentagem': 0.277, 'id': 3,  'nome': 'bruno'},
    {'linha': 22, 'porcentagem': 0.277, 'id': 9,  'nome': 'joão'}
]

I would like to get the following output:

[
    {'linha': 6,  'porcentagem': 0.509, 'id': 3,  'nome': 'bruno'},
    {'linha': 9,  'porcentagem': 0.619, 'id': 7,  'nome': 'teste'},]
    {'linha': 17, 'porcentagem': 0.537, 'id': 8,  'nome': 'testerino'},
    {'linha': 22, 'porcentagem': 0.534, 'id': 9,  'nome': 'joão'},
    {'linha': 23, 'porcentagem': 0.624, 'id': 10, 'nome': 'mais um teste'}
]

Important note that the linha is not a relevant fact, what I really need is: id, name and percentage.

I got a similar result with the following code:

b = list({r['id']: r for r in a}.values())

But the result was simply the last occurrences of such values, but here comes the most important point, I need the average of all the values of that id, that is, the average of [1.0, 0.624, 0.439, 0.418, 0.294, 0.277] (values of Bruno, id 3) is 0.509 and this is the value that should appear in the new listing, is a filter with the average for each of the repeated elements.

1 answer

3


Using pandas, a Python library that works with Dataframes, it is possible to solve this problem easily.

import pandas as pd


a = [
    {'linha': 0,  'porcentagem': 1.0,   'id': 3,  'nome': 'bruno'},
    {'linha': 8,  'porcentagem': 1.0,   'id': 7,  'nome': 'teste'},
    ...
    {'linha': 22, 'porcentagem': 0.277, 'id': 9,  'nome': 'joão'}
]

# Transforma seus dados de `a` em um DataFrame
df = pd.DataFrame(data=a)

>>> print(df)
    id  linha           nome  porcentagem
0    3      0          bruno        1.000
1    7      8          teste        1.000
...
17   9     22           joão        0.277

Then you just group (groupby) the data of the columns nome and id and leave as a rule for the column porcentagem average:

df_gb = df.groupby(['nome', 'id']).agg({'porcentagem':'mean'})

>>> print(df_gb)
                  porcentagem
nome          id             
bruno         3      0.508667
joão          9      0.533500
mais um teste 10     0.624000
teste         7      0.619000
testerino     8      0.537750
```
  • How do I convert these values to a dictionary list again? I used new_df_gb = df_gb.to_dict(), but it creates a list with a single dictionary with the key porcentagem and within it a list with dictionaries where the keys are a tupla with id and name and value to percentage. this has already helped me to move forward, but I wonder if there is a way to convert directly.

  • 3

    @bruno101, to convert into a dictionary as you have the question and using the example of the answer you can simply do at the end: df_gb.reset_index().round({'porcentagem': 3}).to_dict(orient='recods'), and you get [{'nome': 'bruno', 'porcentagem': 0.509, 'id': 3}, {'nome': 'joão', 'porcentagem': 0.534, 'id': 9}, {'nome': 'mais um teste', 'porcentagem': 0.624, 'id': 10}, {'nome': 'teste', 'porcentagem': 0.619, 'id': 7}, {'nome': 'testerino', 'porcentagem': 0.538, 'id': 8}]

  • @Miguel and @Alexciuffa, it worked great, I had heard of this library pandas but I used little, thanks for the help, I’ll look more closely at this library.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.