3
I have the following file on CSV (12 Millions of records):
UF Municipio Cod NIS Valor Data
MA IMPERATRIZ 803 16361947271 45.00 01/01/2011
MA IMPERATRIZ 803 74629273937 15.00 01/01/2011
BA RUY BARBOSA 3845 16481166579 50.00 01/02/2011
BA RUY BARBOSA 3845 16481166579 50.00 01/03/2011
MG IPATINGA 653 73639474937 10.00 01/03/2011
MG IPATINGA 653 83733638376 20.00 01/03/2011
MG IPATINGA 653 52648747648 25.00 01/03/2011
...
I need to group the data by Date, UF and Municipality, calculating the amount of NIS and adding up the values. That is, for each group of Date, UF and Municipality, need to count the amount of NIS and add the values. For the above data, the desired result would be:
Data UF Municipio Quant. Valor
01/01/2011 MA IMPERATRIZ 002 60.00
01/02/2011 BA RUY BARBOSA 001 50.00
01/03/2011 BA RUY BARBOSA 001 50.00
01/03/2011 MG IPATINGA 003 55.00
...
This result should generate a new file CSV.
To add or count the values I use the codes below (which work):
Conta_NIS = csvPanda.groupby(['Data', 'UF', 'Municipio']).NIS.count()
Soma_Valor = csvPanda.groupby(['Data', 'UF', 'Municipio']).Valor.sum()
But how to include the two aggregations (count
and sum
) in the same output to export to a new file CSV?
Very grateful to all!
would serve in awk or perl?
– JJoao