I’m not sure what the format is really like in hd5 (I researched but I couldn’t figure it out), if it’s like what you put instead of doing ...split(',')
as I do in the examples below ....split(' ')
(4 spaces). Csv format I used for testing is:
2016-01-01 00:00:00, NaN
2016-01-01 01:00:00, 22.445700
2016-01-01 02:00:00, 22.388300
2016-01-01 03:00:00, 22.400000
2016-01-01 04:00:00, NaN
2016-01-01 05:00:00, 22.133900
2016-01-01 06:00:00, 21.948999
2016-01-01 07:00:00, 21.787901
...
With groupby you can do so:
from itertools import groupby
with open('tests.csv', 'r') as f:
dados = [(l.split(',')[0], l.split(',')[1].strip()) for l in f]
print(dados) # [('2016-01-01 00:00:00', 'NaN'), ('2016-01-01 01:00:00', '22.445700'), ('2016-01-01 02:00:00', '22.388300'), ('2016-01-01 03:00:00', '22.400000'), ...]
dados_sort = sorted((k.split()[1], v) for k, v in dados) # importante
for hora, group in groupby(dados_sort, key=lambda x: x[0]):
group = list(group)
if any(v == 'NaN' for k, v in group):
print('Existem {} NaN na hora {}'.format(len(group), hora))
Program output for data you give:
There are 2 Nan on time 00:00:00
There are 2 Nan at the time 04:00:00
There are
1 Nan on time 09:00:00
But honestly I would not do so in this case (unless I really had to), I would do so:
from collections import Counter
dados = {}
with open('tests.csv', 'r') as f:
for l in f:
hora, val = l.split(',') # hora e temperatura, deves ja ter isto devidido por linha no teu caso
dados.setdefault(val.strip(), []).append(hora.split(' ')[1])
print(dados) # {'22.388300': ['02:00:00'], '23.810600': ['03:00:00'], '21.610300': ['08:00:00'], '22.400000': ['03:00:00'], '21.948999': ['06:00:00'], 'NaN': ['00:00:00', '04:00:00', '09:00:00', '00:00:00', '04:00:00'], '22.910700': ['02:00:00'], '22.445700': ['01:00:00'], '21.787901': ['07:00:00'], '22.133900': ['05:00:00'], '21.310800': ['01:00:00']}
print(Counter(dados['NaN']))
{'00:00:00': 2, '04:00:00': 2, '09:00:00': 1}
Or, if you don’t need to store all the values you can just:
from collections import Counter
list_NaN = []
with open('tests.csv', 'r') as f:
for l in f:
hora, val = l.split(',')
if val.strip() == 'NaN':
list_NaN.append(hora.split(' ')[1])
print(Counter(list_NaN))
{'00:00:00': 2, '04:00:00': 2, '09:00:00': 1}
You can put the file somewhere that you can download and test. I’m not familiar with hd5 but I think I can help you with groupBy
– Miguel
Could be . csv, that’s no problem. The problem is I apply this in df.
– Lucas Fagundes
Okay, I’m gonna go with csv and see if I can help you
– Miguel