I would like to suggest another approach. If the numbers are integer or with a few decimal places, many of them can be repeated, then we can count the number of occurrences of each number one with a dictionary. So we don’t use so much memory.
1) We read the file line by line and make a dictionary with the occurrences while maintaining a list with the ordered keys (we will use in the future)
Function that puts a number in a list at the ordered position:
def adiciona_na_ordem(lista, tamanho_lista, numero):
for i in range(0, tamanho_lista):
if numero < lista[i]:
break
else:
i+=1
return lista[:i]+[numero]+lista[i:]
We read line by line:
entrada = open('dados.txt', 'r')
primeiro_valor = int(entrada.readline())
ocorrencias = {primeiro_valor:1}
lista_chaves_ordenadas = [primeiro_valor]
num_linhas = 1
for linha in entrada:
numero = int(linha)
if numero in ocorrencias: #Se o numero ja esta no dicionario
ocorrencias[numero] += 1
else: #Se não está, adiciono em ocorrencias
ocorrencias[numero] = 1
#OU ocorrencias.update({numero:1})
lista_chaves_ordenadas = adiciona_na_ordem(lista_chaves_ordenadas, len(lista_chaves_ordenadas), numero)
num_linhas += 1
entrada.close()
Example of what the variables would look like:
para uma entrada:
7
7
1
4
4
5
6
7
>>> print (ocorrencias)
{7: 3, 1: 1, 4: 2, 5: 1, 6: 1}
>>> print (lista_chaves_ordenadas)
[1, 4, 5, 6, 7]
2) Now we calculate the median by going through the dictionary in order of ordered keys until the sum of occurrences reaches half
if num_linhas % 2 == 1: #Buscamos o elemento central
num_elementos = 0
for key in lista_chaves_ordenadas:
num_elementos += ocorrencias[key]
if num_elementos >= num_linhas/2:
print (key)
break
else: #media dos dois elementos centrais
num_elementos = 0
mediana = None
for key in lista_chaves_ordenadas:
num_elementos += ocorrencias[key]
#Prox 2 ifs se os valores medianos forem chaves diferentes, ex: 5 e 6
if num_elementos == num_linhas/2:
mediana = key
if num_elementos > num_linhas/2 and mediana != None:
mediana += key
print(mediana/2)
break
#Prox if se os valores medianos forem a mesma chave, ex: 6 e 6
if num_elementos > num_linhas/2 and mediana == None:
mediana = key
print(key)
break
Edit:
If you use pytho2.7, I think the dictionary is already sorted, so you don’t need the sorted keys_list, just do for key, value in ocorrencias.iteritems():
instead of for key in lista_chaves_ordenadas:
. But be careful with the divisions. Do /2.0
not to round to the whole
1.5mi is 1500 or 1.5 million?
– Woss
There are approximately 1.5 million lines.
– Daniel Augusto
Daniel, I took the liberty of editing your question and making it a little more direct. I could confirm for myself that I didn’t mistake the interpretation and changed the question?
– Woss
@Andersoncarloswoss did not miss the performance and got much better. Thank you!
– Daniel Augusto
Behold this code
– danieltakeshi