list, max/min in python

Asked

Viewed 4,333 times

3

Hello, I need to make a python algorithm that reads a text file .csv which contains literally 5 million numbers and I need this algorithm to tell me which is the smallest and which is the largest number on the list. Now, the problems:

I used this code to open the list in python:

import csv 

lista = open('lista.csv', 'r') read: 
csv.reader(lista) 

for linha in reader: 
    print (linha)

It works normal, but to present the largest and the smallest, it would be this:

import csv 

lista = open('lista.csv', 'r') reader: 
csv.reader(lista) 

for linha in reader: 
    print linha 

    menor = min(linha) 
    maior = max(linha) 

    print (menor, maior) 

The algorithm works, the real problem is that it appears that the smallest value is null and the greatest value is -83422495.2710933

We have already tried to put separate (an algorithm for higher number and one for the lower number) and no use, we have also tried to take out the 'for' and it does not work...

I wanted to know if there’s another way to do it or if we’re missing it... I’d like to thank you very much.

  • 1

    I think there’s only one sample of csv missing to see how the values are arranged, if it’s one value per line and it’s 5 million lines, or it’s 5 million numbers on a line?

  • There are 9 or 10 columns with 5 million lines. I believe the program is reading all the numbers

2 answers

1

You can use pandas, I created an example with only 36 random numbers (3 rows of 12 columns) generated randomly to simulate your csv, I read this 'file' for a Dataframe pandas object and then present the minimum and maximum values.

import io
import pandas as pd

# Simulando o csv
lista = '''
6848, 8453, 6877, 3508, 2071, 1962, 7274, 4901, 9369, 3498, 2138, 2504, 9948
6543, 7021, 260, 2392, 648, 9947, 6759, 3553, 3437, 4121, 2637, 8067, 9421 
6609, 5229, 1872, 2288, 8448, 9701, 1256, 4489, 7549, 2844, 4561, 3291, 5472 
'''

# Lendo o csv
df = pd.read_csv(io.StringIO(lista), header=None)

# Apresentando o resultado
print('Valor máximo:', df.values.max())
print('Valor mínimo:', df.values.min())

Exit:

Valor máximo: 9948
Valor mínimo: 260

Obs.
1. I assumed that your csv has no header for the columns, if you remove the header=None of the csv read command.
2. If you want to/need more functions such as sum(), mean(), etc..

Edited
To test the possibility that the amount of data is a problem, I created an example where I create a Dataframe with 6mi of numbers extracted from a randomly generated numpy array, then present the minimum and maximum values, and the average of all values.

import numpy as np
import pandas as pd
qt = 6000000

df = pd.DataFrame(np.random.randint(0,qt,size=(1000000,6)))

print('Valor máximo:', df.values.max())
print('Valor minimo:', df.values.min())
print('Média dos valores:', df.values.mean())

Exit:

Valor máximo: 5999999
Valor minimo: 2
Média dos valores: 2999119.789572667

Even without pandas, it is possible to extract the maximum and minimum values directly from a list with 5mi of python data "pure":

lista = list(range(0, 5000000))
print('Máximo:',max(lista))
print('Mínimo:',min(lista))

Exit:

Máximo: 4999999
Mínimo: 0
  • Thank you so much, me and a friend are working on it, let’s think of a way and come back to say if it worked =)

  • The "way" is to fully reproduce my code, if it doesn’t work out, you probably have some problem with your csv.

  • csv works normally, I believe my code is giving error due to number amount and file size.

  • I find it unlikely that this is the reason, I have worked on a scientific project (image recognition) with data much larger than that using pandas, no problems, I will edit my answer giving an example with 5mi of data.

  • Here are the first lines of the file: 4034502.7276325077, -48332085.46707194, 3039517.323714033, 34621956.41685204, 83558920.69945103, 60675521.97461882, -50771435.78885216, -18648336.730533928, 79937747.62000978, 79916178.6943368 -22290443.041666657, -68526627.05718894, 14971865.045767307, 44530748.08120343, -56745411.08461867, 80380542.97220108, -79189767.50952648, -5623825.827204466, 73029838.43669158,

  • Does saving the file as text, helps in the matter of columns and rows?

  • It is not necessary, I made a test here, I created a csv (through the pandas, :-)) with 10 million numbers in 5 columns, I read through the pandas and extracts the minimum and the maximum, without problems. Have you tested my code? Had any problems?

  • As I said earlier, I’m working with a friend, we’ll see tonight. Let’s see if we can with your code, I hope yes kkkk we are already discouraged by so many mistakes.

  • I tried with your code but I need to download the pandas library, and I can not at all, both in Pip and in Anaconda... disappointed

  • Which version of python, anaconda? which error occurs?

Show 5 more comments

0


The problem was solved like this:

import re

pattern = re.compile(r"-?\d+\.?\d*")

with open("C:/Users/lab2d/Downloads/lista2.txt") as f:
    numeros = pattern.findall(f.read())

numeros = [float(i) for i in numeros]

if numeros:
    print("Maior valor:", max(numeros))
    print("menor:", min(numeros))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.