1
Hello, my friends!
Currently I do a scientific research in the field of AI, Machine Learning - Classification, using the Python language.
In view of this, I have some data to develop an algorithm, which will be used to train and test my machine. But there are more than 300,000 and, because I’m allocating a lot of data in a single variable, my development environment suffers memory error.
My strategy is to break up this Dataframe and set a limit, but I looked and I can’t find anything. The goal is to take this 300,000 data and limit it to 5,000 or 10,000. So I get a good amount to train my machine without suffering memory error.
Do you have any idea how I do it?
Use the parameter nrows to limit the number of lines to be read:
read_csv(..., nrows=5000)– Augusto Vasques
Augusto, I found this very interesting. But is there any way I can set this limit within the variable? Instead of just reading? 'Cause I need you to cross that line that I set...
– Bianca Viana
Partially? Do you want to pick up random lines? If this is the case you will have to put a minimum sample of data and a [mcve] because depending on the format of the table is a process.
– Augusto Vasques
No kkkkkk was supposed to create another paragraph, but I ended up giving enter, I’m sorry... I edited, can reconsider the question, please?
– Bianca Viana
I need to see what you’re doing. I’m pretty sure you want a slice of the data, but I can’t say or indicate the best way without seeing the code.
– Augusto Vasques
Augusto, everything worked out! I had put the wrong number. Thank you so much for your help and time, you served me and saved me too much!
– Bianca Viana