How to split columns/data with a specific limit?

Asked

Viewed 52 times

1

Hello, my friends!

Currently I do a scientific research in the field of AI, Machine Learning - Classification, using the Python language.

In view of this, I have some data to develop an algorithm, which will be used to train and test my machine. But there are more than 300,000 and, because I’m allocating a lot of data in a single variable, my development environment suffers memory error.

My strategy is to break up this Dataframe and set a limit, but I looked and I can’t find anything. The goal is to take this 300,000 data and limit it to 5,000 or 10,000. So I get a good amount to train my machine without suffering memory error.

Do you have any idea how I do it?

  • Use the parameter nrows to limit the number of lines to be read:read_csv(..., nrows=5000)

  • Augusto, I found this very interesting. But is there any way I can set this limit within the variable? Instead of just reading? 'Cause I need you to cross that line that I set...

  • Partially? Do you want to pick up random lines? If this is the case you will have to put a minimum sample of data and a [mcve] because depending on the format of the table is a process.

  • No kkkkkk was supposed to create another paragraph, but I ended up giving enter, I’m sorry... I edited, can reconsider the question, please?

  • I need to see what you’re doing. I’m pretty sure you want a slice of the data, but I can’t say or indicate the best way without seeing the code.

  • 1

    Augusto, everything worked out! I had put the wrong number. Thank you so much for your help and time, you served me and saved me too much!

Show 1 more comment

1 answer

3


You can determine a chunksize pro value

import pandas as pd

# tamanho da fatia
tamanho = 5000

for fatia in pd.read_csv('./arquivo.csv', chunksize = tamanho):
    # seu código aqui
  • I also think that’s what she wants, but without the MCVE I didn’t risk it. Take my +1 for the courage to code the blind.

  • 1

    @Augustovasques kkkkk, I went only because it was 'little' laborious. Thankful for the upvote!

  • 2

    That’s right, people! Actually this alternative of chunksize I did not know, I found it very interesting... but Augusto’s help was very useful! Thank you both and forgive me for not being so clear :)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.