How to split columns/data with a specific limit?

Question

How to split columns/data with a specific limit?

Asked 4 years, 6 months ago

Viewed 52 times

1

Hello, my friends!

Currently I do a scientific research in the field of AI, Machine Learning - Classification, using the Python language.

In view of this, I have some data to develop an algorithm, which will be used to train and test my machine. But there are more than 300,000 and, because I’m allocating a lot of data in a single variable, my development environment suffers memory error.

My strategy is to break up this Dataframe and set a limit, but I looked and I can’t find anything. The goal is to take this 300,000 data and limit it to 5,000 or 10,000. So I get a good amount to train my machine without suffering memory error.

Do you have any idea how I do it?

Use the parameter nrows to limit the number of lines to be read:read_csv(..., nrows=5000)

– Augusto Vasques

2021/01/26 at 20:07
Augusto, I found this very interesting. But is there any way I can set this limit within the variable? Instead of just reading? 'Cause I need you to cross that line that I set...

– Bianca Viana

2021/01/26 at 20:13
Partially? Do you want to pick up random lines? If this is the case you will have to put a minimum sample of data and a [mcve] because depending on the format of the table is a process.

– Augusto Vasques

2021/01/26 at 20:21
No kkkkkk was supposed to create another paragraph, but I ended up giving enter, I’m sorry... I edited, can reconsider the question, please?

– Bianca Viana

2021/01/26 at 20:24
I need to see what you’re doing. I’m pretty sure you want a slice of the data, but I can’t say or indicate the best way without seeing the code.

– Augusto Vasques

2021/01/26 at 20:34
1

Augusto, everything worked out! I had put the wrong number. Thank you so much for your help and time, you served me and saved me too much!

– Bianca Viana

2021/01/26 at 21:24

Show 1 more comment

1 answer

Browser other questions tagged python pandas machine-learning

You are not signed in. Login or sign up in order to post.

by lmonferrari • **3,550** points · Answer 1 · 2021-01-26T21:35:12+00:00

3

You can determine a chunksize pro value

import pandas as pd

# tamanho da fatia
tamanho = 5000

for fatia in pd.read_csv('./arquivo.csv', chunksize = tamanho):
    # seu código aqui

I also think that’s what she wants, but without the MCVE I didn’t risk it. Take my +1 for the courage to code the blind.

– Augusto Vasques

2021/01/26 at 22:18
1

@Augustovasques kkkkk, I went only because it was 'little' laborious. Thankful for the upvote!

– lmonferrari

2021/01/26 at 22:30
2

That’s right, people! Actually this alternative of chunksize I did not know, I found it very interesting... but Augusto’s help was very useful! Thank you both and forgive me for not being so clear :)

– Bianca Viana

2021/01/28 at 16:17