How do I lower a csv file of 3 GB, to be able to import in pandas in google colab?

Asked

Viewed 86 times

0

In the colab a mistake:

"Session failed after using all available RAM. If you have interest in accessing execution environments with more RAM, check out Colab Pro."

I wanted to decrease the csv file so as not to give this problem while importing, what do I do? It is very large 3 GB, due to this the RAM available on google Colab is not enough to import it. The file is on https://download.inep.gov.br/microdados/microdados_enem_2019.zip , and only download if needed. This is the way to download on the drive to get the file I’m referring to /content/drive/Mydrive/Microdata_enem_2019/DADOS/MICRODADOS_ENEM_2019.csv. If you could help me narrow down the csv file to get it into google colab, I would really appreciate it, and simply and just help me decrease the amount of line in the csv file, in this 5-mile file, it would be easier if I had half a million or less, to be able to import into colab.

from google.colab import drive
drive.mount('/content/drive')

import pandas as pd

#aqui que dá o erro, ao importar o arquivo csv, eu só queria diminuir ele para conseguir importar
microdados = pd.read_csv('/content/drive/MyDrive/Microdados_Enem_2019/DADOS/MICRODADOS_ENEM_2019.csv',sep=";", encoding="ISO-8859-1")
microdados.head()
  • 1

    The message ends with confira o Colab Pro... It means you have to pay for a service to have more memory available. My suggestion if you don’t have the budget for this is install Jupyter on your machine. I believe that splitting the file is not the solution, because you will have to load all the same, right?

  • Did you try to open the file with a text editor, delete a few lines and save? I don’t understand what your question has to do with programming.

  • If you seek the documentation of Pandas will also see that the function read_csv can receive a video nrows which specifies how many lines of the file you want to read.

  • that I wanted @fernandosavio , thank you very much! did not know this parameter of the read_csv "nrows", thank you very much!

1 answer

0


I suggest you download an application that aims to split a giant CSV file into small ones pieces, that is, in several smaller CSV files that will have a number of lines defined by you.

I suggest the app Split CSV File. With this application, you tell how many lines you want in each smaller file. In your case, just parameterize the field "Line Count" with the amount of lines and click the button "Split File".

Aplicativo simples Split CSV File

It is a simple application, just run, no need to install, already used several times. If you want to see other app options, see this site on how to split CSV file

Particular case

To solve your particular case, I downloaded your file and divided it into 100,000-line files. As you said you would like to work with 500,000 lines or less, I am providing a zip file with 5 Csvs of 100,000 lines, just click this link Microdados_enem_2019_500_mil_lines.zip

  • Wow, thank you so much!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.