Select columns from a base without having to read the whole file

Asked

Viewed 64 times

2

I have a file in . csv VERY large that I can’t read in R. I tried so:

dados19 <- read.csv("microdados_censo_superior_2019/dados/SUP_ALUNO_2019.CSV", sep="|", dec=".")

It is the census of higher education 2019 can be found at https://www.gov.br/inep/pt-br/acesso-a-informacao/dados-abertos/microdados/censo-da-educacao-superior

However, I don’t need the whole file, only a few columns (the columns I need are: CO_CURSO, TP_SEXO, TP_COR_RACA and IN_MATRICULA). Would it be possible to upload/select only the columns I want? Without having to read the whole file?

2 answers

2

Complementing the reply from @lmonferrari. As the microdata files are very large, I will use the example mtcars set:

write.csv(mtcars[1:3,], "exemplo.csv")

data table.

The function fread package data table. has the option select to specify the columns desired by names or numbers.

library(data.table)

> fread("exemplo.csv", select = c("disp", "wt", "gear"))
   disp    wt gear
1:  160 2.620    4
2:  160 2.875    4
3:  108 2.320    4

Is faster than read.table and does a good job automatically determining separator and decimal symbol (but can be specified manually).

sqldf

Matter and deal data frames. as a database, with support for SQL statements. In case of function to read files, this can be used not only to select columns but also rows by condition.

library(sqldf)

> read.csv.sql("exemplo.csv", sql = "select disp,wt,gear from file where carb > 2")
  disp    wt gear
1  160 2.620    4
2  160 2.875    4
  • Thank you so much! I will look at all these functions.

1


dados19 <- read.csv('./SUP_ALUNO_2019.CSV', sep = '|', dec = '.',
               colClasses = c('NULL','NULL','NULL','NULL','integer',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','integer',
                              'integer','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','NULL',
                              'NULL','NULL','NULL','NULL','integer',
                              'NULL','NULL','NULL','NULL','NULL'))

By placing NULL in colClasses you do not load the column(skips the column). Each value of this within the array indicates a variable, note that the variables that are maintained are integer.

  • It worked super well! Thank you so much! I’m a beginner and I’ve been trying for a long time.

  • @Alexsandragomes, good morning! For nothing, I’m glad it worked, as you are beginner always post your questions on the site that people try to help. Hug!

  • lmonferrari, went to change the variable 11 (CO_CINE_ROTULO) by putting 'integer' in it and shows the following error: Error in scan(file = file, what = what, Sep = Sep, quote = quote, Dec = Dec, : scan() expected 'an integer', got '0421D01'. Do you have any idea how to solve?

  • @Alexsandragomes, good afternoon! This column consists of numbers and characters, you must pass the column data type as character. Hug!

  • @Alexsandragomes, good night! You typed NUL where the right is NULL, That’s why you’re making this mistake. Hug!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.