How to import a large . txt file with data.frames and columns?

Asked

Viewed 185 times

1

Hello, all right ?

Precise import a large db that is in .txt and divided into 20 parts of 5Gb each (approximately).

Within that base there are three data.frame, with 3 different column quantities.

As columns are separated by fixed spaces.

I’m trying to use data.table() and read_fwf(), but I’m having trouble separating the columns due to the three different frames..

The "RECORD TYPE" column identifies the data.frames.

The basis is this (the layout is here too):

http://receita.economia.gov.br/orientacao/tributaria/cadastros/cadastro-nacional-de-pessoas-juridicas-cnpj/dados-publicos-cnpj

Anybody got any tips? From now on, thank you very much !


ps¹: I tried to use: devtools::install_github("georgevbsantiago/qsacnpj") with

    qsacnpj::gerar_bd_cnpj(path_arquivos_txt = "C:/Users/Downloads/",
                           localizar_cnpj = "NAO",
                           n_lines = 10000,
                           armazenar = "csv")

But it takes too long ! When I tried n_lines = 100000, the computer locked in the sixth file.

ps²: my computer does not have as much memory.

  • Are you sure it’s not a RAM limitation available?

  • About the delay, I think that’s just it ! I would like tips to dribble this problem. That’s why I tried data.table(), but I can’t handle it.

  • @Tomásbarcellos when I say that there are three data.frame on that basis, they are "mixed". Hence the difficulty in separating.

1 answer

2

If your computer doesn’t have enough RAM to load this data, I would load it into a SGDB like the Postgresql (even if fwf) and pull the data from the R using some package to interact with Postgresql. This way, you have the data loaded on your machine and can pull subsets such data to be analysed with the R.

At worst, you can do your analysis in subsets these data and make a Summary at the end to consolidate to the dataset in full. Also, of course, you can do various operations directly in the database data through SQL.

  • I’ll try to do that ! Thank you !

Browser other questions tagged

You are not signed in. Login or sign up in order to post.