Join two df using the sqldf function

Asked

Viewed 110 times

1

I’m trying to put two tables together using the same species id, but I’m not getting it.

Below follow the steps:

Importing the fixed file to TNRS for R

hp<-fread('tnrs_final.txt', header=T)

Now linking the tables using the unique id of the species:

library(sqldf) #pacote 

juncao_dfs<-sqldf('select * from df_total left join hp on df_total.id_species=hp.user_id') #df_total é o outro df que quero juntar com hp.

The following message appears on the console:

Error: cannot allocate vector of size 9.9 Mb

How do I get around this? There are other packages or functions that do this?

  • 1

    This error happens when your program needs more RAM than is available on your computer. It might be just that, and then you’ll need to work off-mememory using some database like Sqlite (has the Rsqlite package). It may be an error... left Join can duplicate many lines if Dios data frames contain many lines with the same userid.

  • That may be so, but the size of the vector that cannot be allocated is very small. @Bruno Umbelino, can’t even provide a part of the tables? dput(hp) and dput(df_total) can help us understand the problem.

  • If they’re too big, you can send something like dput(head(hp, 100))

  • 2

    @Tomásbarcellos the size he speaks is random. It is not the actual size, it is the size of the last allocation that gave error.

  • Of course, but if it was a mistake to allocate a vector of 10 Mb, it is because it was at the RAM usage limit. In my experience this case is most common when there is some semantic error in the code. But of course it can appear in really borderline situations. Hence the dput() would help :)

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.