Is there a way to open a direct SQL table in a data.table, without doing the SQL path > data.frame > data.table?

Asked

Viewed 480 times

8

I want to open a direct SQL table in a date.table. When I do a query with dbGetQuery, what I get is a data.frame. I know I can then turn that data.frame into a data.table easily. But I would like to skip this step -- which on certain occasions may not be computationally efficient.

Below a minimum reproducible code:

    library(sqldf)
    library(data.table)

    A <- data.frame(var1 = 1:10, var2=55:64)
    drv <- dbDriver("SQLite") 
    con <- dbConnect(drv, "basequalquer.db") 
    dbWriteTable(con, "TabelaA", A) 

On the command line below, I’m applying the command data.table on the outcome of dbGetQuery (that comes as data.frame). That’s exactly what I would NOT like to do...

    data.table(dbGetQuery(con, "Select * from TabelaA"))
        row_names var1 var2
     1:         1    1   55
     2:         2    2   56
     3:         3    3   57
     4:         4    4   58
     5:         5    5   59
     6:         6    6   60
     7:         7    7   61
     8:         8    8   62
     9:         9    9   63
    10:        10   10   64

Suggestions?

  • 1

    This Stack Overflow response can help: http://stackoverflow.com/a/20346697/3297472 I think using this function setDT Data.frame to data.table conversion is efficient, but I haven’t tested.

1 answer

7


Sniffing around the code descrobri that the secret is in the function fetch, which is an implementation made in the package RSQLite, for objects of the type SQLiteResult (class S4), which implements the generic method of the same name defined in the package DBI. In your code, the function dbGetQuery uses fetch. So first I’m going to reproduce your code by isolating the function fetch:

library(DBI)
library(RSQLite)

A <- data.frame(var1 = 1:10, var2=55:64)
drv <- dbDriver("SQLite") 
con <- dbConnect(drv, "basequalquer.db") 
dbWriteTable(con, "TabelaA", A) 
res <- dbSendQuery(con, "Select * from TabelaA")
fetch(res)

Looking at the source code package RSQLite, we can see that fetch calls the function sqliteFetch, copied below, which in turn calls a C algorithm, which does all the dirty work and returns NULL or a list.

sqliteFetch <- function(res, n = 0) {  
  check_valid(res)

  # Returns NULL, or a list
  rel <- .Call(rsqlite_query_fetch, res@Id, nrec = as.integer(n))
  if (is.null(rel)) return(data.frame())

  attr(rel, "row.names") <- .set_row_names(length(rel[[1]]))
  attr(rel, "class") <- "data.frame"
  rel
}

Note that the result rel becomes a data.frame after that. You could even try to turn this list into data.table, but that wouldn’t be much different from turning a data.frame in a data.table.

So, unless someone has already implemented an alternative solution, to solve your problem you would have to reimplement the C function to return an object of the type data.table (which is basically a data.frame with a pointer as an attribute). Also, you would need to change the generic definition of fetch, who expects as a result a data.frame (see here).

I personally don’t think it would be worth it, because the gain in efficiency wouldn’t be so significant, but the effort would be.

By the way, I suggest you consider using the fantastic package dplyr, which has functions for efficient database extraction, using "lazy evaluation" (basically, you can use the functions mutate, filter, select, etc. of the vocabulary of dplyr and the package magically transforms these operations into SQL before to download the data, which can result in a huge gain in data transfer). In addition, the package is meant to be intuitive and easy to use. See here.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.