Ramdrive with video card ram

Asked

Viewed 87 times

9

I use the R language for heavy matrix calculations. I’m using gpu for performance gain, which is fantastic indeed.

However, I would like to take another step and dump the 2gb data matrix directly into the video card ram

Or maybe, create a Ramdrive with the ram of the graphics card that is ddr5

That would be possible?

  • If it was just Ramdrive, I don’t know if it would be exactly a scheduling question, but when it comes to dumping the data matrix there, the question gets really interesting.

  • The R already rotates and loads in the ram, with certain packets it would give to extend to use the disk to paginate. The load of a matrix of these dimensions takes horrors, because it searches the disk and plays in a variable.

  • The intention here would be to play in a variable, but the ram memory would be that of the video card. This is because the other matrix inversion commands, etc will be with the gpu

  • The advantage here is not to use the ram of the pc and send directly to gpu, missing only the commands of execution

  • this resource does not need to exist in R, if it exists in C I can call the code from R

  • 1

    "this feature does not need to exist in R, if it exists in C I can call the code from R" It may be interesting to edit the question and put C tag, will greatly increase the number of views of the question.

Show 1 more comment

1 answer

2

Yes there is this possibility in R, Alis parallel computing comes from the beginnings in R, just take a look at the section of High-Performance and Parallel Computing with R in the CRAN, such the link to expedite.

Yes, you can write in C and call in R if you prefer, like this example:

#include 
#include <cufft.h>
/* This function is written for R to compute 1D FFT.
   n - [IN] the number of complex we want to compute
   inverse - [IN] set to 1 if use inverse mode
   h_idata_re - [IN] input data from host (R, real part)
   h_idata_im - [IN] input data from host (R, imaginary part)
   h_odata_re - [OUT] results (real) allocated by caller
   h_odata_im - [OUT] results (imaginary) allocated by caller
*/
extern "C"
void cufft(int *n, int *inverse, double *h_idata_re,
           double *h_idata_im, double *h_odata_re, double *h_odata_im)
{
  cufftHandle plan;
  cufftDoubleComplex *d_data, *h_data;
  cudaMalloc((void**)&d_data, sizeof(cufftDoubleComplex)*(*n));
  h_data = (cufftDoubleComplex *) malloc(sizeof(cufftDoubleComplex) * (*n));

  // Convert data to cufftDoubleComplex type
  for(int i=0; i< *n; i++) {
    h_data[i].x = h_idata_re[i];
    h_data[i].y = h_idata_im[i];
  }

  cudaMemcpy(d_data, h_data, sizeof(cufftDoubleComplex) * (*n), 
             cudaMemcpyHostToDevice);
  // Use the CUFFT plan to transform the signal in place.
  cufftPlan1d(&plan, *n, CUFFT_Z2Z, 1);
  if (!*inverse ) {
    cufftExecZ2Z(plan, d_data, d_data, CUFFT_FORWARD);
  } else {
    cufftExecZ2Z(plan, d_data, d_data, CUFFT_INVERSE);
  }

  cudaMemcpy(h_data, d_data, sizeof(cufftDoubleComplex) * (*n), 
  cudaMemcpyDeviceToHost);
  // split cufftDoubleComplex to double array
  for(int i=0; i<*n; i++) {
    h_odata_re[i] = h_data[i].x;
    h_odata_im[i] = h_data[i].y;
  }

  // Destroy the CUFFT plan and free memory.
  cufftDestroy(plan);
  cudaFree(d_data);
  free(h_data);
}

After wrapping in R:

cufft1D <- function(x, inverse=FALSE)
{
  if(!is.loaded("cufft")) {
    dyn.load("cufft.so")
  }
  n <- length(x)
  rst <- .C("cufft",
  as.integer(n),
  as.integer(inverse),
  as.double(Re(z)),
  as.double(Im(z)),
  re=double(length=n),
  im=double(length=n))
  rst <- complex(real = rst[["re"]], imaginary = rst[["im"]])
  return(rst)
}

It’s not that simple, there are some settings and some questions related to libraries, but this up there is just to get an idea. Here on this link has a nice tutorial and tb is the source from where I got the functions.

p.s.: a 2 GB matrix is not so big and heavy if you use the right shapes in your algorithm.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.