Ramdrive with video card ram

Question

Ramdrive with video card ram

Asked 10 years, 8 months ago

Viewed 87 times

9

I use the R language for heavy matrix calculations. I’m using gpu for performance gain, which is fantastic indeed.

However, I would like to take another step and dump the 2gb data matrix directly into the video card ram

Or maybe, create a Ramdrive with the ram of the graphics card that is ddr5

That would be possible?

If it was just Ramdrive, I don’t know if it would be exactly a scheduling question, but when it comes to dumping the data matrix there, the question gets really interesting.

– Bacco

2015/04/02 at 23:52
The R already rotates and loads in the ram, with certain packets it would give to extend to use the disk to paginate. The load of a matrix of these dimensions takes horrors, because it searches the disk and plays in a variable.

– Araponga Brots

2015/04/03 at 01:22
The intention here would be to play in a variable, but the ram memory would be that of the video card. This is because the other matrix inversion commands, etc will be with the gpu

– Araponga Brots

2015/04/03 at 01:23
The advantage here is not to use the ram of the pc and send directly to gpu, missing only the commands of execution

– Araponga Brots

2015/04/03 at 01:24
this resource does not need to exist in R, if it exists in C I can call the code from R

– Araponga Brots

2015/04/03 at 01:25
1

"this feature does not need to exist in R, if it exists in C I can call the code from R" It may be interesting to edit the question and put C tag, will greatly increase the number of views of the question.

– Molx

2015/04/03 at 01:41

Show 1 more comment

1 answer

Browser other questions tagged r performance

You are not signed in. Login or sign up in order to post.

by Artur_Indio • **1,093** points · Answer 1 · 2015-07-18T00:27:03+00:00

Yes there is this possibility in R, Alis parallel computing comes from the beginnings in R, just take a look at the section of High-Performance and Parallel Computing with R in the CRAN, such the link to expedite.

Yes, you can write in C and call in R if you prefer, like this example:

#include 
#include <cufft.h>
/* This function is written for R to compute 1D FFT.
   n - [IN] the number of complex we want to compute
   inverse - [IN] set to 1 if use inverse mode
   h_idata_re - [IN] input data from host (R, real part)
   h_idata_im - [IN] input data from host (R, imaginary part)
   h_odata_re - [OUT] results (real) allocated by caller
   h_odata_im - [OUT] results (imaginary) allocated by caller
*/
extern "C"
void cufft(int *n, int *inverse, double *h_idata_re,
           double *h_idata_im, double *h_odata_re, double *h_odata_im)
{
  cufftHandle plan;
  cufftDoubleComplex *d_data, *h_data;
  cudaMalloc((void**)&d_data, sizeof(cufftDoubleComplex)*(*n));
  h_data = (cufftDoubleComplex *) malloc(sizeof(cufftDoubleComplex) * (*n));

  // Convert data to cufftDoubleComplex type
  for(int i=0; i< *n; i++) {
    h_data[i].x = h_idata_re[i];
    h_data[i].y = h_idata_im[i];
  }

  cudaMemcpy(d_data, h_data, sizeof(cufftDoubleComplex) * (*n), 
             cudaMemcpyHostToDevice);
  // Use the CUFFT plan to transform the signal in place.
  cufftPlan1d(&plan, *n, CUFFT_Z2Z, 1);
  if (!*inverse ) {
    cufftExecZ2Z(plan, d_data, d_data, CUFFT_FORWARD);
  } else {
    cufftExecZ2Z(plan, d_data, d_data, CUFFT_INVERSE);
  }

  cudaMemcpy(h_data, d_data, sizeof(cufftDoubleComplex) * (*n), 
  cudaMemcpyDeviceToHost);
  // split cufftDoubleComplex to double array
  for(int i=0; i<*n; i++) {
    h_odata_re[i] = h_data[i].x;
    h_odata_im[i] = h_data[i].y;
  }

  // Destroy the CUFFT plan and free memory.
  cufftDestroy(plan);
  cudaFree(d_data);
  free(h_data);
}

After wrapping in R:

cufft1D <- function(x, inverse=FALSE)
{
  if(!is.loaded("cufft")) {
    dyn.load("cufft.so")
  }
  n <- length(x)
  rst <- .C("cufft",
  as.integer(n),
  as.integer(inverse),
  as.double(Re(z)),
  as.double(Im(z)),
  re=double(length=n),
  im=double(length=n))
  rst <- complex(real = rst[["re"]], imaginary = rst[["im"]])
  return(rst)
}

It’s not that simple, there are some settings and some questions related to libraries, but this up there is just to get an idea. Here on this link has a nice tutorial and tb is the source from where I got the functions.

p.s.: a 2 GB matrix is not so big and heavy if you use the right shapes in your algorithm.