Recursion in CUDA and Opencl

Asked

Viewed 256 times

10

It is possible to work with recursiveness in parallel when working with unique technologies for parelelism (Multithread) using the graphics card as an instrument CUDA and OpenCL? if yes how is done the synchronization and the passage of values, if not how to simulate this recursiveness since many operations in estruturas de dados as Árvores of all types use recursion to obtain the data.

Cuda is an extension to the C/C programming language++.

Opencl is an architecture for writing programs that run on heterogeneous platforms, including a language (based on C99) for writing kernels (functions run on Opencl devices)

  • 3

    Why do you downvote?

  • Unfortunately this is not my strong suit and I don’t know exactly what you mean by "parelelism" and "recursiveness". However seems to me something lower level, it would be C/C++?

  • @Guilhermenascimento Yes, parallelism would be multithreaded.

  • Sometimes the terms in Portuguese may sound confusing to people, the question seems good, but I could revise these details, try not to change much beyond this. : ) +1 for explaining to me the terms.

2 answers

4


SIMD (single Instruction source, Multiple data sources) technologies usually do not have asynchronous task capability, as in the case of recursive functions.

Typical processors have a reasonable amount of working memory (registers and CACHE) for a small amount of Cpus (1, 2, 4 or 8), and these Cpus act independently, running separate instructions simultaneously (MIMD - Multiple Instruction sources, Multiple data sources). Typical processors can mount high Stacks of executed instruction data, allowing multiple recursive iterations of the same function. For this, they need complex and intelligent instruction pointers to store multiple return points.

SIMD co-processors, such as Gpgpus, give up such complexities and sophistications in favor of the amount of logic and arithmetic units. Imagine a device that allowed a single person to press the same operation button on several calculators, even if you could enter different values on each one. The same operation is performed simultaneously in several values. The cost of such acceleration is the sophistication of the instruction reading and interpreting system, which lacks the ability to manage several return pointers required for the use of recursion.

However, it is possible to work in parallel in Gpgpus in recursive functions, in case the terminal identification (IF clause) and other operations of the recursive function are executed in CPU, being internal calculations of each function iteration performed in GPGPU, completely alienated from the origin of the main programme instructions. As an example, the pseudo-code below.

function calcularDados( var dados, var localidade[])    //GPGPU
    return calcular(localidade, dados)                  //GPGPU

function souRecursiva(var dados)            //CPU
    if dados != condição_final then         //CPU
        GPGPU <<dados, calcularDados>> dadosResposta    //GPGPU
        return souRecursiva(dadosResposta)  //CPU
    else                                    //CPU
        return dados                        //CPU

-1

CUDA in its most modern versions accepts recursion in a very limited way. Basically, a kernel marked as __global__ can call another kernel __global__, which will run in another block, and wait until it ends, but with limit of more or less 24 recursion levels. The memory is coherent only at adjacent levels, but this can be circumvented by passing the desired part as parameter. Here a slide from NVIDIA about it. Note that depending on the problem being solved, recursion can be replaced by a loop and rules on how to move from one element to another.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.