Why is Offloading for the GPU a good idea?

Question

Why is Offloading for the GPU a good idea?

Asked 8 years, 5 months ago

Viewed 138 times

6

I follow some forums on the Internet and I realize that recently there has been much talk of holding Offloading tasks for the GPU.

Mozilla has implemented a new engine for its browser called Servo. In his heart he does the Offloading of everything possible for the GPU.

As far as I can understand, the GPU is nothing more than an additional processor dedicated to graphics rendering.

How the GPU can assist in processing?
This Offloading usually occurs through libraries such as openMP. Where exactly these libraries help?
Is there any overhead to transfer these processing to the GPU? It pays to pay this overhead to process on GPU?

1

Interestingly the Servant is written in Rust :)

– Maniero

2017/02/23 at 19:59
Yes, I’m aware it’s written in Rust. One of the criticisms of Mozilla is this: they say that the servant is fast because of Rust when in fact it is because of Offloading

– jlHertel

2017/02/23 at 20:01
Surely you have these same marketings.

– Maniero

2017/02/23 at 20:07

1 answer

Browser other questions tagged c c++ performance video gpgpu

You are not signed in. Login or sign up in order to post.

by Kahler • **1,000** points · Answer 1 · 2017-02-24T01:47:36+00:00

Basically: The more processing power, the better. Simple as that. You don’t stop using the CPU to use the GPU, but uses both. A number of historical and market phenomena have contributed to an incredible advance in the processing power of Gpus and the demand for tasks for which Gpus have been designed.

As far as I can understand, the GPU is nothing more than an additional processor dedicated to graphics rendering.

No. Gpus were designed with graphics in mind, as the cost of processing the intricate mathematical functions associated with texture perspective and interpolation were very demanding for processors. Over time, these operations became simple in relation to the computational power of the Cpus, but the number of Cpus increased greatly (especially related to the increasing resolution of the games), so Gpus were approaching large parallel processing matrices (it is common in a game to have to apply the same operation to millions of screen pixels hundreds of times per second), and this massive parallel processing began to draw attention to other tasks, and some hacks began to be used to simulate physics problems (for example) as texture pixels, and companies realized this and embraced generic processing (with the introduction of programmable shaders as big milestone), and today the GPU is actually a bunch of cores that can be used for graphics.

How the GPU can assist in processing?

The GPU has hundreds of cores (The GTX 960 has 1024 processing cores) - far beyond any home PC. These nuclei are highly specialized for unconditional and serial operations, i.e., processing data without deviations from execution, especially when acting on large memory regions (when the parallelism of the nuclei can be better explored)which is the common scenario when manipulating large volume of media information.

This Offloading usually occurs through libraries such as openMP. Where exactly these libraries help?

Although Gpus have great computational power, they are still CPU-controlled peripherals. These libraries provide communication routines to send commands and data to computational peripherals (among them, the GPU). They link the program with the drivers to expose the functions required for generic processing as well as the graphic libraries expose the functionalities of Opengl, for example.

Is there any overhead to transfer these processing to the GPU? It pays to pay this overhead to process on GPU?

Yes and yes (in well-designed programs). All communication between peripherals is considered overhead, including communication between cores of the same CPU has overhead. Data needs to be formatted in type and protocol and proceed to transfer. Recent advances in bus and RAM technologies have allowed to decrease the associated latency, enabling real-time experience, and also allowed to increase the amount of data transmitted, causing the preparation overhead to be offset by the increased perceived result (the technical terms associated with delay of preparation and amount of result per time are Latency and throughput).

An illustrative example (with enough freedom):

Let’s say you have an array of 1024 numbers, and you want to perform a series of operations on them, like adding a constant.

With an 8-core CPU, each core is responsible for processing 128 numbers, while in the GPU with its 1024 cores, each core is responsible for only one number. Assuming the GPU acts at half frequency and each operation takes 1 clock, it is a 64-fold advantage in processing speed.

The data has to be transmitted to the GPU, so maybe the speed doesn’t pay off in such a simple operation, but let’s say you want to add a number, multiply, compare with another array, cosine... a series of commands. Once the data is in the GPU, the CPU only needs to send the necessary commands (as in the case of shaders used in games), and is free for other tasks that the GPU does not specialize in doing, such as checking user inputs, communicating with other peripherals, dealing with other programs (etc) and then ordering the result.