What differs an FPGA to a CPU?

Asked

Viewed 515 times

6

I was looking at some publications on the BLAKE cryptographic algorithm, which was one of the finalists in the SHA-3 competition, whose winner was Keccak.

Finally, in a specific excerpt from the book "The Hash Function BLAKE", right at the beginning, says:

Keccak offers acceptable performance in software, and Excellent performance in hardware.

Source 1 Source 2 (original by NIST)

NIST could just as easily have stated that BLAKE offers Excellent performance in software and acceptable performance in hardware; Nowhere Did NIST Suggest that hardware is more Important than software

Source

My question was to research how something can be fast in hardware and slow in software and especially the other way around. All the places I found talk about such a FPGA (and also ASIC), this is also present in the NIST text, in the SHA-3 competition:

3.2 Performance

NIST was Fortunate to have a Great Depth of performance data on the five finalists that could also be Compared with the performance data of the SHA-2 Algorithms. This data included software implementations on Many Different Kinds of Central Processing Units (Cpus), and hardware implementations in Both Field Programmable Gate Arrays (Fpgas) and Application Specific Integrated Circuits (Asics). All this data made simple comparisons very Difficult; Most Algorithms excelled on some Platforms and lagged on others. However, a few Patterns emerged from the performance data, which affected NIST’s Decision


We know then that there are Cpus, Fpgas and Asics. This has also been mentioned in other answers, that only thing I found in the O.R., in "An attacker with GPU or FPGA may want to do this, but will have difficulty.".


What is the difference between running on a CPU and running on an FPGA? How is it possible something be faster in software, CPU, than an FPGA? What would be the difficulties of an FPGA being as fast as the CPU?

  • 1

    I think which is out of scope. I would like it to be accepted and interested in an answer, including my positive vote.

  • I’m not very familiar with encryption algorithms, but I think it’s possible that an implementation is faster in software when the calculation is essentially sequential. FPGA works naturally with parallelism, while the CPU is already designed to receive sequential commands.

1 answer

3

The main difference between a CPU (Central Processing Unity) and a FPGA (Field Programmable Gate Array) is the flexibility in construction.

A common processor (CPU) processes instructions at the level of the operating system, and its assembly is started in understanding the Machine Code universal from the particular processor architecture. As an FPGA, it is not necessary to have a universal instruction interpreter, it can be dynamically shaped for a specific purpose, thus processing only an algorithm, type of instruction, etc.

When we quote in processing hash by software, remember that the instructions written in the given algorithm will always go through the CPU, however, will consider the speed response time of the same, operating system, and all other factors that also use the CPU for other routine operations.

Unlike software, hardware processing dedicated to that algorithm, in case a FPGA in writing, may perform better than CPU, since the FPGA is dedicated to only that kind of processing, it will only wait for one type of input, and only one type of output will come out of it.

An FPGA chip can be consistent with multiple processing blocks (cores), and are configured and manipulated by the user. Therefore, it can be manipulated to perform uniform operations in a short period of time. Most of the time, a well-configured FPGA is always faster than a conventional processor.

In cryptography, the dedicated use of a given algorithm within an FPGA can rather increase its performance exponentially (when well configured, also considering the quality of the chip), because the chip will be properly configured to process the instructions of that algorithm. Algorithms for FPGA are written in VHDL.

Here is an AES implementation written in VHDL. You can take this code and set it up on your own FPGA.

tl;dr

By summary, the FPGA is an instruction processor dedicated to one (or more) algorithms, written by the user himself. And because of this, it will process information faster than a conventional CPU, such as hashing of information, for example.

FPGA are moldable and flexible. CPU is universal, will treat any instruction of Machine Code, and therefore, by aggregating everything, will have a lower performance compared to FPGA, when comparing an algorithm dedicated to both.

What’s the difference between running on a CPU and running on an FPGA?

The same explained above.

How can something be faster in software, CPU, than in an FPGA?

A bad implementation may be slower than a CPU in certain cases. In this, it is good to consider:

  • the processor you are comparing (speed, quality, cache size);
  • the quality of the FGPU chip and instruction size of it;
  • how the algorithm was written by software (CPU);
  • how the algorithm was written by hardware (FGPU).

In relation to the above, it is a matter of environment and development. It’s like asking how an algorithm written in the X language can be better than in the Y language.

What would be the difficulties of an FPGA being as fast as the CPU?

FPGA and CPU have different purposes from each other. The CPU aims to be something central, where all instructions are passed by it. The FPGA aims to be limited to only one purpose, written by the user. And this also fits into the issue of the use environment, as well as the reason for its use.


More information and benchmarks:

  • 1

    Then it goes from FPGA to ASIC and has even more gains in performance :D

  • I didn’t get to research about ASIC, @Andersoncarloswoss. I’m going to read about :)

  • 1

    The FPGA is, by definition, (re)programmable and this causes it to impair the response time given to its architecture. Routing is done from VHDL synthesizing and can be optimized, while ASIC is a static circuit intended for that purpose. When it is very critical, the natural process is to use the FPGA at development time, because it is reprogrammable and when it achieves satisfactory results, the circuit mask is generated to produce an ASIC. The processors themselves today are produced like this (Intel at least).

  • It is good to use both then for certain development. Since ASIC is involable, one can use the FPGA to program and test VHDL scripts in the FPGA before burning directly into ASIC. And then, after you compile the code within the ASIC, you have an algorithm running on a super fast, dedicated chip. ;D

  • 1

    That, exactly.

  • I gave the +1, but there is still "something" that makes the "FPGA be slow". BLAKE, unlike Keccak, is based on ARX (Addition/Rotation/XOR) and what I found closer to responding is "The time needed for a modular addition of the natural word size of a general-purpose computer is often as fast as bitwise Logic Operations. In hardware, this is not the case: word-length bitwise Logic is Faster than addition with its carry Propagation delays. Therefore, the Fastest hash Function on a general-purpose computer is not necessarily the Fastest hash Function in hardware.".

  • The comparison made to determine "slow" or "fast" was to compare the "throughput/area" and the "Energy-Consumption-per-message-bit". One of the things is that, the construction of the Keccak can be Paralelized, which benefits the FPGA, already the BLAKE can not. What I haven’t yet clearly found is: "what" benefits an FPGA and "what" benefits a CPU. I think an FPGA will be faster than a CPU, comparing a single algorithm. But, comparing several (such as BLAKE and Keccak), one of them has better performance in FPGA (Keccak) and another (BLAKE) has better performance in CPU, the problem is: why this occurs?!

  • @Inkeliz this might help you clear your doubt. I am not going to try to explain what I have understood because it is very superficial, but I hope you have what you are looking for: https://crypto.stackexchange.com/questions/31674/what-advantages-does-keccak-sha-3-have-over-blake2

  • @Inkeliz seems to be what I commented there in 2017. What benefits the use of the FPGA is precisely the parallelism of operations. It does not work as well for sequential operations as there is a very large hardware cost to ensure the correct sequence.

  • 1

    In the master’s degree I developed a fiber optic communication modem with FPGA and a part of the receiver consisted of calculating the Fourier transform of the input signal. I implemented the sequential algorithm for this and had a huge performance drop, as I needed to design the routes and manage the propagation time of each signal within the FPGA to ensure the sequence of operations. This results in an "abusive" use of flip-flops and multiplexers that greatly increase the final response time, precisely by the propagation and response time of these components.

  • @Inkeliz a probable cause for a certain algorithm to behave better on a CPU than an FPGA is to consider that both are totally different architectures, and an algorithm may have been written specially to that processor architecture. Soon, it will be optimized to that environment, and not to the other.

Show 6 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.