How to improve the performance of my code with "for"?

Question

How to improve the performance of my code with "for"?

Asked 7 years, 6 months ago

Viewed 750 times

9

I have the following code:

for ($i=0; $i < 10; $i++) { 
    for ($j=0; $j < 20; $j++) { 
        for ($p=0; $p < 40; $p++) { 
            echo $vaar[$i][$j][$p];
        }
    }
}

I believe a code that contains a for inside another is an inefficient algorithm, my question is there is a way to go through my variable $vaar, without using these 3 for? in several systems I ended up opting for this type of code writing because I did not know another way to optimize my script.

Depends on what you want, if you want to pick up a specific item could simplify a lot or if you want to display all the same you have nothing to do, the performance is always relative, usually if you have no operation within each loop, except display will be fast, even with 3 Fors.

– Guilherme Nascimento

2018/01/10 at 20:06
2

If your variable is three-dimensional and you want to print out all the values, that’s the way it is. But the loops are short, you’re having some performance problem?

– bfavaretto

2018/01/10 at 20:06
actually need to supplement my question then, because I just exemplified in micro scale, but in my real problem, are 4 Fors and each does something with the values of the vectors

– Julio Henrique

2018/01/10 at 20:09
Well, the performance of this will depend on the size of the vector and the operations you do with each item. But if you need to access all the items, you need to nest the for even, there is no other way.

– bfavaretto

2018/01/10 at 20:11

2 answers

14

In this example there is not much to do. You want to show 8000 items, no matter what you do you will have to print 8000 items and that is high cost.

Maybe in a different structure, which you don’t have, you could have a minimum gain if you could access it flat, but it wouldn’t even make up for it. And PHP is not the language for that. The gain would be marginal. That’s a linear complexity O(n), no matter what I do.

The complexity seems to be worse because of the apparent multiplication, but in fact the structure has all of the array whole, dimensions are just partitions of the whole, complexity has to be measured by the whole.

If I had to take an item out of these 8000 items, I would have enough to do, and O(1). But it is possible that you would need to use a different structure, but it is not something that PHP is good at. Depending on the requirement and the guarantees of the structure, if O(1), it could give logarithmic complexity O(log n).

O(1) and O(log n) are great for large volumes (for small may be even worse in practice), but many algorithms can only be done in O(n) even. Anything that requires access to the whole will have at least linear complexity, no matter what it does. Unless, of course, the data can be obtained in a procedural way, but then you are not accessing data from a structure but calculating the result based on a formula. Even if this were possible in the question algorithm it would still be linear.

If you have 1000 x 1000 x 1000 x 1000 items, you have 1 trillion items to evaluate in algorithm you need all of them, have nothing to do, computer and math do not miracle.

If you have a quadratic algorithm O(n²) It is usually bad, but nothing is not manageable until a certain amount of items (thousands or even millions). It will be bad, but it can work. Have many studies to avoid getting here because "many" problems seem to have this need.

Begins to be a problem in exponential complexity O(2ⁿ) where only 100 items will produce an algorithm that needs to do 1,267,650,600,228,229,401,496,703,205,376 (more than 1 nonbillion) operations. This is brutal. To give a parameter a processor does less than 1 billion very simple operations per second. In anything realistic a computer would take years or centuries to process it. You couldn’t think of the number with 1000 items.

Only the factorial complexity O(n!) is worse than this, where in each step increases the distance of the step. Then whatever starts quieter, quickly becomes unviable. 100 would 933215443944152681699238856266700490715968264381628592963895217599993229915608941463976156518625369792082722375825118521091686400000000000000000000. It’s near a googol.

Taking away the complexities that must be avoided in every possible way you can see better the differences between what really happens.

See Big O where I took the images.

He probably believes that nested ties are a problem because he believes in "good practices", that is, he thinks that there are rules that always follow solve all problems. In fact everything depends on context.

You can even slightly improve performance by making the structure flat (one dimension) because it would have a loop. This will not decrease the complexity of the algorithm, but will bring a minimum gain. Optimizations should not give minimal gains, except in many specific cases. The gains from reducing complexity are much more important, and in this case there is no way to reduce complexity, not even accepting some other disadvantage, unless there is something in the data set that allows some specific optimization, which is very rare to occur.

Browser other questions tagged algorithm loop performance optimizing

You are not signed in. Login or sign up in order to post.

by Jefferson Quesado • **22,370** points · Answer 1 · 2018-01-11T02:42:23+00:00

Deepening the response of @Maniero, in the matter of complexity.

Asymptotic complexity is a lot, but that’s not all. Asymptotic complexity depends on variables, it is annotated according to the input. So this value can be "fake" depending on how you take this variable.

Let’s take the following loop, which reads the properties of a cube:

elementos = d*d*d;
for (i = 0; i < elementos; i++) {
  x = i % d;
  y = (i/d)%d;
  z = (i/d)/d;
  imprime(cubo[x][y][z]);
}

So your algorithm runs in linear time, right? Well, linear wheel over variable elementos, but as you receive a cube, it is much more likely that your concern is with d^3.

Here, an algorithm that is quadratic in the amount of elements (such as a simpler sorting algorithm) would make the life of the program running through the data cube much more complicated.

What happened here was that complexity was "falsified" to resemble being linear. Some people end up doing similar things to reduce the "apparent complexity" of the operation.

In this case, @Maniero demonstrated in the answer that for every element you travel, you do o(1) operations. This means that in the set of size elements n is a linear algorithm. However, analyzing from the perspective of data stored in an information cube, the size n is o(d^3).

To get an idea of how looking at the distinct variable affects complexity, look at this example:

add two numbers (arbitrary basis), N and M, each with n digits

The sum has complexity o(n), but also o(log N). ( Source: https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations )

Since here the number of digits to represent a number is the logarithm of that number in the basis being represented, this must be taken into account. It becomes much more sincere and visible you say that the cost of the sum is linear in the number of digits.

You can make considerable gains using the spatial locality. I don’t know how your matrix was implemented, but if it was implemented as a Jagged array (an array, which in turn each element contains a vector, the latter containing each element a number vector, each with its individual size), is expected not to be contiguous. If they are not contiguous, then they may be scattered in random memory, and this is not good.

Why is it not good? Because you will have a lesser guarantee of the use of the spatial location of your variables. As an optimization of the processor you use caches on several levels, usually are charged to the cache the last variables used (temporal locality) and their neighbors (spatial locality), since it is very likely to use a nearby memory region. And what possibly I don’t have with Jagged arrays? Nearby memory regions between data.

To try to have a significant gain on that, we could iterate on the matrix as follows:

for ($i=0; $i < 10; $i++) { 
    for ($j=0; $j < 20; $j++) { 
        for ($p=0; $p < 40; $p++) { 
            echo $vaar[$i*20*40 + $j*40 + $p];
        }
    }
}

Or else:

for ($i=0; $i < 10*20*50; $i++) { 
     echo $vaar[$i];
}

In this case here, we linearize the matrix. Asymptotically, this iteration remains cubic, as it is impossible to go through all the elements of a cube faster. However, we are now dealing with consecutive data, data that will be contiguous in memory.

How the data is contiguous by clicking on cache L1 the vector at position, say, 10, the processor can put right into the cache the values of the vector between 5 and 14 (inclusive). Thus, the number of accesses to the main memory decreases. With less access to the main memory, we run faster. Not to mention that the amount of dereferencing is lower.

In a Jagged array, each look at the matrix index is a dereferencing, and dereferencing is to load the variable into memory and ask the processor to load the region pointed by it.

For example, to do $vaar[$i][$j][$p]:

take the address indicated by $vaar
(load variable in memory)
take the value $i
(load variable in memory)
add $i houses to $vaar
load the memory value obtained in the sum
(address de-referencing)
take the value $j
(load variable in memory)
add $j boxes to the previously loaded memory variable
load the value in memory by the address obtained by this new sum
(address de-referencing)
take $p
(load variable in memory)
add $p houses to memory loaded variable
load the memory value obtained by the address obtained in that new sum
(address de-referencing)

Since the variables are in the same context, it is very likely that, by temporal location, carrying these variables does not involve accessing the main memory. Even so, for each $i or j different, it will be necessary to load the new vector vector that is being referenced. It is likely that they are scattered in memory.

Buy with the simplest iteration of matrix linearization: $vaar[$i]

take the address indicated by $vaar
(load variable in memory)
take the value $i
(load variable in memory)
add $i houses to $vaar
load the memory value obtained in the sum
(address de-referencing)

Ready, with only a single dereference, the variable is loaded in memory now. And since we’re dealing with a contiguous piece of information, the next number will be close and therefore operable quickly.

If you’re gonna do something like that, remember:

to access the position $i,$j,$p of the cube, make the following account: $vaar[$i*20*40 + $j*40 + $p];
your maintenance team will have ease of reading this code?
- if not, maybe the performance gain does not compensate for the new possible "illegibility" of the code