Replace in large string or split loop

Asked

Viewed 253 times

2

I have a multidimensional array with some routes and need to do the conversion of some elements defined as: (alpha), (int), (id), etc. As it is an array, at the moment I use a loop to do the replace.

I thought about approaching another way without the loop, and working with string to do the general replace. First convert the array into string using serialize, then apply replace and then return the array with unserialize.

From the initial way I had a foreach with separate Replaces, as in the example below.

foreach( $array as $line )
{
    $lineA = str_replace( ',' , '&' , $lineA );
    $lineB = preg_replace( array( '(int)' , '(alpha)' ) , '([0-9]+) , ([a-zA-Z]+)' , $line[0] );
}

With new approach I have serialize, unserialize and can use a single replace.

$line = serialize( $array );
$line = str_replace( array( '(int)' , '(alpha)' , ',' ) , array( '([0-9]+)' , '([a-zA-Z]+)' , '&' , $line );
$line = unserialize( $line );

After serialize, replace will be in a string a little large and then apply unserialize.

I don’t know the limits of str_replace - is more advantageous a loop in small strings or a single replace in a large string?

It is not a question about BENCH, only knowing advantages and disadvantages of each case, where one applies better than the other.

  • 1

    Enter the code so you can do the tests. Only then you can have a correct answer. If you want you can do the tests with x-debug and winCacheGrind.

  • I even thought about making a Bench, but the configs of PHP and the machine end up influencing.

  • Then all the answers will be based on the personal opinion of each.

  • Anyway if you test on the same machine, the machine is no longer a variable.

  • Obviously it’s not based on opinion. It’s about behavior and function performance. I don’t know what limitations and advantages of using replace in a loop or large string.

  • To finish. How can you know behavior and performance without testing and measuring?

  • There is a function for every need - str_replace for one case, preg for another... and so on. Each one has advantages and disadvantages in behavior, and I don’t know. I’m waiting for someone who can say preg_replace is not cool with a large string, or that is more advantage than a loop with 3 Replaces.

  • 1

    I do not believe that it is based on opinions, someone may have already taken that test, or they may do so to respond. Anyway a snippet of code will always be welcome.

Show 3 more comments

2 answers

2


Trying to answer in a logical way. I have a principle that helps me to solve some problems. In this case it would be like this:

  • strings -> string functions, arrays -> array functions

The serialize function is used to make it possible to preserve data types, when we want to save this data either in some text file or in the database. When we need this data again we will fetch the file and call the function unserialize so that these data return to their 'original' state. To put this another way: if we want to store an array in the database, we call the serialize function that transforms the array into a string and stores it in the database. When we need to use the array again, we search the database and call the unserialize function to have the array in the same "state" that we had before saving. This leads us:

1) Using serialize, manipulate the string and then unserialize can break data integrity and have unexpected results when we unserialize.

2) the serialize function will make a series of loops to turn the array into a string.

3) the unserialize function will make a series of loops to transform a string into an array.

Completion:

For the above reasons it is not a good idea to use serialize, as at least and roughly will duplicate the number of loops.

  • Great, I really liked it. About breaking the integrity, I don’t think it happens because they’re exact elements. The explanation he gave was exactly what he wanted to know. I think a loop out is better than *serialize’s loops. 100% técino :)

1

I found this question very interesting because in this area of programming the details are very important especially for those who deal with large amounts of information. In that sense I would like to try to add some value to this question that is pertinent.

The question arises:

I don’t know the limits of str_replace - it is more advantageous a loop in small strings or a single replace in a large string?

Good for that I have the answer of previous research which is the following:

Most programming languages limit the number of characters that can be stored in a string, but the PHP no. This does not mean that you can store unlimited data in a PHP string, however and while there is no limit on the length of the string, a limit is imposed on the overall amount of data that a php script can use. This limit is expressed in bytes and can be changed by editing the configuration file "php.ini".

Having said that and looking at your question, I noted that you did not consider the json_encode and the json_decode, but I advise as it is a variant to take into account.

At first glance it seems to me to have three paths, the most obvious option for the serialization of data is the serialize and unserialize PHP. A little less popular is the json_encode and json_decode. There is also a third option, using a third party module that you can easily install on your server. This module is called igbinary. The latter very good in performance but has some requirements that not all environments can provide and if used with memcached so it is bomb.

In the serialization data, we are always concerned about the size of the result, but also about the time it took for the data to be serialized.

However there are documents and tests performed that put json_encode and json_decode with performance gains. In my experience I would say the following:

If your app is more focused on reading than writing, igbinary is the winner once it will unserialize your data faster than the other functions. However if you are more focused on data storage, json_encode is the right choice as it makes your serialized result smaller and faster.

I hope I have contributed to your clarification.

  • Interesting. There is also the possibility to convert to string using http_build_query and return the array with parse_str. Anyway - as said by @Manuel Gerardo Pereira - I believe that converting an array into a string and reversing the process can be more expensive than the loop itself. But it’s always good to have other options, thanks for the contribution, I didn’t know igbinary +1

  • 1

    all the reason because with the direct loop to the array saves two "layers" of conversion. I just put my answer centered on serialize because it was a new way of seeing the subject according to what I asked and in this way trying to enrich the answer more.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.