Problem
There is a problem that one of the creators of this site (SO) calls Shlemiel the Painter’s Algorithm.
A painter paints the stretch of a highway. He starts very well, with high productivity. But every day he produces less, until after a few iterations his work becomes unviable.
This is because it keeps the paint can in a fixed place, so it paints a portion of the strip and has to return to the starting point to wet the brush. Each day he is further away from the can and takes longer on the path than in painting.
Imagine this pattern growing dozens, hundreds or thousands of times. Quickly makes it impossible.
This is what usually happens with concatenations of data collections, especially string. As it grows and does not fit into the existing space for the previous version, a new allocation needs to be made to support the full size of the new version. And it has to dislocate the previous one that turned to garbage. And it fragments the memory. All this is time cost. In some languages the situation is worse since a change that does not even exceed the current size limit already causes a relocation (for a good cause). This is the case of Python.
Solution
The correct way is to figure out the final size, or at least an approximation of it and allocate everything you need, you can put the texts in that allocated area. Obviously it needs to be done in a structure that accepts the text be changed, which is not the case of the type string
that is immutable, that is, any change generates a new object.
The join()
does exactly what I described. He discovers the total size needed - taking the sizes of all strings that will be concatenated - allocates all the space needed and throws the texts into this space that is not yet a string. In the end he turns it into string. Here the cost is equivalent to the total size of the text which is much shorter than walking all over again in each concatenation.
Note that for few concatenations, typically up to 4, concatenation may perform even better than join()
. Of course, in such a small volume no matter which is faster.
Alternatives
Of course the join()
is not the only way to do it. You can do it manually if you need something a little more complex than the join()
does not answer. Maybe using a bytearray or a standard list that are mutable (help, but not great because it may need new allocations, although minimized, does not need in each change, depends on the skill of the programmer).
To python page also shows how to use %s
to obtain similar result. Formatting occurs in a function that manipulates everything in another data structure and only at the end that is generated the string final.
Some people like to use the guy StringIO
who takes care of it.
I answered this in more detail for other languages like Java. And also C#.