I decided to answer because one of the answers is right in its essence but gives a wrong reason for slowness.
How is a string
Is correct the idea that a string in Java (the same goes for C# and several other languages) internally is represented by a array of chars, and this typically has 2 bytes (which individually do not necessarily represent characters, but this is another matter). See comment below on hkotsubo that there is optimization in newer versions that can occupy only 1 byte.
Immutability
Initially one array, that has fixed size, is allocated to save text from string. It turns out that a string is immutable. Every time you’ll change any part of the string, need to occur another memory allocation. Imagine that Making 100,000 memory allocations is nothing cheap. And worse, it’s not just the allocation that takes place, all the elements of the array need to be copied to the new array allocated, with due change occurring in the middle of this process.
Too many allocations
This becomes especially problematic when you are increasing the size of the string successively. This is called Shlemiel the Painter’s Algorithm. Where a painter goes painting the track of overtaking a highway. It starts very well, with high productivity. But every day he produces less, until his work becomes unfeasible. This is because it keeps the paint can in place, so it paints a portion of the strip and has to return to the starting point to wet the brush. Each day he is further away from the can and takes longer on the path than in painting.
The problem is the Garbage Collector?
Of course a huge volume of allocations that soon after no longer have live references allow the Garbage Collector can collect them. This is a possibility that can further increase the slowness. But note that the Garbage colletor does the collection at once, so this collection does not consume that much time. And this usually only occurs when you’re really lacking in memory. It is possible that no collection takes place until its completion, in a simple program.
So the GC is not the reason for the slowness but rather the allocations and copies in memory. 100,000 copy allocations cost absurdly more than the entire GC algorithm freeing up memory, as it is optimized to do this at once on each call using one general method of collection.
Using the Stringbuilder
The Stringbuilder solve this making memory pre-allocations and allowing the content to be changeable. That is, it avoids many allocations when you are just wanting to change a portion of the string and especially when it is increasing in size successively.
It already has an allocated memory portion greatly minimizing the amount of necessary allocations. The most common algorithm used to determine the size is that every time the new text does not fit in the current allocation, it creates a new one twice the size of the previous one. This already greatly minimizes the amount of allocations and consequently the data copy.
Additionally you can determine the initial size of the array used by StringBuilder
through the builder StringBuilder(int capacity)
when it already has a sense of the size that the string will. Obviously you can also change the size of the allocation in the middle of the process with the method expandCapacity(int minimumCapacity)
, where appropriate.
Roughly string is an algorithm O(n), the StringBuilder
is O(log n).
For being a changeable and pre-allocated structure, the problem of the road painter does not occur. The can is always close to the painter.
There are other techniques to avoid extra allocations, but the best known and most universal is the StringBuilder
. I don’t know if Java does some optimization, but C#, for example, can optimize a code string texto = "abc" + "def" + "ghi" + "jkl";
. Even a string texto = str1 + str2 + str3 + str4;
can be optimized by the compiler as string texto = string.concat(str1, str2, str3, str3);
where allocation will occur only once.
Choosing the Best Data Structure
By default we use the string normal, and in cases of StringBuilder
becomes useful. You may want to always use it, but it has its evils (already cited by Math and utluiz). We should not make premature optimization.
Design patterns are created to facilitate the work of something that repeats frequently. Since "always" we have adopted design standards. Some project patterns are just recommendations, examples that should be followed. Others turn into a library, as is the case with StringBuilder
. Others are so useful that they become language constructs.
As this does not seem clear to everyone, I replied another existing question on the subject.
I put in the Github for future reference.
"I don’t know if Java does any optimization". To complete the answer and answer any questions: yes, Java optimizes the "obvious" concatenations made with the +operator. If all strings are constant, they are concatenated at compile time. If variables exist, the compiler generates a code that uses Stringbuilder itself for each concatenation expression with +. The advantage of the programmer creating Stringbuilder manually is in loops or any concatenation that occurs in separate expressions (the compiler would generate a new Stringbuilder each time, which is costly).
– marcus
"and this has 2 bytes always" - from Java 9, no longer "always": https://www.baeldung.com/java-9-compact-string#java
– hkotsubo