String and its efficiency

Asked

Viewed 1,131 times

29

Doubt

I would like to know what design patterns or other "nerdy" things have been applied to this class StringBuffer. Why does String create new objects in a concatenation? As far as I know, String is also an Array of characters (correct me if I’m wrong).


I asked a similar question and I got a good answer, Stringbuffer and Stringbuilder in this question I understood the need for additional spaces in the "Array".


Bass will see code and performance tests, if you program in Java should put in practice.


Test Code


int loop = 1*100000;

        StringBuffer sb = new StringBuffer();
        long time = System.currentTimeMillis();

        for (int i = 0; i < loop; i++)
            sb.append("a");

        System.out.println("Total execuções: " + loop);
        System.out.println("Tempo total de excecução com StringBuffer: "
                + (System.currentTimeMillis() - time) + "ms.");

        String s = "";
        time = System.currentTimeMillis();
        for (int i = 0; i < loop; i++)
            s += "a";

        System.out.println("Total de execuções: " + loop);
        System.out.println("Tempo total de excecução concatenando strings: "
                + (System.currentTimeMillis() - time) + "ms.");

Test results:


Attempt 1

Total executions: 100000.
Total Stringbuffer Exception Time: 18ms.
Total number of executions: 100000.
Total exception time concatenating strings: 10556ms.


Attempt 2

Total executions: 200000.
Total Stringbuffer Exception Time: 23ms.
Total number of executions: 200000.
Total exception time concatenating strings: 44216ms.


Attempt 3

Total executions: 300000.
Total Stringbuffer Exception Time: 30ms.
Total number of executions: 300000.
Total exception time concatenating strings: 94886ms.

3 answers

25

I decided to answer because one of the answers is right in its essence but gives a wrong reason for slowness.

How is a string

Is correct the idea that a string in Java (the same goes for C# and several other languages) internally is represented by a array of chars, and this typically has 2 bytes (which individually do not necessarily represent characters, but this is another matter). See comment below on hkotsubo that there is optimization in newer versions that can occupy only 1 byte.

Immutability

Initially one array, that has fixed size, is allocated to save text from string. It turns out that a string is immutable. Every time you’ll change any part of the string, need to occur another memory allocation. Imagine that Making 100,000 memory allocations is nothing cheap. And worse, it’s not just the allocation that takes place, all the elements of the array need to be copied to the new array allocated, with due change occurring in the middle of this process.

Too many allocations

This becomes especially problematic when you are increasing the size of the string successively. This is called Shlemiel the Painter’s Algorithm. Where a painter goes painting the track of overtaking a highway. It starts very well, with high productivity. But every day he produces less, until his work becomes unfeasible. This is because it keeps the paint can in place, so it paints a portion of the strip and has to return to the starting point to wet the brush. Each day he is further away from the can and takes longer on the path than in painting.

Mostra que cada vez que anda fica mais longe

The problem is the Garbage Collector?

Of course a huge volume of allocations that soon after no longer have live references allow the Garbage Collector can collect them. This is a possibility that can further increase the slowness. But note that the Garbage colletor does the collection at once, so this collection does not consume that much time. And this usually only occurs when you’re really lacking in memory. It is possible that no collection takes place until its completion, in a simple program.

So the GC is not the reason for the slowness but rather the allocations and copies in memory. 100,000 copy allocations cost absurdly more than the entire GC algorithm freeing up memory, as it is optimized to do this at once on each call using one general method of collection.

Using the Stringbuilder

The Stringbuilder solve this making memory pre-allocations and allowing the content to be changeable. That is, it avoids many allocations when you are just wanting to change a portion of the string and especially when it is increasing in size successively.

It already has an allocated memory portion greatly minimizing the amount of necessary allocations. The most common algorithm used to determine the size is that every time the new text does not fit in the current allocation, it creates a new one twice the size of the previous one. This already greatly minimizes the amount of allocations and consequently the data copy.

Additionally you can determine the initial size of the array used by StringBuilder through the builder StringBuilder(int capacity) when it already has a sense of the size that the string will. Obviously you can also change the size of the allocation in the middle of the process with the method expandCapacity(int minimumCapacity), where appropriate.

Roughly string is an algorithm O(n), the StringBuilder is O(log n).

For being a changeable and pre-allocated structure, the problem of the road painter does not occur. The can is always close to the painter.

There are other techniques to avoid extra allocations, but the best known and most universal is the StringBuilder. I don’t know if Java does some optimization, but C#, for example, can optimize a code string texto = "abc" + "def" + "ghi" + "jkl";. Even a string texto = str1 + str2 + str3 + str4; can be optimized by the compiler as string texto = string.concat(str1, str2, str3, str3); where allocation will occur only once.

Choosing the Best Data Structure

By default we use the string normal, and in cases of StringBuilder becomes useful. You may want to always use it, but it has its evils (already cited by Math and utluiz). We should not make premature optimization.

Design patterns are created to facilitate the work of something that repeats frequently. Since "always" we have adopted design standards. Some project patterns are just recommendations, examples that should be followed. Others turn into a library, as is the case with StringBuilder. Others are so useful that they become language constructs.

As this does not seem clear to everyone, I replied another existing question on the subject.

I put in the Github for future reference.

  • 1

    "I don’t know if Java does any optimization". To complete the answer and answer any questions: yes, Java optimizes the "obvious" concatenations made with the +operator. If all strings are constant, they are concatenated at compile time. If variables exist, the compiler generates a code that uses Stringbuilder itself for each concatenation expression with +. The advantage of the programmer creating Stringbuilder manually is in loops or any concatenation that occurs in separate expressions (the compiler would generate a new Stringbuilder each time, which is costly).

  • "and this has 2 bytes always" - from Java 9, no longer "always": https://www.baeldung.com/java-9-compact-string#java

21


Both String and Stringbuilder internally work with character array, just look at their class:

String.java

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

Abstractstringbuilder.java (Stringbuilder and Stringbuffer superclass)

abstract class AbstractStringBuilder implements Appendable, CharSequence {
    /**
     * The value is used for character storage.
     */
    char[] value;

    /**
     * The count is the number of characters used.
     */
    int count;

Stringbuilder’s efficiency is precisely because of the spaces that it leaves in reserve in its vector, this space allows that in most cases more content is placed inside it without exceeding its capacity, when the size of the vector becomes not sufficient it increases to be able to hold this new value. According to the code:

Abstractstringbuilder.java

private void ensureCapacityInternal(int minimumCapacity) {
    // overflow-conscious code
    if (minimumCapacity - value.length > 0)
        expandCapacity(minimumCapacity);
}

void expandCapacity(int minimumCapacity) {
    int newCapacity = value.length * 2 + 2;
    if (newCapacity - minimumCapacity < 0)
        newCapacity = minimumCapacity;
    if (newCapacity < 0) {
        if (minimumCapacity < 0) // overflow
            throw new OutOfMemoryError();
        newCapacity = Integer.MAX_VALUE;
    }
    value = Arrays.copyOf(value, newCapacity);
}

we see that the size of the new vector is 2*(tamanho do vetor anterior)+2, If still this size is not enough it sets its new size as the minimum needed to store the new value. Then the content of the old vector is copied to the new vector.

Answering your questions:

I would like to know what design patterns or other "nerdy" things have been applied to this class

Design patterns usually refer to the class structure, i.e., there is nothing related to the fact that Stringbuilder expands as needed. Any project pattern application at Stringbuilder and superclasses are not relevant to the subject at hand

Why String Creates New Objects in a Concatenation?

Because the String is immutable, so its purpose is not to expand with each concatenation, so there is Stringbuilder.

There are some benefits to being immutable by looking at this question Why do we need immutable class? I transcribe some here an advantage in being immutable:

  • Thread-safe: Immutable objects can be safely shared across multiple threads.

  • Simplicity: each class has only one state.

  • Robustness: if you pass an immutable object to a method you do not run the risk that this method changes the value of your object. For just as you pass primitives to a method and know that these primitives cannot have their value changed the String behaves in the same way, so do the Wrappers classes of the primitives: Integer, Double, Float, Boolean, etc.

16

The hierarchy

The classes java.lang.StringBuffer and java.lang.StringBuilder extend the abstract class java.lang.AbstractStringBuilder and have exactly the same interface.

Analyzing the code of the abstract class AbstractStringBuilder, we see that it has two main attributes:

char value[];
int count;

There are stored the characters and the actual size of the same.

About the buffer

Everyone knows that String is a type that stores a character set. In Java, a String is immutable, that is, it cannot have modified content, as well as its size.

The problem with this is that to create new Strings, for example, by concatenating two or more Strings, a new String as their total size must be created in memory. If several of these operations are executed in sequence, the JVM will need to allocate new memory blocks and run Garbage Collector to de-locate what is not used at all times. This is very "costly" in terms of performance.

So the StringBuffer arises to solve the problem. A StringBuffer is nothing more than a String with a Buffer, that is, a space reserved for new characters that can be modified and makes it unnecessary, to some extent, to allocate more memory at all times.

So, if someone wants to add 100 Strings, each with approximately 10 characters, instead of making 99 String concatenations, just create a Stringbuffer with a 1000-character buffer and add all the Strings there, without overhead allocation and collection.

The initial size

In many situations it is important to set the initial buffer size to an average buffer size.

In doing new StringBuffer() or new StringBuilder(), that is, without setting an initial size, we are subusing the class. The initial buffer capacity is only 16 characters:

public StringBuilder() {
    super(16);
}

This means that if we add more than 16 characters, a new buffer will have to be allocated.

Increasing the size

But it’s not the end of the world. The expansion system is somewhat "clever".

If you overflow buffer capacity, how much will it allocate to more? Will it at least double!

See the calculation of the new buffer capacity in the method expandCapacity class AbstractStringBuilder:

void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2;
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
        newCapacity = minimumCapacity;
    }
    value = Arrays.copyOf(value, newCapacity);
}

StringBuffer or StringBuilder?

Ohe o Javadoc (links are at the beginning of the answer) and note that both have exactly the same interface. What’s the difference?

We say that the class StringBuffer is synchronized, while the class StringBuilder is not synchronized.

This means that the former can be used concurrently by multiple threads, but will also have reduced performance by synchronization and locks, while the latter must be reserved for one-thread exclusive use, while maximizing performance.


For more details, see my another answer here in the OS.

  • 1

    The three replies @Math, bigown and utluiz, helped me understand in more detail. And I hope you help others.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.