Like "clone" an Inputstream?

Question

Like "clone" an Inputstream?

Asked 3 years, 4 months ago

Viewed 213 times

5

Often I need to read a InputStream more than once. For example, to pass on the content of stream for multiple methods.

void readStream(InputStream input) throws Exception {
    var result1 = doSomething(input);
    var result2 = doSomethingElse(input);
    // Faz algo com os resultados
}

The calls downstream are not under my control. They consume and close input, so I can’t just mark and reset the stream original.

Open a new InputStream also not convenient in my case. input is connected to a slow and limited source.

I wonder if there’s any way to "clone" input so that I can pass streams different for the methods downstream?

I mean, I wanted to do something like in the code below (that doesn’t work):

void readStream(InputStream input) throws Exception {
    var clonedInput = ((InputStream) input.clone());
    var result1 = doSomething(input);
    var result2 = doSomethingElse(clonedInput);
    // Faz algo com os resultados
}

Is it possible to do something like this? If not, what are the best alternatives?

Attribution: Adapted from the question How to Clone an Inputstream? asked by the user Renato Dinhani in the Stack Overflow.

1

+1 Very cool! Just a question. I even thought about asking in META, but for what you have done, you are welcome to ask a question/answer couple? Because I have a case that I did not find the answer on a specific subject here at Sopt and I had to look at documentation and at Soen to be able to do an analysis for my solution. In that case, it would be good for me to post a question/answer pair?

– Cmte Cardeal

2021/02/17 at 22:12
I don’t know (yet) if there is anything in the META on the subject...

– Cmte Cardeal

2021/02/17 at 22:13
3

@Cmtecardeal Yes, answer the question itself is perfectly normal and within the rules - myself have already done that

– hkotsubo

2021/02/17 at 22:32

2 answers

7

As it was said that the methods doSomething and doSomethingElse consume all the stream, so I understand that you both need all of her content.

Therefore, an alternative is - as stated in another answer - read all the contents of InputStream and save it in an array of bytes. In the other answer was given the solution with Java 9 (with transferTo), but for earlier versions you would have to read the content manually, and write it in a ByteArrayOutputStream:

void readStream(InputStream input) throws Exception {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    byte[] buffer = new byte[1024];
    int len;
    while ((len = input.read(buffer)) > 0) {
        baos.write(buffer, 0, len);
    }
    baos.flush();

    var result1 = doSomething(new ByteArrayInputStream(baos.toByteArray()));
    var result2 = doSomethingElse(new ByteArrayInputStream(baos.toByteArray()));
    // etc...
}

With this, I get an array of bytes (in the above case, it is the array returned by baos.toByteArray()), which I can then pass on to the methods, encapsulated in a ByteArrayInputStream. So every call is made with a stream new but always containing the same data.

What you could do to improve - if you already "know" or have a sense of the approximate size of the data - is to create the ByteArrayOutputStream with the proper size (something like new ByteArrayOutputStream(tamanhoDosDados)), so you avoid the relocations that are made during the writing: documentation says the value default of the initial capacity is 32 (ie only 32 bytes), so by passing the size closer to the real you minimize the amount of times the internal array is reallocated.

Remember that this only applies if you have enough memory to store all the data from stream.

But if there is not enough memory, you need to read it more than once and prevent it from being closed (and do not want to reopen it), you could do a "gambiarra": create a stream that when it is closed, it actually resets. Something like this:

// solução ruim, veja considerações mais abaixo
public class ResetOnCloseInputStream extends InputStream {
    private InputStream in;
    public ResetOnCloseInputStream(InputStream in) {
        this.in = in;
    }

    @Override
    public int read() throws IOException {
        return in.read();
    }

    // gambiarra: ao ser fechado, na verdade é resetado
    @Override
    public void close() throws IOException {
        if (in.markSupported()) {
            in.mark(0);
            in.reset();
        }
    }
}

...
ResetOnCloseInputStream resetIn = new ResetOnCloseInputStream(input);
var result1 = doSomething(resetIn);
var result2 = doSomethingElse(resetIn);

But there are some problems: in addition to being a gambiarra, for being (in my opinion) misrepresenting the objective of the method close, injuring both the semantics of this and the The Least Surprising Principle, not all stream can withstand the operations of mark and reset. So it’s a less than certain method that works, and I just leave it on record here as curiosity.

If there is not enough memory to maintain the byte array, another option is to write the contents of the input in a temporary file, and at each call of the methods that consume the data, create a new stream that reads from this archive.

From Java 7, you can use bundle java.nio:

InputStream input = // input original

// salva os dados do input em um arquivo temporário
Path tempFile = Files.createTempFile("prefixo", "sufixo"); // prefixo e sufixo podem ser null (tanto faz, o arquivo é temporário e o Java cria um nome "único" mesmo...)
Files.copy(input, tempFile, StandardCopyOption.REPLACE_EXISTING);

// usa o arquivo temporário (cria um novo stream para cada método)
doSomething(Files.newInputStream(tempFile, StandardOpenOption.READ));
doSomethingElse(Files.newInputStream(tempFile, StandardOpenOption.READ));

// opcional: apaga o arquivo depois que terminar o processamento
Files.delete(tempFile);

For Java < 7, use java.io even:

// copia os dados do input para um arquivo temporário
File tempFile = File.createTempFile("prefixo", "sufixo");
try (OutputStream out = new FileOutputStream(tempFile)) {
    byte[] buffer = new byte[1024];
    int read;
    while ((read = input.read(buffer)) != -1) {
        out.write(buffer, 0, read);
    }
} catch (IOException e) {
    // trata erros, etc
}

// usa o arquivo temporário (cria um novo stream para cada método)
doSomething(new FileInputStream(tempFile));
doSomethingElse(new FileInputStream(tempFile));

// opcional: apaga o arquivo depois que terminar o processamento
tempFile.delete();

Finally, if there’s no way to keep the data in memory or create the temporary file, the way is to do what you don’t seem to want: reopen the stream and read it again.

There is also the option to use an external library, as already suggested in the other answer, but it has some poréns:

About Apache Commons

As it has been implied that the methods that process the stream read all the data of this, then CloseShieldInputStream - at least in the tests I’ve done - it doesn’t seem like a good solution:

InputStream input = // o input original

CloseShieldInputStream closeShieldForInput = new CloseShieldInputStream(input);
var result1 = doSomething(closeShieldForInput); // leu todos os dados do input
var result2 = doSomethingElse(closeShieldForInput); // não leu nada

In the tests I did, I created the methods doSomething and doSomethingElse so that they read all the data from stream and close it at the end (basically, "while (read) etc and close"). With that, only doSomething read the data from input, but doSomethingElse couldn’t read any more.

This happens because what CloseShieldInputStream is only to prevent the InputStream that it encapsulates be closed. If we look at the source code, we will see that in fact the method close simply arrow the InputStream for a ClosedInputStream (and this class, in turn, has a method read who always returns -1 - that is, in practice it is like a stream "empty", with no data).

That’s why after closing one CloseShieldInputStream, the input original is not closed, but the attempt to read something afterwards brings no data. So I understand that this does not suit your case. A documentation says the following:

This class is typically used in cases Where an input stream needs to be passed to a Component that wants to explicitly close the stream Even if more input would still be available to other Components.

That is, it is useful when the stream is closed before having all data read, but you do not want it to be closed, and you want to continue reading from where you left off.
For example, I created a file containing the text abc and took that test:

InputStream input = new FileInputStream("arquivo.txt"); // arquivo contendo "abc"

CloseShieldInputStream closeShieldForInput = new CloseShieldInputStream(input);
System.out.println(closeShieldForInput.read()); // 97 <- leu a letra "a"
System.out.println(closeShieldForInput.read()); // 98 <- leu a letra "b"

// não fecha o input original, mas seta o stream interno para um ClosedInputStream
closeShieldForInput.close();

System.out.println(closeShieldForInput.read()); // -1 <- a leitura está sendo feita no ClosedInputStream
System.out.println(closeShieldForInput.read()); // -1 <- a leitura está sendo feita no ClosedInputStream
// ao ler do input original, continua de onde parou (já que ele não foi fechado)
System.out.println(input.read()); // 99 <- leu a letra "c"

Note that after the CloseShieldInputStream be closed, calls to read return -1. But if I read directly from input original, it remains open and the data is read correctly, from where it had left off.

So this is not a good solution in case you need to consume the stream whole several times. You could even try resetting the stream before reusing it, but falls into the problem already mentioned above (not all streams support reset), and besides, after the first reset you would have to use the input original instead of the CloseShieldInputStream (or else create another).

About the TeeInputStream, it actually works, in conjunction with PipedInputStream and PipedOutputStream:

InputStream input = // o input original

PipedInputStream in = new PipedInputStream();
TeeInputStream tee = new TeeInputStream(input, new PipedOutputStream(in));
var result1 = doSomething(tee);
var result2 = doSomethingElse(in);

But remember that internally, a copy of the data is being made: when reading from TeeInputStream, send the data to the PipedOutputStream, which in turn sends the data to PipedInputStream, that keeps a copy of them (that is, if there is memory limitation, this solution would not apply either).

And there is another limitation: if you want to read the data again (for the third time), it will not be possible:

InputStream input = // o input original

PipedInputStream in = new PipedInputStream();
TeeInputStream tee = new TeeInputStream(input, new PipedOutputStream(in));
var result1 = doSomething(tee);
var result2 = doSomethingElse(in);

var result3 = doAnotherThing(tee); // erro: tee está fechado
var result4 = doOneMoreThing(in); // erro: in está fechado
var result5 = doJustOneMoreThing(input); // erro: input está fechado

For both the TeeInputStream as to the PipedInputStream (and the input original) will be closed.

Remember that this does not occur with solutions with ByteArrayInputStream and with the temporary file: with them, we can create as many as necessary. So those remain - in my opinion - the best options: if there is enough memory to maintain the byte array, ByteArrayInputStream it is preferable, otherwise prefer to create the temporary file.

1

Excellent as always hkotsubo, you and Brenno brought everything you had in Soen’s original question and a little more. In a few days I mark the accept. I will only leave open to draw attention for a little more time and give the chance to anyone else want to answer.

– Anthony Accioly

2021/02/18 at 13:59
1

@Anthonyaccioly I added one more option (which I saw you didn’t have in the original Soen question). It’s no better than keeping the data in an array of bytes + ByteArrayInputStream, but I think it’s a good "middle ground" in case there’s memory limitation

– hkotsubo

2021/02/19 at 13:04

Browser other questions tagged java io clone inputstream

You are not signed in. Login or sign up in order to post.

by Brenno Serrato • **136** points · Answer 1 · 2021-02-18T02:20:14+00:00

Java 9

In this situation you do not want to close the input it is possible to use the Inputstream.transfer and keep in Bytearrayoutputstream.

It will be possible to read the data and placed in another variable, creating an intermediary for the bytes. It is important to remember that for this approach it is necessary that the memory is sufficient to contain all data (and its replicas).

An example:

ByteArrayOutputStream foo = new ByteArrayOutputStream();
input.transferTo(foo);
InputStream barOne = new ByteArrayInputStream(foo.toByteArray()); 
InputStream barTwo = new ByteArrayInputStream(foo.toByteArray());
InputStream barThree = new ByteArrayInputStream(foo.toByteArray()); 
// O quanto você tiver de memória disponível

Apache Commons

It is possible to achieve a similar solution also using the Teeinputstream of Apache Commons together with the Pipedinputstream and Pipedoutputstream.

The Teeinputstream serves as a proxy for reading Inputstream and the Pipedinputstream is normally used in a thread different for reading and recording.

void readStream(InputStream input) throws Exception {
    // Cria o pipedInputStream para um input
    PipedInputStream pipInput = new PipedInputStream();
    // Cria um clone para o TeeInputStream
    TeeInputStream teeInput = new TeeInputStream(pipInput, new PipedOutputStream(input));

    // Agora você tem o teeInput e pipInput para manipular seu inputStream

}

Still as a countermeasure you can use the Closeshieldinputstream to prevent the following methods from closing the inputStream original as in the example:

void readStream(InputStream input) throws Exception {
    CloseShieldInputStream closeShieldForInput = new CloseShieldInputStream(input);
    var result1 = doSomething(closeShieldForInput);
    var result2 = doSomethingElse(closeShieldForInput);
    // E no final o seu inputStream poderá ser lido
}