How to merge multiple text files into one?

Asked

Viewed 2,855 times

9

Does anyone know how to select all text files from the same directory and join the information of all of them in just one final text file?

Example: In folder X, I have files 1.txt, 2.txt and 3.txt. I need to merge the contents of all into one text file.

I tried this code, which compiles but when it executes it raises an exception of type Indexoutofrange.

string[] stringArray = Directory.GetFiles(@"C:\InventX", "*.txt");
        System.Text.StringBuilder stringBuilder = new System.Text.StringBuilder();
        for (int i = 0; i <= stringArray.Count(); i++)
        {
            stringBuilder.Append(System.IO.File.ReadAllText(stringArray[i]));
        }
        string bulidOutput = stringBuilder.ToString();
        string newFilePath = @"C:\Lala.txt";
        System.IO.File.WriteAllText(newFilePath, bulidOutput);

4 answers

10

The error in your code is due to this condition:

for (int i = 0; i <= stringArray.Count(); i++)

should be

for (int i = 0; i < stringArray.Count(); i++)

As it is, in the last iteration, when i == stringArray.Count() and given that the arrays sane zero index will lift the exception IndexOutOfRangeException.

To add, an efficient way to join the files is to read them piece by piece and go typing as each bit is read. You can change the size of buffer and compare the gains/losses relative to the performance to see which one best fits your scenario.

public void UnirFicheiros(string directorio, string filtro, string ficheiroUnido)
{
    if (Directory.Exists(directorio))
        throw new DirectoryNotFoundException();

    const int bufferSize = 1 * 1024;
    using (var outputFile = File.Create(Path.Combine(directorio, ficheiroUnido)))
    {
        foreach (string file in Directory.GetFiles(directorio, filtro))
        {
            using (var inputFile = File.OpenRead(file))
            {
                var buffer = new byte[bufferSize];
                int bytesRead;
                while ((bytesRead = inputFile.Read(buffer, 0, buffer.Length)) > 0)
                {
                    outputFile.Write(buffer, 0, bytesRead);
                }
            }
        }
    }
}

9

Here’s a simple example:

static void Main(string[] args)
{
    string diretorio = @"C:\teste";

    String[] listaDeArquivos = Directory.GetFiles(diretorio);

    if (listaDeArquivos.Length > 0)
    {
        string caminhoArquivoDestino = @"C:\teste\saida.txt";

        FileStream arquivoDestino = File.Open(caminhoArquivoDestino, FileMode.OpenOrCreate);
        arquivoDestino.Close();

        List<String> linhasDestino = new List<string>();

        foreach (String caminhoArquivo in listaDeArquivos)
        {
            linhasDestino.AddRange(File.ReadAllLines(caminhoArquivo));
        }

        File.WriteAllLines(caminhoArquivoDestino, linhasDestino.ToArray());
    }

}

Play with the methods and suit your need.

8

As the approach does not seem to be good, I decided to make a compilable example that would solve the problem in a generic way.

using System;
using System.IO;
using Util.IO;

public class MergeFiles {
    public static void Main(string[] args) {
        int bufferSize;
        FileUtil.MergeTextFiles(args[0], args[1], args[2], (int.TryParse(args[3], out bufferSize) ? bufferSize : 0));
    }
}

namespace Util.IO {
    public static class FileUtil {
        public static void MergeTextFiles(string targetFileName, string sourcePath, string searchPattern = "*.*", int bufferSize = 0) {
        if (string.IsNullOrEmpty(sourcePath)) {
            sourcePath = Directory.GetCurrentDirectory();
        }
            if (targetFileName.IndexOfAny(System.IO.Path.GetInvalidPathChars()) != -1) {
                throw new ArgumentException("Diretório fonte especificado contém caracteres inválidos", "sourcePath");
            }
            if (string.IsNullOrEmpty(targetFileName)) {
                throw new ArgumentException("Nome do arquivo destino precisa ser especificado", "targetFileName");
            }
            if (string.IsNullOrEmpty(targetFileName)) {
                throw new ArgumentException("Nome do arquivo destino precisa ser especificado", "targetFileName");
            }
            if (targetFileName.IndexOfAny(System.IO.Path.GetInvalidFileNameChars()) != -1) {
                throw new ArgumentException("Nome do arquivo destino contém caracteres inválidos", "targetFileName");
            }
            var targetFullFileName = Path.Combine(sourcePath, targetFileName);
            if (bufferSize == 0) {
                File.Delete(targetFullFileName);
                foreach (var file in Directory.GetFiles(sourcePath, searchPattern)) {
                    if (file != targetFullFileName) {
                        File.AppendAllText(targetFullFileName, File.ReadAllText(file));
                    }
                }
            } else {
                using (var targetFile = File.Create(targetFullFileName, bufferSize)) {
                    foreach (var file in Directory.GetFiles(sourcePath, searchPattern)) {
                        if (file != targetFullFileName) {
                            using (var sourceFile = File.OpenRead(file))    {
                                var buffer = new byte[bufferSize];
                                int bytesRead;
                                while ((bytesRead = sourceFile.Read(buffer, 0, buffer.Length)) > 0) {
                                    targetFile.Write(buffer, 0, bytesRead);
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

I put in the Github for future reference.

In newer versions can reduce this code.

The method Main() is there just to facilitate a quick test, is not in production conditions. The method MergeTextFiles() is quite reasonable for use. It’s not 100%, I didn’t make a test unit for it, I didn’t document it, I didn’t think of every possible situation, but it’s already well underway.

You can choose a size of buffer if you want to better control the form of copying. If you think you will never need this, you can take this part of the method. But it does not hurt to leave, since the default is to make the full copy of the files within the criteria of the current implementation of . NET.

Possible improvements

Some improvements can be made to make it more generic or add features. You could, for example, put a last parameter parameter extraNewLineOptions extraNewLineOption = extraNewLineOptions.NoExtraNewLine and an enumeration enum extraNewLineOptions { NoExtraNewLine, SelectiveExtraNewLine, AlwaysExtraNewLine }.

To allow an extra line break to be placed at the end of each file to ensure that it will not encase text. This may be useful but in most cases it is not necessary, so it would be disabled by default. I leave to the creativity of each one the implementation of this, mainly by SelectiveExtraNewLine() that would only put a line break if it does not exist at the end of the file, it is not so trivial to implement. It is possible to create a Overload to improve the use of parameters.

Another improvement is to allow the copy to be done asynchronously. Very useful if you have large volumes of files.

And the method could be breaking into pieces as well.

Depending on the version of . NET

I used features to be able to run on virtually any version of .NET. If it is guaranteed to be used in newer versions, parameter checks can be exchanged for Contract.Requires(). Or it is even possible to remove all this since the verification of all these problems are also made in the so-called methods. Of course you would lose the location of the information from where exactly the error originated.

Unfortunately there is no public method to check the joker’s validity in advance. But if necessary it is possible to check as implemented in . NET sources (and possibly us sources of the Mono also (no . NET Core).

If you have C# 6 (through Roslyn), some improvements can be made.

Could use a using Util.IO.FileUtil; and then call the method directly: MergeTextFiles("combo.txt", ".", "*.txt").

In addition the statements int bufferSize; in the method Main() and int bytesRead; could be made inline during their use during the TryParse() and the while respectively: int.TryParse(args[3], out var bufferSize and while ((int bytesRead = sourceFile.Read(buffer, 0, buffer.Length)) > 0) {.

See the example in C# 6 no ideone. And no . NET Fiddle. Also put on the Github for future reference.

4

With Streamwriter

String[] arquivos = Directory.GetFiles(@".\Txts", "*.txt");
StreamWriter strWriter = new StreamWriter(".\\Final.txt");
foreach (String arquivo in arquivos)
{
    strWriter.WriteLine(File.ReadAllText(arquivo));
}
strWriter.Flush();
strWriter.Dispose();

Reference:

Browser other questions tagged

You are not signed in. Login or sign up in order to post.