How to check if the txt file has a blank space on the last line

Asked

Viewed 1,225 times

0

i have a script here that searches all the files . txt from a folder and then joins them in one file only.

The problem is that some files have a " n" in the last row, causing the next line not to be below the previous one, causing errors when I import. final txt.

It would be possible to check to see if the last line of a file has an " n" and so delete it and if not, add a " n".

My records are in that format:

00000011098720150131379000100011
00000021098720150131379000400011
00000021098720150131379000400011

Here is the code:

import os
import glob

found = False
source_folder = None

while not found:
  source_folder = str(input("Adicione o diretório com os arquivos.))
   print(source_folder)
  if not os.path.isdir(source_folder):
    print(source_folder, 'A pasta não foi encontrada.)
else:
    print("Pasta encontrada! ")
    found = True

os.chdir(source_folder)

read_files = glob.glob("*.txt")
print(read_files)

arq = str(input("Adicione o nome do arquivo: "))

with open(arq, "wb") as outfile:
  for f in read_files:
      with open(f, "rb") as infile:
          outfile.write(infile.read())

2 answers

0

If the intention is to work with text files, there is no reason to open the input and output files in binary mode.

Follow a tested solution capable of "concatenating" all files with extension .txt contained in a given directory in a single file, ignoring the blank lines:

import os
import glob

source_folder = input("Entre com o diretorio de origem: ")

try:
    os.chdir(source_folder)
except FileNotFoundError:
    print( "Diretorio nao encontrado: '%s'" % (source_folder) )
    exit(1)

read_files = glob.glob("*.txt")
print(read_files)

arq = str(input("Entre com o nome do arquivo de saida: "))

with open(arq, "w") as outfile:             # Abre arquivo de saída para gravacao... 
    for f in read_files:                    # Para cada arquivo de entrada...
        with open(f, "r") as infile:        # Abre arquivo de entrada para leitura...
            for ln in infile:               # Para cada linha do arquivo de entrada..
                if ln.strip().strip("\n"):  # Verifica linha em branco
                    outfile.write(ln)       # Grava linha na saida

arquivo1.txt

00000011098720150131379000101528
00000011098720150131379000101561
00000011098720150131379000101594
00000011098720150131379000101627

00000011098720150131379000101660
00000011098720150131379000101693
00000011098720150131379000101726
00000011098720150131379000101759

00000011098720150131379000101792
00000011098720150131379000101825
00000011098720150131379000101858

arquivo2.txt

00000011098720150131379000108227
00000011098720150131379000108260
00000011098720150131379000108293

00000011098720150131379000108326
00000011098720150131379000108359
00000011098720150131379000108392
00000011098720150131379000108425

Testing:

$ python3 teste.py 
Entre com o diretorio de origem: /tmp
['arquivo1.txt', 'arquivo2.txt']
Entre com o nome do arquivo de saida: saida.txt

output.txt

00000011098720150131379000101528
00000011098720150131379000101561
00000011098720150131379000101594
00000011098720150131379000101627
00000011098720150131379000101660
00000011098720150131379000101693
00000011098720150131379000101726
00000011098720150131379000101759
00000011098720150131379000101792
00000011098720150131379000101825
00000011098720150131379000101858
00000011098720150131379000108227
00000011098720150131379000108260
00000011098720150131379000108293
00000011098720150131379000108326
00000011098720150131379000108359
00000011098720150131379000108392
00000011098720150131379000108425
  • Wow! The code got very clean this way. Thank you so much for your help @Lacobus.

0


A simple way, since you read all the previous files as a single string, is to use the method strip that removes all space characters from the beginning and the end of the string.

With this, blank lines at the end will be removed, but also the \n after the last line, which is necessary - so we add it back.

Well, all this to say that you just need to rewrite this line:

outfile.write(infile.read())

leaving the like:

outfile.write(infile.read().strip() + "\n")

If the blank lines were in the middle of the file, instead of at the end, it would be necessary to iterate line by line and delete the blank ones. Thanks to the Python comprehensions this could be in one line too:

outfile.writelines(line for line in infile if line.strip())

Just that: the method writelines expects an iterator, which is the Generator Expression between parentheses. The for of this Expression, in turn, uses the input file as an iterator, taking line by line - and the filter expression, after the if discards blank lines: if the line has only spaces and the \n, strip turns it into an empty string, which has fake boolean value, and then it is discarded from the generator.

  • jsbueno, thank you so much for taking the time to answer me. Could you help me with one more thing? When I put "outfile.write(infile.read(). strip() + "n")", it just creates the file but it is empty. (in pycharm, it appears: "expected type 'bytes', bot 'str' Instead" When I use "outfile.writelines(line for line in infile if line.strip())" it does everything right, but when it changes the file, it doesn’t leave the "/n", would you tell me what I would have to do to fix it? Sorry if the solution is obvious, but I started studying python earlier this week, so I’m pretty much lost

  • All right! After some research, I managed to find the solution. The line ended up like this: "outfile.write(infile.read().strip() + b" n")". Again, thank you so much for helping me! @jsbueno

  • I didn’t even notice you had "b" open the file. "b" is for binary files - and will give a LOT of wrong thing, because Python treats it completely differently.If you are reading text, you have to open as text file.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.