Select right line with Split in TXT

Asked

Viewed 79 times

3

My doubt is on how after separating the lines into lists, I can also take these values and just do the append of the right line (numero_linha of function).

But also, manage to catch the whole line, and not only 10 positions of the txt phrase, as it is returning me to line 0 and 10 positions of this line.

Filing cabinet musica.txt:

Roda Viva
Chico Buarque

Tem dias que a gente se sente
Como quem partiu ou morreu
A gente estancou de repente
Ou foi o mundo então que cresceu
A gente quer ter voz ativa
No nosso destino mandar
Mas eis que chega a roda viva
E carrega o destino pra lá

Roda mundo, roda-gigante
Roda moinho, roda pião

O tempo rodou num instante
Nas voltas do meu coração
A gente vai contra a corrente
Até não poder resistir
Na volta do barco é que sente
O quanto deixou de cumprir
Faz tempo que a gente cultiva
A mais linda roseira que há
Mas eis que chega a roda viva
E carrega a roseira pra lá

Roda mundo, roda-gigante
Roda moinho, roda pião
 # função para ler, extrair valores do TXT
 def extrai_linha_txt(nome_arquivo: str, numero_linha: int):

    palavras_linha = []
    
    # le o arquivo com o comando 'with' utilizando o parametro 'nome_arquivo'
    with open(file=nome_arquivo, mode='r', encoding='utf8') as fp:   
    
    # extrair linha do arquivo utilizando o parametro 'numero_linha'
     linha = fp.readline()
     count = 1
    
    # quebra a linha em palavras com o comando split ' '
    while count < numero_linha:
        
            linha = linha.rstrip('\n')
            linha_formatada = linha.split(sep=' ')
            palavras_linha.append(linha_formatada)
                                    
            count += 1 
                    
    return palavras_linha
    
 # chamada de função com parâmetros da linha selecionada
 linha10 = extrai_linha_txt(nome_arquivo='./musica.txt', numero_linha=10)
 print(linha10) # deve retornar ['Mas', 'eis', 'que', 'chega', 'a', 'roda', 'viva']
  • You could [Dit] your question and fix the indentation of your code?

  • And you call a function to process line 11 of your file, but you are waiting for it to return to line 10 (if you start counting from 1).

  • Opa, I thank @fernandosavio! I’ve edited and adjusted the indentation. Yes, I just passed 11 to come from 0 but I’m using a Count already starting with 1. But even so, only returns me [['Wheel', 'Viva'], ['Wheel', 'Viva'], ['Wheel', 'Viva'], ['Wheel', 'Viva'], ['Wheel', 'Viva'], ['Wheel', 'Viva'], ['Wheel', 'Viva'], ['Wheel', 'Hurrah'], ['Wheel', 'Hurrah']]

3 answers

4

To select the correct line of the file just iterate line by line and count on which line you are, so get to the desired line just return the words that are in it.

You can iterate over the line counter along with its contents using the function enumerate:

def le_linha(nome_arquivo, num_linha):
    with open(nome_arquivo, mode="r", encoding="utf8") as file:
        for i, linha in enumerate(file, start=1):
            if i == num_linha:
                return linha.split()

    # caso não ache a linha, retorna lista vazia
    return []


print(le_linha("musica.txt", 10))
# ['Mas', 'eis', 'que', 'chega', 'a', 'roda', 'viva']

print(le_linha("musica.txt", 11))
# ['E', 'carrega', 'o', 'destino', 'pra', 'lá']

print(le_linha("musica.txt", 999))
# []

Code running on Repl.it

As I return the result as soon as I find the line, I do not unnecessarily read the rest of the file and I do not load the entire file in memory, just the line being read.


Remembering that whenever you iterate over a file, it iterates row by row of the file without including the line break characters (see documentation) then it’s memory efficient and fast.

  • Gee whiz @fernandosavio, I find it very interesting how there really are several ways to achieve the same problem with different loads. Thank you very much!

3


Here:

with open(file=nome_arquivo, mode='r', encoding='utf8') as fp:   
    linha = fp.readline()
    count = 1

You only read one line of the file (hence the first). Then, in the while count < numero_linha you simply insert that same line (the first) several times into the list palavras_linha (since the while increment count until it reaches the numero_linha, that is, it rotates several times, inserting the same line).

So actually, if you want the tenth line, skip the first nine lines, and only then read the next:

def extrai_linha_txt(nome_arquivo: str, numero_linha: int):
    try:
        with open(nome_arquivo, encoding='utf8') as fp:
            # pula as linhas até chegar na que eu quero (sempre uma a menos do que o desejado)
            for _ in range(numero_linha - 1):
                next(fp)
            return next(fp).rstrip('\n').split(' ')
    except StopIteration:
        # se o arquivo tem menos linhas, retorna uma lista vazia (ou dá uma mensagem de erro, você decide)
        return []

linha10 = extrai_linha_txt(nome_arquivo='./musica.txt', numero_linha=10)
print(linha10) # ['Mas', 'eis', 'que', 'chega', 'a', 'roda', 'viva']

Like the way default of open is readable ('r'), I omitted, but anyway, if you want to leave, it’s okay.

First I make one for to skip the lines of the file (if I pass 10, it will skip the first 9, note the numero_linha - 1). Since files are iterators, I can use next to get the next line. Then I remove the line break and do the split, already returning the result.

I also put a pad try/except to capture the StopIteration, if you pass a number greater than the number of lines in the file. Remembering that it has not been validated if the number passed is less than or equal to zero, because then it always returns the first line.

To another answer suggests reading the entire file and separating its content into lines, but I find it an exaggeration for this case, after all you only want a single line, so it makes more sense to go up to it, read it and then close (it makes more difference even if the file is too big, because loading everything in memory just to get a single line does not seem a good).


Another way to do it is to use itertools.islice:

from itertools import islice

def extrai_linha_txt(nome_arquivo: str, numero_linha: int):
    try:
        with open(nome_arquivo, encoding='utf8') as fp:
            return next(islice(fp, numero_linha - 1, numero_linha)).rstrip('\n').split(' ')
    except StopIteration:
        # se o arquivo tem menos linhas, retorna uma lista vazia (ou dá uma mensagem de erro, você decide)
        return []

linha10 = extrai_linha_txt(nome_arquivo='./musica.txt', numero_linha=10)
print(linha10) # ['Mas', 'eis', 'que', 'chega', 'a', 'roda', 'viva']

Remembering that in this case, the index count starts at zero (the first line is zero, the second is 1, etc), so I subtracted 1 when calling islice.

  • Face that class above. I am much more advanced in python, but this problem was still not clear to me. I was impressed by your code. Thank you very much!

  • 1

    @Asher Just for the record, if you want only pick up the tenth line and nothing else, use splitlines (as suggested by the other reply) is - in my opinion - an exaggeration. Unless, of course, you need a list of all the lines, to be used again at other points in the code, then it would make sense. But if you only want a line and nothing else, using splitlines vc will be loading the entire file into memory unnecessarily. The code I suggested saves memory, and also does not read the entire file unnecessarily (it only goes to the desired line and then already closes the reading)

  • 1

    Of course for a small file, running a few times, whatever, but for several larger files can make a difference. Anyway, it’s always worth paying attention to these details :-)

  • 1

    As always your answer is excellent and very complete. I added my answer because I found it simpler to understand for those who are starting. :)

  • @fernandosavio Yes, I forgot about the enumerate, I’m glad you remembered :-)

  • Honored to receive so many feedbacks. I really appreciate you guys!

Show 1 more comment

2

An easier way to read the file is to use the structure:

def extrai_linha_txt(nome_arquivo: str, numero_linha: int):
    with open(file=nome_arquivo, mode='r', encoding='utf8') as fp:
     linhas = fp.read().splitlines()
    return linhas[numero_linha].split()

That function splitlines() already returns a list in which each element is a string referring to a line in the file. It already removes the jump-line characters (\n) for you. Documentation here.

With this new definition, you can directly access the desired line and do the split() without the need for while. So your function has the desired return:

['Mas', 'eis', 'que', 'chega', 'a', 'roda', 'viva']

Code:

def extrai_linha_txt(nome_arquivo: str, numero_linha: int):

    with open(file=nome_arquivo, mode='r', encoding='utf8') as fp:
     linhas = fp.read().splitlines()

    return linhas[numero_linha].split()

 # chamada de função com parâmetros da linha selecionada
linha9 = extrai_linha_txt(nome_arquivo='./musica.txt', numero_linha=9)
print(linha9) # deve retornar ['Mas', 'eis', 'que', 'chega', 'a', 'roda', 'viva']

Remembering that as the index starts with 0, the desired line is now 9.

  • 1

    I thank you very much for your dedication in helping this problem. Interesting this splitlines() function. Thank you very much!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.