Printing specific lines from a text file

Asked

Viewed 811 times

2

Imagine that I have a text file called test.txt with the following content:

4 Março 2017- Sábado

    meu aniversario

    -prova de calculo



    6 Março 2017- Segunda

    aniversario do Salomao

    - fazer compras




    8 Março 2017- Quarta

    feriado

    -acordar tarde

The goal is to check if a specific string, such as "6 March 2017" is in the file (its first occurrence, if there is more than one) and then print "everything" before the next date.

For example, if I want to check if "6 March 2017" is in the file, I should print:

 6 Março 2017- Segunda

    aniversario do Salomao

    - fazer compras

I did the following:

f = open("teste.txt",'r')
search = "6 Março 2017"
for  line in f:
    if search in line:

        print(line)
        break

But I only print:

6 Março 2017- Segunda

How can I make the program also print the other lines that interest me?

I tried to implement a simpler logic and it didn’t work:

after finding the date, with a simple :

if "6 March 2017" in f.readlines():

Then I’d like to make one while that went through the other lines, after all I have already reached the first line that I desire, which is the one that contains " 6 March 2017- Monday" and print everything until BEFORE finding another date:

6 March 2017- Monday

aniversario do Salomao

- fazer compras

It stops printing because the next line after the blank spaces starts with 8.

Someone could show this code?

  • I didn’t understand the issue or the comment in my reply. The code of the TWO answers you have do just that. This "simpler" logic is wrong.

  • @Luiz Vieira: Sorry, I wanted to understand the mistake in what I thought to do! The answers are perfect

  • Well, there are several mistakes. The main thing is that "6 Março 2017" in f.readlines() only returns true if the text exists on the list of lines. It doesn’t indicate what line the text is on, so you can’t do anything other than know if it exists.

2 answers

6

Logic

The implemented logic is simple:

  1. The contents of the file are read and stored in content
  2. Stores in date the desired date
  3. With regular expressions, search the file for all dates in the format, storing in dates:
    • Date should start with one or two digits
    • Followed by a blank space
    • Followed by any character (to match characters non-ASCII) countless times
    • Followed by a blank space
    • Sequence of four digits
  4. Checks whether the desired date exists in the file
    • If not, fire an exception with an error message
  5. Locate the desired date in the file and store the position in start
  6. Check the index of the desired date in the date list dates
  7. Checks if the desired date is not the last present in the file
    • If it is, it defines end as the position of the last character of the file contents
  8. Find the next date in the file by accessing the index index+1 of dates, storing in next_date
  9. Locate the next date in the file and store the position in end
  10. Displays the contents of the file between positions start and end

Code

Blank spaces have been removed from the content given in the question to improve the presentation of the answer, but it is not necessary for the code to work.

# -*- coding: utf-8 -*-

import re

content = """4 Março 2017- Sábado
meu aniversario
-prova de calculo

6 Março 2017- Segunda
aniversario do Salomao
- fazer compras

8 Março 2017- Quarta
feriado
-acordar tarde"""

# Data desejada:
date = "8 Março 2017"

# Localizando todas as datas no arquivo:
dates = re.findall(r"[0-9]{1,2}\s.+\s[0-9]{4}", content)

# Verifica se a data existe no arquivo:
if date not in dates:
    raise Exception("Data não definida")

# Localiza a data desejada no arquivo:
start = content.find(date)

# Verifica o índice da data na lista de datas:
index = dates.index(date)

# Verifica se não é a última data da lista:
if index < len(dates) - 1:

    # Verifica qual é a data posterior à desejada:
    next_date = dates[index + 1]

    # Localiza a próxima data no arquivo:
    end = content.find(next_date)

else:

    # É a última data da lista, então exibe até o final do arquivo:
    end = len(content)

# Exibe o conteúdo:
print(content[start:end])

Exits

To date = "4 Março 2017", the output is generated:

4 Março 2017- Sábado
meu aniversario
-prova de calculo

To date = "6 Março 2017", the output is generated:

6 Março 2017- Segunda
aniversario do Salomao
- fazer compras

To date = "8 Março 2017", the output is generated:

8 Março 2017- Quarta
feriado
-acordar tarde

To date = "10 Março 2017", the output is generated:

Traceback (most recent call last):
  File "python", line 25, in <module>
Exception: Data não definida

See the code working on Repl.it or in the Ideone.

5


I don’t know what your problem is (would it be a college exercise?), but if you’re trying to build an agenda or something like that, I would suggest using a more appropriate and easy-to-manipulate storage format.

If it’s something more amateur, using even text file storage, I would suggest using a JSON, one XML or a YAML. Everyone has ready packages in Python. If it’s something more professional, maybe it’s better use a database (Mysql, for example, which also has ready-made packages in Python).

Anyway, there are several options to do what you want. To make it easy, I suggest using the package datetime to identify the dates. But for this you need to set the location in Portuguese before and, very important, use the name of the day of the week correctly ("Monday", instead of "Monday").

The following code reads line by line, testing each line to see if it finds a date (uses the function datetime.strptime, which makes an exception if it is not a valid date - when I consider the content of the date previously recognized, stored in the variable date). If it is a date, it opens a new "key" in the dictionary info based on that date. If it is not, it considers as a content of that entry in your agenda, and simply accumulates it in the current key (by doing info[date] += line + '\n').

Note that "logic" is essentially:

  1. Read a line if you haven’t reached the end of the file.
  2. Checks if it is a date.
  3. If it is a date, open a new "record" for it, and go back to step 1.
  4. If it is not a date, add the line as content in the current record. Back to step 1.

You can implement this logic anyway, and the cat jump is precisely in step 2 (check if it is a date). This code only tries to facilitate this identification using the packages locale and datetime. But nothing stops you from using regular expressions or even manual comparison.

Here’s the code:

import sys
import locale
from datetime import datetime

# Define a localização para Português do Brasil
locale.setlocale(locale.LC_ALL, 'ptg_bra') # No Windows!
# Em outro OS provavelmente será:
# locale.setlocale(locale.LC_ALL, 'pt_BR')

date = ''
info = {}
with open('teste.txt', 'r') as f:
    for line in f.readlines():

        line = line.strip('\n ') # Remove quebras de linhas e espaços

        # Tenta converter a linha atual para uma data (no formato esperado!)
        # Se sucesso, abre uma nova "chave" de conteúdo
        try:
            key = datetime.strptime(line, '%d %B %Y- %A')
            date = key
            info[date] = ''

        # Se falhou, o conteúdo pertence à chave atual (se há uma)
        except ValueError:
            if date != '':
                info[date] += line + '\n'


date = input('Digite a data para consulta:')
try:
    date = datetime.strptime(date, '%d %B %Y- %A')
except ValueError:
    print('O valor [{}] não é uma data válida.'.format(date))
    sys.exit(-1)

print(info[date])

Remembering that the entrance has to be (with "Monday****" instead of just "Monday"):

4 Março 2017- Sábado

    meu aniversario

    -prova de calculo



    6 Março 2017- Segunda-feira

    aniversario do Salomao

    - fazer compras

[. . .]

The exit code is this:

>teste
Digite a data para consulta:6 Março 2017- Segunda-feira

aniversario do Salomao

- fazer compras

P.S.: Note that the date format is set as dia Mês ano- Dia_da_semana based on the format %d %B %Y- %A. If you need change the format (either by adding or removing a space!), you need to change the format! The list of formats can be consulted in the documentation or in this quick guide.

  • In fact, it is a hypothetical exercise to learn the concepts of file manipulation!

  • :I tried to implement a simpler logic and it didn’t work: after finding the date, with a simple : if "6 March 2017" in f.readlines(): After all I have already reached the first line I wish, which is the one that contains " 6 March 2017- Monday" and printed everything up to BEFORE finding another date: 6 March 2017- Second birthday of Salomao - shopping #####It stops printing, because the next line after the white spaces starts with 8.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.