Program to check number of words and a sentence of a text file

Asked

Viewed 111 times

1

text file used: https://easyupload.io/j5agtq The method is part of a class called Arqtext

Objective: The method does not use input and returns, in a tuple object, (1) the average number of words per sentence in the file, (2) the number of words in the sentence with the most words and (3) the number of words in the sentence with the least words. You may consider the symbols that delimit a sentence to be '!?.'.

What I did:

def media(self):
    import string
    marcadores = ["?", ".", "!"]
    cont = 0
    total_pal = 0
    num_frases = 0
    qtdade_palavras = []
    frase = ""
    with open(arq) as f:
        texto = f.read()
        # print(texto)
        for char in texto:
            frase += char

            if char in marcadores:
                # print(frase)
                # print(len(frase.split()))
                for pal in frase.split():
                    for i in string.punctuation:
                        if i in pal:
                            pal = pal.replace(i, "")
                    pal = pal.lower()
                    cont += 1

                # print(cont)
                total_pal += cont
                num_frases += 1
                frase = ""
                qtdade_palavras.append(cont)
                cont = 0
                continue
    print(f"O numero médio de palavras por sentença é {total_pal / num_frases}")
    print(f"O número de palavras na sentença com mais palavras é {max(qtdade_palavras)}")
    print(f"O número de palavras na sentença com menos palavras é {min(qtdade_palavras)}")

Out of my job:

O numero médio de palavras por sentença é 25.488888888888887
O número de palavras na sentença com mais palavras é 124
O número de palavras na sentença com menos palavras é 2

I observed that actually the sentence with fewer words has only 1 word and not TWO:

`Prophet!

What am I missing?

full source code: https://www.pastiebin.com/5df6940dbc109

  • Unable to find any errors, I tried to reproduce the code and everything went well with the expected results, the problem may be in the text you are opening, if you can put all the file that is opened in with open(arq) as f:, other than that, only a few lines that could be removed that make no difference unless you do something in another part of the code

  • @Guilherme França de Oliveira: I will edit and put the full code link

  • @Guilherme França de Oliveira full code: https://www.pastiebin.com/5df6940dbc109

  • I could make the file available raven.txt? that maybe the solution to the problem is in it, that the code itself is running well, playing it here using any file getting the correct output, there are some things that you have to fix in your code, however, I will post here when I have the complete solution

  • @Guilherme França de Oliveira Is in the question: https://easyupload.io/j5agtq

1 answer

1

In your code you are collecting the sentence up to a specific punctuation and checking how many words there are in it, in case your file where the word is Prophet! this a little earlier

Quoth the raven, `Nevermore.'

`Prophet!' said I, `thing of evil! - prophet still, if bird or devil! -

he ends the previous sentence in . and begins a new in ', leaving the phrase like this ' 'Prophet! where it counts as the word parentheses, what you can do is add in the code that line

frase = frase.replace('`', '').replace('"', '').replace("'", '')

leaving so

def media(self):
    import string
    marcadores = ["?", ".", "!"]
    cont = 0
    total_pal = 0
    num_frases = 0
    qtdade_palavras = []
    frase = ""
    with open(self.arq) as f:
        texto = f.read()
        # print(texto)
        for char in texto:
            frase += char

            if char in marcadores:
                print(frase)
                # print(len(frase.split()))
                frase = frase.replace('`', '').replace('"', '').replace("'", '')
                for pal in frase.split():
                    for i in string.punctuation:
                        if i in pal:
                            pal = pal.replace(i, "")
                    pal = pal.lower()
                    cont += 1

                # print(cont)
                total_pal += cont
                num_frases += 1
                frase = ""
                qtdade_palavras.append(cont)
                cont = 0
                continue
    print(f"O numero médio de palavras por sentença é {total_pal / num_frases}")
    print(f"O número de palavras na sentença com mais palavras é {max(qtdade_palavras)}")
    print(f"O número de palavras na sentença com menos palavras é {min(qtdade_palavras)}")

Another thing you should change in your code is when you open the files, in your __ini__ you give a parameter to your class that would be the file name and instance already providing the file, but when you call the method it does not use the file __init__ and yes of its global variable, to fix it use 'self'

with open(self.arq) as f:

that if you change the global variable, it does not change the value of the instantiated object

out a few lines that I would say are useless in the middle of your code that adds nothing unless you have plans for them

for example the continue at the end of for and as soon as it ends should already start automatically, in case you should use case in the middle of the for the paths go to different places

I made some modifications in your code to make it more organized, but I haven’t removed the lines that I find useless, follow the link https://repl.it/repls/OverdueNavyblueNetbsd


Edit

another way to solve would be using the if

def media(self):
    import string
    marcadores = ["?", ".", "!"]
    cont = 0
    total_pal = 0
    num_frases = 0
    qtdade_palavras = []
    frase = ""
    with open(self.arq) as f:
        texto = f.read()
        # print(texto)
        for char in texto:
            if char not in "`'" + '"-':
                frase += char

                if char in marcadores:
                    print(frase)
                    # print(len(frase.split()))
                    for pal in frase.split():
                        for i in string.punctuation:
                            if i in pal:
                                pal = pal.replace(i, "")
                        pal = pal.lower()
                        cont += 1

                    # print(cont)
                    total_pal += cont
                    num_frases += 1
                    frase = ""
                    qtdade_palavras.append(cont)
                    cont = 0
                    continue
    print(f"O numero médio de palavras por sentença é {total_pal / num_frases}")
    print(f"O número de palavras na sentença com mais palavras é {max(qtdade_palavras)}")
    print(f"O número de palavras na sentença com menos palavras é {min(qtdade_palavras)}")

Edit 2

I was a little out of things to do for now and I decided to show you a resolution a little shorter so that from a glance and try to take advantage, I hope it helps

def media(self):
    with open(self.arq) as f:
        import re
        texto = f.read()
        frases = re.split(r'[!?.]\W+\b', texto.replace('\n', ' '))
        palavras = [len(frase.replace('-', '').split()) for frase in frases]

    print(f"O numero médio de palavras por sentença é {sum(palavras) / len(frases)}")
    print(f"O número de palavras na sentença com mais palavras é {max(palavras)}")
    print(f"O número de palavras na sentença com menos palavras é {min(palavras)}")
  • can imagine a simpler way to solve than the one I tried?

  • Another thing I forgot to mention is that he’s also telling - as palavram, being necessary to remove it too, I thought a different way to resolve also, I will post above

  • 1

    I added a shortened version at the end, take a look and see what you think and good luck!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.