Find word in python txt file

Asked

Viewed 1,736 times

0

I have a program called anagrams.py and its function is to show the permutations of a word typed by the user if it is in the file words.txt.

This is the complete code:

""" Anagrams by WhoisBsa """
from itertools import permutations
import sys


def findPermutation(wrd):
    """ Find the permutation of the words """
    parmutationList = permutations(wrd)
    for item in parmutationList:
        print(''.join(item))


def checkWord(wrd, wordLine):
    """ Checks whether the word exists in the file or not """
    while True:
        for i in wordLine.readlines():
            if wrd in i:
                result = True
                break
            else:   
                result = False
        if result:
            findPermutation(wrd)
            break
        else:
            print('This word is not available')
            break


def main():
    """ Main function """
    with open('words.txt', 'r') as f:
        word = sys.argv[1].upper()
        checkWord(word, f)
        f.close()

if __name__ == '__main__':
    main()

This code runs perfectly with one exception: in function check checkWord the condition

if wrd in i:

look for any word that satisfies the condition, ie if I type the word house and run the code, the intention is that the return (if it is in the file) is all permutations of the word house, however, in the file there is the word blockhouse, and with that the program checks if there is house in the file and when it finds the word Blockhouse it stops and makes the permutations of the word house.

I know the condition if wrd in i will validate whichever word that satisfies the condition. The point is that if I use

if wrd == i:

the program doesn’t work, it goes straight to else.

Another problem is that if I type Hou the program will find Blockhouse and make the anagram of the word Hou and the same does not exist.

I thought I’d use the method regex to validate this but I do not know how to apply this code.

  • It is a little confused what you want, see that house is not anagram (and neither permutation) of Blockhouse and nor vice versa. If the user types House and in the file has Blockhouse and no house, then the word is not in the file, unless you want to search for substrings inside the strings in the file (or maybe both: anagrams and substrings), then the statement (and the code) would be other.

  • @Sidon that’s the problem, the file has no house but has Lockhouse, I want to return that there is no house in the file but this does not happen, and sees that there is the word Blockhouse and then does the permutation of the word house because in the code the permutationList receives the permutations of the variable word: permutationList = permutations(wrd)

  • Let me get this straight: If the user types house, you have to search for house and all the house anagrams in the file, it’s?

  • Tip that has nothing to do with the question: you do not need (nor should, but only for legibility) to call the .close of a file used the with to open it. The with already closes the file.

  • @Sidon if I type house I will search for house and if there is house in the file I do the anagrams of the word house :)

  • @jsbueno thanks for the tip! :)

  • Okay, see if my answer helps.

Show 2 more comments

3 answers

2

The operator in Python has different behaviors if the second operand (which comes after the in) for a str or bytes, and if it’s another kind of sequence.

In case you do a in b and both are strings, if a exist in any position within b, he returns True - even though "a" has more than one item length. so "house" in "housekeeper" is True. If, however, b is another type of sequence, such as a list, only if a copy of a is in sequence the result is True: "house" in ["housekeeper",] - returns False: here "housekeeper" is an element within the list being "investigated" by in.

Therefore, you solve your problem easily if, instead of searching directly within the read text of each line of the file, you convert each line to a list of entire words before.

You do not give example of your file . txt, but assuming it has no punctuation beyond whitespace and line breaks, just change the line

for i in wordLine.readlines():
     if wrd in i:
         ...

For:

 for line in wordline: 

      words = line.strip().split()
      if wrd in words:
          ...

Making the changes coexist:

  • i process the line by removing newline blanks from the end of the line calling the .strip(). (At that point, if there is a score, you could remove it, then yes using regular expressions to remove the seats)
  • i divide the line into a list of words where there are blank spaces, with the "split" method. A line with "housekeeper Heart Mother" stands ['hosekepper', 'heart', 'mother']

And without being essential to the problem:

  • i removed the call to method readlines: Python does not need readlines to iterate lines in a file for - this method practically exists for historical reasons
  • Do not use variable i for the variable of for: i is widely used in most other languages that do not have a practical "for each" that traverses the desired sequence directly - in these languages it is common to have an "i" variable that will serve as an "index" and be discarded right away. the for from Python we already get directly the element that interests (in this case, the lines of the file), so it is better to give a name that makes sense to variable.

In addition there are other points of your program that need attention: you have a while True that is not used for anything, for example (inside it has a if that has a break in the if and in the else) - but it has nothing to do with your question.

  • 1

    Nice reply @jsbueno. + 1 .

  • @The @jsbueno response was great, but in my view, yours was a little lighter, and as I’m studying the module re, I decided to join your solution. Anyway, thank you @jsbueno! :)

1


A solution, as indicated in your question, is to use regular expressions. Here is an example, changing only its function checkWord:

def checkWord(wrd, wordLine):
    """ Checks whether the word exists in the file or not """
    for line in wordLine:
        if re.search(r'\b' + re.escape(wrd) + r'\b', line, flags=re.IGNORECASE):
            findPermutation(wrd)
            return

    print('This word is not available')

Only when the word exact is found, and not a derived word (by adding a suffix or prefix), the permutations will be generated.

Depending on the file size, it may be more efficient to read its contents at once and perform only one search.

def checkWord(wrd, wordLine):
    """ Checks whether the word exists in the file or not """
    if re.search(r'\b' + re.escape(wrd) + r'\b', wordLine.read(), flags=re.IGNORECASE):
        findPermutation(wrd)
    else:   
        print('This word is not available')

Don’t forget to import the module re

The \b is a kind of anchor. When added to one side of the regex will capture the specified pattern at the beginning, end or exact word. In this case, we want the exact word, so we add the \b at the beginning and end of the word to search.

Of the Wikiedia:

\b House the separation of words, which also includes the beginning ( ) and the end ($) of the character string tested. The character definition which form words varies according to implementation, but is safe take at least [a-za-Z0-9_]. If supported, the w shortcut is a valid alternative. Java is a notable exception in that supports b but not w. Note that although similar to the limits of words defined by POSIX, this escape sequence does not distinguish the beginning and the end of the word, only the separation itself.

  • For speed reasons, I chose the @Bruno solution for being faster. Thank you to everyone who cooperated! :)

0

According to the comments, what you need to do is just look for the word typed in the text and, if it exists, create the anagrams, for this I created an example simulating the reading of a text in a file:

import io
from itertools import permutations

txt = '''
Este texto contem house e uma anagrama esouh também casa menino
boy asac boi e mala mas pode-se ver sam e homeoffice assim como
boi e mala etc.
'''

text = io.StringIO(txt).read().split()

palavra = input ('digite a palavra:')

permutacoes = None
if palavra in text:
    permutacoes = permutations(palavra)

if permutacoes is not None:
    anagramas = []
    for item in permutacoes:
      anagramas.append(''.join(item)) 
    print(anagramas)      
else:
    print('Não encontrado')    

Output (whereas the user has typed 'house'):

['house', 'houes', 'hosue', 'hoseu', 'hoeus', 'hoesu', 'huose', 'huoes', 
'husoe', 'huseo', 'hueos', 'hueso', 'hsoue', 'hsoeu', 'hsuoe', 'hsueo', 'hseou',
 'hseuo', 'heous', 'heosu', 'heuos', 'heuso', 'hesou', 'hesuo', 'ohuse', 'ohues', 
'ohsue', 'ohseu', 'oheus', 'ohesu', 'ouhse', 'ouhes', 'oushe', 'ouseh', 'ouehs',
 'ouesh', 'oshue', 'osheu', 'osuhe', 'osueh', 'osehu', 'oseuh', 'oehus', 'oehsu',
 'oeuhs', 'oeush', 'oeshu', 'oesuh', 'uhose', 'uhoes', 'uhsoe', 'uhseo', 'uheos',
 'uheso', 'uohse', 'uohes', 'uoshe', 'uoseh', 'uoehs', 'uoesh', 'ushoe', 'usheo',
 'usohe', 'usoeh', 'useho', 'useoh', 'uehos', 'uehso', 'ueohs', 'ueosh', 'uesho',
 'uesoh', 'shoue', 'shoeu', 'shuoe', 'shueo', 'sheou', 'sheuo', 'sohue', 'soheu',
 'souhe', 'soueh', 'soehu', 'soeuh', 'suhoe', 'suheo', 'suohe', 'suoeh', 'sueho',
 'sueoh', 'sehou', 'sehuo', 'seohu', 'seouh', 'seuho', 'seuoh', 'ehous', 'ehosu',
 'ehuos', 'ehuso', 'ehsou', 'ehsuo', 'eohus', 'eohsu', 'eouhs', 'eoush', 'eoshu',
 'eosuh', 'euhos', 'euhso', 'euohs', 'euosh', 'eusho', 'eusoh', 'eshou', 'eshuo',
 'esohu', 'esouh', 'esuho', 'esuoh']

Obs.:
Of course depending on the contents of your file you need to do a job of "sanitizing" tb score will have to be sanitized.

See working on repl.it

Browser other questions tagged

You are not signed in. Login or sign up in order to post.