Permutations and files

Asked

Viewed 152 times

4

I’m working on a game that involves permutations, to create a list of words, in several separate files. Problems start to arise when I require words with more than 4 letters, as the corresponding file becomes too heavy. What I wanted is for every time a given file reached 499 999 (eg.) I opened another file with the same name but with an extension. For example 4 character words "wl4_1.txt, wl4_2.txt ..." where the "wl4_2.txt" is the continuation of "wl4_1.txt", and the first line of this ("wl4_2.txt") would be the 500,000 number line of the "wl4_1.txt", that is, we would close the "wl4_1.txt" and continue our list on "wl4_2.txt".

This code below works beautifully, I just wanted to add this feature I explained. In this example are words of 3 letters and is already heavy (830 584 lines):

import itertools
import string

def main():

    alphabet = string.letters + string.digits + string.punctuation
    alphaLen = len(alphabet)

    print alphabet
    for i in range(3):
        NumToPerm = i+1 #remove 0 from the permutations function
        fileTest = open("word_lists/wl" +str(NumToPerm)+ ".txt", "w")

        perm(fileTest, alphabet, NumToPerm)

def perm(fileTest, alphabet, NumToPerm):

        for p in itertools.product(alphabet, repeat=NumToPerm):

            word = str(p)

            for char in word:
               if char in " (),'":
                  word = word.replace(char,'')

            fileTest.write(word+ '\n')

        fileTest.close()

main()

2 answers

3


First of all, you don’t need to call perm from within get_file_extn: just return to calling function! If you call perm from there, you’ll be starting from scratch, which is not what you want (but continue from where you are).

Just remember to return the new fileTest, because the previous one was closed (if you tried to use it, it would give error). And do not close it twice!

def get_file_extn(fileTest, alphabet, NumToPerm, countExtn):
    #fileTest.close()
    fileTest = open("word_lists/wl" +str(NumToPerm)+ "_" +str(countExtn)+ ".txt", "w")
    return fileTest

def perm(fileTest, alphabet, NumToPerm):
    ...
    if countWords == 999:
        fileTest.close()
        fileTest = get_file_extn(fileTest, alphabet, NumToPerm, countExtn)
        countExtn = countExtn + 1

Secondly, the countWords arrived in 999, but if you don’t reset it to zero, it will pass to 1000 and continue to grow - without ever entering if again! As the next line will already increase it in 1, assign it to zero at the end of if:

    if countWords == 999:
        fileTest.close()
        get_file_extn(fileTest, alphabet, NumToPerm, countExtn)
        countExtn = countExtn + 1
        countWords = 0

    countWords = countWords + 1
    ...

Third, you are assigning countExtn for 1 within the loop for. For every word read, it will be 1! Instead, assign it before the for:

    countExtn = 1
    for p in itertools.product(alphabet, repeat=NumToPerm):

With this you get the separation in files. One last detail: the first file opened, on main, did not use your convention wl_X_Y.txt, but simply wl_X.txt. And he’ll get the first 999 words, while the wl_X_1.txt will receive from the thousandth onwards. It would be preferable that the main create file 1, and the perm has already begun countExtn with 2 (because 1 has already been created):

def main():
    ...
    fileTest = open("word_lists/wl" +str(NumToPerm)+ "_1.txt", "w")

    perm(fileTest, alphabet, NumToPerm)  

def perm(fileTest, alphabet, NumToPerm):
    ...
    countExtn = 2
    for p in itertools.product(alphabet, repeat=NumToPerm):
  • Obgado, but something is going on, I am giving error related to the file. I edited the code on top, in the attempt and put what I realized you told me

  • @Miguel What mistake are you making? From your edition, I saw that only corrected part of the problems, several of them mentioned in the answer are still there (for example, you are returning fileTest of its function, but is not using the return, so when you try to write in the stream already closed it should accuse an error. This is what is happening?).

  • Yes it is. ("...fileTest.write(word+ n') Valueerror: I/O Operation on closed file"). How to implement the return in this case?

0

The proposal I am putting changes the question somewhat:

  • Simplify the script to only generate permutations of the received length via the command line, sending to the stdout.
  • Operating system splits into files (command split).

that is to say:

import itertools
import string
import sys

def main():
    alphabet = string.letters + string.digits + string.punctuation
    perm(alphabet, int(sys.argv[1]))

def perm(alphabet, NumToPerm):
    for p in itertools.product(alphabet, repeat=NumToPerm):
        print  "".join(p)

main()

Method of use:

python x.py 3 | split -d --additional-suffix=.txt  -l 50000 - wl

Browser other questions tagged

You are not signed in. Login or sign up in order to post.