How to iterate with Python 3 dictionaries?

Question

How to iterate with Python 3 dictionaries?

Asked 6 years ago

Viewed 270 times

0

I have two files and need to generate a third from them.

First file called "de_para".

de_to.txt

Inside this file I have the following sample content:

Ordem   Campo_para  Campo_de
1   nome_emp    RAZA
2   ende_emp    ENDE
3   cepe_emp    NCEP
4   cida_emp    CIDA

"Order" would be the order in which the data will be printed on the lines in the third file;
"Campo_to" is the name of the column that will be saved in the third file.
"Campo_de" is the name of the column that will be searched for in the keys of the second file called "companies.txt"

The second file "companies.txt" has the following composition:

RAZA    ENDE    NCEP    CIDA 
empresa1    rua1    cep1    cidade1  
empresa2    rua2    cep2    cidade2 
empresa3    rua3    cep3    cidade3 
empresa4    rua4    cep4    cidade4 
empresa5    rua5    cep5    cidade5

How can I iterate this situation?

With this code I would create the list with the columns of the first file:

for dict_col_geempre in read_geempre:
    hash_geempre.append(dict_col_geempre.get("Campo_de"))

I’m not getting the check the following way:

I need to check the column "RAZA" of the second file exists in the hash_geempre, if it exists, take the value of this key in the second file "companies.txt" and so on and then record it in a third file named "empresas_validas.txt".

You could do it differently. Setting a variable for each column of the second file "companies.txt", but thought to do with a file "de_para" to get better think I.

How can I do that?

1 answer

Browser other questions tagged python python-3.x

You are not signed in. Login or sign up in order to post.

by hkotsubo • **55,826** points · Answer 1 · 2019-08-01T01:04:36+00:00

Let’s go in pieces. First read the file de_para, and store each line (except the first one) in a list (using with to open the file, because it ensures that the file is closed at the end of the block):

de_para = []
with open('de_para.txt', 'r') as de_para_arq:
    next(de_para_arq) # pula primeira linha
    for linha in de_para_arq:
        de_para.append(linha.split())
# ordena a lista de acordo com o campo ordem
de_para = sorted(de_para, key=lambda x: int(x[0]))

I use split to separate the row by spaces, and the result is a list, where the first element is the order column, the second is the "for" column and the third is the "of" column. Each line in the file will be a list like this, and de_para will be a list containing all these lists (one for each line in the file).

Then I ordered the list de_para based on the first column (using int to convert the string to number), to ensure that the elements will be in the correct order (if this column exists, I am assuming that the lines will not necessarily be in order in the file, otherwise it would not make sense for that column to exist).

Now let’s read the company archive:

empresas = []
with open('empresas.txt', 'r') as empresas_arq:
    headers = next(empresas_arq).split() # nomes dos headers
    for linha in empresas_arq:
        empresa = dict()
        for header, valor in zip(headers, linha.split()):
            empresa[header] = valor
        empresas.append(empresa)

First I do split in the first row to get the column names.

Then, for each row of the file, I go through both the list of headers as to the elements of the line (the use of zip allows you to scroll through both lists at the same time, so I have the value corresponding to the name of each column). This solution assumes that the file is well formed and always has all columns.

At the end I have a list of companies, each company being a dictionary, which maps the names of the columns with the respective values. For example, the first element of the list of companies will be:

{'RAZA': 'empresa1', 'CIDA': 'cidade1', 'ENDE': 'rua1', 'NCEP': 'cep1'}

So I have the name of the current column and the respective value. Each row of the business archive will be a dictionary like this, and the list empresas will have all these dictionaries (one for each row of the file).

Now just write the third file:

with open('empresas_validas.txt', 'w') as out:
    # escreve os headers
    out.write(' '.join(para for _, para, _ in de_para))
    out.write("\n")
    for emp in empresas:
        out.write(' '.join(emp[de] for _, _, de in de_para))
        out.write("\n")

First I write the headers, using the names that are on the list de_para. I use join to join names, separating them by space. To facilitate, use the syntax of comprehensilist on, much more succinct and pythonic.

Then, for each company, I take the value of the column "of" that is on the list de_para (as the list has been ordered, I guarantee that the fields are written in the desired order). The result will be the file:

nome_emp ende_emp cepe_emp cida_emp 
empresa1 rua1 cep1 cidade1 
empresa2 rua2 cep2 cidade2 
empresa3 rua3 cep3 cidade3 
empresa4 rua4 cep4 cidade4 
empresa5 rua5 cep5 cidade5

If you want, you can also exchange the loops that read the files for list and Dict comprehensions:

with open('de_para.txt', 'r') as de_para_arq:
    next(de_para_arq) # pula primeira linha
    de_para = [ linha.split() for linha in de_para_arq ]
# ordena a lista de acordo com o campo ordem
de_para = sorted(de_para, key=lambda x: int(x[0]))

with open('empresas.txt', 'r') as empresas_arq:
    headers = next(empresas_arq).split() # nomes dos headers
    empresas = [ { header: valor for header, valor in zip(headers, linha.split()) } for linha in empresas_arq ]

with open('empresas_validas.txt', 'w') as out:
    # escreve os headers
    out.write(' '.join(para for _, para, _ in de_para))
    out.write("\n")
    out.write("\n".join( ' '.join(emp[de] for _, _, de in de_para) for emp in empresas))