How to iterate with Python 3 dictionaries?

Asked

Viewed 270 times

0

I have two files and need to generate a third from them.

First file called "de_para".

  • de_to.txt

Inside this file I have the following sample content:

Ordem   Campo_para  Campo_de
1   nome_emp    RAZA
2   ende_emp    ENDE
3   cepe_emp    NCEP
4   cida_emp    CIDA
  • "Order" would be the order in which the data will be printed on the lines in the third file;
  • "Campo_to" is the name of the column that will be saved in the third file.
  • "Campo_de" is the name of the column that will be searched for in the keys of the second file called "companies.txt"

The second file "companies.txt" has the following composition:

RAZA    ENDE    NCEP    CIDA 
empresa1    rua1    cep1    cidade1  
empresa2    rua2    cep2    cidade2 
empresa3    rua3    cep3    cidade3 
empresa4    rua4    cep4    cidade4 
empresa5    rua5    cep5    cidade5

How can I iterate this situation?

With this code I would create the list with the columns of the first file:

for dict_col_geempre in read_geempre:
    hash_geempre.append(dict_col_geempre.get("Campo_de"))

I’m not getting the check the following way:

I need to check the column "RAZA" of the second file exists in the hash_geempre, if it exists, take the value of this key in the second file "companies.txt" and so on and then record it in a third file named "empresas_validas.txt".

You could do it differently. Setting a variable for each column of the second file "companies.txt", but thought to do with a file "de_para" to get better think I.

How can I do that?

1 answer

1


Let’s go in pieces. First read the file de_para, and store each line (except the first one) in a list (using with to open the file, because it ensures that the file is closed at the end of the block):

de_para = []
with open('de_para.txt', 'r') as de_para_arq:
    next(de_para_arq) # pula primeira linha
    for linha in de_para_arq:
        de_para.append(linha.split())
# ordena a lista de acordo com o campo ordem
de_para = sorted(de_para, key=lambda x: int(x[0]))

I use split to separate the row by spaces, and the result is a list, where the first element is the order column, the second is the "for" column and the third is the "of" column. Each line in the file will be a list like this, and de_para will be a list containing all these lists (one for each line in the file).

Then I ordered the list de_para based on the first column (using int to convert the string to number), to ensure that the elements will be in the correct order (if this column exists, I am assuming that the lines will not necessarily be in order in the file, otherwise it would not make sense for that column to exist).


Now let’s read the company archive:

empresas = []
with open('empresas.txt', 'r') as empresas_arq:
    headers = next(empresas_arq).split() # nomes dos headers
    for linha in empresas_arq:
        empresa = dict()
        for header, valor in zip(headers, linha.split()):
            empresa[header] = valor
        empresas.append(empresa)

First I do split in the first row to get the column names.

Then, for each row of the file, I go through both the list of headers as to the elements of the line (the use of zip allows you to scroll through both lists at the same time, so I have the value corresponding to the name of each column). This solution assumes that the file is well formed and always has all columns.

At the end I have a list of companies, each company being a dictionary, which maps the names of the columns with the respective values. For example, the first element of the list of companies will be:

{'RAZA': 'empresa1', 'CIDA': 'cidade1', 'ENDE': 'rua1', 'NCEP': 'cep1'}

So I have the name of the current column and the respective value. Each row of the business archive will be a dictionary like this, and the list empresas will have all these dictionaries (one for each row of the file).


Now just write the third file:

with open('empresas_validas.txt', 'w') as out:
    # escreve os headers
    out.write(' '.join(para for _, para, _ in de_para))
    out.write("\n")
    for emp in empresas:
        out.write(' '.join(emp[de] for _, _, de in de_para))
        out.write("\n")

First I write the headers, using the names that are on the list de_para. I use join to join names, separating them by space. To facilitate, use the syntax of comprehensilist on, much more succinct and pythonic.

Then, for each company, I take the value of the column "of" that is on the list de_para (as the list has been ordered, I guarantee that the fields are written in the desired order). The result will be the file:

nome_emp ende_emp cepe_emp cida_emp 
empresa1 rua1 cep1 cidade1 
empresa2 rua2 cep2 cidade2 
empresa3 rua3 cep3 cidade3 
empresa4 rua4 cep4 cidade4 
empresa5 rua5 cep5 cidade5 

If you want, you can also exchange the loops that read the files for list and Dict comprehensions:

with open('de_para.txt', 'r') as de_para_arq:
    next(de_para_arq) # pula primeira linha
    de_para = [ linha.split() for linha in de_para_arq ]
# ordena a lista de acordo com o campo ordem
de_para = sorted(de_para, key=lambda x: int(x[0]))

with open('empresas.txt', 'r') as empresas_arq:
    headers = next(empresas_arq).split() # nomes dos headers
    empresas = [ { header: valor for header, valor in zip(headers, linha.split()) } for linha in empresas_arq ]

with open('empresas_validas.txt', 'w') as out:
    # escreve os headers
    out.write(' '.join(para for _, para, _ in de_para))
    out.write("\n")
    out.write("\n".join( ' '.join(emp[de] for _, _, de in de_para) for emp in empresas))
  • 1

    Thank you @hkotsubo, it worked perfectly the way I need it. Congratulations for the dedication and time willing to explain the process in great detail.

  • I have a question about this line: out.write(' '.Join(emp[de] for _, _, de in de_para)) Which is in the writing part of the file. I need to validate or treat the "from" field before recording, as I would in this case?

  • 1

    @It depends on what you want to do. If you want to test a condition, you can do it out.write(' '.join(emp[de] for _, _, de in de_para if condicao)) (the condition may be any valid expression of a if, then he will only take the keys that satisfy the condition), or else out.write(' '.join(emp.get(de, valor_default) for _, _, de in de_para)) - in that case, emp.get(de, valor_default) returns the default value if the key de not exist.

  • I tried to figure it out on my own, but I couldn’t. Could help me understand the line: out.write(' '.Join(emp[de] for _, _, de in de_para)) Specifically the "for _, _, de in de_para". I didn’t understand why of these two underscores, the "of" understood, but the underscores didn’t. Another thing I have to do, I have tried and tangled up is to change this line I passed up to create validators before recording the line.

  • I tried this but didn’t give: for _, _, Competitor in geempre_list: field = emp.get(Competitor, "NULL") empresa_linha_list.append(field) # out.write(" n".Join(emp.get(Campo_competitor, "NULL") for _, _, Campo_competitor in geempre_list)) out.write(" t".Join(line for line in empresa_linha_list)) out.write(" n") Grateful.

  • 1

    Each de_para element is a list of 3 elements, then for a, b, c in de_para already makes a, b and c the elements of each list. Generally used _ to indicate that that value does not matter and I will only use others (it is a convention of language, could be any name, but it is used _ to indicate that that value will not be used)

  • @Clebernandi Anyway, I don’t understand exactly what the problem you’re having. I suggest you do another question with the specific problem. It is that the site works like this: a question for each specific problem. In the case - as far as I understand - it is already something different from the problem reported here (although related, but different). It’s even better another question because there’s more space than in the comments, and you can still format the code (even more in Python, that identation makes all the difference to understand the code)

  • Thanks. I’ll ask another question.

  • @Clebernandi Another advantage of asking another question is that it is visible on the main page to all users, and increases their chances of getting an answer. Already here in the comments, fewer people will see... Don’t forget to put all the necessary context (not everyone will know or remember that you’ve asked this question here, ask the other question so that it is "independent" of this).

  • I’ll do it, thank you. Thank you very much.

Show 5 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.