Let’s go in pieces. First read the file de_para
, and store each line (except the first one) in a list (using with
to open the file, because it ensures that the file is closed at the end of the block):
de_para = []
with open('de_para.txt', 'r') as de_para_arq:
next(de_para_arq) # pula primeira linha
for linha in de_para_arq:
de_para.append(linha.split())
# ordena a lista de acordo com o campo ordem
de_para = sorted(de_para, key=lambda x: int(x[0]))
I use split
to separate the row by spaces, and the result is a list, where the first element is the order column, the second is the "for" column and the third is the "of" column. Each line in the file will be a list like this, and de_para
will be a list containing all these lists (one for each line in the file).
Then I ordered the list de_para
based on the first column (using int
to convert the string to number), to ensure that the elements will be in the correct order (if this column exists, I am assuming that the lines will not necessarily be in order in the file, otherwise it would not make sense for that column to exist).
Now let’s read the company archive:
empresas = []
with open('empresas.txt', 'r') as empresas_arq:
headers = next(empresas_arq).split() # nomes dos headers
for linha in empresas_arq:
empresa = dict()
for header, valor in zip(headers, linha.split()):
empresa[header] = valor
empresas.append(empresa)
First I do split
in the first row to get the column names.
Then, for each row of the file, I go through both the list of headers
as to the elements of the line (the use of zip
allows you to scroll through both lists at the same time, so I have the value corresponding to the name of each column). This solution assumes that the file is well formed and always has all columns.
At the end I have a list of companies, each company being a dictionary, which maps the names of the columns with the respective values. For example, the first element of the list of companies will be:
{'RAZA': 'empresa1', 'CIDA': 'cidade1', 'ENDE': 'rua1', 'NCEP': 'cep1'}
So I have the name of the current column and the respective value. Each row of the business archive will be a dictionary like this, and the list empresas
will have all these dictionaries (one for each row of the file).
Now just write the third file:
with open('empresas_validas.txt', 'w') as out:
# escreve os headers
out.write(' '.join(para for _, para, _ in de_para))
out.write("\n")
for emp in empresas:
out.write(' '.join(emp[de] for _, _, de in de_para))
out.write("\n")
First I write the headers, using the names that are on the list de_para
. I use join
to join names, separating them by space. To facilitate, use the syntax of comprehensilist on, much more succinct and pythonic.
Then, for each company, I take the value of the column "of" that is on the list de_para
(as the list has been ordered, I guarantee that the fields are written in the desired order). The result will be the file:
nome_emp ende_emp cepe_emp cida_emp
empresa1 rua1 cep1 cidade1
empresa2 rua2 cep2 cidade2
empresa3 rua3 cep3 cidade3
empresa4 rua4 cep4 cidade4
empresa5 rua5 cep5 cidade5
If you want, you can also exchange the loops that read the files for list and Dict comprehensions:
with open('de_para.txt', 'r') as de_para_arq:
next(de_para_arq) # pula primeira linha
de_para = [ linha.split() for linha in de_para_arq ]
# ordena a lista de acordo com o campo ordem
de_para = sorted(de_para, key=lambda x: int(x[0]))
with open('empresas.txt', 'r') as empresas_arq:
headers = next(empresas_arq).split() # nomes dos headers
empresas = [ { header: valor for header, valor in zip(headers, linha.split()) } for linha in empresas_arq ]
with open('empresas_validas.txt', 'w') as out:
# escreve os headers
out.write(' '.join(para for _, para, _ in de_para))
out.write("\n")
out.write("\n".join( ' '.join(emp[de] for _, _, de in de_para) for emp in empresas))
Thank you @hkotsubo, it worked perfectly the way I need it. Congratulations for the dedication and time willing to explain the process in great detail.
– Cleber Nandi
I have a question about this line: out.write(' '.Join(emp[de] for _, _, de in de_para)) Which is in the writing part of the file. I need to validate or treat the "from" field before recording, as I would in this case?
– Cleber Nandi
@It depends on what you want to do. If you want to test a condition, you can do it
out.write(' '.join(emp[de] for _, _, de in de_para if condicao))
(the condition may be any valid expression of aif
, then he will only take the keys that satisfy the condition), or elseout.write(' '.join(emp.get(de, valor_default) for _, _, de in de_para))
- in that case,emp.get(de, valor_default)
returns the default value if the keyde
not exist.– hkotsubo
I tried to figure it out on my own, but I couldn’t. Could help me understand the line: out.write(' '.Join(emp[de] for _, _, de in de_para)) Specifically the "for _, _, de in de_para". I didn’t understand why of these two underscores, the "of" understood, but the underscores didn’t. Another thing I have to do, I have tried and tangled up is to change this line I passed up to create validators before recording the line.
– Cleber Nandi
I tried this but didn’t give: for _, _, Competitor in geempre_list: field = emp.get(Competitor, "NULL") empresa_linha_list.append(field) # out.write(" n".Join(emp.get(Campo_competitor, "NULL") for _, _, Campo_competitor in geempre_list)) out.write(" t".Join(line for line in empresa_linha_list)) out.write(" n") Grateful.
– Cleber Nandi
Each de_para element is a list of 3 elements, then
for a, b, c in de_para
already makes a, b and c the elements of each list. Generally used_
to indicate that that value does not matter and I will only use others (it is a convention of language, could be any name, but it is used_
to indicate that that value will not be used)– hkotsubo
@Clebernandi Anyway, I don’t understand exactly what the problem you’re having. I suggest you do another question with the specific problem. It is that the site works like this: a question for each specific problem. In the case - as far as I understand - it is already something different from the problem reported here (although related, but different). It’s even better another question because there’s more space than in the comments, and you can still format the code (even more in Python, that identation makes all the difference to understand the code)
– hkotsubo
Thanks. I’ll ask another question.
– Cleber Nandi
@Clebernandi Another advantage of asking another question is that it is visible on the main page to all users, and increases their chances of getting an answer. Already here in the comments, fewer people will see... Don’t forget to put all the necessary context (not everyone will know or remember that you’ve asked this question here, ask the other question so that it is "independent" of this).
– hkotsubo
I’ll do it, thank you. Thank you very much.
– Cleber Nandi