How to read a text file and generate a dictionary?

Asked

Viewed 344 times

2

Write the function símbolo() who accepts a string (the name of a file: Nasdaq.txt) as input.

The archive will have company names and stock symbols. In this file, a company name will occupy a line and its action symbol will be on the next line. After this line will be a line with another company name and so on. Your program will read the file and store the name and symbol of the action in a dictionary.

file (part of it):

ACTIVISION INC      
ATVI
ADOBE SYS INC
ADBE
ALTERA CORP     
ALTR
AMAZON  
AMZN
AMERICAN POWER CONVER CORP  
APCC
AMGEN   
AMGN
APOLLO GROUP-A  
APOL

full file: https://easyupload.io/x6v37a

What I did:

def simbolo(arquivo):
    empresas = {}
    with open("nasdaq.txt") as f:
     texto = f.read()
     for i in range(len(texto.split("\n")) - 1) :
        empresas.setdefault(texto.split("\n")[i],texto.split("\n")[i+1])
    return empresas


empresas = simbolo("nasdaq.txt")
print(empresas)

The problem is that the dictionary is getting wrong with several " t":

{'ACTIVISION INC  \t': 'ATVI', 'ATVI': 'ADOBE SYS INC', 'ADOBE SYS INC': 'ADBE', 'ADBE': 'ALTERA CORP \t', 'ALTERA CORP \t': 'ALTR', 'ALTR': 'AMAZON \t', 'AMAZON \t': 'AMZN', 'AMZN': 'AMERICAN POWER CONVER CORP \t', 'AMERICAN POWER CONVER CORP \t': 'APCC', 'APCC': 'AMGEN \t', 'AMGEN \t': 'AMGN', 'AMGN': 'APOLLO GROUP-A \t', 'APOLLO GROUP-A \t': 'APOL', 'APOL': 'APPLE}

Any ideas to fix it? What I’m doing wrong?

3 answers

1


Beyond the character \t, there are excess empty spaces in the txt file. A combination of string methods can be used strip() and replace(). Before displaying the code, I’ll give you a brief example of how this code works. Suppose you have the following string:

sentence = 'Todos os animais são iguais, \tmas uns são mais que os outros     '

As remembered by another user, the computer interprets \tas if you had keyboard key tab and thus considers \tas 4 spaces. To see this, just see the formatted version of string using the print:

 print(sentence)

Output:

Todos os animais são iguais,     mas uns são mais que os outros

Note that in addition to the character \t, there is an excess of spaces at the end of the string that were created using the same space key. To delete these created spaces without the tab, we can use the method strip(). Behold:

sentence = sentence.strip()
print(sentence)

Output:

Todos os animais são iguais,     mas uns são mais que os outros

To remove the character \t, we can use the replace command:

sentence = sentence.replace('\t','')
print(sentence)

Output:

'Todos os animais são iguais, mas uns são mais que os outros'

Naturally, as the output of the method strip() is also a string, we can apply the methods in a chain:

sentence.strip().replace('\t', '')

Applying these concepts to their function, we have:

def simbolo(arquivo):
    empresas = {}
    with open("nasdaq.txt") as f:
     texto = f.read()
     for i in range(len(texto.split("\n")) - 1) :
        empresas.setdefault(texto.split("\n")[i].strip().replace('\t',''),texto.split("\n")[i+1].strip().replace('\t',''))
    return empresas


empresas = simbolo("nasdaq.txt")
print(empresas)
  • @Luke: The text file is in https://easyupload.io/x6v37a

  • Ok. I checked here. That’s right. I’ll edit my reply to include the removal of ' t'

  • @Lucas: could you please explain in more detail?

1

In Python the escape sequence \t means TAB. To replace the characters TAB by space use:

string.replace("\t", " ")

In your code:

def simbolo(arquivo):
    empresas = {}
    with open("nasdaq.txt") as f:
     texto = f.read().replace("\t", " ") # terminada a leitura faz as substituições
     for i in range(len(texto.split("\n")) - 1) :
        empresas.setdefault(texto.split("\n")[i],texto.split("\n")[i+1])
    return empresas


empresas = simbolo("nasdaq.txt")
print(empresas)

See the code working on Repl.it

1

def simbolo(arquivo):
    empresas = {}
    with open("nasdaq.txt") as f:
     texto = f.read()
    #print(texto)

    caracteres = ["\t"]
    for i in caracteres:
        texto = texto.replace(i, " ")
    texto = texto.split("\n")
    texto_novo = [i.strip() for i in texto if len(i)>0]
    #print(texto_novo)
    i = 0
    while i < len(texto_novo) -1:
        empresas[texto_novo[i]] = texto_novo[i+1]
        i+=2
    return empresas





empresas = simbolo("nasdaq.txt")
print(empresas)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.