I want to turn a text into a list of dictionaries

Asked

Viewed 714 times

0

I’m making a bot in python for twitter that posts news about Latin America, at the time of making the system that detects whether a news is new or not, I will save the news that has already been posted in a file (which for now is a .txt) to compare later with news from the next GET. I have this text file:

[{'pais': 'Argentina', 'titulo': 'Santistas veem vitória como resposta a críticos e exaltam Sampaoli', 'link': 'https://esportes.estadao.com.br/noticias/futebol,santistas-veem-vitoria-como-resposta-a-criticos-e-exaltam-sampaoli,70002696776'}, {'pais': 'Bolívia', 'titulo': 'Cobertura de água e esgoto no Brasil é pior que no Iraque', 'link': 'https://economia.estadao.com.br/noticias/geral,cobertura-de-agua-e-esgoto-no-brasil-e-pior-que-no-iraque,70002695633'}, {'pais': 'Brasil', 'titulo': 'Rodada dos Estaduais é marcada por homenagens à tragédia de Brumadinho', 'link': 'https://esportefera.com.br/noticias/futebol,rodada-dos-estaduais-e-marcada-por-homenagens-a-tragedia-de-brumadinho,70002696787'}, {'pais': 'Chile', 'titulo': '‘Não acredito na possibilidade de uma guerra civil na Venezuela’', 'link': 'https://internacional.estadao.com.br/noticias/geral,nao-acredito-na-possibilidade-de-uma-guerra-civil,70002695486'}]

And I want to turn it into a list of dictionaries so I can compare the old links to the new ones. The question is: how to do this change in python?

5 answers

2

One thing you have to keep in mind regarding JSON is that you should use double quotes " instead of single quotes ', see more about JSON. So your JSON has to be fixed.

Of:

[{'pais': 'Argentina', 'titulo': 'Santistas veem vitória como resposta a críticos e exaltam Sampaoli', 'link': 'https://esportes.estadao.com.br/noticias/futebol,santistas-veem-vitoria-como-resposta-a-criticos-e-exaltam-sampaoli,70002696776'}, {'pais': 'Bolívia', 'titulo': 'Cobertura de água e esgoto no Brasil é pior que no Iraque', 'link': 'https://economia.estadao.com.br/noticias/geral,cobertura-de-agua-e-esgoto-no-brasil-e-pior-que-no-iraque,70002695633'}, {'pais': 'Brasil', 'titulo': 'Rodada dos Estaduais é marcada por homenagens à tragédia de Brumadinho', 'link': 'https://esportefera.com.br/noticias/futebol,rodada-dos-estaduais-e-marcada-por-homenagens-a-tragedia-de-brumadinho,70002696787'}, {'pais': 'Chile', 'titulo': '‘Não acredito na possibilidade de uma guerra civil na Venezuela’', 'link': 'https://internacional.estadao.com.br/noticias/geral,nao-acredito-na-possibilidade-de-uma-guerra-civil,70002695486'}]

To:

[{"pais": "Argentina", "titulo": "Santistas veem vitória como resposta a críticos e exaltam Sampaoli", "link": "https://esportes.estadao.com.br/noticias/futebol,santistas-veem-vitoria-como-resposta-a-criticos-e-exaltam-sampaoli,70002696776"}, {"pais": "Bolívia", "titulo": "Cobertura de água e esgoto no Brasil é pior que no Iraque", "link": "https://economia.estadao.com.br/noticias/geral,cobertura-de-agua-e-esgoto-no-brasil-e-pior-que-no-iraque,70002695633"}, {"pais": "Brasil", "titulo": "Rodada dos Estaduais é marcada por homenagens à tragédia de Brumadinho", "link": "https://esportefera.com.br/noticias/futebol,rodada-dos-estaduais-e-marcada-por-homenagens-a-tragedia-de-brumadinho,70002696787"}, {"pais": "Chile", "titulo": "‘Não acredito na possibilidade de uma guerra civil na Venezuela’", "link": "https://internacional.estadao.com.br/noticias/geral,nao-acredito-na-possibilidade-de-uma-guerra-civil,70002695486"}]

'Cause you’re gonna get one error whenever you try to process a JSON code with simple quotes '.

Once you have handled JSON you can read in your file, see:

import json

with open('arq.txt') as f:    
    myjson = json.load(f)    
    print(myjson[0]['pais'])

Exit:

Argentina

Use the command with for manipulate files which is a shorter form to try-finally, and look for save in the extension *.json your files to avoid problems.

1

You saved the Python text representation of your objects in the file. It is not a recommended format, because not always the representation (repr) of an object will be able to reconstruct it.

In the case of the example, where you have only lists, dictionaries and strings (and still could have no problems numbers and 'None'), you can simply evaluate the contents of the file as if it were Python code - this will "compile" this data as if it were in your program and return the resulting Python object. The built-in function eval does that.

In short, to recover data from this specific file as a Python object simply do:

 dados = eval(open("arquivo.txt").read())

To continue developing your system, however, this is not a recommended way. In addition to the problem that some objects are not de-serializable in this way, the use of eval in the literature is not very recommended - although in this case, as the user of the program should not have access to the data that will be processed by eval (without also having access to the source code), there being the implicit security problem of the eval.

So, here come the suggestions that are half in many other answers here - in the next interaction of the program you should serialize this data with a more appropriate medium - not only with arquivo.write(str(dados)). The most popular Python native mechanisms for this serialization are pickle and json. The difference is more or less as follows:

json

The file remains directly readable and editable by people in any text editor. The downside is that data types are restricted, unless you customize the serializer and json deserializer. But it’s enough if you just want lists, dictionaries, string numbers and None.

To serialize an object and read back using JSON:

import json
with open("meu_arquivo.json", "wt") as arquivo:
     json.dump(meu_objeto, arquivo)

# para ler:

meu_objeto = json.load(open("meu_arquivo.pickle"))

pickle

Pickle is interesting because it can put into a file, and de-serialize back any kind of data in which this makes sense (that is, data that does not depend on the state of the program while it is running, such as network connections, open files, etc...). The downside is that the resulting file is an opaque binary, which cannot be edited manually, and can only be extracted back into a Python application. But you can serialize dates, times, sets, instances of your own program’s classes and various other things without worrying about anything in the code.

To serialize an object and read back using Pickle:

import pickle
with open("meu_arquivo.pickle", "wb") as arquivo:
     pickle.dump(meu_objeto, arquivo, -1)

# para ler:

meu_objeto = pickle.load(open("meu_arquivo.pickle", "rb"))

0

Hello, my suggestion is the following: if you only want to save and read the file later, you should use pickle:

>>> import pickle
>>> pickle.dumps({'foo': 'bar'})
b'\x80\x03}q\x00X\x03\x00\x00\x00fooq\x01X\x03\x00\x00\x00barq\x02s.'
>>> pickle.loads(_)
{'foo': 'bar'}

But if you want the file to be saved legibly, then use json:

>>> import json
>>> json.dumps({'foo': 'bar'})
'{"foo": "bar"}'
>>> json.loads(_)
{'foo': 'bar'}

0

Hello,

To solve your problem you would have to develop an algorithm "similar" to the diff, on it you will need to check a dictionary to see if the tweet you have already made is inside it, if yes, you do X, else, Y.

My suggestion is that you use a JSON file to store this data (JSON is nothing more than a structured TXT) and use it as a dictionary within Python.

Inside Python, the iteration in JSON can be done through the method .items() dictionary.

-1

You can use the library json to turn strings into lists, just use the command:

import json
texto = open('arquivo.txt','r').read()
lista = json.loads(texto)
print(lista['pais'])
  • Using the json library simply to turn strings into lists is irrational!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.