Back position in data JSON format or list with Python

Asked

Viewed 1,960 times

1

I am working with data similar to the structure below:

{"Id":1,
"Data_inscricao":"2017-01-01",
"Texto":"Loremipsum",
"Numeracao":26,
"Tempo":"25s"}, 
{"Id":3,
"Data_inscricao":"2010-05-02",
"Texto":"LoremipsumLorem",
"Numeracao":656,
"Tempo":9},....

I have in hand the die "656" which refers to the "Numeration". I need to get back 2 position of my . get("Numbering") to pick up the die "2010-05-02", ie use the . get("Date Registration") but with reference to "Number":656

How do I do this in JSON or list format variable? Current code below:

numeracao = '656'

#A URL é privada, não vou conseguir mostrar o conteúdo
html = urlopen("https://www.teste.com.br")

#Retornando um volume muito grande de dados, não são apenas 2 blocos de registros.
bsObj = BeautifulSoup(html)

informacoes = bsObj.findAll(id="Resultados")
print(informacoes)

    #Resultado do print() - BEGIN

    [<input id=&quot;Resultados&quot; type=&quot;hidden&quot; value=&quot;{

    &quot;result&quot;:true,&quot;message&quot;:&quot;ok&quot;,&quot;Contador&quot;:2282,&quot;Dados&quot;:
    [
    {&quot;Id&quot;:1,
    &quot;Data_inscricao&quot;:&quot;2017-01-01&quot;,
    &quot;Texto&quot;:&quot;Loremipsum&quot;,
    &quot;Numeracao&quot;:26,
    &quot;Tempo&quot;:&quot;25s&quot;}, 
    {&quot;Id&quot;:3,
    &quot;Data_inscricao&quot;:&quot;2010-05-02&quot;,
    &quot;Texto&quot;:&quot;LoremipsumLorem&quot;,
    &quot;Numeracao&quot;:656,
    &quot;Tempo&quot;:9}
    ]

    }&quot;/>]
    #Resultado do print() - END

informacoes = informacoes.replace('&quot;', '\"')
print(type(informacoes))

    #Resultado do print() - BEGIN
    <class 'str'>
    #Resultado do print() - END

print(informacoes)

    #Resultado do print() - BEGIN

    [<input id="Resultados" type="hidden" value="{

    "result":true,"message":"ok","Contador":2282,"Dados":
    [
    {"Id":1,
    "Data_inscricao":"2017-01-01",
    "Texto":"Loremipsum",
    "Numeracao":26,
    "Tempo":"25s"}, 
    {"Id":3,
    "Data_inscricao":"2010-05-02",
    "Texto":"LoremipsumLorem",
    "Numeracao":656,
    "Tempo":9}
    ]

    }"/>]

    #Resultado do print() - END

regex = re.compile('(?:\"Dados\":\[)(.*?)(?:[]}"/>]])')

informacoes = re.findall(regex, informacoes)
print(type(informacoes))

    #Resultado do print() - BEGIN
    <class 'list'>
    #Resultado do print() - END

#Imprime conteúdo, considerando como lista
for dados in informacoes:
    print(type(dados))
        #Resultado do print() - BEGIN
        <class 'str'>
        #Resultado do print() - END

    print(dados)
        #Resultado do print() - BEGIN

        {"Id":1,
        "Data_inscricao":"2017-01-01",
        "Texto":"Loremipsum",
        "Numeracao":26,
        "Tempo":"25s"}, 
        {"Id":3,
        "Data_inscricao":"2010-05-02",
        "Texto":"LoremipsumLorem",
        "Numeracao":656,
        "Tempo":9

        #Resultado do print() - END
    #No print acima, realmente está faltando a } no final, provavelmente é por causa da regex
  • Can you elaborate on the question? Is your JSON an object list? What code are you using so far?

  • @Anderson, I’m not sure, but I believe it’s a list of objects because JSON is being the return of a beautifulsoup. I don’t have a logic for that code yet

  • Can you put JSON in full then? Don’t forget your code too...

  • @Andersoncarloswoss, I put the code

  • The question is at least strange, the concept of json is "key:value", so it doesn’t make much sense to express "back 2 positions." If you know the key, go right to it. If you do not know but know the value, convert to a Dict and look for the value to discover the key, the problem is if the same value belongs to more than one key.

  • @Sidon , Yes, a value can belong to more than one key with the same name. In the case of the expression "back 2 positions", it is based on I own the value 656 and need to collect the value of the key number, only from the data block with id:3. Obs. I don’t have id value

  • Without the real file in hand it becomes difficult.

  • I changed the question to make it clearer, I could analyze it please?

  • @Daniloalbergardi Okay, I put an answer, see if it helps.

Show 4 more comments

4 answers

2

I believe that your difficulty is related to how to turn a string into a dictionary. How do you convert JSON into a string to use replace, you need to get python to understand you as Dict to make your query. To achieve this you can use the built-in json (https://docs.python.org/3/library/json.html).

import json

informacoes = '{"resultado":true, "mensagem":"ok", "Contador":2144,\
                "Dados":[{"Id":1, "Data_inscricao":"2017-01-01",\
                          "Texto":"Loremipsum", "Numeracao":26, "Tempo":"25s"},\
                         {"Id":3, "Data_inscricao":"2010-05-02",\
                          "Texto":"LoremipsumLorem", "Numeracao":656, "Tempo":"96s"}]}'

# Diz ao python que sua string deve ser lida como JSON
data = json.loads(informacoes)

# Se você der um print(type(data)) verá que a str passará a ser tratada como dict

Then just do a search for the dictionary(s) (s) for the information you want. There are several ways to do this, and below just exemplify one of them.

numeracao = 656

for dictio in data["Dados"]:
    if dictio["Numeracao"] == numeracao:
        print(dictio["Data_inscricao"])

NOTE: Do not look for your numbering as str, because in JSON it is int.

  • I changed the question to make it clearer, I could analyze it please?

1

It seems your main question is not in converting data to JSON, but in extracting data from HTML with Beautifulsoup.

I’m not so familiar with Beautifulsoup, but from what I saw on documentation you need to use the method soup.find() for the method soup.findAll() returns a list of elements, while the soup.find() returns an element or None.

Once you have found the element just take the attribute you want directly in python with __getitem__(example elemento['atributo']).

html = urlopen("https://www.teste.com.br")
soup = BeautifulSoup(html)

# Pega o elemento que tu quer
info = soup.find(id="Resultados")

if info is None:
    # Nenhum elemento encontrado

# pega apenas o atributo que tu quer
json_str = info.get('value')

if json_str is None:
    # Elemento não possui atributo 'value'

# converte para json
json_data = json.loads(json_str)

Now that you already have your data loaded as JSON it’s time to choose how the necessary data will be extracted.


If you’re just searching the data through the field Numeracao and need to do more than 1 search, you can create a dict with the index being the field Numeracao, thus the search in the dict is fast (O(1)).

Ex.:

import json

# json_data atribuído no código anterior

dados = { data['Numeracao']: data for data in json_data['Dados'] }

# print(dados)
"""
dados = {
    656: {
        'Tempo': '96s', 
        'Data_inscricao': '2010-05-02', 
        'Id': 3, 
        'Numeracao': 656, 
        'Texto': 'LoremipsumLorem'
    }, 
    26: {
        'Tempo': '25s', 
        'Data_inscricao': '2017-01-01', 
        'Id': 1, 
        'Numeracao': 26, 
        'Texto': 'Loremipsum'
    }
}
"""

# Agora ficou simples e rápido procurar por numeração
numeracao = 656

# retorna o item com chave 656 ou None se o item não existir
item = dados.get(numeracao) 

if item:
    print('Dados encontrados ---> Data_inscricao:', item['Data_inscricao'])
else:
    print('Dados não encontrados')

But if you search only once and discard the data, you can iterate your data list and take only the item you need (O(n)).

#!/usr/bin/python3

import json

# json_data atribuído no código anterior

# Numeração desejada
numeracao = 656

# Itera json_data['Dados'] e filtra apenas o item que tiver numeração (retorna um generator)
dados = (x for x in json_data['Dados'] if x['Numeracao'] == numeracao)

# Retorna o item com a numeração desejada ou None caso ele não exista
item = next(dados, None)

if item:
    print('Dados encontrados ---> Data_inscricao:', item['Data_inscricao'])
else:
    print('Dados não encontrados')
  • I changed the question to make it clearer, I could analyze it please?

  • I updated the answer, see if it helps you.

1

Hi, I’m a Java programmer, but I believe this line of reasoning works for python...

1 - You need to transform this JSON into a python data structure, that is, an object array;

2 - Having an array of objects in hand you need to filter the items that have the value 656 in the attribute "Numbering", maybe there is already a library that abstracts enough for you, otherwise you will have to go through all the items, checking the value of the property "Numbering" of each to return 1 object that will be what you seek;

3 - Having the item (object) found you will access the property "Data_inscription" from within it;

  • I changed the question to make it clearer, I could analyze it please?

1


I think they have already given good answers here, but I will leave my interpretation, even to make the analysis requested in the comments. The first thing I notice is that the example that is given from Json is actually not valid, a json is, in terms of format, very similar to the python dictionaries (or list of them). So I adapted it to the most likely format:
(tl;dr).

[  
   {  
      "Id":1,
      "Data_inscricao":"2017-01-01",
      "Texto":"Loremipsum",
      "Numeracao":26,
      "Tempo":"25s"
   },
   {  
      "Id":3,
      "Data_inscricao":"2010-05-02",
      "Texto":"LoremipsumLorem",
      "Numeracao":656,
      "Tempo":9
   }
]

If you take the above content, put it in a text type file with the name "J1.txt', enter a python console and run the commands below:

with open('/path/j1.txt') as f:
    data = json.loads(f.read())
print (data) 

What you will get is exactly an object of the type list python, and the print will be exactly like a txt presentation. With this correct format and "loaded" for python, we can do what you want:

number = 656
for d in data:
    for value in d.values():
        if value==number:
            print ('A data de inscrição é: ', d['Data_inscricao'])

 A data de inscrição é:  2010-05-02

As I said a Json or a dictionary, it is not 'navigable' through a pointer but through its keys/values, so what I did was navigate the list of objects (this one allows pointer type navigation), find out which object has the value that Voce looks for (see I didn’t worry whether or not the key is the Numeracao, but could have done it, by the way, do it then), and finally, take the value of the data_inscricao.

If you want to restrict only to the key Numeração, could do so:

# Restringindo para que a busca se limite à chave 'Numeração'
for d in data:
    for key  in d.keys():
        if key=='Numeracao':
            if d[key]==number: 
                print ('A data de inscrição é: ', d['Data_inscricao'])
A data de inscrição é:  2010-05-02

Note that in both cases, if the value 656 appears n times (the first restricting to the key Numeracao and the second without taking the key into account. ), the date of registration will be presented n times.

Final consideration
You can create a strategy to be able to 'go back', one or n positions on the object within the json (see that you would not be returning in the json, but rather in the draft within it, which in this case would be a dictionary), but I consider that this would be a true masochism, one way would be: Identify the dictionary in which is the value sought, put the keys in an array mkeys and the values in another mvalues, identify the position (index) of the value sought in mvalues, get back to n desired positions in mkeys(only to take the key name) and access the value in mvalues in the same position. Except as a mere exercise, it does not seem wise.

See the code working in repl.it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.