Get Data in the JSON structure with python

Asked

Viewed 23,455 times

3

I want to access certain information in the JSON code below, with python:

{
  "informacao1": valor_informação1,
  "informacao2": "{
    dado=informação_dado
}"

print(arquivojson.get("informacao1"))

The above print will display the line below.

Showing:

value_information1

How to access the value of datum=?

I tried the command below, but it doesn’t work

print(arquivojson.get("informacao1").get("dado"))
  • Is JSON exactly the way you put it? Why is the data there a string and is not in a JSON format. To get this information you will have to handle the string, with regex, for example.

  • It’s like this, I get it, there’s no other way to capture that value?

2 answers

5


As it stands, the structure of your data is apparently inconsistent. As commented by Sidon, if used JSON, a format that would make much more sense would be:

{
  "informacao1": "valor_informação1",
  "informacao2": {
    "dado": "informação_dado"
  }
}

If you have autonomy over the code that generates this JSON, I would recommend that you make this change. Otherwise, assuming that the information is in the desired format or that you have no way to modify it, you can get the information like this:

Consider the input data:

content = '''{
  "informacao1": "valor_informação1",
  "informacao2": "{dado=informação_dado}"
}'''

Analyzing JSON, converting to a Python object:

data = json.loads(content)

If we do:

print(data.get("informacao2"))

We’ll have the exit:

{dado=informação_dado}

To get only the content after the =, we can find the content within the string of this character and return the part of it from this position to the penultimate character:

part = slice(data.get("informacao2").index("=") + 1, -1)

In this case, if you do print(part), you’ll see he’s worth slice(6, -1, None), that is, return the positions of the string from index 6 to penultimate character.

dado = data.get("informacao2")[part]

In this way, dado worth informação_dado.

See working on Repl.it.


whereas informacao2 be something like:

"{dado=informação_dado, dado2=informação_dado2, dado3=informação_dado3}"

You can get all the values of dados through regular expression:

groups = re.findall(r"(?:dado\d*\=)(.*?)(?:[,}])", data.get("informacao2"))

In this case, by print(groups), we shall have:

['informação_dado', 'informação_dado2', 'informação_dado3']

That is, a list of all the data from string. To get the last amount, just do groups[-1].

  • I really appreciate your help, the file is really poorly structured Andersoncarloswoss and Sidon. I’m having trouble with your file because I have a lot of records inside "informacao2" and I need to capture the last record, the file is similar like this: content = '{ "informacao1": "value_information1", "informacao2": "{given=information_data, date2=information_data2, date3=information_data}" }'' How can we count these records to the last?

  • You just need the last or all of them?

  • At first only the last, but it would be very difficult to capture other information in the middle of the archive?

  • @Daniloalbergardi added an excerpt at the end of the answer, see if this is it.

  • with regex will work, I’ll try later, thanks for the help!

  • You have the json file, right? So I think the most appropriate would be to make a "program" to 'normalize it" This would not be difficult.

Show 1 more comment

1

The file should look like this:

{
  "informacao1": "valor_informação1",
  "informacao2": {
    "dado": "informação_dado"
}
}

That way, Voce can do:

import json
from pprint import pprint

with open('arquivo.json') as f:    
    data = json.load(f)

pprint (data)
{'informacao1': 'valor_informação1', 'informacao2': {'dado': 'informação_dado'}} 

print (data['informacao2'])
{'dado': 'informação_dado'} 

Now let’s consider that the two keys contain only strings, so save the file like this:

{
  "informacao1": "valor_informação1",
  "informacao2": "{dado=informação_dado}"
}

So you can do:

 pprint(data)
{'informacao1': 'valor_informação1', 'informacao2': '{dado=informação_dado}'}

pprint (data['informacao1'])
'valor_informação1'
  • Sidon, see comments, the value of informacao2 is a string even, not an object.

  • @Andersoncarloswoss, OK, so just save it as a string and not as an object, I edited the answer and added the example.

  • Correct, but the user wants to extract the information after the character =. Something like data.get('informacao2').get('dado'), returning informação_dado, understands?

  • But that’s not the concept json, json assumes a key and a value always, to do that Voce wants you to have to write the file according to my first example and, to get the value of the given key, do: print(data['informacao2']['dado']), got it? :-)

  • Exact. Everything indicates that JSON was incorrectly generated, so it cannot be recovered in the traditional way. Something like that.

  • If the json was recorded wrong then there is no way you have to do any gambiara as the one you present. If you have the solution to circumvent, what is the purpose of the question?

  • But I’m not the author of the question. I am only advising you that your answer does not solve the requested problem and indicating that you can edit it if it is of interest to you. Not necessarily this code will be gambiara. It depends on the context, that we do not know.

  • Perhaps the json was not misgenerated, perhaps the author of the question was himself "misgenerating" the json. That’s what I got with the question, if that’s right, my answer, in addition to solving the problem, clarifies to him how json actually works. As you said yourself, without making the context clear, it gets complicated.

  • I appreciate your help Sidon, but unfortunately the JSON file is inconsistent and I can’t modify it

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.