How to select JSON elements with Python?

Asked

Viewed 2,471 times

1

I’m using a news API, it returns me a JSON file:

import json
import requests

url = ('https://newsapi.org/v2/top-headlines?'
       'country=us&''apiKey=be7b904493554491afde83281651f05a')
response = requests.get(url)
noticias = json.loads(response.text)

inserir a descrição da imagem aqui

So I give the print of noticias but only with the first title:

 print(noticias['articles'][0]['title'])

How could I make to bring at least the top 10 titles?

  • The zero you used refers to the position in the list of articles you are accessing. If you want the top 10, just vary this value from 0 to 9.

  • when I do it it returns error

  • Typeerror: list indices must be integers or Slices, not tuple

  • 1

    Need to make a repeat loop and access each one separately.

2 answers

4

To manipulate a JSON, it is easier if you first understand its syntax/structure (and it’s not that hard).

Basically, there are two structures that I consider the most important: arrays and objects.


A JSON object is a set of pairs "key: value", and is bounded by {}. For example, if I have:

{
  "nome": "Fulano",
  "idade": 20
}

This object has the key "name", whose value is the string "Fulano", and the key "age", whose value is number 20. Note that in each pair "key: value" there is a : separating them, and each pair is separated by commas.


A JSON array is a list of elements, and is bounded by []. For example:

[ 10, "abc", 3 ]

This array has 3 elements: number 10, string "abc" and number 3 (all separated by commas).


What can make a JSON confusing is the fact that we can have these structures nested (we can have an object, with values that are arrays, which in turn contain other objects, which can have other objects or arrays, etc). Example:

{
  "nome": "Fulano",
  "idade": 20,
  "filmes_preferidos": [ "Clube da Luta", "Matrix" ]
}

Note that the object now has the "preferred filmes_key", whose value is an array (as it is between []). Inside this array there are 2 strings with movie names.

But nothing prevents each element of the array from also being an object:

{
  "nome": "Fulano",
  "idade": 20,
  "filmes_preferidos": [
    {
      "nome": "Clube da Luta",
      "ano_lancamento": 1999
    },
    {
      "nome": "Matrix",
      "ano_lancamento": 1999
    }
  ]
}

Note that the array now has 2 objects within it: each element of the array is bounded by {}, and each has 2 keys ("name" and "ano_lancamento"). Also note the comma after the } from the first film, to separate array elements.

And we could go further: inside each object that represents a film, one of the keys could have as value another array or object, and inside these could have other arrays/objects and so on, and the structure can be very complex.

Anyway, to read and manipulate a JSON, the ideal is to first look at its structure and see where are the information you need: are it in an object? What key? Is the value of the key an array? What array positions do I need? Are the array elements other objects or arrays? And so on...


In the case of your JSON (I deleted some parts to simplify):

{
 "status": "ok",
 "totalResults": 38,
 "articles": [
  {
   "source": {
    "id": null,
    "name": "Yahoo.com"
   },
   "author": null,
   "title": "U.S. Futures Climb on Rate-Cut Bets; Dollar Drops: Markets Wrap - Yahoo Finance",
   ...
  },
  {
   "source": {
    "id": "cnbc",
    "name": "CNBC"
   },
   "author": "Elizabeth Schulze",
   "title": "France approves digital tax on American tech giants, defying US trade threat - CNBC",
   ...
  },
  ...
 ]
}

It is an object (it is bounded by {}), which has several keys ("status", "totalResults", "Articles").

The value of the "Articles" key is an array (it is delimited by []). This array, in turn, has several objects within it (note that each element of the array is between {} and they are separated by commas). And each of these objects has the keys "source", "Author", "title", etc. Below is a more detailed explanation:

 "articles": [   <-- chave "articles", seu valor é um array (este [ marca o início do array)
  {   <-- primeiro elemento do array é um objeto (este { marca o início do objeto)
   "source": {  <-- chave "source" do primeiro elemento do array, valor é outro objeto (pois está entre {})
    "id": null,
    "name": "Yahoo.com"
   },
   "author": null,  <-- chave "author" do primeiro elemento do array
   // chave "title" do primeiro elemento do array
   "title": "U.S. Futures Climb on Rate-Cut Bets; Dollar Drops: Markets Wrap - Yahoo Finance",
   ...
  },  <-- fim do primeiro elemento do array (este } marca o fim do objeto, a vírgula separa os elementos do array)
  {  <-- segundo elemento do array é um objeto (este { marca o início do objeto)
   "source": {  <-- chave "source" do segundo elemento do array, valor é outro objeto (pois está entre {})
    "id": "cnbc",
    "name": "CNBC"
   },
   "author": "Elizabeth Schulze",  <-- chave "author" do segundo elemento do array
   // chave "title" do segundo elemento do array
   "title": "France approves digital tax on American tech giants, defying US trade threat - CNBC",
  },  <-- fim do segundo elemento do array (este } marca o fim do objeto, a vírgula separa os elementos do array)
  ...  <-- aqui continua os demais elementos do array
 ]  <-- este ] fecha o array

Knowing this, we now need to know how Python transforms this JSON into language structures. In the module documentation json we have this table:

JSON Python
Object Dict
array list
string str
number (int) int
number (real) float
true True
false False
null None

That is, objects are mapped to dictionaries, and arrays are mapped to lists. So, their variable noticias will be a dictionary (since the JSON in question is an object). Therefore, when accessing noticias['articles'], you get the value of the "Articles" key, which in case will be a list.

That’s why noticias['articles'][0] returns only the first element of the array (lists start from index zero). And this value, in this case, is another dictionary, which refers to the object that is the first element of the array (and this dictionary, in turn, has the key "title").


That said, if you want to go through the titles, just go through the list noticias['articles']. To get the top 10, use the syntax of Slice, and then traverse the elements using a for:

for artigo in noticias['articles'][:10]:
    print(artigo['title'])

In the case, [:10] is a way to get a "sub-list" containing only the first 10 elements. If you want to go through the whole list, just remove this snippet and use for artigo in noticias['articles'].

3

If you want all the titles:

for artigos in noticias.get('articles'):
        print(artigos['title'])

If you only want the first 10:

for artigos in noticias.get('articles')[:10]:
    print(artigos['title'])
  • worked out! I was just in doubt of what would be the "_" that you put in for

  • 1

    @Filipe. C by community convention, uses a variable called _ when its value will not be interesting to the context. It is not the case of the answer, so in my opinion it was unfortunate to do so. It would be much more readable to call her article

  • That’s exactly what @Andersoncarloswoss. Use _ It is a mania of mine when I access something within something, and this, as a whole, does not interest me in the future (treating it as a variable of 'discard'). I agree that it is not the most readable way. I will edit the answer.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.