Return a JSON block within another block

Asked

Viewed 84 times

0

Hello!

I am using Pentaho Data Integration to return a JSON from the Twitter API, but I have an error in processing JSON and return only the values I want.

I’m trying to return block values "hashtags" within the second block "entities" but I’m unable to break the array and return only the hashtag.

I’m using the JSON Input and passing the following parameters:

  • $[*].created_at
  • $[*].text
  • $[*].id
  • $[*].user
  • $[*].retweet_count
  • $[*].favorite_count
  • $[*].in_reply_to_screen_name

And I would like to add a column with the hashtag.

Does anyone have any idea how to proceed?

Follow data present in JSON:

{
  "statuses": [
    {
      "created_at": "Tue Apr 02 13:15:16 +0000 2019",
      "id": 1113067372695310300,
      "id_str": "1113067372695310336",
      "text": "A leitura \\u00e9 um h\\u00e1bito muito saud\\u00e1vel e que vale a pena ser incentivado desde a primeira inf\\u00e2ncia. \n\nClique aqui\\u2026 https:\\/\\/t.co\\/kv9v4AHYJt",
      "truncated": true,
      "entities": {
        "hashtags": [],
        "symbols": [],
        "user_mentions": [],
        "urls": [
          {
            "url": "https:\\/\\/t.co\\/kv9v4AHYJt",
            "expanded_url": "https:\\/\\/twitter.com\\/i\\/web\\/status\\/1113067372695310336",
            "display_url": "twitter.com\\/i\\/web\\/status\\/1\\u2026",
            "indices": [
              113,
              136
            ]
          }
        ]
      },
      "metadata": {
        "iso_language_code": "pt",
        "result_type": "recent"
      },
      "source": "\\u003ca href=\"http:\\/\\/etus.com.br\" rel=\"nofollow\"\\u003eEtus Brasil\\u003c\\/a\\u003e",
      "in_reply_to_status_id": null,
      "in_reply_to_status_id_str": null,
      "in_reply_to_user_id": null,
      "in_reply_to_user_id_str": null,
      "in_reply_to_screen_name": null,
      "user": {
        "id": 495432339,
        "id_str": "495432339",
        "name": "Festa Na Floresta BH",
        "screen_name": "festaflorestabh",
        "location": "Belo Horizonte, Minas Gerais",
        "description": "Buffet Infantil",
        "url": "http:\\/\\/t.co\\/SUpGLZLV1A",
        "entities": {
          "url": {
            "urls": [
              {
                "url": "http:\\/\\/t.co\\/SUpGLZLV1A",
                "expanded_url": "http:\\/\\/www.festanaflorestabh.com.br",
                "display_url": "festanaflorestabh.com.br",
                "indices": [
                  0,
                  22
                ]
              }
            ]
          },
          "description": {
            "urls": []
          }
        },
        "protected": false,
        "followers_count": 30,
        "friends_count": 9,
        "listed_count": 0,
        "created_at": "Fri Feb 17 23:38:16 +0000 2012",
        "favourites_count": 355,
        "utc_offset": null,
        "time_zone": null,
        "geo_enabled": true,
        "verified": false,
        "statuses_count": 246,
        "lang": "pt",
        "contributors_enabled": false,
        "is_translator": false,
        "is_translation_enabled": false,
        "profile_background_color": "FFFFFF",
        "profile_background_image_url": "http:\\/\\/abs.twimg.com\\/images\\/themes\\/theme1\\/bg.png",
        "profile_background_image_url_https": "https:\\/\\/abs.twimg.com\\/images\\/themes\\/theme1\\/bg.png",
        "profile_background_tile": true,
        "profile_image_url": "http:\\/\\/pbs.twimg.com\\/profile_images\\/1858212508\\/logo_festanafloresta_normal.png",
        "profile_image_url_https": "https:\\/\\/pbs.twimg.com\\/profile_images\\/1858212508\\/logo_festanafloresta_normal.png",
        "profile_link_color": "0084B4",
        "profile_sidebar_border_color": "C0DEED",
        "profile_sidebar_fill_color": "DDEEF6",
        "profile_text_color": "333333",
        "profile_use_background_image": true,
        "has_extended_profile": false,
        "default_profile": false,
        "default_profile_image": false,
        "following": false,
        "follow_request_sent": false,
        "notifications": false,
        "translator_type": "none"
      },
      "geo": null,
      "coordinates": null,
      "place": null,
      "contributors": null,
      "is_quote_status": false,
      "retweet_count": 0,
      "favorite_count": 0,
      "favorited": false,
      "retweeted": false,
      "possibly_sensitive": false,
      "lang": "pt"
    }
  • There is an example in the directory. data-Integration samples Transformations JSON - read nested Fields.ktr In this example a field is read which is a nested structure, and after another reading is done.

1 answer

0

Pentaho has some problems to perform reading complex json when it needs to do a flat like this, to do what it needs to do just do the following.

Step 01 - Extract root entities from Tweets as shown below. inserir a descrição da imagem aqui

Step 02 - Build the layout as per the image below by separating the streams as you need them, in this case you will add or Json Step to take the second level of the json that contains the hashtag you need. I’ll leave the prints below. inserir a descrição da imagem aqui Initial layout

Segundo Json stepi According to Json step

inserir a descrição da imagem aqui What to put inside the second Step

Step 03

Join the streams to make the flat you need by the tweet id of the streams you copied and divided up and be happy :)

Juntando os fluxos

Something warns

Browser other questions tagged

You are not signed in. Login or sign up in order to post.