Facebook Json - Problems with accentuation - Python

Asked

Viewed 81 times

-3

I’m trying to import the Friends.json file from Facebook, but I’m having trouble accentuating it. Friends.json file is something like:

{
  "friends": [
    {
      "name": "Marco Aur\u00c3\u00a9lio Ferreira",
      "timestamp": 1617453534
    },
    {
      "name": "Tha\u00c3\u00ads Everton D. Soares Papaleo",
      "timestamp": 1617420287
    }
          ]
 }

The code I’m using is:

# encoding in UTF-8
import json

with open("friends.json", "r", encoding='UTF-8') as read_file:
    data = json.load(read_file)
    data = (data['friends'])
    for n in data:
        nome = n['name'] 
        print(nome)

However the return is as follows:

Marco Aurélio Ferreira
Thaís Everton D. Soares Papaleo

Honestly I’ve tried everything, but I can’t fix the accent. Someone has some light?

  • 1

    tried latin-1?

  • Yes. I also tried the codecs library, codecs.open function, but also did not give.

  • 2

    The fact of leaving different from what you expect seems to be a problem at print time (the place where you are reading the string, whether console, IDE or something like that is not in UTF-8), which does not mean that Py is wrong. If you can get [Edit] to post and provide a [mcve], it would help a lot (Dit: and put where you’re using the output, if you’re going to use it on the console, that OS etc - locks are not intended to be definitive and are not punishment, so we always invite you to edit the posts - it’s not personal, and it doesn’t mean you’re not welcome, quite the contrary).

  • Not the IDE’s problem. But the Kaique solution worked. Plus my post was a Minimal, Complete and Verifiable example.

1 answer

-3

# -*- encoding: utf-8 -*-
import json

with open("friends.json", "r", encoding='UTF-8') as read_file:
    data = json.load(read_file)
    data = (data['friends'])
    for n in data:
        nome = n['name'].encode('latin-1').decode('unicode-escape').encode('latin-1').decode('utf-8')
    print(nome)
  • THANK YOU!!! I spent half of my Sunday on this. rs...

  • Kaique Silva I know the code works, I tested it, but could you please put an explanation of why it works.

  • 1

    And explain why not just do nome = n['name'].encode('latin-1').decode('utf-8')

  • 2

    The JSON provided is already UTF-8 - this is one of the reasons why it is important to close posts lacking fundamental information, the question even talks about how it will use the output. I understand that the Kaique wants to help, but in most scenarios this will make the situation worse (all the posts on the site should serve a wide audience, and not as individual Helpdesk). The code of the author of the question does not give the error it mentions, when you play the output in a UTF-8 output. And obviously the sequence of encodes and decodes should already be indicative that it is not a very natural "solution".

  • Bacco, most people understood my question, tried to help me, but you didn’t. Maybe you are partially right and for a more complex program there are better outputs, but the Kaique solution is simple and works, perfectly met my need. If you thought it necessary for me to tell you the way out, you might have asked, but how it was proved such information was not necessary. From your posts, it seems you haven’t even tested the code I posted. I know your intention is good. But if you want the community to grow, you need to revisit some concepts.

  • 2

    Your mistake, @Otavio. This is not a Helpdesk. Solving the author’s problem is almost a side effect. About testing, it was the first thing I did before closing the post (so I could with peace of mind say that the error was not reproduced, because in a UTF-8 exit the accent comes out correct). About revising concepts, I’m always open, but there are two that need more preparation to tell me; one, is about the purpose of the site, and the other, character coding, the second I have even more expertise than the first (but I’m not competing with anyone). Even the solution presented here has too many steps.

  • 2

    Fellow @Augustovasques himself points out something much simpler, 2 steps that work for latin-1 (which in his case may not even be the right one, maybe it’s win-1252, we don’t know where he’s going to use - they’re similar, but not the same). This is easily verifiable: https://ideone.com/yQS24V - but we only know that it is a solution for YOUR case, because it worked where it used, which is almost coincidental. About revising concepts, maybe this will help: What is the Stack Overflow

  • 2

    Anyway, I am not "owner" of the site, the system is very well designed. If the moderator makes a mistake (and people make a mistake), simply users vote to reopen the post, and it’s solved. What I feel obligated is to have left the comment here so that the Kaique can better elaborate the post if you want, detailing what he did and why he understands that it would work, as for who comes after (which is the purpose of the site) use the information as thoroughly as possible. Still, at any time you can participate in [meta] and get feedback from other community members.

  • I started studying Python two weeks ago. I saw that on this site there were already several similar questions with doubts about accent in python and tested all before posting. What indicates my doubt may be that of others. Finally Bacco. I think we have invested too much time with a problem that is already solved. I promise to try to be more thorough in the next question. And I regret the inconvenience. rs...

  • Purpose of the site: What is the Stack Overflow?. I’m locking the comments, because it’s not a place to cast doubt on what the site is. The tips of Bacco on encoding, are pertinent, even in the past he has helped me with this, it is something that he understands very well, the problem, I recommend to everyone to follow the code: https://answall.com/conduct and understand constructive criticism. If you still feel you need to ask questions about how the site works, see META https://pt.meta.stackoverflow.com/ or ask there.

  • Hi @Otavio. It is worth remembering that the closures are not always definitive, it depends on the author improve the question, always remembering the [mcve], which is almost always the "key rule" for a question to be useful to the site and to facilitate to be answered. Everyone is welcome to ask and answer, just remembering to try to understand some minimum criteria so that the site does not become just a Helpdesk, We recommend everyone to read: Sopt’s Guide to Survival to avoid closures and better enjoy the site.

Show 6 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.