0
I am trying to read a news site, but the special characters (like accents and cedillas) are coming wrong. Example:
In html code (and news site) is, for example:
Examples 1: "Brazil prohibits people from entering the border with Venezuela". But my code returns: "Brazil proh-be entry of people on the border with Venezuela"
Examples 2: "Without tourists and boats, Venice’s water becomes clearer and clearer". But my code returns: "Without tourists and boats, Venice’s water becomes clearer and unlisted"
I saw that a solution would be to introduce an ADO Stream object, but I couldn’t implement it. Someone can help?
Public Function getHTTP(ByVal Url As String) As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False: .Send
getHTTP = StrConv(.ResponseBody, vbUnicode)
End With
End Function
=====================================================================
Sub analisar()
Url = "https://g1.globo.com/"
Html = getHTTP(Url)
inicio_titulo = 1
i = 0
For Each c In Range("A1:A20")
inicio_titulo = InStr(inicio_titulo, Html, """title"":""") + 9
fim_titulo = InStr(inicio_titulo, Html, """,""url"":""")
titulo = Mid(Html, inicio_titulo, fim_titulo - inicio_titulo)
c.Value = titulo
Next
End Sub
If you put
.Charset = "utf-8"
in theCreateObject
, before the.Open
, works?– Rafael Tavares
No. Gave invalid property (error 438).
– Lam Lee
You are using conversion to Unicode and will give problem with accent in Latin even. See this article
– danieltakeshi
And the way you are performing, you may have problems reaching the maximum number of characters, see how to extract HTML to a . txt in this answer
– danieltakeshi