The original HTML entities are not preserved when the Markup of the document is interpreted (Parsed) for browser, so that they are not available to you to consult them via Javascript or any other way. According to the specification, during the step of tokenization (reading the text "raw" and producing "parts" - or tokens - for further analysis) the HTML entities (here called Character Reference) produce a single character when consumed:
8.2.4.69 Tokenizing Character Ferences
...
The behavior depends on the identity of the next character (the one immediately after U+0026 AMPERSAND), as follows:
...
"#" (U+0023)
Consume the U+0023 NUMBER SIGN.
...
Consume all characters matching the character range listed above (ASCII hexadecimal digits or ASCII digits).
...
Otherwise, if the next character is a U+003B SEMICOLON, consume it as well. If it is not, it is a Parsing.
...
Otherwise, return a character token to the Unicode character whose code point is that number.
Anything else
Consume as many characters as possible, provided the characters consumed match one of the identifiers in the first column of the table of named character reference (case sensitive and case sensitive).
...
Return one or two character tokens to the(s) character(s) corresponding to the character name in the reference (given by the second column of the table of named character reference).
(free translation, emphasis mine)
That is, after the HTML document has been "parsed" and the resulting HTML formed (as well as its representation in the DOM) the information about the HTML entities used is no longer there - were replaced by their corresponding characters. The fact of innerHTML
(and outerHTML
) return the text with >
escape is due to the fact that he "rewrites" it for you, regardless of how it was originally in the Markup:
alert(document.querySelector('p').innerHTML);
<p>situação > ativo</p>
Therefore, if you really need this information you will need to get it before the text of the document reaches the browser like HTML: on the server side, for example, or - if you’re getting the text via Ajax - analyzing the text before creating elements with it. Unfortunately this implies analyzing the HTML text yourself, which is nothing trivial... Maybe some library of Parsing be able to at the same time deal with the entities (preferably recognizing all that make up the specification) and preserve the original content of the same, but in my head I do not know any (nor know that in environment you are working).
That’s a great question! I don’t know at what point in the Parsing HTML entities are resolved, not even if it is preserved somewhere or discarded. By checking the children of
p
, I see there’s only one knotText
, whosedata
(ofCharacterData
) is the string with all its resolved entities (including the>
). So it seems to me that the information you want no longer exists after the page is loaded, so you would need to get that content from that external source (either on the server side, or maybe via ajax if applicable) and treat it before the browser interpret your HTML.– mgibsonbr