How to disable emoji rendering inside an HTML tag

Asked

Viewed 401 times

5

I have a platform (Website ) where any individual can create content (images and text), however would like to control the rendering of HTML symbols in particular the Emoji within a specific tag (in this case, div).

  • Control rendering of emoji - Filter (Remove, Hide, disable rendering) all symbols, codes of emoji and leave only the texts.
  • THE HTML

<div class="title-content" style="padding-top:0">
  <h2 class="title" style="padding-top:0">      </h2>
</div>

  • The Image

Renderização do código HTML com emoji na página

Form for creating content

  • Title input

    <div class="ubi-agenda-form-group col-xs-12">
      <label for="title" class="label-control">
        Titulo
        <span class="symbol required"></span>
      </label>
      <input type="text" class="input-control " autocomplete='name' id="title" required="required" name="title" placeholder="Ex: Noites de Pandza">
    </div>
    
  • Form Submit - jQuery

let titulo = $('input#title').val(); // Busco aqui o input antes de enviar para base de dados

let tituloTratado = tratarTitulo(title); // a tal funcao em javascript/Jquery para remover os símbolos, códigos dos emoji e deixar apenas os textos.
  • Yes! Without the emojis and respecting the font-family tag

  • If it is possible to change before storing in the database it would be good.

  • @Mauroalmeida made some changes to the question to try to make it clearer.

  • I’ve answered below, I hope it helps ;-)

2 answers

5

I made here a mini example how to remove emojis using regex.

To test you can enter the emojis and click the button Clique para retirar emojis and you will see that in the second text box emojis do not appear.

You can store the string returned from replace.

$('#button').click( function() {
  var text = $('#title').val();
  text = text.replace(/(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])/g, '').normalize('NFKC');
  $('#result').val(text);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<input type="text" id="title" name="title"><br>
<input type="button" id="button" name="button" value="Clique para retirar emojis"><br>
<input type="text" id="result" name="result"><br>

Edited: Using the Function normalize as described by hkotsubo

  • Thanks @Mauroalmeida!. The regex works, but has a problem does not format the text; that is, besides appearing like this Meu Time anonimo he keeps showing up like this

5


Us comments you said you want to take out the formatting (the string " " should become "My team Anonimo").

The problem is that this string you are using is not exactly formatted text. Not in the sense of having any HTML tags formatting it. In fact, this text is using characters other than the letters of our alphabet:

let s = "  ";
// imprimir os code points da string
console.log(Array.from(s).map(c => c.codePointAt(0).toString(16)));

In the code above, I am printing the code points. To understand the details about code points, suggest this - long - reading, but briefly, each existing character (be it letters, numbers, spaces, punctuation marks, mathematical symbols, etc.) has a unique numerical value, determined by Unicode.

If you run the above code, you will see that the first elements of the array are code points "1d474" and "1d486" (the values were printed in hexadecimal). The first corresponds to the character "MATHEMATICAL BOLD ITALIC CAPITAL M", that is the letter "M" uppercase "stylized" in italics and bold. The second is the character "MATHEMATICAL BOLD ITALIC SMALL E".

They are different characters from the letters "M" (whose code point is U+004D) and "e" (code point U+0065). The characters "" and "", although visually similar to the letters "M" and "e", are not the same characters, as they have different code points. And more importantly, as much as they look like just an "M" and an "E" formatted, they’re not exactly that. Because if you write "", without any training, they will already be like this, in "italics and bold", but if you apply this formatting, they will be even "more italics and bold" (see this example in Google Docs):

inserir a descrição da imagem aqui

Same example in HTML:

<!-- Caracteres ASCII -->
<p>Meu TIME Anonimo</p>
<p><b><i>Meu TIME Anonimo</i></b></p>
<!-- Caracteres Unicode (Mathematical Letters) -->
<p>  </p>
<p><b><i>  </i></b></p>

Notice how they are rendered differently. Unicode characters, even without any formatting, are already "bold and italic" even without tags <b> and <i>, and with the tags, they are even "more bold and italics" (more "thick" and "inclined").

As such, you don’t exactly want to "take out the formatting", but convert these characters to their ASCII equivalents. To do this, you can use the method normalize:

let s = "  ".normalize('NFKC');
console.log(s);
// imprimir os code points da string
console.log(Array.from(s).map(c => c.codePointAt(0).toString(16)));

Note that the text has now been printed with ASCII characters ("not formatted"): Meu TIME Anonimo, and the first code points are "4d" and "65", which correspond to the letters "M" and "and".

You can apply normalization after removing the emojis (using the solution proposed by Mauroalmeida), so your text will be "clean" the way you need it.

To understand a little more about Unicode normalization, read here, here and here - and more details can be found on Unicode document describing normalization.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.