Error while removing accents

Asked

Viewed 859 times

4

I have a javascript code to remove accents, it works with any accent, less with the crase, and the code should work with the crase as well. I can’t find the bug.

var teste = "Çaptúra de Tela 2016-04-27 às 18.21.24.png à à";


function removerAcentos( s ) {
        var map={"â":"a","Â":"A","à":"a","À":"A","á":"a","Á":"A","ã":"a","Ã":"A","ê":"e","Ê":"E","è":"e","È":"E","é":"e","É":"E","î":"i","Î":"I","ì":"i","Ì":"I","í":"i","Í":"I","õ":"o","Õ":"O","ô":"o","Ô":"O","ò":"o","Ò":"O","ó":"o","Ó":"O","ü":"u","Ü":"U","û":"u","Û":"U","ú":"u","Ú":"U","ù":"u","Ù":"U","ç":"c","Ç":"C"};

        console.log('remove acentos',s.replace(/[\W\[\] ]/g,function(a){return map[a]||a}));

        return s.replace(/[\W\[\] ]/g,function(a){return map[a]||a});
    }

  console.log(removerAcentos(teste));

1 answer

6

Though not apparent these à are not the same character... take a look at this comparison: https://www.diffchecker.com/2t62nhqv, and if you use the same code (apparently the same): https://jsfiddle.net/qecnk7Lk/

What happens is that UNICODE has two possibilities, or the full character with accent, or two entities... one the letter and another the accent as a complement.

In ES6 it is already possible to normalize this with .normalize and the code is very simple. regex is a combination of possible accents:

var teste = "Çaptúra de Tela 2016-04-27 às 18.21.24.png à à";

function removerAcentos(s) {
  return s.normalize('NFD').replace(/[\u0300-\u036f]/g, '');
}

console.log(removerAcentos(teste));

To support old browsers you can also use this library that does the same as ES6 does. Getting like this: https://jsfiddle.net/qjwcmo1v/

<script src="https://rawgit.com/walling/unorm/master/lib/unorm.js"></script>

Browser other questions tagged

You are not signed in. Login or sign up in order to post.