Remove special characters while maintaining accents

Asked

Viewed 139 times

-4

I have this function below:

<?php
    $titulo = "Notícia Com Ácêntös";
    $titulo_novo = preg_replace(array('/[^a-zA-Z0-9 -]/', '/[ -]+/', '/^-|-$/'),  array('', '-', ''), $titulo);
    
    echo $titulo_novo; // Retorna: Notcia-Com-cnts
?>

Note that she’s removing the accents, I want you to just remove the special characters, but keep the accents and ç (cedilla).

I need this function to rename the images I upload, for example:

Convert that:

"Símbolo cachaça & foguete.com.jpg"

For that reason:

"Símbolo-cachaça-foguete-com.jpg"
  • For the expected result it seems to me that neither of regex you need, just use https://www.php.net/manual/en/function.strtr.php, remembering to separate the extension from the rest of the string, which can be done with pathinfo or until substr + strrpos

  • That way I would have to identify everything I want to replace, I want something automatic that I tell you what I don’t want you to change, and everything else goes away. I think preg_replace is ideal but I’m not doing it right.

  • 1

    It doesn’t seem ideal, just this looking like this because it solved part of what you already wanted, anyway what I said already applies, even in preg_replace, just you DELIMIT what you need to be removed, instead of "ignore", just replace &, _, . and spaces and finally when it gets hyphens in sequence do another replace just to remove the sequences that will appear.

  • Look, I used the "&"," example." just as an example, because there are a multitude of characters that users can name the images before sending, for me to make a list of all the code would be very large, understood?

  • 1

    utf8 and latin1 accents (iso-8859-1, windows-1252, etc.) are not the same, I mean á utf-8 is not the same thing as á of latin1, they look alike, but they’re not the same. Soon the very idea that applying something the way you want it will be something flawed, so there is no way to solve it the way you want it.

2 answers

-4


I created an example, that I would like?

$titulos = [
    'Notícia ÷ 2×2 äëïöü',
    'Notícia Com Ácêntös',
    'Símbolo cachaça & foguete.com'
];

foreach($titulos as $titulo) {
    $novo_titulo = preg_replace('/[^\p{L}\p{M}]+/u', '-', $titulo);
    
    echo $novo_titulo . PHP_EOL;
}

Example: here

  • 3

    Hello Jeff. If the PHP script is saved with windows-1252 or equivalent and the files. jpg in the operating system are with their names in UTF-8, this will fail. cc @Alexfonseca

  • 4

    Not to be boring (but to do what... every answer is a reference for future visitors) but it only worked in the test by mere coincidence. The source encoding is the first problem of all and was not considered. Other than that, the regex is wrong, in addition to letting special characters pass between À and ü ) Example: ÷ https://ideone.com/e2KQMU. You can [Dit] the post and fix the problems at any time, one of the principles of the site is always to give space to improve the posts and aggregate to the community). And in Europe there is still a lack of standardisation.

  • Without taking into account the problems of encoding already mentioned, would give to "simplify" to preg_replace('/[^\p{L}\p{M}]+/u', '-', $titulo). The Lookahead seems unnecessary to me, because to deny a character enough [^...] - and used the Unicode Properties to pick up letters and diacritics (the accents, if the string is in NFD). Of course, it still does not solve the case of algo.com.jpg that by the example of the question should be algo-com.jpg (only the last point is not changed), but already improves a little...

  • Hello folks, in fact the previous regex did not cover all the range you needed. Thank you very much @hkotsubo for your contribution.

  • About the encoding problem can really happen. what options would you have to solve this problem? Unfortunately I can’t help without running some tests and understanding the scenario :/

-4

Thank you @Jeff.Dev It was exactly what I wanted, I just minimized it and that’s it, it was perfect!

<?php
$removerEspeciais = 'Notícia Com Ácêntös';
$removerEspeciais = preg_replace(['#(?![A-Za-zÀ-ü0-9.]).#i', '#[-]{1,}#i'], ['-', '-'], $removerEspeciais);
echo $removerEspeciais;
?>

Browser other questions tagged

You are not signed in. Login or sign up in order to post.