How to remove accent in upload with php?

Asked

Viewed 5,446 times

11

Part of the file is working. The problem is that when I send a file with accent. Example: send a file with the name ação-íaaa.jpg it looks like this on the server ação-íaaa.jpg. So I wanted to remove the seats so it looks like this acao-iaaa.jpg. Suggestions?

$destination_path = getcwd().DIRECTORY_SEPARATOR;
$result = 0;
$target_path = $destination_path . basename( $_FILES['myfile']['name']);
if(@move_uploaded_file($_FILES['myfile']['tmp_name'], $target_path)) {
$result = 1;}
sleep(1);
  • As you have already noticed, it is important that the encoding page and PHP are set correctly (because if the client sends trash, nothing you do on the server will fix). Once done, the proposed solutions should work correctly.

  • In that question the problem has already been solved, the question is different, but the problem and the answer are the same

12 answers

16

Simply remove accents:

$file = "ação-íaaa.jpg";
$file = iconv('UTF-8', 'ASCII//TRANSLIT', $file);
echo "{$file} <br>";

Output: acao-iaaa.jpg

Example available on ideone

  • +1, but it is good to remember that it is not enough to 'UTF-8' on iconv, but rather to check the encoding on the request headers. The browser may be sending ISO names, for example.

  • @Bacco, the weird thing is I upgraded PHP to 5.4.28 and it’s getting away ac~ao-'iaaa.jpg, I don’t know if it’s a version bug or some old config in ini I forgot to change.

  • But in ideone the output is correct as in the previous php I used, only now that it is separating. It worked right, I’m looking for some bug report.

  • 1

    I found the answer: "Note that the iconv Function on some systems may not work as you expect. In such a case, it’d be a good idea to install the » GNU libiconv library. " http://www.php.net/manual/en/intro.iconv.php - The iconv symptom of glibc is exactly what you described, separate the accents instead of removing.

  • Thanks @Bacco, I will take a look at it calmly. You think I should remove the answer, since it does not produce the expected effect?

  • The answer is right, you just need to or update libiconv on your server for PHP to use libiconv instead of iconv glibc. See on phpinfo() in the "iconv" section which is being used.

  • @Bacco, fine, then I’ll install it to see.

Show 2 more comments

13


I don’t like using files that contain special characters, so I always give a "clean" in the names and etc.

  function clearId($id){
     $LetraProibi = Array(" ",",",".","'","\"","&","|","!","#","$","¨","*","(",")","`","´","<",">",";","=","+","§","{","}","[","]","^","~","?","%");
     $special = Array('Á','È','ô','Ç','á','è','Ò','ç','Â','Ë','ò','â','ë','Ø','Ñ','À','Ð','ø','ñ','à','ð','Õ','Å','õ','Ý','å','Í','Ö','ý','Ã','í','ö','ã',
        'Î','Ä','î','Ú','ä','Ì','ú','Æ','ì','Û','æ','Ï','û','ï','Ù','®','É','ù','©','é','Ó','Ü','Þ','Ê','ó','ü','þ','ê','Ô','ß','‘','’','‚','“','”','„');
     $clearspc = Array('a','e','o','c','a','e','o','c','a','e','o','a','e','o','n','a','d','o','n','a','o','o','a','o','y','a','i','o','y','a','i','o','a',
        'i','a','i','u','a','i','u','a','i','u','a','i','u','i','u','','e','u','c','e','o','u','p','e','o','u','b','e','o','b','','','','','','');
     $newId = str_replace($special, $clearspc, $id);
     $newId = str_replace($LetraProibi, "", trim($newId));
     return strtolower($newId);
  }

USE

$target_path = $destination_path . basename( clearId($_FILES['myfile']['name']));

PS.: Depending on the coding of your files it may be necessary to use so clearId(utf8_encode($_FILES['myfile']['name']))

  • How to deploy some example of how to use the above code?

  • 1

    using that function clearId() just change of: $target_path = $destination_path . basename( $_FILES['myfile']['name'] ); for: $target_path = $destination_path . clearId( basename( $_FILES['myfile']['name']) );

  • Added the use...

  • Solved was just that with some adjustments in the line $Letraproibi = Array went all right thanks

  • You are welcome, if this decided to mark the answer as accepted, to close the question.

  • excellent response!

Show 1 more comment

8

I use the plugin code Germanix, from one of the moderators of Wordpress Developers and who knows much character encoding and Internationalization and Localization. First, he makes a html_entity_decode, then convert to lower case, then remove duplicates (e.e, ++ for +) permissible character (-=+.) and finally makes the replace of not allowed characters based on a very long and complete list.

/**
 * Limpar nome de arquivo no upload
 * 
 * Sanitization test done with the filename:
 * ÄäÆæÀàÁáÂâÃãÅåªₐāĆćÇçÐđÈèÉéÊêËëₑƒğĞÌìÍíÎîÏïīıÑñⁿÒòÓóÔôÕõØøₒÖöŒœßŠšşŞ™ÙùÚúÛûÜüÝýÿŽž¢€‰№$℃°C℉°F⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉±×₊₌⁼⁻₋–—‑․‥…‧.png
 * @author toscho
 * @url    https://github.com/toscho/Germanix-WordPress-Plugin
 */
function t5f_sanitize_filename( $filename )
{

    $filename    = html_entity_decode( $filename, ENT_QUOTES, 'utf-8' );
    $filename    = t5f_translit( $filename );
    $filename    = t5f_lower_ascii( $filename );
    $filename    = t5f_remove_doubles( $filename );
    return $filename;
}

/**
 * Converte maiúsculas em minúsculas e remove o resto.
 * https://github.com/toscho/Germanix-WordPress-Plugin
 *
 * @uses   apply_filters( 'germanix_lower_ascii_regex' )
 * @param  string $str Input string
 * @return string
 */
function t5f_lower_ascii( $str )
{
    $str     = strtolower( $str );
    $regex   = array(
        'pattern'        => '~([^a-z\d_.-])~'
        , 'replacement'  => ''
    );
    // Leave underscores, otherwise the taxonomy tag cloud in the
    // backend won’t work anymore.
    return preg_replace( $regex['pattern'], $regex['replacement'], $str );
}


/**
 * Reduz meta caracteres (-=+.) repetidos para apenas um.
 * https://github.com/toscho/Germanix-WordPress-Plugin
 *
 * @param  string $str Input string
 * @return string
 */
function t5f_remove_doubles( $str )
{
    $regex = array(
        'pattern'        => '~([=+.-])\\1+~'
        , 'replacement'  => "\\1"
    );
    return preg_replace( $regex['pattern'], $regex['replacement'], $str );
}


/**
 * Substitui caracteres não-ASCII.
 * https://github.com/toscho/Germanix-WordPress-Plugin
 *
 * Modified version of Heiko Rabe’s code.
 *
 * @author Heiko Rabe http://code-styling.de
 * @link   http://www.code-styling.de/?p=574
 * @param  string $str
 * @return string
 */
function t5f_translit( $str )
{
    $utf8 = array(
        'Ä'  => 'Ae'
        , 'ä'    => 'ae'
        , 'Æ'    => 'Ae'
        , 'æ'    => 'ae'
        , 'À'    => 'A'
        , 'à'    => 'a'
        , 'Á'    => 'A'
        , 'á'    => 'a'
        , 'Â'    => 'A'
        , 'â'    => 'a'
        , 'Ã'    => 'A'
        , 'ã'    => 'a'
        , 'Å'    => 'A'
        , 'å'    => 'a'
        , 'ª'    => 'a'
        , 'ₐ'    => 'a'
        , 'ā'    => 'a'
        , 'Ć'    => 'C'
        , 'ć'    => 'c'
        , 'Ç'    => 'C'
        , 'ç'    => 'c'
        , 'Ð'    => 'D'
        , 'đ'    => 'd'
        , 'È'    => 'E'
        , 'è'    => 'e'
        , 'É'    => 'E'
        , 'é'    => 'e'
        , 'Ê'    => 'E'
        , 'ê'    => 'e'
        , 'Ë'    => 'E'
        , 'ë'    => 'e'
        , 'ₑ'    => 'e'
        , 'ƒ'    => 'f'
        , 'ğ'    => 'g'
        , 'Ğ'    => 'G'
        , 'Ì'    => 'I'
        , 'ì'    => 'i'
        , 'Í'    => 'I'
        , 'í'    => 'i'
        , 'Î'    => 'I'
        , 'î'    => 'i'
        , 'Ï'    => 'Ii'
        , 'ï'    => 'ii'
        , 'ī'    => 'i'
        , 'ı'    => 'i'
        , 'I'    => 'I' // turkish, correct?
        , 'Ñ'    => 'N'
        , 'ñ'    => 'n'
        , 'ⁿ'    => 'n'
        , 'Ò'    => 'O'
        , 'ò'    => 'o'
        , 'Ó'    => 'O'
        , 'ó'    => 'o'
        , 'Ô'    => 'O'
        , 'ô'    => 'o'
        , 'Õ'    => 'O'
        , 'õ'    => 'o'
        , 'Ø'    => 'O'
        , 'ø'    => 'o'
        , 'ₒ'    => 'o'
        , 'Ö'    => 'Oe'
        , 'ö'    => 'oe'
        , 'Œ'    => 'Oe'
        , 'œ'    => 'oe'
        , 'ß'    => 'ss'
        , 'Š'    => 'S'
        , 'š'    => 's'
        , 'ş'    => 's'
        , 'Ş'    => 'S'
        , '™'    => 'TM'
        , 'Ù'    => 'U'
        , 'ù'    => 'u'
        , 'Ú'    => 'U'
        , 'ú'    => 'u'
        , 'Û'    => 'U'
        , 'û'    => 'u'
        , 'Ü'    => 'Ue'
        , 'ü'    => 'ue'
        , 'Ý'    => 'Y'
        , 'ý'    => 'y'
        , 'ÿ'    => 'y'
        , 'Ž'    => 'Z'
        , 'ž'    => 'z'
        // misc
        , '¢'    => 'Cent'
        , '€'    => 'Euro'
        , '‰'    => 'promille'
        , '№'    => 'Nr'
        , '$'    => 'Dollar'
        , '℃'    => 'Grad Celsius'
        , '°C' => 'Grad Celsius'
        , '℉'    => 'Grad Fahrenheit'
        , '°F' => 'Grad Fahrenheit'
        // Superscripts
        , '⁰'    => '0'
        , '¹'    => '1'
        , '²'    => '2'
        , '³'    => '3'
        , '⁴'    => '4'
        , '⁵'    => '5'
        , '⁶'    => '6'
        , '⁷'    => '7'
        , '⁸'    => '8'
        , '⁹'    => '9'
        // Subscripts
        , '₀'    => '0'
        , '₁'    => '1'
        , '₂'    => '2'
        , '₃'    => '3'
        , '₄'    => '4'
        , '₅'    => '5'
        , '₆'    => '6'
        , '₇'    => '7'
        , '₈'    => '8'
        , '₉'    => '9'
        // Operators, punctuation
        , '±'    => 'plusminus'
        , '×'    => 'x'
        , '₊'    => 'plus'
        , '₌'    => '='
        , '⁼'    => '='
        , '⁻'    => '-' // sup minus
        , '₋'    => '-' // sub minus
        , '–'    => '-' // ndash
        , '—'    => '-' // mdash
        , '‑'    => '-' // non breaking hyphen
        , '․'    => '.' // one dot leader
        , '‥'    => '..'  // two dot leader
        , '…'    => '...'  // ellipsis
        , '‧'    => '.' // hyphenation point
        , ' '    => '-'   // nobreak space
        , ' '    => '-'   // normal space
    );

    $str = strtr( $str, $utf8 );
    return trim( $str, '-' );
}

Then just pass the file name to the main function:

t5f_sanitize_filename( $nome_do_arquivo );

7

Place at the beginning of the script

ini_set("default_charset","UTF-8");

or uses

$nome = utf8_encode($_FILES['myfile']['name']);

It should resolve, at least the name is right, but does not remove the accent. If you want to keep a unique name for the file without accentuation you can do so:

$nome = md5(date("YmdHis").$_FILES['myfile']['name']).jpg;
  • I switched <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> by <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> on the page and the name became normal now I would like to remove the accents

  • To remove the accents you have to use the @Kaduamaral function. But always remember that after the search the file by its original name will not find it. You can also do it with md5(). See the edition.

3

In that question, the alternative below was the most efficient, I believe the case is quite similar.

function replaceChar($str){
        $str = preg_replace('/[áàãâä]/ui', 'a', $str);
        $str = preg_replace('/[éèêë]/ui', 'e', $str);
        $str = preg_replace('/[íìîï]/ui', 'i', $str);
        $str = preg_replace('/[óòõôö]/ui', 'o', $str);
        $str = preg_replace('/[úùûü]/ui', 'u', $str);
        $str = preg_replace('/[ç]/ui', 'c', $str);
        $str = preg_replace('/[^a-z0-9]/i', '_', $str);
        $str = preg_replace('/_+/', '_', $str);
        return $str;
    }

2

The solution below also solves the problem, and in a cleaner way:

$string = 'ÁÍÓÚÉÄÏÖÜËÀÌÒÙÈÃÕÂÎÔÛÊáíóúéäïöüëàìòùèãõâîôûêÇç'; // Entrada
$semAcentos = preg_replace('/[`^~\'"]/', null, iconv('UTF-8', 'ASCII//TRANSLIT', $string));
echo $semAcentos;

// Saída: AIOUEAIOUEAIOUEAOAIOUEaioueaioueaioueaoaioueCc

Thanks to Carlos Coelho, from whom I obtained this solution.

2

Complementing the existing responses.

There is a Unicode character block called Combining Diacritical Marks (combination diacritical signals), they are used to produce accents.

If the text contains any of these characters none of the presented solutions removes these accents. There are two ways to deal with this problem:

1 - remove characters using regular expression:

<?php
// remove somente caracteres dentro do intervalo
preg_replace('/[\x{0300}-\x{036f}]+/u', '', $string);
// remove todos os caracteres do bloco de caracteres
preg_replace('/[\p{M}]+/u', '', $string);

2 - convert combination signals to accented characters before to apply the removal of accents:

<?php
normalizer_normalize($string);

This function only works if the "Intl" internationalization extension is enabled on the server.

2

The simplest and most efficient way to remove accents is to map characters with an internal PHP function, the iconv:

setlocale(LC_CTYPE, 'pt_BR'); // global (pode ser LC_ALL) 

function unaccent($str){
    return iconv('UTF-8', 'ASCII//TRANSLIT', $str);
}

The iconv is a standardized and well mature function, with high performance and high reliability, in general a common library function call operating system (Linux, Windows and other systems).

2

function removeAcentos($string, $slug = false) {
if(mb_detect_encoding($string.'x', 'UTF-8, ISO-8859-1') == 'UTF-8'){
$string = utf8_decode(strtolower($string)); }
$ascii['a'] = range(224, 230);
$ascii['e'] = range(232, 235);
$ascii['i'] = range(236, 239);
$ascii['o'] = array_merge(range(242, 246), array(240, 248));
$ascii['u'] = range(249, 252);
$ascii['b'] = array(223);
$ascii['c'] = array(231);
$ascii['d'] = array(208);
$ascii['n'] = array(241);
$ascii['y'] = array(253, 255);
foreach ($ascii as $key=>$item) {
$acentos = '';
foreach ($item as $codigo) $acentos .= chr($codigo);
$troca[$key] = '/['.$acentos.']/i'; }
$string = preg_replace(array_values($troca), array_keys($troca), $string);  if ($slug) {
$string = preg_replace('/[^a-z0-9]/i', $slug, $string);
$string = preg_replace('/' . $slug . '{2,}/i', $slug, $string);
$string = trim($string, $slug); }
return $string; }
echo removeAcentos("Palavras com acentuação");
echo removeAcentos("Palavras com acentuação", "_");

1

Follows function to remove accents using regular expression, much simpler and compact.

<?php
function removerAcentos( $string ) {
    $mapaAcentosHex  = array(
        'a'=> '/[\xE0-\xE6]/',
        'A'=> '/[\xE0-\xE6]/',
        'e'=> '/[\xE8-\xEB]/',
        'E'=> '/[\xE8-\xEB]/',
        'i'=> '/[\xEC-\xEF]/',
        'I'=> '/[\xEC-\xEF]/',
        'o'=> '/[\xF2-\xF6]/',
        'O'=> '/[\xF2-\xF6]/',
        'u'=> '/[\xF9-\xFC]/',
        'U'=> '/[\xF9-\xFC]/',
        'c'=> '/\xE7/',
        'C'=> '/\xE7/',
        'n'=> '/\xF1/',
        'N'=> '/\xF1/'
    );
    foreach ($mapaAcentosHex as $letra => $expressaoRegular) {
        $string = preg_replace( $expressaoRegular, $letra, $string);
    }
    return $string;
}

1

Even though there are several answers, I leave here a new way that I learned and found very cool.

function remover_acentos($str) {
    return preg_replace("/&([a-z])[a-z]+;/i", "$1", htmlentities($str))
}

echo remover_acentos("ação-íaaa.jpg");

-1

A simpler way:

function retirarAcentos($pTxt){
    $pTxt = str_replace('ç','c',$pTxt);
    $pTxt = str_replace('ã','a',$pTxt);
    return $pTxt;
}

The example is only replacing "ç" with "c" and "ã" with "a", but just repeat these lines with all the other accents you want to replace.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.