Compare strings to accents, UTF-8

Asked

Viewed 1,486 times

2

Something is escaping me here, I am doing Curl to a weather page, if the results have accents and comparing with exactly the same string in full this condition returns false (not equal). This is merely for testing:

function get_page($url) {

   $curl = curl_init();
   curl_setopt($curl, CURLOPT_RETURNTRANSFER, True);
   curl_setopt($curl, CURLOPT_URL, $url);
   /*curl_setopt($curl, CURLOPT_TIMEOUT_MS, 1000);*/
   $return = curl_exec($curl);
   curl_close($curl);
  return $return;

}

$weather = get_page("http://www.accuweather.com/pt/pt/cascais/274007/weather-forecast/274007");

preg_match('/<span class="cond">(.*?)<\/span>/s', $weather, $cond);
preg_match('/<strong class="temp">(.*?)<span>/s', $weather, $temp);
$condition = trim($cond[1]); //Céu Limpo (hoje)
$temp = trim($temp[1]); //27 (hoje)

In today’s case (30-06-2015) the condition we have is "Clear Sky", but when I test the following condition:

if(strtolower($condition) == "céu limpo") {
   ....
}

False returns (if commands are not executed)

But if it does:

$hey = "Céu Limpo";
if(strtolower($hey) == "céu limpo") {
   ....
}

Already returns true and the code within the condition is already executed. I would like to know why of this and how to solve

  • of the one echo $condition to see what he’s getting

  • It’s printing "Clear Sky"

2 answers

2


Your problem is related to Html Entities, if you do it here:

$arrayCondition = str_split($condition);
$arrayString = str_split("Céu Limpo");
var_dump($arrayCondition);
var_dump($arrayString);

You’ll notice the difference in their exit:

array(14) { [0]=> string(1) "C" [1]=> string(1) "&" [2]=> string(1) "#" [3]=> string(1) "2" [4]=> string(1) "3" [5]=> string(1) "3" [6]=> string(1) ";" [7]=> string(1) "u" [8]=> string(1) " [9]=> string(1) "L" [10]=> string(1) "i" [11]=> string(1) "m" [12]=> string(1) "p" [13]=> string(1) "o" }

array(10) { [0]=> string(1) "C" [1]=> string(1) " [2]=> string(1) " [3]=> string(1) "u" [4]=> string(1) " [5]=> string(1) "L" [6]=> string(1) "i" [7]=> string(1) "m" [8]=> string(1) "p" [9]=> string(1) "o" }

The first that comes from your Curl is coming with Htmlentities, your handwriting is is coming with the value &#233;

To solve this you can use the html_entity_decode, example:

if (strtolower(html_entity_decode($condition)) == "céu limpo") {
    echo 'funcionou!!!';
}

1

The problem occurs because it is probably not in UTF-8 the CURL answer.

For this try to convert to utf-8 before making the comparison.

Functions that can be used: utf8_encode or utf8_decode

Example:

if(strtolower(utf8_encode($condition)) == "céu limpo") {
....
}
//Caso não funciona tente ao contrario as vezes seu arquivo não esta em utf-8

if(strtolower(utf8_decode($condition)) == "céu limpo") {
....
}
  • In this case it does not work, but good solution for other cases. Thank you

Browser other questions tagged

You are not signed in. Login or sign up in order to post.