PHP character decoding on an IMAP connection

Asked

Viewed 272 times

3

Introducing

I’m working on an e-mail box where later I have to filter the messages by senders. But the problem is in coding some "subjects".

  1. i make a connection to the mail server through the function imap_open;

    $mail_box = imap_open("{" . $incoming_server . ":" . $port . "/imap/ssl/novalidate-cert}INBOX", $username, $password) or die();
    
  2. then I get the header information through the function imap_headerinfo

    $header = imap_headerinfo($mail_box, $num_da_mensagem);
    

Between these two steps I do not manipulate anything. Everything has been sorted internally via PHP itself.

Difficulty

The problem is that when I give one print_r in that $header['Subject'] the return of some records will bring an encoded string like this:

[subject] => =?utf-8?B?UkVTOiBSRVM6IFtFWFRFUk5BTF0gUmU6IEluZm9ybWHDp8O1ZXMgc29icmUg?= =?utf-8?B?YSBBdGl2YcOnw6NvIGRvcyBQcm9kdXRvcyBlIFNlcnZpw6dvcyBDb250cmF0?= =?utf-8?Q?ados_-_WJINTERNET?=

To decode I tried to use the htmlentities and another custom function that I explain below.

function convert_encoding ($string, $to_encoding, $from_encoding = '')  {
if ($from_encoding == '')
    $from_encoding = $this->detect_encoding($string);

if ($from_encoding == $to_encoding)
    return $string;

return mb_convert_encoding($string, $to_encoding, $from_encoding);
}

function detect_encoding($string){
if (preg_match('%^(?: [\x09\x0A\x0D\x20-\x7E] | [\xC2-\xDF][\x80-\xBF] | \xE0[\xA0-\xBF][\x80-\xBF] | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} | \xED[\x80-\x9F][\x80-\xBF] | \xF0[\x90-\xBF][\x80-\xBF]{2} | [\xF1-\xF3][\x80-\xBF]{3} | \xF4[\x80-\x8F][\x80-\xBF]{2} )*$%xs', $string))
    return 'UTF-8';

return mb_detect_encoding($string, array('UTF-8', 'ASCII', 'ISO-8859-1', 'JIS', 'EUC-JP', 'SJIS'));
}

So it would look like this: convert_encoding ($header['Subject'], 'UTF-8');
But... nothing happens. Certainly because it is not an encoding but a predefined formatting (suspect). Therefore it is opportune to say that I also did not understand yet the reason for having some normal records and others like this.

What I need

  1. I wonder why some messages are coming with the Subject coded and others not. Understanding the root of the problem can help me see a different horizon to reach a viable solution.

  2. If it is a purely technical problem, if possible, what technique can I use to try to convert this encoding to something readable?

1 answer

3


All Subject E-mail that has special characters is encoded. Are the accented letters, the cedilla and etc.

I don’t know exactly why this coding is done, because I’m not familiar with the SMTP protocol. Then I can give a study to improve that answer.

There is a PHP function that performs all the heavy decoding work.

iconv_mime_decode

See the result:

<?php
$assunto = '=?utf-8?B?UkVTOiBSRVM6IFtFWFRFUk5BTF0gUmU6IEluZm9ybWHDp8O1ZXMgc29icmUg?= =?utf-8?B?YSBBdGl2YcOnw6NvIGRvcyBQcm9kdXRvcyBlIFNlcnZpw6dvcyBDb250cmF0?= =?utf-8?Q?ados_-_WJINTERNET?=';

echo iconv_mime_decode($assunto);
?>

The code above returns:

RES: RES: [EXTERNAL] Re: Informações sobre a Ativação dos Produtos e Serviços Contratados - WJINTERNET

References: http://php.net/manual/en/function.iconv-mime-decode.php

  • Great. I learned one more. Thank you so much for sharing this knowledge. .

Browser other questions tagged

You are not signed in. Login or sign up in order to post.