Problems with str_pad function and accentuation

Question

Problems with str_pad function and accentuation

Asked 8 years, 11 months ago

Viewed 427 times

8

Guys I’m using the function str_pad to take a string and fill it with 10 characters 0.

It is working perfectly, see the example:

echo str_pad("dda", 10, "0", STR_PAD_LEFT);

She writes 0000000dda.

The problem occurs when placing accentuation, example:

echo str_pad("ddã", 10, "0", STR_PAD_LEFT);

Instead of writing 0000000ddã he writes 000000ddã, which means he loses a 0. Someone knows how to fix this?

@Victorstafusa you know what can be, the problem?

– Hugo Borges

2016/09/12 at 20:25
What is the file encounter? is that thing the function str_pad() handles bytes and not characters, accented characters take up 2 or more bytes so the final string gets a character less. I made a test here play a utf8_decode() in the first argument and it worked. Should another better way to solve this.

– rray

2016/09/12 at 20:31

2 answers

7

The problem is that the function str_pad assumes that each character occupies one byte. When you use characters that are more than one byte long (such as ã), the function starts to go wrong.

No Stackoverflow in English there’s a question about that And there are four answers to that problem. Judging by the comments, two of the answers have problems (including the accepted answer) and the other two should be adequate (I did not test them however). All answers given there consist of creating a different function capable of handling multibyte characters.

Here the solution of Wes:

function mb_str_pad($str, $pad_len, $pad_str = ' ', $dir = STR_PAD_RIGHT, $encoding = NULL)
{
    $encoding = $encoding === NULL ? mb_internal_encoding() : $encoding;
    $padBefore = $dir === STR_PAD_BOTH || $dir === STR_PAD_LEFT;
    $padAfter = $dir === STR_PAD_BOTH || $dir === STR_PAD_RIGHT;
    $pad_len -= mb_strlen($str, $encoding);
    $targetLen = $padBefore && $padAfter ? $pad_len / 2 : $pad_len;
    $strToRepeatLen = mb_strlen($pad_str, $encoding);
    $repeatTimes = ceil($targetLen / $strToRepeatLen);
    $repeatedString = str_repeat($pad_str, max(0, $repeatTimes)); // safe if used with valid unicode sequences (any charset)
    $before = $padBefore ? mb_substr($repeatedString, 0, floor($targetLen), $encoding) : '';
    $after = $padAfter ? mb_substr($repeatedString, 0, ceil($targetLen), $encoding) : '';
    return $before . $str . $after;
}

Here the solution of Ja ck:

function mb_str_pad($input, $pad_length, $pad_string = ' ', $pad_type = STR_PAD_RIGHT, $encoding = 'UTF-8')
{
    $input_length = mb_strlen($input, $encoding);
    $pad_string_length = mb_strlen($pad_string, $encoding);

    if ($pad_length <= 0 || ($pad_length - $input_length) <= 0) {
        return $input;
    }

    $num_pad_chars = $pad_length - $input_length;

    switch ($pad_type) {
        case STR_PAD_RIGHT:
            $left_pad = 0;
            $right_pad = $num_pad_chars;
            break;

        case STR_PAD_LEFT:
            $left_pad = $num_pad_chars;
            $right_pad = 0;
            break;

        case STR_PAD_BOTH:
            $left_pad = floor($num_pad_chars / 2);
            $right_pad = $num_pad_chars - $left_pad;
            break;
    }

    $result = '';
    for ($i = 0; $i < $left_pad; ++$i) {
        $result .= mb_substr($pad_string, $i % $pad_string_length, 1, $encoding);
    }
    $result .= $input;
    for ($i = 0; $i < $right_pad; ++$i) {
        $result .= mb_substr($pad_string, $i % $pad_string_length, 1, $encoding);
    }

    return $result;
}

Browser other questions tagged php

You are not signed in. Login or sign up in order to post.

by stderr • **30,356** points · Answer 1 · 2016-09-12T20:44:08+00:00

This is because ã is a character multi-byte, see:

echo strlen("a"); // 1
echo strlen("ã"); // 2

The function str_pad interprets ã as a character of two bytes instead of a multi-byte, to circumvent this use the function mb_strlen to inform the size of the string, thus ã will be interpreted as a character multi-byte, see:

echo mb_strlen("a"); // 1
echo mb_strlen("ã"); // 1

You can implement mb_strlen thus (credits):

function mb_str_pad( $input, $pad_length, $pad_string = ' ', $pad_type = STR_PAD_RIGHT, $encoding = "UTF-8") {
    $diff = strlen( $input ) - mb_strlen($input, $encoding);
    return str_pad( $input, $pad_length + $diff, $pad_string, $pad_type );
}

Use like this:

echo mb_str_pad("ddã", 10, "0", STR_PAD_LEFT); // 0000000dda
echo str_pad("ddã", 10, "0", STR_PAD_LEFT);    // 000000ddã

See DEMO