The str_pad
does not work with multi-byte characters, since it does not add all characters at once.
The original code of str_pad
is exactly:
switch (pad_type_val) {
case STR_PAD_RIGHT:
left_pad = 0;
right_pad = num_pad_chars;
break;
case STR_PAD_LEFT:
left_pad = num_pad_chars;
right_pad = 0;
break;
case STR_PAD_BOTH:
left_pad = num_pad_chars / 2;
right_pad = num_pad_chars - left_pad;
break;
}
/* First we pad on the left. */
for (i = 0; i < left_pad; i++)
ZSTR_VAL(result)[ZSTR_LEN(result)++] = pad_str[i % pad_str_len];
/* Then we copy the input string. */
memcpy(ZSTR_VAL(result) + ZSTR_LEN(result), ZSTR_VAL(input), ZSTR_LEN(input));
ZSTR_LEN(result) += ZSTR_LEN(input);
/* Finally, we pad on the right. */
for (i = 0; i < right_pad; i++)
ZSTR_VAL(result)[ZSTR_LEN(result)++] = pad_str[i % pad_str_len];
ZSTR_VAL(result)[ZSTR_LEN(result)] = '\0';
RETURN_NEW_STR(result);
Source.
Note the presence of i % pad_str_len
, ie it just adds a single byte, which can make an unknown byte remain. For example, if you are using the chr(160)
, This is for Latin1 and not for UTF8.
In Latin1, the byte A0
represents "non-breaking space". But the same thing in UTF8 requires two bytes, being them C2 A0
. If you cut one of them, for example, isolating C2
, you will have a ?
.
If you want a "new version" of str_pad
we could create a mb_str_pad()
:
const STR_PAD_INSERT_ALL = 4;
function mb_str_pad(string $input, int $pad_length, string $pad_string, int $pad_type, string $pad_encoding = 'utf8') : string {
$result = '';
$pad_insert_all = 0;
$pad_inset_limit = 1;
$pad_str_len = mb_strlen($pad_string, $pad_encoding);
$input_len = mb_strlen($input, $pad_encoding);
if ($pad_length < 0 || $pad_length <= $input_len) {
return $input;
}
if(($pad_type & STR_PAD_INSERT_ALL) === STR_PAD_INSERT_ALL){
$pad_insert_all = PHP_INT_MAX;
$pad_inset_limit = null;
$pad_type -= STR_PAD_INSERT_ALL;
}
if ($pad_str_len === 0) {
trigger_error ( "Padding string cannot be empty", E_WARNING);
return $input;
}
if ($pad_type < STR_PAD_LEFT || $pad_type > STR_PAD_BOTH) {
trigger_error ("Padding type has to be STR_PAD_LEFT, STR_PAD_RIGHT, or STR_PAD_BOTH", E_WARNING);
return $input;
}
$num_pad_chars = $pad_length - $input_len;
if ($num_pad_chars >= PHP_INT_MAX) {
trigger_error ("Padding length is too long", E_WARNING);
return $input;
}
switch ($pad_type) {
case STR_PAD_RIGHT:
$left_pad = 0;
$right_pad = $num_pad_chars;
break;
case STR_PAD_LEFT:
$left_pad = $num_pad_chars;
$right_pad = 0;
break;
case STR_PAD_BOTH:
$left_pad = floor($num_pad_chars / 2);
$right_pad = $num_pad_chars - $left_pad;
break;
}
for ($i = 0; $i < $left_pad; $i++){
$result .= mb_substr($pad_string, ($i % $pad_str_len) &~$pad_insert_all, $pad_inset_limit, $pad_encoding);
}
$result .= $input;
for ($i = 0; $i < $right_pad; $i++){
$result .= mb_substr($pad_string, ($i % $pad_str_len) &~$pad_insert_all, $pad_inset_limit, $pad_encoding);
}
return $result;
}
This requires PHP 7+
This is an extremely version based on the original version of PHP, indicated above, with some changes:
It supports multi-bytes, so you can do:
mb_str_pad($nome, 30, "\xc2\xa0", STR_PAD_BOTH, 'utf8');
Differences from the original version:
PS: Assuming I have not entered any bug.
Support for multi-bytes:
It supports characters that require multiple bytes. You can specify the type of encoding used, including UTF8, which is the default.
A new "STR_PAD_INSERT_ALL":
You can insert the entire string, instead of "switching to each other", if you have a string with more than one character (example: "abc"), you can specify to always insert "abc", this has a side effect since the number of inserted characters is not measured. To use, just use STR_PAD_BOTH | STR_PAD_INSERT_ALL
, but that’s not necessary in your case.
Return in case of error:
Even if a WARNING is issued it will return the original string, which is not the behavior of the original function.
I tested it on ideone.com and it worked. It’s saving with utf-8 encoding?
– Marcos Xavier
Marcos Xavier, I’m using Notepad++, and yes, I’ve saved with the UTF-8 NO GOOD option. But I can’t see the blanks, only one and the others are suppressed in the view. PS: using str_replace() indicated by Isac, beauty, but would like to solve only with str_pad().
– Fernandes