How to count the number of characters of the word that came from the first line of a text file?

Question

How to count the number of characters of the word that came from the first line of a text file?

Asked 10 years, 11 months ago

Viewed 15,096 times

8

Below is an example of how to count the number of characters in a string:

$palavra ="coisa";
echo strlen($palavra); //retorna o número 5

However I am taking this word from a text file and the strlen it’s not working, see:

$f = fopen("palavras.txt", "r");
echo fgets($f); // Até aqui funciona: ecoa "casa".
echo strlen($f); //A primeira palavra do arquivo de texto é "casa", 
                //  mas o echo não ecoa 4.

I’ve tried to do it this way too and it didn’t work:

$f = fopen("palavras.txt", "r");
$palavra = fgets($f); 
echo strlen($palavra); // Está ecoando 6 que não corresponde aos 4 caracteres da       
                      //palavra "casa".

OBS.: currently the file contains 3 words, each one in a row. But I intend to put more words.

I started to do the form below, but still it is not returning 4 characters to home, is always returning 3 characters more than apalavra that I put in the first line in the file:

 $f = fopen("palavras.txt", "r");
 $palavra = fgets($f);
 echo strlen(trim($palavra));

ADDED ON 25/08/2014

Guys, since each word is an array of characters I was trying to print on the screen to check if it would print something more than the four letters of the word "home", I found that the word home is on:

echo $palavra[3];
echo $palavra[4];
echo $palavra[5];
echo $palavra[6];

What’s in 0, 1 and 2? I did a for printing all and the first three positions appear on the screen as lozenges with a question mark

I set both the html meta and the file when saving to utf-8 .

I’ve tried utf8_decode and nothing.

I figured if I always took out 3 characters of the result I would solve my problem I went to search and found this satckoverflow question in English: https://stackoverflow.com/questions/4057742/how-to-remove-efbbbf-in-php-string

One guy does exactly this, but another also warns that discarding the GOOD is not a good idea,even because if one time the GOOD is not set I would be failing to count 3 characters of my word. I don’t want to gambiarra. I want to understand.

Look at my final code working:

//Nesse arquivo na primeira linha tenho somente a palavra "casa"
$f = fopen("palavras.txt","r");
$palavra = fgets($f);
$car= strlen(trim($palavra)) - 3;
echo $car;
//Com o código acima retorno o valor 4, sem o (-3) retorna 7.

Ma will be fine?

**SOLVED! SAVING WITHOUT THE BLESSED "GOOD"! ON THE NOTEPAD++ HAS TO GO ON

ENCODING

BECAUSE WINDOWS NOTEPAD DOESN’T HAVE THAT OPTION.**

Thank you all! The @Jader reply is very useful and I will definitely use it, but according to the question if anyone else in the forum needs this information @bfavaretto put it all.

Does the file contain a single word? Always? Can it be two words? Several lines? . . The best is [Edit] the question to clarify this.

– brasofilo

2014/08/24 at 15:14
Added details.

– I Wanna Know

2014/08/24 at 15:27
1

Actually, I’m trying to understand your question and the reply of bfavaretto in the context of your comment "And how to measure only the word?"... if you only have one word per line and his code reads the first line, I don’t understand what the problem is...

– brasofilo

2014/08/24 at 15:35
I know that logically speaking may have sounded strange to you, but is that even using the "Trim", keeps returning 3 more characters than the word actually has.

– I Wanna Know

2014/08/24 at 16:59
Does this only happen with the first line? Does the other lines count? It may be the BOM of UTF-8 http://en.wikipedia.org/wiki/Byte_order_mark

– Marcos

2014/08/25 at 12:00
Look, I’m going to use @Jader’s answer to check the other lines, but if I treat it that simple I don’t even know how to get the second line.

– I Wanna Know

2014/08/25 at 12:03
The @bfavaretto mentioned this, but how do I remove GOOD if it is being employed? I set both the html meta and the file when saving to utf-8 .

– I Wanna Know

2014/08/25 at 12:07

Show 2 more comments

2 answers

8

The variable $f represents the file. The command fgets($f) reads the next line of the file (in your example, the first). So it makes no sense to try to measure $f, you need to measure fgets($f):

$f = fopen("palavras.txt", "r");
$linha = fgets($f);
echo strlen($linha);

As pointed out by @mgibsonbr, the return of fgets includes line breaking. In a file containing only the word "home" and formatted in Windows, this means casa\r\n, that is, the car return and the line break count and the length gives 6. You can use trim to remove these characters (it removes blanks at the beginning and end of the string, including line breaks and tabs):

echo strlen(trim($linha));

Another important detail: in UTF-8 encoding, certain characters, such as accents, will occupy more than one byte, and PHP will incorrectly measure the length in these cases. To solve, you’ll need to use the function mb_strlen:

echo mb_strlen(trim($linha));

To read all the lines, just use a loop. Putting everything together, it looks like this:

$f = fopen("palavras.txt", "r");
while($linha = fgets($f) !== false) {
    echo $linha . ' - ' . mb_strlen(trim($linha)) . '<br>';
}

About PHP returning 3 more characters in the first line count, everything indicates that your TXT file is with UTF-8 encoding with BOM (byte order Mask). You need to change the encoding to UTF-8 without BOM. The way to do this depends on the editor, usually stays in the save dialog itself, or in a separate option of encoding.

Even with Trim is always returning 3 more characters of the word I put in the first line in the file.

– I Wanna Know

2014/08/24 at 17:16
2

Maybe you saved the file as UTF-8 "GOOD". This can cause extra characters at the beginning. @Iwannaknow

– bfavaretto

2014/08/24 at 18:08
I didn’t set it to GOOD. At least not intentionally. You don’t even have this option when saving txt. I added details at the end of the question.

– I Wanna Know

2014/08/25 at 12:46
By your description, you are with BOM, @Iwannaknow. The length of BOM is 3 bytes, which is the difference you are seeing. Search your editor for an option that explicitly says UTF8 without BOM.

– bfavaretto

2014/08/25 at 12:50
You don’t have that option. Here’s what you have: http://goo.gl/2p58pS

– I Wanna Know

2014/08/25 at 13:07
Which editor are you using? I complemented my answer, but you will really need to solve this encoding problem.

– bfavaretto

2014/08/25 at 13:09
Try using another editor, like Notepad++ @Iwannaknow

– bfavaretto

2014/08/25 at 13:09
I just did that, I was just coming to warn you, Thanks for the help! But don’t even set it to save, you have to go to Settings - preferences - new Ocument. Thanks man!

– I Wanna Know

2014/08/25 at 13:22
In Notepad ++ just go in ENCODING.

– I Wanna Know

2014/08/29 at 14:20

Show 4 more comments

Browser other questions tagged php filing-cabinet fopen

You are not signed in. Login or sign up in order to post.

by Jader A. Wagner • **4,921** points · Answer 1 · 2014-08-24T16:27:13+00:00

To get through all the words you need to do something like this:

$texto = file_get_contents('teste.txt');

$palavras = preg_split('/[\s\r\n\t[:punct:]]+/', $texto, -1, PREG_SPLIT_NO_EMPTY);

$tamanhos = array();
foreach($palavras as $palavra) $tamanhos[] = strlen($palavra);

for ($i = 0; $i < sizeof($palavras); $i++) {
    echo  $i . '.) "' . $palavras[$i] . '"  - ' . $tamanhos[$i] . '<br>';
}

test.txt

Casa grande é outra coisa!
Mas, custa caro para manter...

Upshot:

0.) "Casa" - 4
1.) "grande" - 6
2.) "é" - 1
3.) "outra" - 5
4.) "coisa" - 5
5.) "Mas" - 3
6.) "custa" - 5
7.) "caro" - 4
8.) "para" - 4
9.) "manter" - 6

To catch a word randomly use the rand() thus:

$r = rand(0,sizeof($palavras)-1);

echo 'Palavra aleatoria: ' . $palavras[$r] . ' - ' . $tamanhos[$r];

// Resultado:
// Palavra aleatoria: coisa - 5