Function in C that counts how many times the string appears in a file

Asked

Viewed 59 times

0

I need to make a function that receives as parameters a pointer for a string and the name of a file and return to the main how many times the string was found in the file.

I only managed to get her to search the string only once.

#include <stdio.h>
#include <string.h>

void buscaPalavra();

int main(void) {

  char nomeArquivo[30]= "teste.txt";
  char palavra[30] = "unisinos";
 
  //printf("Digite o nome do arquivo: \n");
  //scanf("%s",nomeArquivo);

  //printf("Digite a palavra que deseja buscar: \n");
  //scanf("%s",palavra);

  buscaPalavra(nomeArquivo,palavra);

  return 0;
}

void buscaPalavra(char *nomeArquivo, char *palavra){

  FILE *arquivo;  
  char conteudo[100];
  int qtd = 0;
  int aux;
  int i;
  int flag;

  if((arquivo = fopen(nomeArquivo,"r")) == NULL){
    printf("Erro ao abrir o arquivo\n");
  }

  fgets(conteudo, 98, arquivo);


    for(i=0; i < strlen(conteudo); i++){
      if(conteudo[i]==palavra[0]){
          aux=i;
          flag =1;
        }
      }
      for(int j = 0; j <= strlen(palavra); j++){
        if(palavra[j] == conteudo[aux]){
          aux++;
        } else {
          flag =0;
          break;
        }
      }
      if(flag != 0){
        qtd++;

  }
  fclose(arquivo);
  printf("Total de palavras eh: %d\n",qtd);
}

2 answers

1


Well, you only call once fgets(conteudo, 98, arquivo). What if the file has more than 98 characters? If so, you should make one loop and read until there is nothing left to read. Something like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// conta quantas vezes a palavra aparece no arquivo
int contaOcorrencias(char *nomeArquivo, char *palavra){
    char conteudo[100];
    FILE *arquivo = fopen(nomeArquivo, "r");
    if (arquivo == NULL) {
        printf("Erro ao abrir arquivo %s\n", nomeArquivo);
        exit(-1); // se deu erro, sai
    }

    int cont = 0;
    // loop para ir lendo o arquivo
    while (fgets(conteudo, 100, arquivo) != NULL) {
        char *tmp = conteudo;
        while ((tmp = strstr(tmp, palavra)) != NULL) {
            cont++; // encontrei uma ocorrência
            tmp++; // continua a busca a partir da posição seguinte
        }
    }
    fclose(arquivo);
    return cont;
}

int main() {
    // declara o nomeArquivo, palavra, etc...

    // obtém a quantidade de vezes que a palavra ocorre no arquivo
    int ocorrencias = contaOcorrencias(nomeArquivo, palavra);

    // faz o que quiser com o valor (printf, etc)
    printf("A palavra \"%s\" ocorre %d vezes no arquivo %s\n", palavra, ocorrencias, nomeArquivo);

    return 0;
}

It is worth remembering that fgets reads to the specified character limit, or until you find a line break (whichever occurs first). Usually there is the concern to remove this line break, but in this case I do not think necessary, since you will look for the word, and the line break at the end will not influence the search (unless, of course, the "word" has line breaks). I am also assuming that there are no words "broken" (that there was separation of syllables and it continues in the line below, for example).

Then I use strstr, that checks whether a string occurs within another, and returns a pointer to the character where the occurrence starts (or NULL if you don’t find anything). I make a loop because it may be that the word occurs more than once on the same line, and as I meet, I update the counter.


It is worth remembering that this approach is naive, because if I look for "run" and in the file have the words "help" or "corrosive", both will be accounted for. If you want to be more precise (consider only the word "run", ignoring the cases where it is part of a word), there already complicates a little more, because you would have to analyze the sentence, checking if there are separators (spaces, punctuation marks, etc).

One option is to use strtok to separate the string into parts:

// conta quantas vezes a palavra aparece no arquivo
int contaOcorrencias(char *nomeArquivo, char *palavra) {
    char conteudo[100];
    FILE *arquivo = fopen(nomeArquivo, "r");
    if (arquivo == NULL) {
        printf("Erro ao abrir arquivo %s\n", nomeArquivo);
        exit(-1); // se deu erro, sai
    }

    int cont = 0;
    char *delimiters = " ,.-;!?";
    while (fgets(conteudo, 100, arquivo) != NULL) {
        char *tok = strtok(conteudo, delimiters);
        while (tok != NULL) {
            if (strcmp(tok, palavra) == 0)
                cont++;
            tok = strtok(NULL, delimiters);
        }
    }
    fclose(arquivo);
    return cont;
}

As a separator I used " ,.-;!?", that is, it separates the parts using space, comma, point or some of the other characters (adapt to your case), which solves the case already mentioned in which the word to be searched is part of another larger word.

  • had already tried with strstrstr however n had managed, so I did otherwise. But now I understood what was done, thank you very much for the help!!

-1

Place a repeating structure in function invocation buscaPalavra forces the function to perform more than one search.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.