Problems
There are some flaws in the logic that is implemented:
- You are not removing the various repeated successively as you only use the
a
at the end of the second for
- Can’t do
free(p)
and then use on for
to move on to the next with p = p -> prox
- The advance
agenda = p -> prox
it is not necessary to remove duplicates from the first rather than the first itself. This makes your instruction should be agenda->prox = p->prox
or p->prox =q->prox
since it stops a
was NULL
p
must be exactly in the next element
Fixing the problems
Using the logic and variables you have I suggest you do so:
Elemento* remove_duplicados(Elemento* agenda){
Elemento* a;
Elemento* p;
Elemento* q;
for(p = agenda; p != NULL; p = p -> prox ){
a = NULL; //a cada varrimento a começa a NULL
for(q = p -> prox; q != NULL; ){ //sem incremento
if(0 == strcmp(p -> info.nome, q -> info.nome)){ //teste de igual
if(a == NULL){ //remove do inicio
p -> prox = q -> prox;
}
else{ //ou do meio/fim
a -> prox = q -> prox;
}
Elemento* remover = q;
q = q -> prox;
free(remover);
}
else { //so atualiza o anterior quando não é igual
a = q;
q = q->prox;
}
}
}
return agenda;
}
I withdrew the comments I had in order to highlight the ones I put, and with the explanations associated with the exchanges I made.
See this code working on Ideone
Refactoring for better readability
The names of variables that have are very little suggestive and make it difficult to read. You should always try to give representative names of the content that the variables have. In addition, the first element is never changed as duplicates are removed forward, so the type of return is not necessary either, and should be void
.
Taking this into account we can make the function much clearer:
void remove_duplicados(Elemento* agenda){
Elemento *anterior, *corrente, *seguinte;
for(corrente = agenda; corrente != NULL; corrente = corrente -> prox ){
anterior = NULL;
for(seguinte = corrente -> prox; seguinte != NULL; ){
if(0 == strcmp(corrente -> info.nome, seguinte -> info.nome)){
if(anterior == NULL)
corrente -> prox = seguinte -> prox;
else
anterior -> prox = seguinte -> prox;
Elemento* remover = seguinte;
seguinte = seguinte -> prox;
free(remover);
}
else {
anterior = seguinte;
seguinte = seguinte->prox;
}
}
}
}
Efficiency with Hash Table
If there is concern regarding efficiency, the solution presented by you, which was the one I took, can bring problems if the amount of data is large, since it is a quadratic solution, with time complexity in the order of O(n²).
There are several ways to improve the solution but with a hash table and ensuring few collisions we achieve a solution in the order of O(n).
We start with the structure of each hash table entry and the respective array that represents it:
typedef struct entrada {
struct info *dados;
struct entrada *prox;
} entrada;
#define TAMANHO_HASH 100
entrada *hash_nomes[TAMANHO_HASH];
To find the input for a given value we need the hash, which in this case we can use as strings:
unsigned long hash(char *str){
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c;
return hash;
}
Create a good function of hash is in itself a science, and will be reflected in the performance of the algorithm as it will create many or few collisions depending on how it is built. For simplicity I used an already known.
The function that remove_duplicados
is now further simplified as it is based entirely on the hash to know if the current value already exists:
void remove_duplicados(Elemento* agenda){
Elemento *corrente = agenda, *anterior = NULL;
while (corrente != NULL){
if (adiciona_se_inexistente(&corrente->info) == 0){ //0 não adicionou por já existir
anterior->prox = corrente->prox;
Elemento* remover = corrente;
corrente = corrente -> prox;
free(remover);
}
else {
anterior = corrente;
corrente = corrente -> prox;
}
}
}
Here the function was used adiciona_se_inexistente
to add to the table of hash if there is no:
int adiciona_se_inexistente(struct info* dados){
unsigned long key = hash(dados->nome) % TAMANHO_HASH; //achar a entrada
entrada *corrente = hash_nomes[key];
while (corrente != NULL){ //verificar se já existe
if (strcmp(corrente->dados->nome, dados->nome) == 0){
return 0;
}
corrente = corrente->prox;
}
//não existe adiciona a nova entrada
entrada *nova = malloc(sizeof(entrada));
nova->prox = hash_nomes[key];
nova->dados = dados;
hash_nomes[key] = nova;
return 1;
}
The table of hash has to be initialized with all entries NULL
, what I did using the function memset
.
Remarks:
- I implemented the hash with internal collision list instead of open addressing, which ultimately uses more space, but is potentially simpler to implement.
- This solution is much more performance efficient if we ensure we have few collisions, something that depends on the function of hash and the size of the table. However it will use more memory than its original solution. Note that the size of
100
that I chose for the table is good for the amount of names I used (5
), but if it has more names it can already generate many collisions, so it is something that has to adapt to your needs.
- It is also worth remembering that if you only need this function at one point in the program and you are sure that it will not be needed later, you need to free up the space of all filled entries in the hash, with
free
.
Example of this implementation in Ideone
Merge a list but do not put repeat results
– Jefferson Quesado