You want to create the following set C
:
To do so, you need to answer two questions:
- How to identify
e
belonging to A
?
- How to identify
e
not belonging to B
?
As in our case A
is a file then e
belongs to A
if it is the result of reading the file. So with this we can answer the first question.
Now, how do we know e
does not belong to B
? The irrelevant transaction entails the following:
If B
is a common set, it means that I will always need to compare with all of its elements to make sure that e
does not belong to B
. But we don’t need to stick to common sets, I can have ordered sets, hash maps or tree names, all these alternatives allow us to make optimizations in the amount of comparisons made. An efficient search data structure will reduce the number of operations performed.
For the scope of this answer, I’m not optimising the amount of comparisons. I’m also taking into account that the file that contains the set B
is relatively small, with a few megabytes at most.
As the whole B
is always consulted as a whole, and the whole A
only has of importance the generation of the next element, in the initialization of my algorithm I will pre-load fully B
. The initialization of the set A
, in this case, it will be simply open the file.
The general idea is more or less the following:
inicializa conjunto A
inicializa conjunto B
faça:
pegue _e_ o próximo elemento de A
se _e_ não pertence a B:
adiciona _e_ em C
enquanto não chegou ao fim do conjunto A
The initialization of A
would only open the file. If we were to try to optimize comparisons between A
and B
, we could use some data structure in A
as an ordered vector.
inicialização conjunto A:
abre arquivo "listaCompleta"
The initialization of the set B
here is its complete reading. As we do not know its total dimension a priori, we can use a chained list, whose node contains a structure nome
and a pointer to the next element of the linked list:
inicialização conjunto B:
nodo_lista *conjuntoB = NULL
abre arquivo "listaPresenca"
faça:
nodo_lista *novoElemento = malloc(sizeof(nodo_lista))
novoElemento->next = conjuntoB
lê do arquivo "listaPresenca" no endereço &(novoElemento->valor)
se a leitura deu certo:
conjuntoB = novoElemento
enquanto não der fim do arquivo "listaPresenca"
fecha arquivo "listaPresenca"
Take the next element and just give one fread
the way you did:
pegue _e_ o próximo elemento de A:
nome _e_
lê do arquivo "listaCompleta" no endereço &_e_
The relevance of e
in B
is being treated by observing the whole set B
, then we need to do a full iteration:
_e_ pertence a B?
nodo_lista *elementoB = conjuntoB
enquanto elementoB != NULL:
se elementoB->valor é igual a _e_:
retorna "_e_ pertence a B"
elementoB = elementoB->next
retorna "_e_ não pertence a B"
The comparison between two elements of the type nome
is comparing the field nomePessoa
of the two objects using strcmp
:
_a_ é igual a _b_?
retorna strcmp(_a_.nomePessoa, _b_.nomePessoa) == 0
Add element e
to the whole C
can be put the new element in a list or write directly to the file with fwrite
. If the alternative of putting yourself on a list is used, after completing the analysis of all A
, it is necessary to write these items in the file.
Differences between our approaches
Basically, I’m filling in the set B
in working memory, while you try to keep using external memory (such as HD). The problem in your approach is that, before reviewing again an item of the set A
, would be necessary to reposition the reading of the file representing the set B
for the start again. A simple fseek
right after reading the first file, forcing the second file back to the beginning, would sometimes return to the beginning of the set iteration B
.
As its alternative implies a quadratic amount of readings to the external memory, I did not think it was a practical solution. Thus, using a linear amount of readings to external memory, I soon fill the whole set B
.
Do you really need to be in [tag:C]? A shell script would solve your life much more easily
– Jefferson Quesado
Yes, unfortunately it has to be in C
– Marcos Rodrigues
I can write an algorithm here with very low efficiency, but it fits your case perfectly. To optimize, I’m already putting alternatives in the answer like sort the data sets or use an efficient search structure, like a hash or tree, but I don’t dwell too much on the subject because your focus here is not speed.
– Jefferson Quesado
You also asked for an algorithm, so I’m taking the liberty of putting the idea, without much focus on the C itself
– Jefferson Quesado