Why the Set
removes duplicate elements?
The term Set, English for "set", comes from the mathematical concept of Ensemble. Of the definition, a set is composed of elements (or no element, the empty set). A set does not have the notion of duplicate elements.
From this definition, it is only possible to affirm that a set contains or does not contain any element. It is therefore not appropriate to say that an element is present more than once (contrary to what is possible in a list, for example).
So it doesn’t make sense that languages that implement this concept (such as the Set
, Javascript) allow the inclusion of the same element several times in a set. In view of this, it is expected that data structures representing the idea of a mathematical set will have a mechanism to prevent the insertion of the same element more than once.
Like the Set
removes duplicate elements?
Elements can be added to a set by the method Set.prototype.add
. To prevent equal values from being added to the set, algorithm of the method makes a check before actually entering the element.
So if the programmer calls the method add
passing a value that already exists in the set, the insertion will not occur again - the set will remain unchanged. Example:
const set = new Set(); // Set(0) {}
set.add('foo'); // Set(1) {"foo"}
set.add('bar'); // Set(2) {"foo", "bar"}
// Elemento "foo" já existe, `set` será mantido inalterado.
set.add('foo'); // Set(2) {"foo", "bar"}
It is also noteworthy that in the instantiation of a new Set
, the constructor may receive an iterable with the initial elements to be added. In this way, the implementation will go through for each value of the eternal (which can be passed to the first argument) and calls the Set.prototype.add
for each element. Thus, if the supplied material has duplicated elements, the set shall not be subject to the insertion of duplicate values.
The comparison mechanism used by Set
to determine whether two elements are equal is the SameValueZero
.
In short, Javascript does not allow an instance of Set
has duplicated elements.
What about the device to remove duplicate values from the array? How it works?
As we saw above, the builder Set
accepts an iterable in its first argument, using the method add
when iterating over each iterated value, which avoids repeated insertions.
How Javascript arrays are iterable, when providing an array to the first constructor argument Set
, each array value will be traversed, so that each single element of the array will be added to the instance under construction. For example:
const set1 = new Set([1, 2, 3, 1]);
// É o mesmo que fazer:
const set2 = new Set();
set2.add(1);
set2.add(2);
set2.add(3);
set2.add(1); // Como já existe, o `add` não alterará o conjunto
Thus, a new set is created with each unrepeated element of the array. This same procedure can be done for any iterable, such as strings.
Since the set has been created, the duplicated elements of the array have already been removed. From there, it is necessary to create a new array, since most people, when providing an array, expect an array back. Of course the Set
is a detail of the implementation of the "algorithm" to remove duplicate elements from an array.
That’s what the Operator spread makes. When used within a literal array, the spread iterates over each element of an iterable (as is the case with set, previously created) and adds, element by element, to the array in which it was applied.
Once the new array has been created, there is nothing more to do. It’s over. You already have a new array without repeated elements.
When and why sometimes this doesn’t work with objects?
As we have seen above, the algorithm used by the method Set.prototype.add
to determine whether two members are equal (or different) is the SameValueZero
. The comparison SameValueZero
, which, for objects, works similarly to the operator ===
, compares them via reference.
To SameValueZero
, two objects will be equal only if they have the same reference.
Thus, although they have all the same members, all these objects are different for the SameValueZero
(so much so that the set created does not remove them by duplicity). See:
const arr = [
{ id: 1 },
{ id: 1 },
{ id: 1 }
]; // Podem parecer objetos iguais, mas possuem referências diferentes,
// já que não se tratam do **mesmo** objeto.
// Logo, o `Set` não os remove por "duplicidade":
const set = new Set(arr);
// O "novo array" irá conter todos os mesmos três objetos:
const newArr = [...set];
console.log('Initial:', arr); //=> [{ id: 1 }, { id: 1 }, { id: 1 }]
console.log('Final:', newArr); //=> [{ id: 1 }, { id: 1 }, { id: 1 }]
However, in this other example below, it is a list of three elements, the three being the same object. In this case, the set removes it by duplicity. See:
const obj = { id: 1 };
const arr = [
obj,
obj,
obj
]; // Todos os três elementos são objetos iguais (possuem a mesma referência).
// Logo, o `Set` os remove por duplicidade:
const set = new Set(arr);
// O "novo array" irá conter apenas um elemento.
const newArr = [...set];
console.log('Initial:', arr); //=> [{ id: 1 }, { id: 1 }, { id: 1 }]
console.log('Final:', newArr); //=> [{ id: 1 }]
In cases where objects must be removed by some duplicity criterion, therefore the Set
may not be ideal. In most cases, it is ideal to create a comparison algorithm at hand. See more in this other answer.
+1 Good P/R. "..when providing an array to the first constructor argument
Set
, each value of the array will be traversed,", then we can say thatSet
influences the complexity of an algorithm? I was with this doubt– Cmte Cardeal
@Cmtecardeal, yes. Some operations of
Set
, ashas
sane, per-spec, sublineares in most cases; can, depending on the implementation, arrive at something better (see more). However, it is clear that to build a set from an array, it will be linear, since each element of the array will have to be traversed. So, yes, it can affect. In the case of this question, linearity comes from the need of the algorithm itself. I don’t think it’s possible to remove all duplicates from an array with anything smaller thanO(n)
.– Luiz Felipe