What is the most efficient way to implement Groupby in Javascript?

Asked

Viewed 228 times

6

I’m trying to implement a GroupBy with these parameters

function GroupBy(keySelector, elementSelector, comparer)
{
    // keySelector = function(e) { return e.ID }
    // elementSelector = function(e) { return e.Name }
    // comparer = { Equals: function(a,b) { return a==b }, GetHashCode:... }
}

However I don’t know an efficient way to implement this.

I created a jsPerf test here with linq.js and a method I’ve created that doesn’t use a comparer, so this only works on 'normal' types'. (Testing of these methods here)

Other libraries like underscore and Lo-Dash does not have the parameter comparer then their implementation is not relevant.

The key of each group can be up to a class, need a comparer to know if the key value is equal between objects.

Basically I’m trying to copy C# Linq GroupBy documented here.

Input example:

var arrComplex =
[
    { N: { Value: 10 }, Name: "Foo" },
    { N: { Value: 10 }, Name: "Bar" },
    { N: { Value: 20 }, Name: "Foo" },
    { N: { Value: 20 }, Name: "Bar" }
];

Output example (or something like that):

[
    {
       "Key": {"Value":10},
       "Elements":["Foo","Bar"]
    },
    {
        "Key": {"Value":20},
        "Elements":["Foo","Bar"]
    }
] 

Any idea how to implement this?

  • Let me get this straight, you want to compare if keySelector (which is equal to the id of an element) is equal to elementSelector (which is the name of another element)?

  • I’m basically trying to copy the GroupBy of c# http://msdn.microsoft.com/en-us/library/bb534304(v=vs.110). aspx . keySelector is what will extract the key from an object to be the key to the group. elementSelector is what will extract the object to be placed in the list of group elements.

  • So you want to create an associative array, that’s it?

  • @Kennyrafael The group by does this, var arr= [{ Chave: 1, Valor: 2}, { Chave: 1, Valor: 3}]... Grouping the result is [{ Key: 1, Elements: [2,3] }]. It could also be an assosiative array... { 1: [2,3] }...

2 answers

5

Many libraries that implement GroupBy add that the entry is already keyed. A sorting by key is O(N*log(N)), but once performed it is possible to group lists of any size quickly, in O(N), because just compare the key of an element with the key of the last found group:

function GroupBySimples(a, keySelector, elementSelector, comparer)
{
    if ( a.length == 0 )
        return [];
    elementSelector = elementSelector || function(e) { return e; };
    comparer = comparer ||
        {
            Equals: function(a,b) { return a==b },
            GetHashCode: function(e) { return e.GetHashCode(); }
        };

    var ultimaChave = keySelector(a[0]);
    var ultimoElemento = { Key: ultimaChave, Elements: [elementSelector(a[0])] };
    var ret = [ultimoElemento];
    for (var i = 1, n = a.length; i < n; ++i)
    {
        var key = keySelector(a[i]);
        if ( comparer.Equals(key, ultimaChave) )
            ultimoElemento.Elements.push(elementSelector(a[i]));
        else {
            ultimaChave = keySelector(a[i]);
            ultimoElemento = { Key: key, Elements: [elementSelector(a[i])] };
            ret.push(ultimoElemento);
        }
    }

    return ret;
}

var arrComplexOrdenado = arrComplex.sort(function(a, b) {
    return a.N.Value - b.N.Value;
});
var resultArray = GroupBySimples(arrComplexOrdenado
        , function(e) { return e.N; }
        , function(e) { return e.Name; }
        , {
              Equals: function(a,b) { return a.Value == b.Value },
              GetHashCode: function(e) { return e.Value.GetHashCode(); }
          }
);

As you can see in this performance test, this method is more than 3x efficient than the alternatives presented (using hashes).

Notes:

  1. I increased the size of the input array of 4 for 100.000, to make the test more relevant.

  2. Are you sure that the linq.js is producing a correct result? His performance continued more or less the same, with a twenty-five thousand times greater input, something can only be wrong...

0


I managed to implement this way:

First I need a way to get the Hashcode of the objects.

Object.prototype.GetHashCode = function () {
    var s = this instanceof Object ? JSON.stringify(this) : this.toString();

    var hash = 0;
    if (s.length === 0) return hash;
    for (var i = 0; i < s.length; ++i) {
        hash = ((hash << 5) - hash) + s.charCodeAt(i);
    }
    return hash;
};
Number.prototype.GetHashCode = function () { return this.valueOf(); };

Now man GroupBy was like this

function GroupBy(a, keySelector, elementSelector, comparer)
{
    // seta os valores padrões para os parâmetros opicionais
    elementSelector = elementSelector || function(e) { return e; };
    comparer = comparer ||
        {
            Equals: function(a,b) { return a==b },
            GetHashCode: function(e) { return e.GetHashCode(); }
        };

    var key, hashKey, reHashKey;

    // agrupa elementos pelo hashcode
    var hashs = {};
    for (var i = 0, n = a.length; i < n; ++i)
    {
        // para caso do hash ser igual, mas retornar false no Equals
        reHashKey = undefined;

        // obtêm a chave do objeto
        key = keySelector(a[i]);
        // pega o hashcode do objeto
        hashKey = comparer.GetHashCode(key);

        // se o hash já existe na lista
        // compara os valores com Equals
        // caso retorne falso, gera um hash unico
        if (typeof hashs[hashKey] !== "undefined")
            reHashKey = comparer.Equals(key, hashs[hashKey].Key) ? hashKey : hashKey + " " + i;

        // se foi gerado um hash novo, utiliza-se esse
        if (typeof reHashKey !== "undefined" && reHashKey !== hashKey)
            hashKey = reHashKey;

        // pega ou cria um grupo e adiciona os elementos
        hashs[hashKey] = hashs[hashKey] || { Key: key, Elements: [] };
        hashs[hashKey].Elements.push(a[i]);
    }

    return hashs;
}

For testing

var arrComplex =
[
    { N: { Value: 10 }, Name: "Foo" },
    { N: { Value: 10 }, Name: "Bar" },
    { N: { Value: 20 }, Name: "Foo" },
    { N: { Value: 20 }, Name: "Bar" }
];
//

var x = GroupBy(arrComplex
        , function(e) { return e.N; }
        , function(e) { return e.Name; }
        , {
              Equals: function(a,b) { return a.Value == b.Value },
              GetHashCode: function(e) { return e.GetHashCode(); }
          }
);
//

console.log(x);

Example in jsFiddle

But, according to my tests, my implementation of GroupBy is slower than Linq.js. It is only faster when I convert the result to an array.

Test results

Nome                         op/s
---------------------------------
GroupBy                   163,261
GroupByToArray            152,382
linq.js groupBy           243,547
linq.js groupBy toArray    26,309
  • Are you sure you’re not comparing apples to oranges? Yours GetHashCode for example is terribly inefficient (and at first glance incorrect), is the "Burst" method making use of it? If your method is using it, and Lynne is not, it would explain the difference in performance...

  • That method GetHashCode I copied from some response in Stack Overflow, I don’t know how to implement this, how it would be more efficient?

  • For starters, it is important that the GetHashCode be it coherent with the Equals: if your comparer says that two keys are equal if the Values are equal, the key hash (according to the comparer) should only do the hash of Value - not the whole object. Why? Simply because two objects have the same Value and some other different property would have different hashes but the same value. As for a general purpose method to hash any object, it would take a while to think of something, but converting to JSON and hashing it doesn’t seem like a good idea.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.