Doubt the $group mongodb

Asked

Viewed 2,380 times

4

I need to use the grouping operator $group of mongodb, but every explanation I find is very confusing.

How this works and what is the benefit of using this operator?

1 answer

8


The $group is one of the stages of aggregate. The idea of aggregate is to establish a pipeline of operations on a Collection that will produce a certain output. It is an alternative to map-reduce offered by Mongodb. In the documentation on aggregation mongodb, the use of aggregate is described in pseudo-code as:

db.collection.aggregate([ { <stage> }, ... ])

That is to say, db.collection.aggregate receives an array of stages, stages in the pipeline (such as the $group). There are several stages described in this link above. The simplest, would be the $match, that simply filters the results the moment they pass through it to the next stage of the pipeline. For example:

db.collection.aggregate([
  { $match: { nome: 'Wallace' } },
  { $match: { idade: 10 } }
])

You will first filter all documents through the field nome and then the field idade. Note that this could be redundant and slower than just running { $match: { nome: 'Wallace', idade: 10 } }, but Mongodb performs optimizations in the pipeline you define and one of them combines several $matchs in a row.

As to the $group, the idea is to pass a field _id, which defines how you want to group the results of your pipeline and several Fields that work on all documents generating some final result. For example:

db.collection.aggregate({
  { $match: { nome: 'Wallace' } },
  { $group: { _id: '$idade', total: { $sum: 1 } } }
})

Will first filter all documents, finding those with doc.nome == 'Wallace' and then group them by idade. Thus, all groups of documents of the same age will be represented by a single object, with the format:

{ 
  _id: <alguma-idade>,
  total: <0 + 1 para cada documento agrupado (portanto: o total de Wallaces com essa idade)>
}

The $sum above is a stage operator $group. It takes some parameter, which can be calculated for each document, and produces the sum of all results for all documents. If we wrote:

db.collection.aggregate([
  { $group: { _id: '$nome', somaDasIdades: { $sum: '$idade' } } }
])

We would receive the sum of all ages for each group of documents with the same name.

The complete list of operators to produce stage results $group is here:

The value that stands next to the _id or of $sum is any valid expression, so it can be:

  • a literal value, such as 'Wallace'

  • a path to a field in the documents that are passing '$documento.campo'

  • an object that applies multiple expressions to specific fields

An example using an object like _id would be:

db.collection.aggregate([
  {
    $group: {
      _id: {
        nome: '$nome',
        idade: '$idade'
      }
    }
  }
])

This will create groups (without other fields outside the _id) of all documents with the same idade and the same nome.

There is also a function in REPL db.collection.group, but she’s just a helper to do aggregateonly with one internship $group.

I think this gives the basic notion that it is possible to pass quickly. I strongly suggest that you read the documentation of aggregate that I Linkei up:

About why you use one aggregate, I think it depends extremely on what you are going to do. Just like the map-reduce, this is the kind of operation to be done when the amount of data you are operating on is large enough that it is not worth processing in your application code. In such cases, use something like a aggregate will be more (much more) efficient than pulling a large amount of data into your application and treating them in it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.