Aggregation#

:manual:`Aggregation framework</aggregation/>` operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.

The Aggregation Pipeline#

The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into aggregated results.

For a full explanation and a complete list of pipeline stages and operators, see the :manual:`manual</core/aggregation-pipeline/>`.

The following example uses the aggregation pipeline on the restaurants sample dataset to find a list of the total number of 5-star restaurants, grouped by restaurant category.

client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'test')
coll = client['restaurants']
aggregation = coll.aggregate([
       { '$match'=> { 'stars'=> 5 } },
       { '$unwind'=> '$categories'},
       { '$group'=> { '_id'=> '$categories', 'fiveStars'=> { '$sum'=> 1 } } }
    ])

aggregation.each do |doc|
  #=> Yields a BSON::Document.
end

Inside the aggregate method, the first pipeline stage filters out all documents except those with 5 in the stars field. The second stage unwinds the categories field, which is an array, and treats each item in the array as a separate document. The third stage groups the documents by category and adds up the number of matching 5-star results.

Aggregation pipeline stages have a :manual:`maximum memory use limit</core/aggregation-pipeline-limits/#agg-memory-restrictions>`. To handle large datasets, set the allowDiskUse option to true to enable writing data to temporary files.

  • You can call the allow_disk_use method the aggregation object to get a new object with the option set:

aggregation = coll.aggregate([ <aggregration pipeline expressions> ])
aggregation_with_disk_use = aggregation.allow_disk_use(true)
  • Or you can pass an option to the aggregate method:

aggregation = coll.aggregate([ <aggregration pipeline expressions> ],
                              :allow_disk_use => true)

Single Purpose Aggregation Operations#

MongoDB provides helper methods for some aggregation functions, including :manual:`count</reference/command/count/>` and :manual:`distinct</reference/command/distinct/>`.

Count#

The following example demonstrates how to use the count method to find the total number of documents which have the exact array [ 'Chinese', 'Seafood' ] in the categories field.

client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'test')
coll = client['restaurants']
aggregation = coll.count({ 'categories': [ 'Chinese', 'Seafood' ] })

count = coll.count({ 'categories' => [ 'Chinese', 'Seafood' ] })

Distinct#

The distinct helper method eliminates results which contain values and returns one record for each unique value.

The following example returns a list of unique values for the categories field in the restaurants collection:

client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'test')
coll = client['restaurants']
aggregation = coll.distinct('categories')

aggregation.each do |doc|
  #=> Yields a BSON::Document.
end