Mongo DB Aggregation(Part 1) – Single Purpose Aggregation Operations
MongoDB offers more than one ways of aggregating data stored in a collection. It has a flexible datamodel and gives us tools to write some pretty complex queries and derive results similar to what SQL queries could do. Like queries, aggregation operations in MongoDB use collections of documents as an input and return results in the form of one or more documents.
There are 3 main aggregation operators :
- Single purpose aggregation operations
- Aggregation Framework
- Map Reduce
In this part, we will have a look at the “Single purpose aggregation operators” and their usage with examples. Though these are limited in scope, they might provide simple way of doing the common data processing operations.
We will consider below collection as an example for reference:
{ "_id": 1, "emp_dept": "Sales", joining_date: "01-10-2014", "job_roles": [ "Manager", "Regional Head" ], salary : { "base": 2100, "variable": 800 }}
{ "_id": 2, "emp_dept": "Sales", joining_date: "03-01-2014", "job_roles": [ "Analyst", "Sales Officer"], salary : { "base": 2000, "variable": 500 }}
{ "_id": 3, "emp_dept": "Finance", joining_date: "31-01-2015", "job_roles": "Manager" , salary : { "base": 1000, "variable": 500 }}
{ "_id": 4, "emp_dept": "Sales", joining_date: "03-03-2015", "job_roles": [ "Analyst"], salary : { "base": 1500, "variable": 500 } }
{ "_id": 5, "emp_dept": "Finance", joining_date: "28-05-2015", "job_roles": [ "Analyst" , "Regional Head"], salary : { "base": 3000, "variable": 500 } }
To help easy understanding, we will first write an SQL query for each example and then translate to a MongoDB command.
1. COUNT – counts all documents or the ones that match a query (if specified)
Syntax:
db.<collection.count(query) or db.runCommand({ count: , query: , limit: , skip: , hint: })
Examples:
Query ==>
select count(*) from employees where joining_date >= '01-01-2015';
In Mongo ==>
db.employees.count({ joining_date: { $gte: new Date("2014","12","31") }}) or db.runCommand( { count:'employees', query: { joining_date: { $gte: new Date("2014","12","31") } } } )
Result ==>
{ "n" : 3, "ok" : 1 }
* hint option is like a performance hint, used to specify a index that can be used.
2. DISTINCT – returns distinct values for a key in collection. In case the key is an array field, it considers each value in the array as different value.
Syntax:
db..distinct(field, query) or db.runCommand({ distinct: "", key: "", query: })
Examples:
Query 1 ===>
Select emp_dept from employees;
In Mongo ===>
db.runCommand({ distinct: "employees", key: "emp_dept"}) or db.employees.distinct("emp_dept")
Result ==>
{ "values" : [ "Sales", "Finance" ], "stats" : { ... }, "ok" : 1 }
Query 2 ===>
Select distinct job_roles from employees where emp_dept = 'Sales';
In Mongo ===>
db.runCommand ( { distinct: "employees", key: "job_roles", query: { emp_dept: "Sales"} } )
Result ===>
{ "values" : [ "Manager", "Regional Head", "Analyst", "Sales Officer"], "stats" : { ... }, "ok" : 1 }
3. GROUP – groups the documents in a collection by given key and allows to perform simple aggreation functions like sum/count on fields.
Syntax:
db.collection.group({ key, reduce, initial [, keyf] [, cond] [, finalize] }) or db.runCommand({ group: { ns: , key: , $reduce: , initial:, $keyf: , cond: , finalize: } })
Examples:
Query ===>
Select emp_dept, sum(salarybase) as basesalary, sum(salaryvariable) as variablesalary, count(_id) as empcount, avg(salary) as avgsalary from employees group by emp_dept;
In Mongo ===>
db.runCommand({ group:{ ns: "employees", key:{emp_dept:1}, initial: { basesalary : 0, variablesalary: 0, empcount:0 }, $reduce: function ( curr, result ) { result.emp_dept = curr.emp_dept, result.basesalary += curr.salary.base; result.variablesalary += curr.salary.variable; result.empcount++; }, $finalize:function(result){ result.avgsalary = (result.basesalary+result.variablesalary)/result.empcount; }}}) or db.employees.group({key:{emp_dept:1}, initial: { basesalary : 0, variablesalary: 0,empcount:0 }, reduce:function ( curr, result ) { result.emp_dept = curr.emp_dept, result.basesalary += curr.salary.base; result.variablesalary += curr.salary.variable; result.empcount++; },finalize:function(result){ result.avgsalary = (result.basesalary+result.variablesalary)/result.empcount; }})
Result ===>
{"retval" :[ { "emp_dept" : "Sales", "basesalary" : 5600 ,"variablesalary": 1800, "empcount": 3, "avgsalary": 2466.666666666667}, { "emp_dept" : "Finance", "basesalary" : 4000 ,"variablesalary": 1000, "empcount":2,"avgsalary": 2500}, ], "count" : 5, "keys" : 3, "ok" : 1}
This Group function / command does not work with sharded clusters and also has limitation on the size of the array that is returned. The aggregation framework or map reduce provide more stronger options.
So, this is it. Simple but handy.
We will look at the Aggregation Framework in Part II of this blog. Stay tuned !!
Shraddha Tambe | Helical IT Solutions
References:
http://docs.mongodb.org/manual/core/single-purpose-aggregation/
http://docs.mongodb.org/manual/reference/method/js-collection/
Best Open Source Business Intelligence Software Helical Insight is Here