Mongo DB Aggregation – Part II – The Aggregation Framework

Posted on August 23, 2015 by By Shraddha Tambe, in Business Intelligence, Databases | 0

Mongo DB Aggregation(Part 2) – The Aggregation Framework
The Mongo DB aggregation framework or aggregation pipeline lets you define a data processing pipeline. It takes multiple documents in a collection, puts them through the multiple Stages we define and transforms them into an aggregated result. It allows us to do much more complex aggregation than the “single purpose aggregation operators”, that we saw in the first part of this blog (Part I).

Let’s dive into an example to see what we can do with this. Let’s take the same collection we used in the part I with some additional documents.


{ "_id": 1, "emp_dept": "Sales", joining_date: ISODate("2014-10-01T00:00:00.000Z"), "job_roles":[ "Manager", "Region Head" ], salary :{ "base": 2100, "variable": "800" }}
{ "_id": 2, "emp_dept": "Sales", joining_date: ISODate("2014-01-03T00:00:00.000Z"), "job_roles": [ "Analyst", "Sales Officer"], salary : { "base": 2000, "variable": 500 }}
{ "_id": 3, "emp_dept": "Finance", joining_date: ISODate("2015-01-31T00:00:00.000Z"), "job_roles": "Manager" , salary : { "base": 1000, "variable": 500 }}
{ "_id": 4, "emp_dept": "Sales", joining_date: ISODate("2015-03-03T00:00:00.000Z"), "job_roles": [ "Analyst"], salary : { "base": 1500, "variable": 500 }}
{ "_id": 5, "emp_dept": "Finance", joining_date: ISODate("2015-05-28T00:00:00.000Z"), "job_roles": [ "Analyst" , "Region Head"], salary : { "base": 3000, "variable": 500 }}
{ "_id": 6, "emp_dept": "Engineering", joining_date: ISODate("2015-01-29T18:30:00.000Z"), "job_roles": "Manager" , salary : { "base": 10000, "variable": 1500 }}
{ "_id": 7, "emp_dept": "Engineering", joining_date: ISODate("2014-05-02T18:30:00.000Z"), "job_roles": [ "Superintendent"], salary : { "base": 5000, "variable": 500 } }
{ "_id": 8, "emp_dept": "Engineering", joining_date: ISODate("2015-05-26T18:30:00.000Z"), "job_roles": [ "Superintendent" , "Region Head"], salary : { "base": 15000, "variable": 2000 } }

I will take the sql query below, which has most of the clauses we generally use and then see how the mongoDB aggregation looks like for the same.

— Query —

Select emp_dept,
sum(salarybase) as basesalary, sum(salaryvariable) as variablesalary,
count(_id) as empcount, avg(salary) as avgsalary
from employees
where joining_date >= '2015-01-01'
group by emp_dept
having avgsalary >= 2500
order by avgsalary desc;

— Aggregate function —

db.employees.aggregate( [
   { $match: { joining_date: { $gte: new Date("2015","00","00") }}},
   { $group: { _id: "$emp_dept", 
               basesalary: { $sum: "$salary.base" }, 
               variablesalary: {$sum: "$salary.variable"},
               empcount :{$sum:1}
              }},
   { $project: { 
               basesalary : "$basesalary",
               variablesalary: "$variablesalary",
               empcount :"$empcount",
               avgsalary: {$divide:[{ $add: [ "$basesalary", "$variablesalary" ] }, "$empcount" ] }
               }},
   { $match: { avgsalary: { $gte: 2500 } } }
] )

— Stages explained —
$match :

This is typically a filter expression. The matching documents are passed to the next stage with no modifications. In our example, it acts as a WHERE as well as HAVING clause, as per the position of the $match stage in the pipeline. Firstly, it was used to filter the whole collection to only retrieve employees with joining date greater than ‘2015-01-01’. Later, it was used to filter only the groups (departments) whose average salary exceeded 2500.

The typical operators used in Expressions are $gt, $lt, $eq, $gte, $lte etc. Apart from Operator Expressions, there are Boolean Expressions, Set Expressions, Comparison Expressions, Arithmetic Expressions, String Expressions and so on. A lot to explore and utilise.

$group :

This the group by clause which allows us to group the documents using a given key and define aggregated fields. In our example, we group the records on “employee department” and then calculate aggregated “basesalary”, “variablesalary” and also the “count of employees” in the group.

Apart from $sum, there are many other Accumulators that can be used in the $group stage, like $avg,$first,$last,$min,$max etc.

$project:

This is the SELECT clause, wherein we define the fields that we would like to see in the resultset.In our example, we have copied some fields from the previous stage results (basesalary, variablesalary,empcount) and added a new field “avgsalary” which uses arithmetic functions like add & divide.

The other Arithmetic Expressions we can use are $multiply, $subtract, $mod. But as described earlier, we have quite some expressions at our disposal.

Apart from the ones described,there are other stages that we haven’t talked about like $unwind, $skip, $limit, $redact, $out, $geoNear.

But I hope, I could at the least get you started on using the Aggregation Pipeline.

Shraddha Tambe | Helical IT Solutions

References:
http://docs.mongodb.org/manual/meta/aggregation-quick-reference/
http://docs.mongodb.org/v2.6/reference/sql-aggregation-comparison/

Best Open Source Business Intelligence Software Helical Insight is Here

A Business Intelligence Framework

aggregation framework aggregation pipeline MongoDB mongodb aggregation mongoDB group No SQL Databases

0 0 votes

Article Rating

0 Comments

Inline Feedbacks

View all comments

You might also like..

Helical Insight

Helical IT Solutions Launches Helical Insight 5.2.2 : Focus on Advance Embedded Analytics

By admin

24 Dec 2024: Helical IT Solutions is excited to unveil Helical Insight 5.2.2, the latest iteration of its cutting-edge Open Source Business Intelligence (BI) platform. This release reinforces Helical Insight's position as a cost-effective, versatile, and powerful alternative to mainstream...

Helical Insight 5.2.1

Helical IT Solutions Launches Helical Insight 5.2.1: Elevating Open Source BI to New Heights

By admin

02 Sept 2024 – Helical IT Solutions is thrilled to announce the release of Helical Insight version 5.2.1, the latest upgrade to its Open Source Business Intelligence (BI) platform. This new version delivers a powerful, cost-effective BI solution that is...

Business Intelligence

Installation of Firebird db

By admin

Steps to install firebird db 1. Go to google and type firebird in search box and then click on first link. License aggrement 2. Click on downloads and then install Firebird latest version(5.0.0). 3. It will navigate to the below...

About Helical IT Solutions Pvt Ltd

Location

Contact Us

Search what you are looking for..

Mongo DB Aggregation – Part II – The Aggregation Framework

Posted on August 23, 2015 by By Shraddha Tambe, in Business Intelligence, Databases | 0

A Business Intelligence Framework

You might also like..

Helical Insight

Helical IT Solutions Launches Helical Insight 5.2.2 : Focus on Advance Embedded Analytics

By admin

Helical Insight 5.2.1

Helical IT Solutions Launches Helical Insight 5.2.1: Elevating Open Source BI to New Heights

By admin

Business Intelligence

Installation of Firebird db

By admin

Contact Form