Memory Group By Vs Group By in Pentaho DI

Posted on March 28, 2016 by By Nikhilesh, in Business Intelligence, Open Source Business Intelligence, Pentaho | 0

Memory Group By Vs Group By in Pentaho DI

Group By and Memory Group By: These steps are useful to calculate aggregate values for the group of fields.

Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.

Get your 30 Days Trail Version

Even though both are useful for same purpose there are some differences and similarities as follows:

1) Group By works only when the input rows are in sorted order otherwise it will gives accurate result for the sequence records. So before using the Group By step the result should be sorted. Where as Memory Group by will handle non sorted records also.internally it will sort out and gives the output based on the selected aggregate option.

2) In Group By, there is an option to include all rows. If you enable this option it will give all the records, not just the aggregation records. to differentiate these two type of rows on output , we have to provide one boolean type of flag column. In Memory Group By we do not have this option.

3) In Group By, we have Temporary Files Directory option. If we enable include all rows then this option will be available. This option is to select the directory in which the temporary files are stored (needed when the Include all rows option is enabled and the number or grouped rows exceed 5000 rows); the default is the standard temporary directory for the system. This is the option which allows us to store the data on disk so that if we have large amount of data we always prefer this one. And the Memory Group By step cannot handle huge amount of data as it always internally sorts out the data which will cause performance issue. So it will work fine for small amount of data only. So when the number of rows is too large to fit into memory, you need to use the combination of the Sort rows and Group by steps.

4) There is one more additional option in Group By step is, Add line number, restart in each group: Enable to add a line number that restarts at 1 in each group.

5) Always give back a result row : This option is same in both Group By and Memory Group By steps. If you enable this option, it will always give back a result row, even if there is no input row. This can be useful if you want to count the number of rows. Without this option you would never get a count of zero (0).

6) The field that make up the group: This option also works same in both the steps. After retrieving fields using the Get Fields button, designate the fields to include in the group.

7) Aggregates: Group By and Memory Group By , these both steps have same aggregate type functions. Specify the fields that must be aggregated, the method and the name of the resulting new field.
Here are the available aggregation methods :

Sum
Average (Mean)
Median
Percentile
Minimum
Maximum
Number of values (N)
Concatenate strings separated by , (comma)
First non-null value
Last non-null value
First value (including null)
Last value (including null)
Cumulative sum (all rows option only!)
Cumulative average (all rows option only!)
Standard deviation
Concatenate strings separated by <Value>: specify the separator in the Value column
Number of distinct values
Number of rows (without field argument)

Thank You

Best Open Source Business Intelligence Software Helical Insight is Here

A Business Intelligence Framework

Lalitha

Aggregate DI Difference between group by and memory group by ETL Group group by step needs data to be Group By vs Memory Group By in pentaho Memory Group By Vs Group By in Pentaho DI pentaho

0 0 votes

Article Rating

0 Comments

Inline Feedbacks

View all comments

You might also like..

Business Intelligence

Installation of Firebird db

By admin

Steps to install firebird db 1. Go to google and type firebird in search box and then click on first link. License aggrement 2. Click on downloads and then install Firebird latest version(5.0.0). 3. It will navigate to the below...

Software Testing

Defect Life Cycle

By admin

This blog explains about the complete life cycle of a bug and different status of bug from the stage it was identified,fixed,retest and close. What is Defect life cycle? Defect life cycle is the life cycle of a defect or...

Software Testing

Different Levels of Testing in Software Testing

By admin

What are the Levels of Software Testing? In this blog,we are going to understand the various levels of software testing In Software Testing,we have four different levels of testing,which are as mentioned below: Unit Testing Integration Testing System Testing Acceptance...

About Helical IT Solutions Pvt Ltd

Location

Contact Us

Search what you are looking for..

A Business Intelligence Framework

You might also like..

Business Intelligence

Installation of Firebird db

By admin

Software Testing

Defect Life Cycle

By admin

Software Testing

Different Levels of Testing in Software Testing

By admin

Contact Form