Creating a Multidimensional Data Cube using Talend ETL Tool

Posted on by By admin, in Talend | 0

What is a Cube in Data warehousing?

  1. Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provide multidimensional views of data, querying and analytical capabilities to clients. 
  2. There are different types of cubes, and we would be covering only multidimensional cube in this blog-post.

Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.

Get your 30 Days Trail Version

What is a Multidimensional Cube?

  1. A multidimensional cube will give us a view of the data for the required dimensions along with the configured aggregations.
  2. For example, please consider the below data table which gives us a one dimensional view.
  3. Multidimensional Data Cube

  4. If we wish to view the data differently i.e., need to know the paid to each then we need to aggregate the above data and capture the results.
  5. Aggregation of data on a reactive basis works fine for systems dealing with small amount of data.But, when we talk about big data then the aggregations are time taking (take forever) and resource consuming.
  6. As most the OLAP applications in the current IT world would wish to play around with as much data as possible, we need big data solutions in place to address any data response/processing delays.
  7. The solution described in this blog will pre-aggregate the data as required on the dimensions given using an open source ETL tool. This pre-aggregation will avoid data processing/response delays when dealing with BIG DATA.

Data Cube Using Talend ETL Tool :

Prerequisites :

  1. Basic knowledge of ETL and data-warehousing concepts
  2. Basic knowledge of Talend ETL Tool

Tool Used :

  1. Talend Open Source Studio for Big Data v6.4

Solution:

We would be considering the below data file for creating the cube. The same logic can be applied for aggregating the data on a larger scale.

Multi Dimensional Data Cube

Cube Dimensions:

  1. <Name>
  2. <epartment>
  3. <Salary>

The aggregations that need to be applied on the data are:

  1. Average <Salary>
  2. Count of employees

Step 1:

Open the Talend application and create a new job.

Multidimensional Data Cube

Step 2:

From the palette drag and drop the ‘tFileInputDelimited’ component and configure it as per below.

Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.

Grab The 30 Days Free Trail

Component Configuration :
Multidimensional Data Cube

Schema :

Multidimensional Data Cube

Step 3:

  1. The number of the combinations of the cube depends on the number of dimensions required.
  2. The formula to calculate the number of combinations is 2(n)
  3. n=number of dimensions.
  4. We have 3 dimensions, so we will have a total of 8 combinations.
  5. Each dimension will have one aggregation based on dimension combination.
  6. Drag a tmap component from the pallete and connect the previous component output to tmap.
  7. Configure the tmap component such that it reflects the number of possible cube combinations, which in the current example is 8.
  8. We can calculate the number of possible cube combinations using the below link
  9. https://planetcalc.com/3757/.

  10. For the data we are processing, the number of possible cube combinations are as follows
  11. Combination 1 --> <Name>,<Department>,<Salary>,<Aggregation1>,<Aggregation2>
    Combination 2 --><Name>,<Department>,NULL,<Aggregation1>,<Aggregation2>
    Combination 3 --><Name>,NULL,<Salary>,<Aggregation1>,<Aggregation2>
    Combination 4 -->NULL,<Department>,<Salary>,<Aggregation1>,<Aggregation>
    Combination 5 --><Name>,NULL,NULL,<Aggregation1>,<Aggregation2>
    Combination 6 -->NULL,<Department>,NULL,<Aggregation1>,<Aggregation2>
    Combination 7 -->NULL,NULL,<Salary>,<Aggregation1><,Aggregation2>
    Combination 8 -->NULL,NULL,NULL,<Aggregation1>,<Aggregation2>
    
  12. There should be 8 outputs mapped in the tmap as per below.

Multidimensional Data Cube

Step 4:

  1. Output each values of combinations i.e., Cube1,2,3 etc to different taggregaterow components.
  2. Each taggregaterow component will have the same aggregation with different dimensions as per below.

Multidimensional Data Cube

Multidimensional Data Cube

Step 5:

Redirect the output of the to a tlogrow component(can be redirected to a file or different database) and read the cube data.

Output:

Columns separated by “|”.

Multidimensional Data Cube

Complete Design of the job in Talend

Multidimensional Data Cube

Multidimensional Data Cube

Conclusion :

  1. Above design is one of the solutions for pre-aggregating big data and making it readily available querying.
  2. Make data easy with Helical Insight.
    Helical Insight is the world’s best open source business intelligence tool.

    Get your 30 Days Trail Version

  3. Like everything else, this solution comes with a trade-off i.e., storage space.
  4. The pre-aggregated cube data can be stored in a clustered file system with Apache cassandra on top of it for faster data retrieval.

In case if you have any queries comment in the below comment section

Thank You
SaiCharan
Senior ETL Developer
Helical IT Solutions Pvt Ltd

logo

Best Open Source Business Intelligence Software Helical Insight Here

logo

A Business Intelligence Framework


logo

Best Open Source Business Intelligence Software Helical Insight is Here

logo

A Business Intelligence Framework

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments