Checklist before Starting new ETL Work

Posted on by By Nikhilesh, in ETL, Jaspersoft, Pentaho | 0

ETL Checklist

ETL Checklist is the things to know when you starting a ETL project.

Database Connections: Metadata and the jar files required for Source and Target Databases

Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.

Get your 30 Days Trail Version

Source data : We will have multiple sources and some times Single source also. We need to check is there any redundant data . We need to confirm regarding Data Types and data format. On which basis they are formatting data.

How the customer want to maintain the historical data. Is there any flags for deleted/invalid records. When to use SCD type 2 and SCD type 3 changes.

Logging and Auditing: We need to log all the details to track if any error occurs where it is exactly occurred. Because In our project we will have number of jobs and sub jobs.

So we have to track how many records are successfully inserted/updated

how many records got rejected. Job name, sub job name component name(if any error occurs we can track easily)

Rollback and Restarting Jobs: In ETL process the effected thing is always a Target database. So we need to keep check points in our job so that if any error occurs and results the job to be stopped. In this case ,when restarting the job the job should process from the last saving check point. So that no need to start the entire process again, which save the time and also improves the performance of the project.

Batch Loading: When we have a huge data, it is always better to load the data in multiple chunks rather than loading the entire data at a time. So Batch loading should be implemented.

Dependencies : In Target Database, we need to check for the dependencies like Lookup tables and Parent Tables which are connected with foreign key relationship. Are those having similar data types or not? Need to load the Parent tables first so that the foreign key relation will not get disturbed and also handled easily.

Alerting/Notifications: We have to implement Alerting/Notification mechanism. If any error occurs it should trigger a mail including error description,job name, component name etc.

Documentation: This is also one of the important task . Need to document the mappings in any format. These will be useful like a reference guide.


Best Open Source Business Intelligence Software Helical Insight is Here


A Business Intelligence Framework

Thank you



0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments