Change Data Capture (CDC) – Methods, Approaches, Alternatives

Posted on October 14, 2013 by By Nikhilesh, in Business Intelligence, Jaspersoft, Open Source Business Intelligence, Pentaho | 0

Change data capture :

It’s the process of tracking the data that has changed so that corresponding action can be taken. Its nothing but capturing the changes which are made on the data source and applying it at all the other places, and hence it reduces the efforts required at the ETL step.

Different strategies :

Push: the source process creates a snapshot of changes within its own process and delivers rows downstream. The downstream process uses the snapshot, creates its own subset and delivers them to the next process.

Pull: the target that is immediately downstream from the source, prepares a request for data from the source. The downstream target delivers the snapshot to the next target, as in the push model.

Usage of Change Data Capture:

Change data capture is often used in data warehousing since it involves extraction and transformation of data from one or more databases to data warehouse for analysis.

Prior to CDC, the change data was captured using other methodologies i.e. table difference or table comparison. Its like taking a copy of the source and comparing with target, spotting the difference. For limited number of records this approach might work, but for more number of records this methodology will be highly inefficient and result in immense overheads.

Table differencing: It involved copying entire table from source to staging. But since, the entire table was shifted, there were additional overheads and this method was not efficient. Also, in case if there are multiple changes in the data, it cant be captured using this methodology.

Change value selection: This involves capturing source database changed values based on the value of a specific colum. However for implementing this we need to fire queries to find out the changed data which is an additional overhead. Again, in this method also we will not be able to find or capture the intermediate values. Also the changes happening between firing of two queries will not be captured.

A slightly better approach than the ones mentioned above is by using database triggers. This triggers can be used after any Insert/Delete/Modify statement whenever there is any change, the changed information will be passed on to a table. The only disadvantage of this solution is it is database specific and migration of such a solution becomes complex since its db specific.

Another approach would be the usage of log readers. Many Enterprise DBMS have capability of transaction logs that can be used (can be switched on and off), the amount & details of information in log to be stored can be decided as well. By reading this we can come to know of the changes happening in the system. The problem with this is purchasing of new component for log reading, migration might be an issue, also some DBs might not have the logging features etc. Also sometimes some changes which are not committed, their logs are also created which could be as well an issue.

Capturing data with CDC:

CDC can capture data using two modes :: Synchronous and Asynchronous. In synchronous mode, any changed data in source system is informed immediately to the target using DML (data manipulation language) operations, whereas that’s not the case with Asynchronous mode. Synchronous uses triggers and it results in no latency or delay, though it has more overheads as compared to asynchronous.
Benefits of CDC

– CDC can improve performance and reduce redundancy by copying only the changed data.

– Log based CDC also reduces load on operational systems since it only reads the logs rather than db

– No latency, hence end business users can have immediate realization of the same in their BI

– CDC can easily handle high volume transactions

We can implement CDC using Talend Integration Suite. More information about the same will be covered in another blog.

For any query, please reach out to me on nikhilesh@Helicaltech.com

Best Open Source Business Intelligence Software Helical Insight is Here

A Business Intelligence Framework

CDC change data capture data replication database jaspersoft kettle pentaho pentaho data integrator talend

0 0 votes

Article Rating

0 Comments

Inline Feedbacks

View all comments

You might also like..

Helical Insight

Helical IT Solutions Launches Helical Insight 5.2.2 : Focus on Advance Embedded Analytics

By admin

24 Dec 2024: Helical IT Solutions is excited to unveil Helical Insight 5.2.2, the latest iteration of its cutting-edge Open Source Business Intelligence (BI) platform. This release reinforces Helical Insight's position as a cost-effective, versatile, and powerful alternative to mainstream...

Helical Insight 5.2.1

Helical IT Solutions Launches Helical Insight 5.2.1: Elevating Open Source BI to New Heights

By admin

02 Sept 2024 – Helical IT Solutions is thrilled to announce the release of Helical Insight version 5.2.1, the latest upgrade to its Open Source Business Intelligence (BI) platform. This new version delivers a powerful, cost-effective BI solution that is...

Business Intelligence

Installation of Firebird db

By admin

Steps to install firebird db 1. Go to google and type firebird in search box and then click on first link. License aggrement 2. Click on downloads and then install Firebird latest version(5.0.0). 3. It will navigate to the below...

About Helical IT Solutions Pvt Ltd

Location

Contact Us

Search what you are looking for..

Change Data Capture (CDC) – Methods, Approaches, Alternatives

Posted on October 14, 2013 by By Nikhilesh, in Business Intelligence, Jaspersoft, Open Source Business Intelligence, Pentaho | 0

A Business Intelligence Framework

You might also like..

Helical Insight

Helical IT Solutions Launches Helical Insight 5.2.2 : Focus on Advance Embedded Analytics

By admin

Helical Insight 5.2.1

Helical IT Solutions Launches Helical Insight 5.2.1: Elevating Open Source BI to New Heights

By admin

Business Intelligence

Installation of Firebird db

By admin

Contact Form