Change Data Capture (CDC) Process in dbt Documentation

Implement a CDC process in dbt to capture and process only the changed or new data from source systems, ensuring efficient and up-to-date analytics models.

  1. Source System Integration:
    – Ensure that your source system supports CDC features, such as timestamps or change logs.
    – Connect your dbt project to the source system.
  2. Define Timestamps or Change Logs:
    – Identify the columns in your source data that contain timestamps or change log information.
    – Create appropriate configurations in dbt to leverage these columns.
    Example: Configuring a timestamp column in your dbt model

      *,  modified_timestamp_column AS dbt_valid_from
    FROM  my_source_table
  3. Incremental Model Building:
    – Create dbt models with incremental logic to process only the changed or new data.
    – Use dbt_run_query or dbt run commands to execute these models.
    Example: Incremental model SQL in dbt

    WITH changed_data AS (
        * FROM my_source_table
      WHERE modified_timestamp_column> (SELECT MAX(dbt_valid_from) FROM my_model)
    ) SELECT * FROM changed_data
  4. Dependency Management:
    Establish proper dependencies between dbt models to maintain the correct processing order.
    — Example: Defining dependencies in dbt

    version: 2
      - name: my_model
        description: "My main analytics model"
        materialized: table
          - incremental_model
  5. Scheduled Execution:
    Schedule your dbt runs at regular intervals using dbt Cloud, dbt CLI, or your preferred scheduling tool.
    # Example: Scheduling dbt runs using dbt CLI
    dbt run
  6. Monitoring and Validation:
    Regularly monitor dbt runs and validate the results.
    Use dbt Cloud or other monitoring tools to track execution logs and performance.

