Airbyte – Transformation | Transformations with Airbyte | Airbyte transformations guide

Posted on by By admin, in Airbyte | 0

First transformation step: Normalization

Airbyte is geared to handle the EL (Extract Load) steps of an ELT process.

It is actually producing a table in the destination with a JSON blob column… For the typical analytics use case, you probably want this json blob normalized so that each field is its own column.

Airbyte runs this step before handing the final data over to other tools that will manage further transformation

While we setting up connection we can toggle this and also access

While dumping(syncing) manually and we can add custom Transformations

Transformations with Airbyte

We can add our Customown transformation tool such as dbt or Dataform.

Example of normalization SQLscript

create  table
mysql.`products__dbt_tmp`
  as (

with __dbt__cte__products_ab1 as (

-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema
-- depends_on: mysql._airbyte_raw_products
select
json_value(_airbyte_data, 
    '$."updated_at"' RETURNING CHAR) as updated_at,
json_value(_airbyte_data, 
    '$."year"' RETURNING CHAR) as `year`,
json_value(_airbyte_data, 
    '$."price"' RETURNING CHAR) as price,
json_value(_airbyte_data, 
    '$."created_at"' RETURNING CHAR) as created_at,
json_value(_airbyte_data, 
    '$."model"' RETURNING CHAR) as model,
json_value(_airbyte_data, 
    '$."id"' RETURNING CHAR) as id,
json_value(_airbyte_data, 
    '$."make"' RETURNING CHAR) as make,
    _airbyte_ab_id,
    _airbyte_emitted_at,

    CURRENT_TIMESTAMP
 as _airbyte_normalized_at
from mysql._airbyte_raw_products as table_alias
-- products
where 1 = 1
),  __dbt__cte__products_ab2 as (

-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type
-- depends_on: __dbt__cte__products_ab1
select
    cast(nullif(updated_at, '') as char(1024)) as updated_at,
    cast(`year` as 
    signed
) as `year`,
    cast(price as 
    float
) as price,
    cast(nullif(created_at, '') as char(1024)) as created_at,
    cast(model as char(1024)) as model,
    cast(id as 
    signed
) as id,
    cast(make as char(1024)) as make,
    _airbyte_ab_id,
    _airbyte_emitted_at,

    CURRENT_TIMESTAMP
 as _airbyte_normalized_at
from __dbt__cte__products_ab1
-- products
where 1 = 1
),  __dbt__cte__products_ab3 as (

-- SQL model to build a hash column based on the values of this record
-- depends_on: __dbt__cte__products_ab2
select
    md5(cast(concat(coalesce(cast(updated_at as char), ''), '-', coalesce(cast(`year` as char), ''), '-', coalesce(cast(price as char), ''), '-', coalesce(cast(created_at as char), ''), '-', coalesce(cast(model as char), ''), '-', coalesce(cast(id as char), ''), '-', coalesce(cast(make as char), '')) as char)) as _airbyte_products_hashid,
    tmp.*
from __dbt__cte__products_ab2 tmp
-- products
where 1 = 1
)-- Final base SQL model
-- depends_on: __dbt__cte__products_ab3
select
updated_at,
    `year`,
    price,
created_at,
    model,
    id,
    make,
    _airbyte_ab_id,
    _airbyte_emitted_at,

    CURRENT_TIMESTAMP
 as _airbyte_normalized_at,
    _airbyte_products_hashid
from __dbt__cte__products_ab3
-- products from mysql._airbyte_raw_products
where 1 = 1
  )

At Helical, with more than 10+ years of experience in the space of data domain, we have got extensive experience of implementing data solutions including modern data stack. We also provide consulting, implementation, maintenance and support services on top of Airbyte. Reach out to us on nikhilesh@Helicaltech.com to learn more about our services

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments