First transformation step: Normalization
Airbyte is geared to handle the EL (Extract Load) steps of an ELT process.
It is actually producing a table in the destination with a JSON blob column… For the typical analytics use case, you probably want this json blob normalized so that each field is its own column.
Airbyte runs this step before handing the final data over to other tools that will manage further transformation
While we setting up connection we can toggle this and also access
While dumping(syncing) manually and we can add custom Transformations
We can add our Customown transformation tool such as dbt or Dataform.
Example of normalization SQLscript
create table mysql.`products__dbt_tmp` as ( with __dbt__cte__products_ab1 as ( -- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema -- depends_on: mysql._airbyte_raw_products select json_value(_airbyte_data, '$."updated_at"' RETURNING CHAR) as updated_at, json_value(_airbyte_data, '$."year"' RETURNING CHAR) as `year`, json_value(_airbyte_data, '$."price"' RETURNING CHAR) as price, json_value(_airbyte_data, '$."created_at"' RETURNING CHAR) as created_at, json_value(_airbyte_data, '$."model"' RETURNING CHAR) as model, json_value(_airbyte_data, '$."id"' RETURNING CHAR) as id, json_value(_airbyte_data, '$."make"' RETURNING CHAR) as make, _airbyte_ab_id, _airbyte_emitted_at, CURRENT_TIMESTAMP as _airbyte_normalized_at from mysql._airbyte_raw_products as table_alias -- products where 1 = 1 ), __dbt__cte__products_ab2 as ( -- SQL model to cast each column to its adequate SQL type converted from the JSON schema type -- depends_on: __dbt__cte__products_ab1 select cast(nullif(updated_at, '') as char(1024)) as updated_at, cast(`year` as signed ) as `year`, cast(price as float ) as price, cast(nullif(created_at, '') as char(1024)) as created_at, cast(model as char(1024)) as model, cast(id as signed ) as id, cast(make as char(1024)) as make, _airbyte_ab_id, _airbyte_emitted_at, CURRENT_TIMESTAMP as _airbyte_normalized_at from __dbt__cte__products_ab1 -- products where 1 = 1 ), __dbt__cte__products_ab3 as ( -- SQL model to build a hash column based on the values of this record -- depends_on: __dbt__cte__products_ab2 select md5(cast(concat(coalesce(cast(updated_at as char), ''), '-', coalesce(cast(`year` as char), ''), '-', coalesce(cast(price as char), ''), '-', coalesce(cast(created_at as char), ''), '-', coalesce(cast(model as char), ''), '-', coalesce(cast(id as char), ''), '-', coalesce(cast(make as char), '')) as char)) as _airbyte_products_hashid, tmp.* from __dbt__cte__products_ab2 tmp -- products where 1 = 1 )-- Final base SQL model -- depends_on: __dbt__cte__products_ab3 select updated_at, `year`, price, created_at, model, id, make, _airbyte_ab_id, _airbyte_emitted_at, CURRENT_TIMESTAMP as _airbyte_normalized_at, _airbyte_products_hashid from __dbt__cte__products_ab3 -- products from mysql._airbyte_raw_products where 1 = 1 )
At Helical, with more than 10+ years of experience in the space of data domain, we have got extensive experience of implementing data solutions including modern data stack. We also provide consulting, implementation, maintenance and support services on top of Airbyte. Reach out to us on nikhilesh@Helicaltech.com to learn more about our services
Airbyte transformation example Airbyte transformation tutorial Airbyte transformations guide Can Airbyte do transformations? How Airbyte works? Transformations with Airbyte