What are sources in dbt? | YAML Files in dbt

Posted on by By admin, in DBT | 0

What are sources in dbt?

In dbt, sources are configurations that define where your raw data comes from. Sources help dbt understand the structure and schema of your raw data tables, allowing you to build analytics on top of them. Typically, sources are associated with the raw, untransformed data in your data warehouse.

Sources are defined in .yml files under “sources:” key.

Example

Let’s consider an example of defining a source in dbt for a hypothetical e-commerce dataset. Suppose you have a raw data table named raw_orders containing information about customer orders.

Create a YAML file for the source:

In dbt project, create a file named raw_orders.yml in the models directory.

# models/raw_orders.yml
version: 2
sources:
  - name: raw_orders
    tables:
      - name: orders
This YAML file defines a source named raw_orders with a single table named orders.

Define source configuration:

The raw_orders.yml file specifies the basic configuration for the raw_orders source. It tells dbt that there’s a table named orders in this source.

Reference in models:

Now, you can reference this source in your dbt models. Let’s say you want to create a model that aggregates order data.

-- model code
-- models/orders_summary.sql
with aggregated_orders as (
    select
date_trunc('day', order_date) as order_day,
count(*) as order_count,
        sum(order_total) as total_sales
    from {{ ref('raw_orders.orders') }}
    group by 1
)
select * from aggregated_orders;

This model (orders_summary.sql) aggregates order data, and it references the raw_orders.orders table from the source.

Automatic Discovery:

dbt can also automatically discover source configurations based on the structure of your data warehouse. For example, if your raw data tables are organized in schemas, dbt can infer these configurations without manual definition.

# models/raw_orders_auto.yml
version: 2
sources: []
In this case, dbt will automatically recognize tables within the specified schemas as sources.

By defining sources in dbt, you provide the necessary context for dbt to understand the raw data’s structure and efficiently build upon it for analytics purposes. This separation of sources and models promotes modularity and ease of maintenance in your dbt projects.

We at Helical have more than 10 years of experience in providing solutions and services in the domain of data and have served more than 85+ clients. We are also DBT partners, hence in case if you are looking for certain assistance, consulting, services please do reach out on nikhilesh@Helicaltech.com

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments