What are sources in dbt?
In dbt, sources are configurations that define where your raw data comes from. Sources help dbt understand the structure and schema of your raw data tables, allowing you to build analytics on top of them. Typically, sources are associated with the raw, untransformed data in your data warehouse.
Sources are defined in .yml files under “sources:” key.
Example
Let’s consider an example of defining a source in dbt for a hypothetical e-commerce dataset. Suppose you have a raw data table named raw_orders containing information about customer orders.
Create a YAML file for the source:
In dbt project, create a file named raw_orders.yml in the models directory.
# models/raw_orders.yml version: 2 sources: - name: raw_orders tables: - name: orders This YAML file defines a source named raw_orders with a single table named orders.
Define source configuration:
The raw_orders.yml file specifies the basic configuration for the raw_orders source. It tells dbt that there’s a table named orders in this source.
Reference in models:
Now, you can reference this source in your dbt models. Let’s say you want to create a model that aggregates order data.
-- model code -- models/orders_summary.sql with aggregated_orders as ( select date_trunc('day', order_date) as order_day, count(*) as order_count, sum(order_total) as total_sales from {{ ref('raw_orders.orders') }} group by 1 ) select * from aggregated_orders;
This model (orders_summary.sql) aggregates order data, and it references the raw_orders.orders table from the source.
Automatic Discovery:
dbt can also automatically discover source configurations based on the structure of your data warehouse. For example, if your raw data tables are organized in schemas, dbt can infer these configurations without manual definition.
# models/raw_orders_auto.yml version: 2 sources: [] In this case, dbt will automatically recognize tables within the specified schemas as sources.
By defining sources in dbt, you provide the necessary context for dbt to understand the raw data’s structure and efficiently build upon it for analytics purposes. This separation of sources and models promotes modularity and ease of maintenance in your dbt projects.
We at Helical have more than 10 years of experience in providing solutions and services in the domain of data and have served more than 85+ clients. We are also DBT partners, hence in case if you are looking for certain assistance, consulting, services please do reach out on nikhilesh@Helicaltech.com
Configuring data sources in dbt dbt sources & dbt source freshness Commands dbt sources config dbt sources example What Are dbt Sources? What are sources in dbt? What are YML files used for in dbt? What is a source in dbt? What is a YAML file used for? What is source and ref in dbt? YAML Files in dbt