Structuring dbt Projects in Snowflake: The Definitive Guide

If you’ve ever inherited a dbt project, you know there are two kinds: the clean, logical, and easy-to-navigate project, and the other kind—a tangled mess of models that makes you question every life choice that led you to that moment. The difference between the two isn’t talent; it’s structure. For high-performing data teams, a well-defined structure for dbt projects in Snowflake isn’t just a nice-to-have, it’s the very foundation of a scalable, maintainable, and trustworthy analytics workflow.

While dbt and Snowflake are a technical match made in heaven, simply putting them together doesn’t guarantee success. Without a clear and consistent project structure, even the most powerful tools can lead to chaos. Dependencies become circular, model names become ambiguous, and new team members spend weeks just trying to understand the data flow.

This guide provides a battle-tested blueprint for structuring dbt projects in Snowflake. We’ll move beyond the basics and dive into a scalable, multi-layered framework that will save you and your team countless hours of rework and debugging.

Why dbt and Snowflake Are a Perfect Match

Before we dive into project structure, it’s crucial to understand why this combination has become the gold standard for the modern data stack. Their synergy comes from a shared philosophy of decoupling, scalability, and performance.

  • Snowflake’s Decoupled Architecture: Its separation of storage and compute is revolutionary. This means you can run massive dbt transformations using a dedicated, powerful virtual warehouse without slowing down your BI tools.
  • dbt’s Transformation Power: dbt focuses on the “T” in ELT—transformation. It allows you to build, test, and document your data models using simple SQL, which it then compiles and runs directly inside Snowflake’s powerful engine.
  • Cost and Performance Synergy: Running dbt models in Snowflake is incredibly efficient. You can spin up a warehouse for a dbt run and spin it down the second it’s finished, meaning you only pay for the exact compute you use.
  • Zero-Copy Cloning for Development: Instantly create a zero-copy clone of your entire production database for development. This allows you to test your dbt project against production-scale data without incurring storage costs or impacting the production environment.

In short, Snowflake provides the powerful, elastic engine, while dbt provides the organized, version-controlled, and testable framework to harness that engine.

The Layered Approach: From Raw Data to Actionable Insights

A scalable dbt project is like a well-organized factory. Raw materials come in one end, go through a series of refined production stages, and emerge as a finished product. We achieve this by structuring our models into distinct layers, each with a specific job.

Our structure will follow this flow: Sources -> Staging -> Intermediate -> Marts.

Layer 1: Declaring Your Sources (The Contract with Raw Data)

Before you write a single line of transformation SQL, you must tell dbt where your raw data lives in Snowflake. This is done in a .yml file. Think of this file as a formal contract that declares your raw tables, allows you to add data quality tests, and serves as a foundation for your data lineage graph.

Example: models/staging/sources.yml

Let’s assume we have a RAW_DATA database in Snowflake with schemas from a jaffle_shop and stripe.

YAML

version: 2

sources:
  - name: jaffle_shop
    database: raw_data 
    schema: jaffle_shop
    description: "Raw data from the primary application database."
    tables:
      - name: customers
        columns:
          - name: id
            tests:
              - unique
              - not_null
      - name: orders
        loaded_at_field: _etl_loaded_at
        freshness:
          warn_after: {count: 12, period: hour}

  - name: stripe
    database: raw_data
    schema: stripe
    tables:
      - name: payment
        columns:
          - name: orderid
            tests:
              - relationships:
                  to: source('jaffle_shop', 'orders')
                  field: id

Layer 2: Staging Models (Clean and Standardize)

Staging models are the first line of transformation. They should have a 1:1 relationship with your source tables. The goal here is strict and simple:

  • DO: Rename columns, cast data types, and perform very light cleaning.
  • DO NOT: Join to other tables.

This creates a clean, standardized version of each source table, forming a reliable foundation for the rest of your project.

Example: models/staging/stg_customers.sql

SQL

-- models/staging/stg_customers.sql
with source as (
    select * from {{ source('jaffle_shop', 'customers') }}
),

renamed as (
    select
        id as customer_id,
        first_name,
        last_name
    from source
)

select * from renamed

Layer 3: Intermediate Models (Build, Join, and Aggregate)

This is where the real business logic begins. Intermediate models are the “workhorses” of your dbt project. They take the clean data from your staging models and start combining them.

  • DO: Join different staging models together.
  • DO: Perform complex calculations, aggregations, and business-specific logic.
  • Materialize them as tables if they are slow to run or used by many downstream models.

These models are not typically exposed to business users. They are building blocks for your final data marts.

Example: models/intermediate/int_orders_with_payments.sql

SQL

-- models/intermediate/int_orders_with_payments.sql
with orders as (
    select * from {{ ref('stg_orders') }}
),

payments as (
    select * from {{ ref('stg_payments') }}
),

order_payments as (
    select
        order_id,
        sum(case when payment_status = 'success' then amount else 0 end) as total_amount
    from payments
    group by 1
),

final as (
    select
        orders.order_id,
        orders.customer_id,
        orders.order_date,
        coalesce(order_payments.total_amount, 0) as amount
    from orders
    left join order_payments 
      on orders.order_id = order_payments.order_id
)

select * from final

Layer 4: Data Marts (Ready for Analysis)

Finally, we arrive at the data marts. These are the polished, final models that power your dashboards, reports, and analytics. They should be clean, easy to understand, and built for a specific business purpose (e.g., finance, marketing, product).

  • DO: Join intermediate models.
  • DO: Have clear, business-friendly column names.
  • DO NOT: Contain complex, nested logic. All the heavy lifting should have been done in the intermediate layer.

These models are the “products” of your data factory, ready for consumption by BI tools like Tableau, Looker, or Power BI.

Example: models/marts/fct_customer_orders.sql

SQL

-- models/marts/fct_customer_orders.sql
with customers as (
    select * from {{ ref('stg_customers') }}
),

orders as (
    select * from {{ ref('int_orders_with_payments') }}
),

customer_orders as (
    select
        customers.customer_id,
        min(orders.order_date) as first_order_date,
        max(orders.order_date) as most_recent_order_date,
        count(orders.order_id) as number_of_orders,
        sum(orders.amount) as lifetime_value
    from customers
    left join orders 
      on customers.customer_id = orders.customer_id
    group by 1
)

select * from customer_orders

Conclusion: Structure is Freedom

By adopting a layered approach to your dbt projects in Snowflake, you move from a chaotic, hard-to-maintain process to a scalable, modular, and efficient analytics factory. This structure gives you:

  • Maintainability: When logic needs to change, you know exactly which model to edit.
  • Scalability: Onboarding new data sources or team members becomes a clear, repeatable process.
  • Trust: With testing at every layer, you build confidence in your data and empower the entire organization to make better, faster decisions.

This framework isn’t just about writing cleaner code—it’s about building a foundation for a mature and reliable data culture.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *