Tag: azure

  • Synapse to Fabric: Your ADX Migration Guide 2025

    Synapse to Fabric: Your ADX Migration Guide 2025

    The clock is ticking for Azure Synapse Data Explorer (ADX). With its retirement announced, a strategic Synapse to Fabric migration is now a critical task for data teams. This move to Microsoft Fabric’s Real-Time Analytics and its Eventhouse database unlocks a unified, AI-powered experience, and this guide will show you how.

    This guide will walk you through the entire process, from planning to execution, complete with practical examples and KQL code snippets to ensure a smooth transition.

    Why This is Happening: The Drive Behind the Synapse to Fabric Migration

    Microsoft’s vision is clear: a single, integrated platform for all data and analytics workloads. This Synapse to Fabric migration is a direct result of that vision. While powerful, Azure Synapse Analytics was built from a collection of distinct services. Microsoft Fabric breaks down these silos, offering a unified SaaS experience where data engineering, data science, and business intelligence coexist seamlessly.

    A 'before and after' architecture diagram comparing the separate services of Azure Data Explorer with the integrated Microsoft Fabric Eventhouse solution for real-time analytics.

    Eventhouse is the next evolution of the Kusto engine that powered ADX, now deeply integrated within the Fabric ecosystem. It’s built for high-performance querying on streaming, semi-structured data—making it the natural successor for your ADX workloads.

    Key Benefits of Migrating to Fabric Eventhouse:

    • OneLake Integration: Your data lives in OneLake, a single, tenant-wide data lake, eliminating data duplication and movement.
    • Unified Experience: Switch from data ingestion to query to Power BI reporting within a single UI.
    • Enhanced T-SQL Support: Query your Eventhouse data using both KQL and a more robust T-SQL surface area.
    • AI-Powered Future: Tap into the power of Copilot and other AI capabilities inherent to the Fabric platform.

    Phase 1: Assess and Plan Your Migration

    Before you move a single byte of data, you need a clear inventory of your current ADX environment.

    A hand-drawn flowchart infographic detailing the three key steps for a Synapse to Fabric migration: Assess & Plan, Migrate Data, and Update Reports.
    1. Document Your Clusters: List all your ADX clusters, databases, and tables.
    2. Analyze Ingestion Pipelines: Identify all data sources. Are you using Event Hubs, IoT Hubs, or custom scripts?
    3. Map Downstream Consumers: Who and what consumes this data? Document all Power BI reports, dashboards, Grafana instances, and applications that query ADX.
    4. Export Your Schema: You’ll need the schema for every table and function. Use the .show and .get commands in the ADX query editor to script your objects.

    Example: Scripting a Table Schema

    Run this KQL command in your Azure Data Explorer query window to get the creation command for a specific table.

    .get table YourTableName schema as csl

    This will output the .create table command with all columns, data types, and folder/docstring properties. Save these scripts for each table. Do the same for your functions using .show function YourFunctionName.

    Phase 2: The Migration – Data and Schema

    With your plan in place, it’s time to create your new home in Fabric and move your data.

    Step 1: Create a KQL Database and Eventhouse in Fabric

    1. Navigate to your Microsoft Fabric workspace.
    2. Select the Real-Time Analytics experience.
    3. Create a new KQL Database.
    4. Within your KQL Database, Fabric automatically provisions an Eventhouse. This is your primary database for analysis. You can also create “KQL Querysets” which are like saved query collections.

    Step 2: Recreate Your Schema

    Using the scripts you exported in Phase 1, run the .create table and .create function commands in your new Fabric KQL Database query window.

    Step 3: Migrate Your Data

    For historical data, the most effective method is exporting from ADX to Parquet format in Azure Data Lake Storage (ADLS) Gen2 and then ingesting into Fabric.

    Example: One-Time Data Ingestion with a Fabric Pipeline

    1. Export from ADX: Use the .export command in ADX to push your historical table data to a container in ADLS Gen2.Code snippet.
    2. Ingest into Fabric: In your Fabric workspace, create a new Data Pipeline.
    3. Use the Copy data activity.
      • Source: Connect to your ADLS Gen2 account and point to the exported Parquet files.
      • Destination: Select “Workspace” and choose your KQL Database and target table.
    4. Run the pipeline. Fabric will handle the ingestion into your Eventhouse table with optimized performance.
    .export async to parquet (
        h@"abfss://your-container@your-storage-account.dfs.core.windows.net/path/to/export"
    )
    <|
    YourTableName

    For ongoing data streams, you will re-point your Event Hubs or IoT Hubs from your old ADX cluster to your new Fabric Eventstream or KQL Database connection string.

    Phase 3: Update Queries and Reports

    Most of your KQL queries will work in Fabric without modification. The primary task here is updating connection strings in your downstream tools.

    Connecting Power BI to Fabric Eventhouse:

    This is where the integration shines.

    1. Open Power BI Desktop.
    2. Click Get Data.
    3. Search for the KQL Database connector.
    4. Instead of a cluster URI, you’ll see a simple dialog to select your Fabric workspace and the specific KQL Database.
    5. Select DirectQuery for real-time analysis.

    Your existing Power BI data models and DAX measures should work seamlessly once the connection is updated.

    Example: Updating an Application Connection

    If you have an application using the ADX SDK, you will need to update the connection string.

    • Old ADX Connection String: https://youradxcluster.kusto.windows.net
    • New Fabric KQL DB Connection String: https://your-fabric-workspace.kusto.fabric.microsoft.com

    You can find the exact query URI in the Fabric portal on your KQL Database’s details page.

    Embracing the Future

    Completing your Synapse to Fabric migration is more than a technical task—it’s a strategic step into the future of data analytics. By consolidating your workloads, you reduce complexity, unlock powerful new AI capabilities, and empower your team with a truly unified platform. Start planning today to ensure you’re ahead of the curve.

    Further Reading & Official Resources

    For those looking to dive deeper, here are the official Microsoft documents and resources to guide your migration and learning journey:

    1. Official Microsoft Documentation: Migrate to Real-Time Analytics in Fabric
    2. Microsoft Fabric Real-Time Analytics Overview
    3. Quickstart: Create a KQL Database
    4. Get data into a KQL database
    5. OneLake, the OneDrive for Data
    6. Microsoft Fabric Community Forum
  • How to Build a Data Lakehouse on Azure

    How to Build a Data Lakehouse on Azure

     For years, data teams have faced a difficult choice: the structured, high-performance world of the data warehouse, or the flexible, low-cost scalability of the data lake. But what if you could have the best of both worlds? Enter the Data Lakehouse, an architectural pattern that combines the reliability and performance of a warehouse with the openness and flexibility of a data lake. And when it comes to implementation, building a data lakehouse on Azure has become the go-to strategy for future-focused data teams.

    The traditional data lake, while great for storing vast amounts of raw data, often turned into a “data swamp”—unreliable and difficult to manage. The data warehouse, on the other hand, struggled with unstructured data and could become rigid and expensive. The Lakehouse architecture solves this dilemma.

    In this guide, we’ll walk you through the blueprint for building a powerful and modern data lakehouse on Azure, leveraging a trio of best-in-class services: Azure Data Lake Storage (ADLS) Gen2, Azure Databricks, and Power BI.

    The Azure Lakehouse Architecture: A Powerful Trio

    A successful Lakehouse implementation relies on a few core services working in perfect harmony. This architecture is designed to handle everything from raw data ingestion and large-scale ETL to interactive analytics and machine learning.

    Here’s the high-level architecture we will build:

    1. Azure Data Lake Storage (ADLS) Gen2: This is the foundation. ADLS Gen2 is a highly scalable and cost-effective cloud storage solution that combines the best of a file system with massive scale, making it the perfect storage layer for our Lakehouse.

    2. Azure Databricks: This is the unified analytics engine. Databricks provides a collaborative environment for data engineers and data scientists to run large-scale data processing (ETL/ELT) with Spark, build machine learning models, and manage the entire data lifecycle.

    3. Delta Lake: The transactional storage layer. Built on top of ADLS, Delta Lake is an open-source technology (natively integrated into Databricks) that brings ACID transactions, data reliability, and high performance to your data lake, effectively turning it into a Lakehouse.

    4. Power BI: The visualization and reporting layer. Power BI integrates seamlessly with Azure Databricks, allowing business users to run interactive queries and build insightful dashboards directly on the data in the Lakehouse.

    Let’s explore each component.

    Step 1: The Foundation – Azure Data Lake Storage (ADLS) Gen2

    Every great data platform starts with a solid storage foundation. For a Lakehouse on Azure, ADLS Gen2 is the undisputed choice. Unlike standard object storage, it includes a hierarchical namespace, which allows you to organize your data into directories and folders just like a traditional file system. This is critical for performance and organization in large-scale analytics.

    A best practice is to structure your data lake using a multi-layered approach, often called “medallion architecture”:

    • Bronze Layer (/bronze): Raw, untouched data ingested from various source systems.

    • Silver Layer (/silver): Cleaned, filtered, and standardized data. This is where data quality rules are applied.

    • Gold Layer (/gold): Highly aggregated, business-ready data that is optimized for analytics and reporting.

    Step 2: The Engine – Azure Databricks

    With our storage in place, we need a powerful engine to process the data. Azure Databricks is a first-class service on Azure that provides a managed, high-performance Apache Spark environment.

    Data engineers use Databricks notebooks to:

    • Ingest raw data from the Bronze layer.

    • Perform large-scale transformations, cleaning, and enrichment using Spark.

    • Write the processed data to the Silver and Gold layers.

    Here’s a simple PySpark code snippet you might run in a Databricks notebook to process raw CSV files into a cleaned-up table:

    # Databricks notebook code snippet

    # Define paths for our data layers

    bronze_path = “/mnt/datalake/bronze/raw_orders.csv”

    silver_path = “/mnt/datalake/silver/cleaned_orders”

    # Read raw data from the Bronze layer using Spark

    df_bronze = spark.read.format(“csv”) \

      .option(“header”, “true”) \

      .option(“inferSchema”, “true”) \

      .load(bronze_path)

    # Perform basic transformations

    from pyspark.sql.functions import col, to_date

    df_silver = df_bronze.select(

        col(“OrderID”).alias(“order_id”),

        col(“CustomerID”).alias(“customer_id”),

        to_date(col(“OrderDate”), “MM/dd/yyyy”).alias(“order_date”),

        col(“Amount”).cast(“decimal(18, 2)”).alias(“order_amount”)

      ).where(col(“Amount”).isNotNull())

    # Write the cleaned data to the Silver layer

    df_silver.write.format(“delta”).mode(“overwrite”).save(silver_path)

    print(“Successfully processed raw orders into the Silver layer.”)

    Step 3: The Magic – Delta Lake

    Notice the .format(“delta”) in the code above? That’s the secret sauce. Delta Lake is an open-source storage layer that runs on top of your existing data lake (ADLS) and brings warehouse-like capabilities.

    Key features Delta Lake provides:

    • ACID Transactions: Ensures that your data operations either complete fully or not at all, preventing data corruption.

    • Time Travel (Data Versioning): Allows you to query previous versions of your data, making it easy to audit changes or roll back errors.

    • Schema Enforcement & Evolution: Prevents bad data from corrupting your tables by enforcing a schema, while still allowing you to gracefully evolve it over time.

    • Performance Optimization: Features like data skipping and Z-ordering dramatically speed up queries.

    By writing our data in the Delta format, we’ve transformed our simple cloud storage into a reliable, high-performance Lakehouse.

    Step 4: The Payoff – Visualization with Power BI

    With our data cleaned and stored in the Gold layer of our Lakehouse, the final step is to make it accessible to business users. Power BI has a native, high-performance connector for Azure Databricks.

    You can connect Power BI directly to your Databricks cluster and query the Gold tables. This allows you to:

    • Build interactive dashboards and reports.

    • Leverage Power BI’s powerful analytics and visualization capabilities.

    • Ensure that everyone in the organization is making decisions based on the same, single source of truth from the Lakehouse.

    Conclusion: The Best of Both Worlds on Azure

    By combining the low-cost, scalable storage of Azure Data Lake Storage Gen2 with the powerful processing engine of Azure Databricks and the reliability of Delta Lake, you can build a truly modern data lakehouse on Azure. This architecture eliminates the need to choose between a data lake and a data warehouse, giving you the flexibility, performance, and reliability needed to support all of your data and analytics workloads in a single, unified platform.