Microsoft Fabric Lakehouse Migration Guide

Microsoft Fabric is a Microsoft SaaS analytics platform that combines data engineering, data integration, data warehousing, real-time intelligence, data science, and Power BI over OneLake. Recent Fabric documentation confirms lakehouse support for Delta Lake, Spark and SQL access, OneLake shortcuts, and Direct Lake semantic models; however, shortcut source support, workload availability, and regional availability must still be verified against current Microsoft Learn documentation before designing around a specific external storage target. For Malaysian enterprises running on-premises SQL Server data warehouses or Azure Synapse Dedicated Pools, the question is no longer "should we migrate?" but "how do we migrate without breaking existing BI reports and ETL pipelines?"

This guide provides a practical, incrementally-applicable migration plan — from inventory and assessment through parallel run and decommissioning — with real code examples and Malaysian enterprise context.

What Is a Lakehouse?

A lakehouse combines the flexibility of a data lake (cheap object storage, open formats like Parquet and Delta Lake) with the reliability of a data warehouse (ACID transactions, schema enforcement, BI integration). Microsoft Fabric implements this with OneLake as the centralized logical data lake for Fabric — designed to support shared analytics data access across Fabric experiences such as lakehouses, warehouses, Spark, SQL, Power BI semantic models, and notebooks.

Key architectural principles:

One copy, many engines. Data is stored once in Delta format in OneLake. Compute engines (Spark, SQL, Notebooks) read the same files. No costly data movement between systems.
Open formats. Delta Lake and Parquet are open-source, Apache-licensed formats. Your data is never locked into a proprietary binary format.
Shortcuts instead of copies. Instead of copying data between workspaces or clouds, Fabric shortcuts let you reference external data (ADLS Gen2, AWS S3, GCS) without duplicating it.

Why Migrate to Fabric Lakehouse in 2026?

1. Cost Reduction vs. Traditional SQL Data Warehouses

Traditional Azure Synapse Dedicated Pools are provisioned by data warehouse units (DWUs), while Fabric uses capacity-based billing through Fabric capacity SKUs. Exact monthly cost depends on region, capacity size, reservation/commitment model, workload concurrency, data volume, and whether the workload can pause or scale. Use the official Azure Synapse and Microsoft Fabric pricing pages or Azure Pricing Calculator before committing to a budget.

How to build a credible comparison for a Malaysian mid-size enterprise (for example, a 5 TB warehouse):

Cost Component	Traditional Synapse / SQL Warehouse	Fabric Lakehouse
Storage	Database/data-lake storage, backup retention, redundancy, and export copies	OneLake storage plus any shortcut cache/egress implications
Compute	Provisioned or reserved warehouse capacity, query concurrency, and ETL runtime	Fabric capacity SKU, workload concurrency, Spark/SQL usage, and scale/pause pattern
Integration	ADF/SSIS runtime, self-hosted integration runtimes, and network transfer	Fabric Data Factory/Dataflow/Notebook runtime and gateway/network requirements
BI	Power BI licensing, refresh mode, and semantic model design	Power BI licensing plus Direct Lake/Import/DirectQuery suitability
Operations	Monitoring, performance tuning, backup/restore, security administration	Capacity monitoring, workspace governance, lineage, security, and optimization jobs

Do not assume a fixed percentage saving. Fabric can reduce duplication and simplify operations, but the commercial outcome depends on workload shape, capacity sizing, reservation choices, data movement, and governance overhead.

2. Unified Analytics

A shared OneLake data foundation can serve multiple analytics workloads when the chosen Fabric item and access mode support the pattern:

SQL analytics via the T-SQL endpoint (Power BI, SSMS, Azure Data Studio)
Data engineering via Spark notebooks (Python, Scala, SQL)
Real-time analytics via Event Streams (Kafka-compatible ingestion)
AI/ML via Fabric notebooks with direct MLflow integration
Data science via integrated Synapse Data Science

No more copying data from the warehouse to a data lake for ML training, then to Power BI for dashboards.

3. AI and Copilot Integration

Fabric Copilot capabilities can assist with data engineering, data science, real-time intelligence, data warehousing, and Power BI experiences when they are available and enabled for the tenant. Treat Copilot as an accelerator, not a substitute for data engineering review, governance, and testing.

Migration Architecture

The target architecture for a Fabric lakehouse migration:

Source Systems                    Fabric Lakehouse                    Consumption
┌──────────────┐                ┌─────────────────────┐           ┌──────────────┐
│ On-prem SQL   │──Data Factory──▶│ OneLake (Delta)     │──SQL─────▶│ Power BI     │
│ Server        │                │ Tables/             │ endpoint  │              │
├──────────────┤                │ ├── sales           │           ├──────────────┤
│ Azure SQL DB  │──Notebook──────▶│ ├── customers       │──Spark───▶│ Notebooks    │
│               │                │ ├── inventory       │           │ (ML, ETL)    │
├──────────────┤                │ └── orders          │           ├──────────────┤
│ CSV/Parquet   │──OneLake───────▶│                     │──RealTime─▶│ Power BI     │
│ files (ADLS)  │  Shortcut      │ Shortcuts:          │           │ (Direct Lake)│
└──────────────┘                │ ├── archive (S3)    │           └──────────────┘
                                 │ └── raw-logs (ADLS) │
                                 └─────────────────────┘

Step-by-Step Migration

Step 1: Assess Your Current Warehouse

Before any migration, run a thorough inventory:

-- SQL Server: Inventory all tables, sizes, and row counts
SELECT
    s.name AS SchemaName,
    t.name AS TableName,
    SUM(p.rows) AS TotalRows,
    SUM(a.total_pages) * 8 / 1024 AS TotalSizeMB,
    MAX(i.type_desc) AS IndexType
FROM sys.schemas s
INNER JOIN sys.tables t ON s.schema_id = t.schema_id
INNER JOIN sys.indexes i ON t.object_id = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
WHERE i.index_id IN (0, 1)  -- Heap or clustered index
GROUP BY s.name, t.name
ORDER BY TotalSizeMB DESC;

Also document: - Query patterns: Most frequently run queries, slowest queries, peak concurrency - ETL dependencies: SSIS packages, ADF pipelines, stored procedure chains - BI connections: Which Power BI datasets, Excel files, and reporting tools connect to the warehouse - Data freshness requirements: Real-time? Daily? Weekly batch?

Malaysian enterprise tip: For BNM-regulated or PDPA-sensitive workloads, document lineage, data classification, retention, access approvals, and audit evidence before migration. Confirm sector-specific requirements with the organisation's compliance owner rather than assuming one generic regulatory control.

Step 2: Create Fabric Lakehouse and OneLake Storage

# PowerShell: Create Fabric workspace via API
$body = @{
    displayName = "analytics-lakehouse-prod"
    description = "Production lakehouse for migrated data warehouse"
} | ConvertTo-Json

Invoke-RestMethod -Uri "https://api.fabric.microsoft.com/v1/workspaces" \
    -Method POST \
    -Headers @{
        Authorization = "Bearer $accessToken"
        "Content-Type" = "application/json"
    } \
    -Body $body

In the Fabric portal: 1. Navigate to your workspace -> New item -> Lakehouse 2. Name it analytics-lakehouse 3. Choose a Fabric capacity/workspace region that satisfies the organisation's data residency requirements, and confirm current Fabric workload availability in that region before deployment 4. Create default folders: Tables/, Files/, and subfolders like Tables/sales, Tables/ref

Step 3: Migrate Data

Method A: Fabric Data Factory (Pipelines) — Best for scheduled, managed migrations

Create a pipeline with Copy Activity that reads from your source warehouse and writes to OneLake as Delta:

Source: SQL Server / Azure SQL DB via Linked Service
Sink: OneLake Lakehouse table (Delta format)
Staging: Enabled for large tables (uses temporary ADLS Gen2)

Method B: Fabric Notebook (Spark) — Best for complex transformations during migration

# Fabric notebook: Schema migration with type mapping
from pyspark.sql import SparkSession
from pyspark.sql.types import *

# Define schema mapping for SQL Server → Delta Lake types
type_mapping = {
    "int": IntegerType(),
    "bigint": LongType(),
    "varchar": StringType(),
    "nvarchar": StringType(),
    "datetime": TimestampType(),
    "decimal": lambda p, s: DecimalType(p, s),
    "float": DoubleType(),
    "bit": BooleanType(),
}

# Read from source SQL Server
source_table = "sales.orders"
df = spark.read \
    .format("sqlserver") \
    .option("url", "jdbc:sqlserver://source-server.database.windows.net:1433") \
    .option("databaseName", "WarehouseDB") \
    .option("table", source_table) \
    .option("user", dbutils.secrets.get("sql-user")) \
    .option("password", dbutils.secrets.get("sql-password")) \
    .load()

# Write to OneLake as Delta table
target_path = "Tables/sales/orders"
df.write \
    .format("delta") \
    .mode("overwrite") \
    .option("delta.autoOptimize.optimizeWrite", "true") \
    .save(target_path)

# Register table for SQL endpoint access
spark.sql(f"CREATE TABLE IF NOT EXISTS sales.orders USING DELTA LOCATION '{target_path}'")

Method C: OneLake Shortcut — Best for data that must stay in its source location

For data that cannot move (compliance/regulatory reasons), create a shortcut directly to the source:

In your lakehouse, click "New Shortcut"
Select source type: Azure Data Lake Storage Gen2, Amazon S3, or Google Cloud Storage
Point to the folder path
Fabric creates the shortcut — no data moves, queries read in-place

Step 4: Migrate ETL Pipelines

Source ETL Technology	Fabric Migration Target	Migration Effort
SSIS packages	Fabric Data Factory / Data Factory in Microsoft Fabric	Medium to High — reassess control flow, connectors, custom code, and scheduling
ADF pipelines	Fabric Data Factory / Data Factory in Microsoft Fabric	Low to Medium — concepts transfer, but validate connector parity, expressions, triggers, and integration runtime dependencies
SQL Stored Procedures (ETL)	Fabric SQL scripts or Spark notebooks	Medium to High — T-SQL mostly works but Spark SQL has differences
SSAS Multidimensional cubes	Power BI datasets (Direct Lake over OneLake)	High — different modeling paradigm
Azure Data Factory (v2)	Fabric Data Factory / Data Factory in Microsoft Fabric	Low to Medium — migrate patterns carefully and test feature compatibility

Step 5: Configure Security

Fabric lakehouse security uses a layered model:

OneLake data access roles — Define which users/groups can read/write which tables
SQL analytics endpoint permissions — Use supported SQL permissions for query access where applicable, and validate the exact permission model for lakehouse and warehouse items.
Row-level security (RLS) — Commonly enforced in Power BI semantic models with DAX roles; validate whether any SQL-layer security feature is supported for the specific Fabric item type before using it.

For Malaysian PDPA-sensitive data, avoid relying on illustrative SQL masking syntax unless the target engine explicitly supports it. Classify personal data, restrict access through workspace/item/security roles, apply sensitivity labels, minimise copied PII, and use masking, tokenisation, or encryption in the supported layer for the chosen architecture.

Step 6: Migrate BI Reports

Point Power BI datasets to the Fabric SQL analytics endpoint:

Old connection string:
  Server=myserver.database.windows.net;Database=WarehouseDB

New connection string:
  Server=<copy the SQL analytics endpoint server from the Fabric portal>;
  Database=<copy the endpoint/database name from the Fabric portal>

Prefer Direct Lake mode in Power BI (not Import, not DirectQuery). Direct Lake reads Delta files directly from OneLake without going through the SQL engine — it combines the speed of Import mode with the data freshness of DirectQuery.

Step 7: Validate with Parallel Run

Run both systems in parallel for at least two full business cycles:

-- Validation query: compare row counts between old and new
SELECT 'Old Warehouse' AS Source, COUNT(*) AS RowCount FROM linked.old_warehouse.sales.orders
UNION ALL
SELECT 'Fabric Lakehouse', COUNT(*) FROM sales.orders;

-- Compare aggregated values
SELECT
    o.Source,
    o.OrderYear,
    SUM(o.TotalAmount) AS TotalRevenue
FROM (
    SELECT 'Old' AS Source, YEAR(OrderDate) AS OrderYear, TotalAmount FROM linked.old_warehouse.sales.orders
    UNION ALL
    SELECT 'New' AS Source, YEAR(OrderDate) AS OrderYear, TotalAmount FROM sales.orders
) o
GROUP BY o.Source, o.OrderYear
HAVING COUNT(DISTINCT o.Source) > 1  -- Shows mismatches only
ORDER BY o.OrderYear;

Validation criteria for go-live: - Row counts match within 0.1% tolerance - Aggregated values match within 0.5% tolerance - Top 10 Power BI reports render correctly with identical numbers - ETL completion time is equal to or better than the old system

Performance Optimization for Fabric Lakehouse

Warehouse performance tuning skills (indexes, partition functions, statistics) don't transfer directly to lakehouse. Instead, learn these Delta Lake optimization techniques:

Z-Ordering (Multi-dimensional Clustering)

# Optimize by customer_id and order_date for fast point lookups
spark.sql("""
    OPTIMIZE sales.orders
    ZORDER BY (customer_id, order_date)
""")

Use Z-ordering on columns that appear frequently in WHERE clauses and JOIN predicates.

Clustering / table optimization patterns (verify current availability)

Fabric and Delta Lake optimization capabilities evolve. Before standardising on a specific clustering feature or SQL syntax, validate support in Microsoft Fabric documentation and test it in the target workspace. For continuously-ingested tables, compare available clustering, compaction, and partitioning options using real query patterns rather than assuming one feature is universally available.

Vacuum (Clean Up Old Files)

# Remove files older than 7 days (default retention)
spark.sql("VACUUM sales.orders RETAIN 168 HOURS")

Run VACUUM after every major data load to reclaim storage space from old Parquet files.

File Compaction

# Compact small files into optimal 256MB-1GB files
spark.sql("OPTIMIZE sales.orders")

This is especially important for streaming or micro-batch ingestion that produces many tiny Parquet files.

Migration Timeline (Realistic for a Malaysian Enterprise)

Phase	Duration	Activities
Assessment	Week 1-2	Inventory tables, map data types, document ETL dependencies
POC	Week 3-4	Migrate one schema (e.g., Sales), validate with Power BI
Core migration	Week 5-10	Migrate remaining schemas, rebuild ETL pipelines
Parallel run	Week 11-14	Both systems running, daily validation
Cutover	Week 15	Redirect Power BI, decommission old warehouse connections
Decommission	Week 16	Archive old warehouse, remove old Synapse/SQL Server resources

Common Pitfalls (and How to Avoid Them)

1. Schema Evolution Mismatch

Traditional data warehouses have rigid, static schemas. Lakehouses support schema evolution (adding columns, changing types), but unplanned changes break downstream reports. Fix: Use Delta Lake schema enforcement (mergeSchema option) carefully. Add new columns with NULL defaults to avoid breaking existing queries.

2. Performance Expectations

Everyone expects lakehouse SQL endpoints to match Synapse Dedicated Pool performance. For some well-tuned workloads with appropriate file layout, table optimization, and semantic model design, it can perform well. For unoptimized tables (many small files, poor clustering/partitioning, or unsuitable query patterns), it may not. Fix: Run OPTIMIZE and VACUUM as part of your ETL pipeline, not as a separate manual step.

3. Data Type Migration Issues

Some SQL Server data types don't have direct Delta Lake equivalents: - MONEY / SMALLMONEY → DECIMAL(19,4) - UNIQUEIDENTIFIER → STRING - HIERARCHYID → No direct equivalent (flatten before migration) - GEOGRAPHY → No direct equivalent (store as WKT string)

Fix: Review data type mappings in a spreadsheet before migration. Flag unsupported types for manual handling.

4. Security Model Reconstruction

Warehouse security (schemas, roles, row-level security) doesn't migrate automatically. Fix: Document the entire security model before starting. Re-implement in Fabric using OneLake data access roles + SQL endpoint permissions + Power BI RLS.

5. Underestimating ETL Rewrite Effort

SSIS packages with complex control flow (loops, precedence constraints, event handlers) require significant rework in Fabric Data Factory. Fix: Budget 50-100% more time for ETL migration than for data migration.

Key Takeaways

Fabric lakehouse is a viable migration target when validated workload-by-workload — Microsoft documents lakehouse, OneLake, shortcuts, SQL/Spark access, Direct Lake, and Fabric capacity capabilities, but performance, security design, and endpoint behaviour must be proven against your own data model and reports.
Start with a thorough assessment — you cannot migrate what you haven't documented. Inventory every table, ETL dependency, and BI report before touching anything in Fabric.
Migrate incrementally, one schema at a time — start with a non-critical schema (e.g., Sales reporting), validate thoroughly with Power BI and business users, then expand. This de-risks the entire project.
Learn Delta Lake optimization techniques — Z-ordering, liquid clustering, and VACUUM are your new performance toolkit. Traditional warehouse tuning (indexes, partitions) does not transfer. Allocate training time for your team.
Run parallel for at least two business cycles — validate row counts, aggregated values, and report rendering before cutting over. A day-one data discrepancy erodes trust quickly.
Budget 50% of total effort for ETL migration — data moves easily; pipeline logic is where the real work lives. SSIS → Fabric Data Factory rewrites require careful test planning.

Fabric lakehouse migration is a strategic project that touches data engineering, BI, and governance. I advise Malaysian enterprises on migration roadmaps, capacity planning, and team upskilling. Connect on LinkedIn.