Datalake Architecture Reconciliation¶
Date: 2026-04-22
Author: Ffinlo Wright (Revenue Sciences / Data Engineering)
Status: Proposal for discussion
Audience: Data engineering team, Gregg (datalake architect), technical stakeholders
Context¶
Plan.com now has two Dagster-based data platforms built independently:
-
The Rev-Sci Dagster stack — on-prem Docker Swarm, Polars transforms, ClickHouse SQL layer, NFS-backed Parquet storage. Serves wholesale reporting, ML drift detection, and billing analytics. In production.
-
The proposed datalake (Gregg's briefing, 2026-04-26) — ADLS Gen2 storage, DuckDB transforms, 35 bronze sources, medallion architecture. Currently single-node Docker Swarm with no run launcher. Targets company-wide analytics.
Both share Dagster as orchestrator and Parquet as storage format. They diverge on storage location, transform engine, compute distribution, query access, and operational patterns. This dossier analyses both systems, identifies what to keep from each, and proposes two consolidation strategies with clear trade-offs.
Documents¶
| Document | Contents |
|---|---|
| Executive Summary | Two strategies, pros/cons, recommendation |
| Cross-Comparison | Detailed side-by-side of both Dagster systems |
| ClickHouse Unification | How ClickHouse serves both stacks and end-user SQL access |
| DuckDB and Polars | Transform engines — similarities, divergences, when to use which |
| IO Managers | Why IO managers matter and how they simplify the merged stack |
| Gaps and Risks | Shortcomings in both systems and open questions |
| Gregg's Datalake Briefing (Summary) | Restructured summary of the original datalake proposal |
How to read this¶
Start with the Executive Summary for the two-strategy decision. Read Cross-Comparison for the full technical picture. The remaining sections go deep on specific topics — read as needed for the meeting.
This dossier is an addendum to Gregg's datalake briefing (2026-04-26). It assumes the reader has access to that document. Where we reference it, we cite by section number (e.g. "Briefing s.8" = section 8, Operational watchdog).