Skip to content

Datalake Architecture Reconciliation

Date: 2026-04-22
Author: Ffinlo Wright (Revenue Sciences / Data Engineering)
Status: Proposal for discussion
Audience: Data engineering team, Gregg (datalake architect), technical stakeholders


Context

Plan.com now has two Dagster-based data platforms built independently:

  1. The Rev-Sci Dagster stack — on-prem Docker Swarm, Polars transforms, ClickHouse SQL layer, NFS-backed Parquet storage. Serves wholesale reporting, ML drift detection, and billing analytics. In production.

  2. The proposed datalake (Gregg's briefing, 2026-04-26) — ADLS Gen2 storage, DuckDB transforms, 35 bronze sources, medallion architecture. Currently single-node Docker Swarm with no run launcher. Targets company-wide analytics.

Both share Dagster as orchestrator and Parquet as storage format. They diverge on storage location, transform engine, compute distribution, query access, and operational patterns. This dossier analyses both systems, identifies what to keep from each, and proposes two consolidation strategies with clear trade-offs.

Documents

Document Contents
Executive Summary Two strategies, pros/cons, recommendation
Cross-Comparison Detailed side-by-side of both Dagster systems
ClickHouse Unification How ClickHouse serves both stacks and end-user SQL access
DuckDB and Polars Transform engines — similarities, divergences, when to use which
IO Managers Why IO managers matter and how they simplify the merged stack
Gaps and Risks Shortcomings in both systems and open questions
Gregg's Datalake Briefing (Summary) Restructured summary of the original datalake proposal

How to read this

Start with the Executive Summary for the two-strategy decision. Read Cross-Comparison for the full technical picture. The remaining sections go deep on specific topics — read as needed for the meeting.

This dossier is an addendum to Gregg's datalake briefing (2026-04-26). It assumes the reader has access to that document. Where we reference it, we cite by section number (e.g. "Briefing s.8" = section 8, Operational watchdog).