Enterprise Customer Analytics Platform
Customer lifecycle analytics with operational dashboards
The Problem
Customer data is messy - duplicates, multiple currencies, inconsistent timezones. I built a platform that deduplicates customers using fuzzy matching, normalizes financial data across currencies, and surfaces CLV, retention, and cohort insights through self-service dashboards.
Architecture
Raw Data ──▶ Pydantic Validation ──▶ Transformation Modules
┌──────────┼────────────┐
│ │ │
Dedup (Fuzz) Currency Timezone
│ │ │
└──────────┴────────────┘
│
PostgreSQL Star Schema (6 dims, 3 facts)
│
Airflow Orchestration + GX Validation
│
Metabase Dashboards (3) + Looker StudioKey Technical Decisions
PostgreSQL over Cloud DW
Right-sized for the data volume. Docker-deployable for portability. Full SQL power without cloud costs for this scale.
RapidFuzz for Deduplication
Fuzzy matching handles real-world customer name variations that exact matching misses. Configurable thresholds balance precision and recall.
Metabase over Tableau
Open-source, Docker-deployable, low barrier for stakeholder self-service. Embeddable for product analytics use cases.
Pydantic + Great Expectations
Pydantic for row-level schema validation at ingestion. Great Expectations for dataset-level quality assertions. Complementary layers.
Results & Impact
- ▸Designed star schema with 6 dimension tables and 3 fact tables supporting customer behavior and revenue analysis
- ▸Implemented SCD Type 2 for the customer dimension, enabling historical journey analysis and cohort tracking across 4,268 records
- ▸Built 4 transformation modules: customer deduplication (RapidFuzz fuzzy matching), multi-currency conversion, timezone normalization, and data quality validation
- ▸Created aggregated data marts producing daily revenue summaries, customer lifetime value (CLV), cohort retention curves, and product performance KPIs
- ▸Deployed 3 Metabase dashboards with 100% availability using Docker Compose, with structured logging for pipeline health monitoring
- ▸Developed data quality framework with Great Expectations + Pydantic producing schema documentation and test coverage reports