E-Commerce Streaming Analytics Platform
Real-time Lambda architecture with AI-powered operations agent
The Problem
E-commerce teams need both historical completeness and real-time freshness. Existing approaches force a choice between batch accuracy and streaming speed. I built a Lambda architecture that delivers both, plus an AI agent that lets non-technical stakeholders query the system in natural language.
Architecture
Olist CSV ──▶ S3 Bronze ──▶ Dagster Batch Assets ──▶ BigQuery
│ (staging → dims → facts → marts)
│ │
└── Kafka Producers ─▶ 7 Topics ─▶ Consumers ─▶ BigQuery Realtime
└─▶ MongoDB
│
React Dashboard (6 pages) ◄── FastAPI ◄── LangGraph AI AgentKey Technical Decisions
Dagster over Airflow
Asset-oriented model makes dependencies emerge from data, not manual DAG wiring. Clearer lineage and simpler local development.
Lambda Architecture (batch + stream)
Batch path provides historical completeness (100K+ orders). Stream path delivers 2–5s freshness. Unified views merge both for consistent KPIs.
LangGraph for the AI Agent
StateGraph enables multi-step reasoning with tool orchestration. MemorySaver provides conversation persistence. Far more capable than a simple LLM wrapper.
BigQuery + MongoDB split
Analytical/operational separation. BigQuery for columnar analytics and SQL-based BI. MongoDB for event documents and real-time operational queries.
Results & Impact
- ▸Orchestrated 80+ Dagster assets across batch, streaming, and analysis layers with asset-level data quality checks blocking downstream materialization on failures
- ▸Built real-time event ingestion processing orders, payments, deliveries, and clickstream across 7 Kafka topics with 2–5s end-to-end latency
- ▸Designed statistical A/B testing framework evaluating 10 experiments with chi-square, t-tests, Bayesian analysis, power analysis, and multiple testing correction
- ▸Created AI operations agent with 11 tools, SSE streaming, conversation memory, and anomaly detection - enabling natural-language queries over the full data stack
- ▸Produced automated weekly BI reports (HTML/PDF, Excel workbook) with experiment recommendations (LAUNCH / ITERATE / CONTINUE) and projected revenue impact
- ▸Built 6 React dashboards: Real-Time Ops, A/B Testing Hub, Customer Journey, Business Performance, Data Quality, and AI Agent
Technologies Used
Orchestration
Streaming
Analytics DB
NoSQL
API
Frontend
AI Agent
Statistics
Infrastructure
Platform Demos
Dashboard & Analytics (5 Dashboards)
Real-time streaming analytics, A/B testing, cohort analysis, and more.
AI Operations Agent
LangGraph-powered agent analyzing metrics, detecting anomalies, and recommending actions.