Data Engineering / AI

E-Commerce Streaming Analytics Platform

Real-time Lambda architecture with AI-powered operations agent

80+
Dagster Assets
7
Kafka Topics
2–5s
End-to-End Latency
6
Interactive Dashboards
11
AI Agent Tools
10
A/B Experiments

The Problem

E-commerce teams need both historical completeness and real-time freshness. Existing approaches force a choice between batch accuracy and streaming speed. I built a Lambda architecture that delivers both, plus an AI agent that lets non-technical stakeholders query the system in natural language.

Architecture

Olist CSV ──▶ S3 Bronze ──▶ Dagster Batch Assets ──▶ BigQuery
    │                         (staging → dims → facts → marts)
    │                                                    │
    └── Kafka Producers ─▶ 7 Topics ─▶ Consumers ─▶ BigQuery Realtime
                                                  └─▶ MongoDB
                                                         │
    React Dashboard (6 pages) ◄── FastAPI ◄── LangGraph AI Agent
1
Source: Olist CSV (100K+ orders)
2
Ingestion: AWS S3 Bronze → Kafka (7 topics)
3
Orchestration: Dagster (80+ assets)
4
Warehouse: BigQuery (staging → dims → facts → marts)
5
Operational Store: MongoDB (events, real-time)
6
API Layer: FastAPI + LangGraph AI Agent
7
Frontend: React + TypeScript (6 dashboards)

Key Technical Decisions

Dagster over Airflow

Asset-oriented model makes dependencies emerge from data, not manual DAG wiring. Clearer lineage and simpler local development.

Lambda Architecture (batch + stream)

Batch path provides historical completeness (100K+ orders). Stream path delivers 2–5s freshness. Unified views merge both for consistent KPIs.

LangGraph for the AI Agent

StateGraph enables multi-step reasoning with tool orchestration. MemorySaver provides conversation persistence. Far more capable than a simple LLM wrapper.

BigQuery + MongoDB split

Analytical/operational separation. BigQuery for columnar analytics and SQL-based BI. MongoDB for event documents and real-time operational queries.

Results & Impact

  • Orchestrated 80+ Dagster assets across batch, streaming, and analysis layers with asset-level data quality checks blocking downstream materialization on failures
  • Built real-time event ingestion processing orders, payments, deliveries, and clickstream across 7 Kafka topics with 2–5s end-to-end latency
  • Designed statistical A/B testing framework evaluating 10 experiments with chi-square, t-tests, Bayesian analysis, power analysis, and multiple testing correction
  • Created AI operations agent with 11 tools, SSE streaming, conversation memory, and anomaly detection - enabling natural-language queries over the full data stack
  • Produced automated weekly BI reports (HTML/PDF, Excel workbook) with experiment recommendations (LAUNCH / ITERATE / CONTINUE) and projected revenue impact
  • Built 6 React dashboards: Real-Time Ops, A/B Testing Hub, Customer Journey, Business Performance, Data Quality, and AI Agent

Technologies Used

Orchestration

Dagster

Streaming

Apache Kafka

Analytics DB

Google BigQuery

NoSQL

MongoDB

API

FastAPIUvicorn

Frontend

React 19TypeScriptTailwind CSSRecharts

AI Agent

LangGraphClaude API (Anthropic)

Statistics

scipystatsmodelspandas

Infrastructure

Docker ComposeGCPAWS S3

Platform Demos

Dashboard & Analytics (5 Dashboards)

Real-time streaming analytics, A/B testing, cohort analysis, and more.

AI Operations Agent

LangGraph-powered agent analyzing metrics, detecting anomalies, and recommending actions.