All projects

Real-time Analytics Pipeline

A streaming data pipeline processing 100K+ events/second with sub-second latency for product analytics and anomaly detection.

Date Jul 10, 2024 Tags Backend, Data, Python
Backend

Problem

The existing analytics stack was batch-oriented — hourly aggregations that arrived too late for operational decisions.

Architecture

  • Ingestion: HTTP collector (Go) into Kafka
  • Processing: Faust stream processors for aggregation and feature computation
  • Storage: TimescaleDB for time-series, Redis for real-time counters

Performance

  • 120K events/second sustained
  • End-to-end latency: p50 = 400ms, p99 = 1.2s