
Events flow into Kinesis and are read by a Glue Streaming job using Spark Structured Streaming. The job parses JSON, applies light transforms, and lands rows to a Redshift staging table in micro-batches (e.g., every 1–5 minutes). A post-write step runs a MERGE to upsert into analytics tables, keeping them current without duplicates. Invalid records are parked in S3 (DLQ); CloudWatch and Glue provide throughput/latency/error metrics.
A Kinesis Data Stream + simple event producer (Python).
Glue Streaming job (Spark Structured Streaming) that: