In the previous pipeline, data was loaded into Amazon S3 using scheduled batch jobs. While this works well for many use cases, some systems require data to be available much faster.
Instead of waiting for the next batch run, organizations often capture changes directly from the database as they happen and continuously move those changes into the data lake.
In this pipeline, you will build a Change Data Capture (CDC) pipeline using AWS Database Migration Service (DMS). As records are inserted, updated, or deleted in PostgreSQL, the changes are automatically captured and written to Amazon S3.
You will also simulate real healthcare events such as new lab orders, lab results, medication updates, and billing charges to see how CDC pipelines behave in real-world systems.
After completing this pipeline, you will be able to: