External SFTP → Lambda → S3 → Redshift COPY
- Difficulty: Intermediate
- Tech stack: Lambda, SFTP (Paramiko), S3, Redshift (COPY), EventBridge, Secrets Manager
- Estimated time: 2 hrs
Overview
A scheduled Lambda uses key-based SSH (Paramiko) to list and pull new files from the partner’s SFTP into an S3 raw bucket (e.g., `vendorA/2025/10/12/*.csv.gz`). It validates naming and a few sanity checks, routes failures to a reject/ prefix, and triggers a Redshift COPY from S3 into staging tables using the Redshift Data API (or a small Step Functions task). After a successful load, files move to history/ for audit. EventBridge handles cadence; Secrets Manager stores SFTP creds and DB auth.
Outcome
- Automated vendor feed from SFTP into Redshift with minimal ops.
- Safe & auditable loads (schema checks, error handling, archives).
- Hands-off cadence via EventBridge or S3 triggers.
What you’ll build
- Lambda function that connects to SFTP (key-based auth) and fetches new CSV/JSON.
- S3 raw/ layout (date/vendor prefixes), optional gzip on upload.
- Validation step (filename pattern, row count, basic schema) + reject path.
- Redshift COPY from S3 (IAM role), plus simple post-COPY checks.
- Archive of processed files (history/) and logs/metrics for observability.
- Orchestration via EventBridge (schedule) or S3-put trigger.
- (Optional) Secrets in AWS Secrets Manager; retries/backoff.