← Back

External SFTP → Lambda → S3 → Redshift COPY

Pull partner files from SFTP with Lambda, land to S3, and load Redshift via COPY – validated, archived, and scheduled.

sftp-lambda-s3-redshift-copy

Overview

A scheduled Lambda uses key-based SSH (Paramiko) to list and pull new files from the partner’s SFTP into an S3 raw bucket (e.g., `vendorA/2025/10/12/*.csv.gz`). It validates naming and a few sanity checks, routes failures to a reject/ prefix, and triggers a Redshift COPY from S3 into staging tables using the Redshift Data API (or a small Step Functions task). After a successful load, files move to history/ for audit. EventBridge handles cadence; Secrets Manager stores SFTP creds and DB auth.

Outcome

  • Automated vendor feed from SFTP into Redshift with minimal ops.
  • Safe & auditable loads (schema checks, error handling, archives).
  • Hands-off cadence via EventBridge or S3 triggers.

What you’ll build

  • Lambda function that connects to SFTP (key-based auth) and fetches new CSV/JSON.
  • S3 raw/ layout (date/vendor prefixes), optional gzip on upload.
  • Validation step (filename pattern, row count, basic schema) + reject path.
  • Redshift COPY from S3 (IAM role), plus simple post-COPY checks.
  • Archive of processed files (history/) and logs/metrics for observability.
  • Orchestration via EventBridge (schedule) or S3-put trigger.
  • (Optional) Secrets in AWS Secrets Manager; retries/backoff.