RDBMS → Sqoop → HDFS/Hive (Batch Ingestion)

Classic Hadoop ingestion: full + incremental Sqoop to HDFS, Hive tables ready for analytics.

Difficulty: Beginner
Tech stack: Hadoop (HDFS), Hive, Sqoop, MySQL, shell
Estimated time: 1-2 hrs

Overview

You’ll configure Sqoop to pull from an OLTP database into HDFS; first a full load, then incremental loads using a watermark column. Data is stored in columnar formats and surfaced in Hive (external/managed, partitioned). Jobs run via shell/cron for repeatable batch ingestion.

Outcome

Ingest full + incremental tables from MySQL into HDFS with Sqoop.
Expose data via Hive tables (Parquet/ORC, partitioned).
Automate daily runs with shell.

What you’ll build

Docker/EMR lab with Hadoop + Hive.
Sqoop jobs: full import + incremental by last-modified.
Hive DDL for external/managed tables and partitions.
Shell/Cron to orchestrate.

Join the waitlist