Doing ACID on Spark
Sep 29, 2021
Here I am going to talk about doing CRUD using Delta Lake and Spark. I heard a lot of good stuff about Delta Lake and I just try to work with it and share my experience. Also it kinda supports ACID transactions on Spark.
Here is the technologies that is going to be used:
- Deltalake: delta tables as file format
- Spark: Processing and doing ETL
- Hive metastore: Create tables and query on them
- Presto: Running distributed query on delta tables
- Airflow: Workflow management
- Minio(S3): Storage and Deltalake file system
- Superset: Creating dashboards using presto and hive on delta files
The more details and walking tour could be found here:
References: