It sounds really amazing scaling airflow when we want grow and run as many as possible in parallell. Here I just present Airflow setup for scaleing out and the some spark airflow ETL. The source code for this article is in here.

There are several way to scale Airflow workers, one of the best way is to use celery queue. Another way is using Kubernetes Executor but Argo is more better solution if Kubernetes is going to be used for workflow management.

Here I am going to explain about scaling spark on kubernetes task on Airflow:


Here I am going to talk about doing CRUD using Delta Lake and Spark. I heard a lot of good stuff about Delta Lake and I just try to work with it and share my experience. Also it kinda supports ACID transactions on Spark.

Here is the technologies that is going to be used:

  • Deltalake: delta tables as file format
  • Spark: Processing and doing ETL
  • Hive metastore: Create tables and query on them
  • Presto: Running distributed query on delta tables
  • Airflow: Workflow management
  • Minio(S3): Storage and Deltalake file system
  • Superset: Creating dashboards using presto and hive on delta files

The more details and walking tour could be found here:

References:


In this story for penetration test on kafka LOCUST is going to be applied. In each api call by locust a random data at current time will be produced into kafka.


In this story, a ML pipeline service for training machine learning model is going to be illustrated.

This platform consists of these services:

  • Restful web service using FASTAPI
  • CELERY for handeling ML tasks
  • Flower for monitoring ML Jobs
  • Mysql as database for storing events and model results
  • RabbitMQ as broker…

In this article the implementation of real-time bitcoin prediction and monitoring is going to be explained.

The project for this story has been developed in this repository.

Getting Bitcoin Value

For this job I use one of the cool open api and I am very thankful to them.

“https://api.coinbase.com/v2/prices/BTC-USD/spot"

I want to ask…


In this story, the process of developing a live chatroom with fastapi websocket and build it with docker and deploy it on heroku :))

This repository include all code, and docker files.

Chatroom class

This class handles that messages deliver to the member of the specified chatroom.


In the first part the machine learning part is going to be described. This fraud detection project notebook could be found here.

This fraud detection process has been implemented with two algorithm, with Random Forest and CNN and accuracy of 99% on test data.

Because of more simplicity of random…


The deployment of the nameko microservice that explained HERE with kubernetes. As it mentioned in part 1, each of microservice component has its own docker image; therefore in this story the process of deploying them is explained.

Docker registry

First a by this compose file a docker registry will be up, the we should push the docker images to it with these commands:


In this story the whole process of building the microservice app with nameko that has both flask and fastapi as web service, is going to explained.

This project is developed on docker container and is going to be deployed with kubernetes. The repo for this project is here.

Services

There are…

Alireza Moosavi

doing some data engineering

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store