In this story, a ML pipeline service for training machine learning model is going to be illustrated.
This platform consists of these services:
- Restful web service using FASTAPI
- CELERY for handeling ML tasks
- Flower for monitoring ML Jobs
- Mysql as database for storing events and model results
- RabbitMQ as broker for CELERY
- Minio for storing datasets and Joblib trained models
Let’s build it:
This web servive has these APIs docs http://localhost:8080/docs#/:
In order to load dataset (csv) for modeling, the data could be selected and sent to platform.
After execution, the data_id will be return:
This data record will be saved into mysql and the data object will be saved into minio:
In order to start training, first the available model could be get from:
So for start the training we need to post the data like:
dataset_id is from loading data step, and the class column and feature column name is also needs to be set(comma seperated!). Then the result is:
This id indicates that in order to get results we cant get it by using this!
In order to get the results of training:
And the result:
Also in order to download the trained model:
Then the response is the joblib model to be downloaded!
The model training records will be saved into Mysql and Minio:
By specifying rabbitmq as broker the flower will monitor the ML jobs:
Using rabbitmq-management as messege broker:
The repository for this project could be found HERE and it contains an example file in order to get hands dirty with it.
How to run?
docker-compose up -d mysql rabbitmq minio
Then after they are up…
docker-compose up -d celery apis flower
To stop and remove all:
docker-compose down -v
A simple version of the pipeline is deployed on heroku:
Built with: FASTAPI, CELERY, Sqlite on Docker