Skip to content

Latest commit

 

History

History
241 lines (165 loc) · 14.5 KB

README.md

File metadata and controls

241 lines (165 loc) · 14.5 KB

🚀 Pipeline for Asynchronous Task Scheduling

Pipeline for Asynchronous Task Scheduling in Python via Flask-RESTPlus (API) + Redis + Celery + MongoDB

GitHub license

📖 Table of contents




💡 TL;DR: What is this at all?

Together with advancements in computing capacity and increasing trends in working on large-scale datasets, in some projects, we might need to execute time-consuming and/or heavy functions upon the plenty of requests coming from the client side.

So, we shall run them in the background while still getting new requests from the client and finally, we should get back the result later once the process is done.

Hence, here, asynchronous (async) task scheduling via microservices provides a good offer to handle our issue!

🧪 What are async tasks and microservices?

Microservices

During designing a software pipeline, Microservices Architecture, where the applications, modules/functions work independently, has shown up for the last decade.

Monolithic vs Microservice

Image source: https://dev.to/alex_barashkov/microservices-vs-monolith-architecture-4l1m.

In a microservices architecture, in contrast to monolithic architecture, the functional parts of your whole work are split into smaller units. Those smaller units are dedicated to working on specific tasks and they do their job well!

While working they do not know what the others perform however they have a common port/way of communication where they are able to talk to each other.

For more about the microservices, you might check this blog post: https://medium.com/sciant/microservices-for-dummies-e55e428a5eef.


Here is one real-world examples of microservice architecture:

In this F1 pit spot, the fastest one which only takes few seconds, every single person works in a single task and does the best for that task!

Real-world example of async programming


Asynchronous (async) Programming

Let's get started with the synchronous one. In this case, the tasks, or the jobs are performed once at a time. So, to be able to work on the 2nd (following) task, first of all, the 1st (previous) task has to be completed. So, you have to wait for a task to complete to go for the next one.

By contrast, in async programming, the pipeline can go with any process without waiting for the completion of the previous task if they are independent of each other. Hence, you might catch several requests at a time simultaneously.

Sync. vs. Async

Image resource: https://www.baeldung.com/cs/async-vs-multi-threading.

💭 Use cases (aka. Real-life examples)

Maybe you might be asking why we need this or what are the real-life examples? So, currently, several processes are going through in async way even we do not realize.

Here are some real-life examples:

If you have more (and/or clearer) examples, please just reach out to me.


Use case #1 | Ordering a burger | Literally real-life 😊


Assume that you are starving and looking forward to eating a burger!

In this scenario, every customer is getting reference number after their ordering. While the order is on its own way, the customers can just relax and control the dashboard for the status of their orders while the crew are working on other orders.

Please, check this awesome blog post here to see the rest and to get the issue in the easiest way! (Even worthy to look at the cartoons!)


Use case #2 | Basic example


Think that we as a team or company are providing a form in our web page to our users or customers and they can fill the form. Supposing that we need a sort of computation which takes a lot of time once the form is filled.


Now, in this scenario, our user will fill and submit the form, we will return that "OK, we got your request and will let you know". Meanwhile, we will perform needed calculation in background and once it's done, we will send an email containing the result of calculation to our user. Yey!

For this real-life example, you can find another repository and blog with its implementation here.


Use case #3 | Bioinformatics/Computational Biology

Almost same protocol also applies here. Consider that our user is willing to predict the 3D structure of a protein with given sequence or post long-time genome alignment work.

So, while you might get other request from same person and/or other users, the tasks will undergo the pipeline in the background and will turn back once they are done, if possible 😊.




🎨 Architecture of the pipeline

Here is the schematic view of my pipeline: pipeline

Technologies used:

Name Description Link
Flask Python micro web-framework https://flask.palletsprojects.com/en/1.1.x/
Flask-RESTPlus Extension for Flask that adds support for quickly building REST APIs https://flask-restplus.readthedocs.io/en/stable/
Celery Asynchronous task queue or job queue package for Python https://docs.celeryproject.org/en/stable/
Flower Tool for monitoring and administrating Celery clusters https://flower.readthedocs.io/en/latest/
Redis In-memory data structure store, a distributed, key–value database https://redis.io/
MongoDB Document-oriented NoSQL database program https://www.mongodb.com/python

If you wonder the comparison of several brokers, please visit this https://docs.celeryproject.org/en/stable/getting-started/brokers/index.html page by Celery offical docs.

For use of Flask-RESTPlus extension (using Swagger UI), you might visit my previous repo and post: https://github.com/furkanmtorun/Flask-RESTPlusAPI.

Reminder: Do not forget to change MONGO_CLIENT_URL for your setup. For MongoDB, you might give a try to use MongoDB Atlas.


🧵 Installation & Running

Here is classic way of installation and running.

Docker-compose file might come later!

Installation

Running

  • Start redis:

    redis-server --port 6380 --slaveof 127.0.0.1 6379

  • Execute Celery, Flower and our flask_api file inside the project folder:

    • celery worker -A flask_api.celery_client -E

      If you are using Win10 and face some problems with Celery, try the following instead:

      celery worker -A flask_api.celery_client -E --pool=solo -l info

    • flower -A flask_api.celery_client

      Then, you can visit http://localhost:5555/

    • python flask_api.py

      Now, time to open our REST-Api on http://localhost:5000/


💎 How the things go (aka. Results)?

In our simple pipeline, we provide a REST API for our users where they can calculate the body-mass index (BMI) of a person. So, they can list all of the calculations or post a new one under the end point namely calculationsEP Operations for Calculations while they can get the details of a calculation using unique process ID pid or delete it via pid.

page-overview

Our Flask RESTPlus API page overview

Here, in our case, bmi_calculation() function mimics a time-consuming process. Once the user submits a new entry with name, weight and height, it automatically add this request to query via message broker or queue system Redis and it forwards them to Celery workers to be processed and it returns a unique pid. Once the result is ready, it is saved into MongoDB Atlas database to operate CRUD operations on the API (getting the list of calculations, deleting, etc.).

✨ If you wonder about the details, all is explained with screenshots! 👇

Flower-0 Flower UI (no task provided yet)

image Documents (Entries) of MongoDB Atlas Collection

posting New calculation post in our REST-API

Screenshot_2 PID for our calculation returned upon the post

Screenshot_4 Several new calculations are posted and see how their status in Celery look like on Flower

Screenshot_3 Once several new posts are submitted, see the queue in Redis on Flower

Screenshot_6 The states of all Tasks in Celery on Flower UI

Screenshot_5
Results of new calculations posted are shown in MongoDB Atlas


So, now, our users can fetch easily the calculated BMI values (in the background) by using the given pid later time.

That's all for now!




📘 References & Similar works

  1. Miranda, L. (2019). Distill: Why do we need Flask, Celery, and Redis? (with McDonalds in Between). Retrieved 2 May 2021, from https://ljvmiranda921.github.io/notebook/2019/11/08/flask-redis-celery-mcdo/
  2. Smyth, P. (2018). Creating Web APIs with Python and Flask . Programming Historian. Retrieved from https://programminghistorian.org/en/lessons/creating-apis-with-python-and-flask
  3. Rocha, B. (2021). Microservices with Python, RabbitMQ and Nameko. Retrieved 2 May 2021, from http://brunorocha.org/python/microservices-with-python-rabbitmq-and-nameko.html
  4. Herman, M. (2021). Asynchronous Tasks with Flask and Celery. Retrieved 2 May 2021, from https://testdriven.io/blog/flask-and-celery/
  5. Costa, R. (2021). Asynchronous vs. Synchronous Programming: When to Use What. Retrieved 2 May 2021, from https://www.outsystems.com/blog/posts/asynchronous-vs-synchronous-programming/
  6. Vassilev, K. (2019). Microservices for Dummies. Retrieved 2 May 2021, from https://medium.com/sciant/microservices-for-dummies-e55e428a5eef

See other similar works:

👁‍🗨 TO-DO List

  • Elaborate the README file/post
  • Prepare a docker-compose.yml file for containerization
  • Enable re-trying the failed celery tasks
  • Write tests via pytest & integrate Github Actions for CI/CD

🎈 Author & Feedback/Contribution

About the author & developer:

Moreover, please do not hesitate to comment via opening an issue via GitHub if you have any suggestions or feedback!