Skip to content
This repository has been archived by the owner on Nov 14, 2023. It is now read-only.

[Feature] Task Queue System & Rate Limit on API #335

Open
wants to merge 118 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
9cf5f24
added flower celery and redis
kshitijrajsharma Sep 21, 2022
0a96256
added background task
kshitijrajsharma Sep 21, 2022
1f351ab
added pickle serializer
kshitijrajsharma Sep 21, 2022
2dcc8a4
fixed return error too
kshitijrajsharma Sep 21, 2022
bf62819
added task id to export name itself so that it will be easy to track
kshitijrajsharma Sep 21, 2022
db168d7
updated readme
kshitijrajsharma Sep 22, 2022
8708f29
applied versioning on rawdata snapshot
kshitijrajsharma Sep 22, 2022
8ddfd34
updated version of click
kshitijrajsharma Sep 22, 2022
59ce9bb
enabled fiona
kshitijrajsharma Sep 22, 2022
3959423
added docker workflow
kshitijrajsharma Sep 22, 2022
2f181c0
ignored all .out files
kshitijrajsharma Sep 22, 2022
78b5d7e
remove fiona since its no longer used
kshitijrajsharma Sep 22, 2022
455dda9
Merge branch 'feature/celery' of https://github.com/hotosm/galaxy-api…
kshitijrajsharma Sep 22, 2022
f3041e9
added try except block
kshitijrajsharma Sep 22, 2022
ec6654f
added config file validation and restored previous docker file
kshitijrajsharma Sep 22, 2022
b795f54
removed systemctl part
kshitijrajsharma Sep 22, 2022
fb6ba25
updated docs
kshitijrajsharma Sep 22, 2022
9c25472
Merge branch 'develop' into feature/celery
kshitijrajsharma Sep 22, 2022
3af19ce
updated config doc
kshitijrajsharma Sep 22, 2022
f9dab03
Update GETTING_STARTED_WITH_DOCKER.md
kshitijrajsharma Sep 22, 2022
3d02bb9
Update GETTING_STARTED_WITH_DOCKER.md
kshitijrajsharma Sep 22, 2022
c057081
Update README.md
kshitijrajsharma Sep 22, 2022
5b36c99
Update GETTING_STARTED_WITH_DOCKER.md
kshitijrajsharma Sep 22, 2022
2f161f2
Update README.md
kshitijrajsharma Sep 22, 2022
4354fa0
Update README.md
kshitijrajsharma Sep 22, 2022
7af6549
Update CONFIG_DOC.md
kshitijrajsharma Sep 22, 2022
aad8870
removed config sample unnecessary text and moved instructions to conf…
kshitijrajsharma Sep 22, 2022
b5f99c6
changed workflow for unit test with postgis image
kshitijrajsharma Sep 22, 2022
e2b8a74
fixed yaml indent error
kshitijrajsharma Sep 22, 2022
00dcf8a
indent fix
kshitijrajsharma Sep 22, 2022
7a45838
added postgres
kshitijrajsharma Sep 22, 2022
be3988d
added sudo command
kshitijrajsharma Sep 22, 2022
7c430ac
tet create database option in github action
kshitijrajsharma Sep 22, 2022
9a990d9
check to insert data
kshitijrajsharma Sep 22, 2022
72e6a84
Update CONFIG_DOC.md
kshitijrajsharma Sep 22, 2022
ebf5630
removed auto remove
kshitijrajsharma Sep 22, 2022
3c782a0
removed typo
kshitijrajsharma Sep 22, 2022
c9c5039
check for file path
kshitijrajsharma Sep 22, 2022
31f681b
rerun with filepath
kshitijrajsharma Sep 22, 2022
d173883
added password
kshitijrajsharma Sep 22, 2022
15c6a51
added host information
kshitijrajsharma Sep 22, 2022
eea2106
added -c option
kshitijrajsharma Sep 22, 2022
1d5dd9c
added config section and added insight only
kshitijrajsharma Sep 22, 2022
499bb3f
removed space
kshitijrajsharma Sep 22, 2022
c8e937a
testing workflow
kshitijrajsharma Sep 22, 2022
6a7040a
removed space
kshitijrajsharma Sep 22, 2022
a2ceecd
fixed yml error
kshitijrajsharma Sep 22, 2022
340766a
test
kshitijrajsharma Sep 22, 2022
5b7c96d
added password export option
kshitijrajsharma Sep 22, 2022
4a6a78c
changed password
kshitijrajsharma Sep 22, 2022
ed97604
added db add data
kshitijrajsharma Sep 22, 2022
64c4544
reverted
kshitijrajsharma Sep 22, 2022
93bd453
droped idea of db insert
kshitijrajsharma Sep 22, 2022
4c92ca5
changed to psql 12
kshitijrajsharma Sep 22, 2022
c63e974
added psql 14 postgis
kshitijrajsharma Sep 22, 2022
cd697ce
changed scripts
kshitijrajsharma Sep 22, 2022
f9538a0
final test for psql 14
kshitijrajsharma Sep 22, 2022
97498ff
check psql version
kshitijrajsharma Sep 22, 2022
aae7572
apt install
kshitijrajsharma Sep 22, 2022
a285edf
changed postgis command
kshitijrajsharma Sep 22, 2022
e916c5b
opted old method
kshitijrajsharma Sep 22, 2022
7501bb4
final test
kshitijrajsharma Sep 22, 2022
c882905
setup test with the database
kshitijrajsharma Sep 22, 2022
3a66d87
added port info
kshitijrajsharma Sep 22, 2022
b09f727
exported password
kshitijrajsharma Sep 22, 2022
5e2dd87
binded port of redis
kshitijrajsharma Sep 22, 2022
338f557
updated doc for the docker compose
kshitijrajsharma Sep 22, 2022
5121305
updated sample
kshitijrajsharma Sep 22, 2022
28946dc
Update GETTING_STARTED_WITH_DOCKER.md
kshitijrajsharma Sep 22, 2022
3bf13e8
Update CONFIG_DOC.md
kshitijrajsharma Sep 22, 2022
0b63d6a
Update CONFIG_DOC.md
kshitijrajsharma Sep 22, 2022
194628e
Update README.md
kshitijrajsharma Sep 22, 2022
4f1037a
Check if fails or not for worker
kshitijrajsharma Sep 22, 2022
88f857a
handled error
kshitijrajsharma Sep 22, 2022
394c34c
formatted worker
kshitijrajsharma Sep 22, 2022
59bda45
added build and separated from unit test
kshitijrajsharma Sep 23, 2022
bc01ff9
redis minimal installation added
kshitijrajsharma Sep 23, 2022
c952c75
removed double installation of redis
kshitijrajsharma Sep 23, 2022
ee806aa
moved db section to top
kshitijrajsharma Sep 23, 2022
27b46d8
check for server error
kshitijrajsharma Sep 23, 2022
f18c91a
added timeout
kshitijrajsharma Sep 23, 2022
e07ef58
changed redis url and get api
kshitijrajsharma Sep 23, 2022
a74df8d
added flower and mapathon endpoint test
kshitijrajsharma Sep 23, 2022
1e1291a
fixed typo
kshitijrajsharma Sep 23, 2022
e01775f
updated documentation along with curl command
kshitijrajsharma Sep 23, 2022
7096b7f
curl command setup
kshitijrajsharma Sep 23, 2022
ad9ead2
updated readme
kshitijrajsharma Sep 23, 2022
325e38f
updated doc
kshitijrajsharma Sep 23, 2022
a77f4ae
check db connection
kshitijrajsharma Sep 23, 2022
1a50d45
check if we can install gdal without update
kshitijrajsharma Sep 23, 2022
5b1ffba
check with env
kshitijrajsharma Sep 23, 2022
825b240
check with disabled upgrade command
kshitijrajsharma Sep 23, 2022
5f49c51
added rawdata snapshot
kshitijrajsharma Sep 23, 2022
82e14cd
Update README.md
kshitijrajsharma Sep 23, 2022
a18b781
Update README.md
kshitijrajsharma Sep 23, 2022
e6cba6b
Update README.md
kshitijrajsharma Sep 23, 2022
b058ee5
Update README.md
kshitijrajsharma Sep 23, 2022
12b2f90
Update README.md
kshitijrajsharma Sep 23, 2022
79f9482
Changed URL with relative and fixed typo
kshitijrajsharma Sep 23, 2022
7141f6c
changed url to relative url
kshitijrajsharma Sep 23, 2022
ab0c94a
resolved healthcheck url
kshitijrajsharma Sep 23, 2022
8ccf55d
added note for docker users to use local postgres from container
kshitijrajsharma Sep 23, 2022
13b8a25
added supporting doc if connection fails from container to psql
kshitijrajsharma Sep 23, 2022
2d7ce77
formatted md file
kshitijrajsharma Sep 23, 2022
2b1c0b1
resolved docker cache on dependencies and updated options to connect …
kshitijrajsharma Sep 24, 2022
b44e4ab
added pickle and status
kshitijrajsharma Sep 24, 2022
8bcbd72
round digit to 2 decimal for binded file size
kshitijrajsharma Sep 24, 2022
f349755
Introduces rate limit
kshitijrajsharma Sep 24, 2022
c9b9b7b
resolved mapathon detail docs
kshitijrajsharma Sep 24, 2022
fd28684
resolved raise exception if config file not found
kshitijrajsharma Sep 24, 2022
0116b0c
removed string in error file
kshitijrajsharma Sep 24, 2022
b55ff6b
reverted lgic with previous
kshitijrajsharma Sep 25, 2022
a9a0c98
Reformatted boto exceptions
kshitijrajsharma Sep 26, 2022
b3ba9e4
ignore all log files
kshitijrajsharma Sep 26, 2022
e1ab356
disabled logs
kshitijrajsharma Sep 26, 2022
f2db19f
reformatted
kshitijrajsharma Sep 27, 2022
40e9394
changed worker to dev debug setup and added fix for ogr2ogr devsetup
kshitijrajsharma Sep 27, 2022
276b527
removed test sql
kshitijrajsharma Sep 27, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 31 additions & 12 deletions .github/workflows/Unit-Test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,22 @@ jobs:
deploy:
runs-on:
ubuntu-latest

services:
postgres:
image: postgis/postgis:14-3.3
env:
POSTGRES_PASSWORD: admin
POSTGRES_DB: insights
ports:
- 5434:5432
options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 2
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v1
with:
python-version: 3.8
- name: Check postgresql version
run: |
psql -V
- name: Remove postgresql version 14
- name: Clean up PSQL
run: |
sudo apt-get --purge remove postgresql
sudo apt-get purge postgresql*
Expand All @@ -36,17 +41,31 @@ jobs:
run: |
sudo apt-get update
sudo apt install postgis postgresql-12-postgis-3
- name: Install gdal
run: |
sudo apt-add-repository ppa:ubuntugis/ubuntugis-unstable
sudo apt-get update
sudo apt-get install gdal-bin libgdal-dev


- name: Create Databases
run : |
export PGPASSWORD='admin';
psql -U postgres -h localhost -p 5434 -c "CREATE DATABASE underpass;"
psql -U postgres -h localhost -p 5434 -c "CREATE DATABASE tm;"
psql -U postgres -h localhost -p 5434 -c "CREATE DATABASE raw;"

- name: Insert sample db data
run : |
export PGPASSWORD='admin';
psql -U postgres -h localhost -p 5434 insights < tests/src/fixtures/insights.sql
psql -U postgres -h localhost -p 5434 raw < tests/src/fixtures/raw_data.sql
psql -U postgres -h localhost -p 5434 underpass < tests/src/fixtures/underpass.sql
wget https://raw.githubusercontent.com/hotosm/tasking-manager/develop/tests/database/tasking-manager.sql
psql -U postgres -h localhost -p 5434 tm < tasking-manager.sql

- name: Install Dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .
- name: Creating config.txt
run: |
mv src/config.txt.sample src/config.txt
- name: Run Tests
run: |
py.test -v -s
py.test -v -s
86 changes: 86 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
name: Check Build
on:
push:
branches:
- master
- develop
pull_request:
branches:
- master
- develop

jobs:
build:
timeout-minutes: 4
runs-on: ubuntu-latest
services:
postgres:
image: postgis/postgis:14-3.3
env:
POSTGRES_PASSWORD: admin
POSTGRES_DB: insights
ports:
- 5434:5432
options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 2
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v1
with:
python-version: 3.8

- name: Create Databases
run: |
export PGPASSWORD='admin';
psql -U postgres -h localhost -p 5434 -c "CREATE DATABASE underpass;"
psql -U postgres -h localhost -p 5434 -c "CREATE DATABASE tm;"
psql -U postgres -h localhost -p 5434 -c "CREATE DATABASE raw;"

- name: Insert sample db data
run: |
export PGPASSWORD='admin';
psql -U postgres -h localhost -p 5434 insights < tests/src/fixtures/insights.sql
psql -U postgres -h localhost -p 5434 insights < tests/src/fixtures/mapathon_summary.sql
psql -U postgres -h localhost -p 5434 raw < tests/src/fixtures/raw_data.sql
psql -U postgres -h localhost -p 5434 underpass < tests/src/fixtures/underpass.sql
wget https://raw.githubusercontent.com/hotosm/tasking-manager/develop/tests/database/tasking-manager.sql
psql -U postgres -h localhost -p 5434 tm < tasking-manager.sql

- name: Install gdal
run: |
sudo apt-get -y install gdal-bin python3-gdal && sudo apt-get -y autoremove && sudo apt-get clean

- name: Install redis
run: |
sudo apt install lsb-release
curl -fsSL https://packages.redis.io/gpg | sudo gpg --dearmor -o /usr/share/keyrings/redis-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/redis-archive-keyring.gpg] https://packages.redis.io/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/redis.list
sudo apt-get update
sudo apt-get install redis
redis-cli ping

- name: Install Dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .
- name: Creating config.txt
run: |
mv src/config.txt.sample src/config.txt
- name: Run uvicorn server
run: |
uvicorn API.main:app &
env:
PORT: 8000
- name: Run celery server
run: |
celery --app API.api_worker worker --loglevel=INFO &
- name: Run flower dashboard
run: |
celery --app API.api_worker flower --port=5555 --broker=redis://localhost:6379/ &
- name: Run mapathon summary endpoint
run: |
curl -d '{"project_ids": [11224, 10042, 9906, 1381, 11203, 10681, 8055, 8732, 11193, 7305,11210, 10985, 10988, 11190, 6658, 5644, 10913, 6495, 4229],"fromTimestamp":"2021-08-27T9:00:00","toTimestamp":"2021-08-27T11:00:00","hashtags": ["mapandchathour2021"]}' -H 'Content-Type: application/json' http://127.0.0.1:8000/v1/mapathon/summary/
- name: Run rawdata current snapshot
run: |
curl -d '{"geometry":{"type":"Polygon","coordinates":[[[83.96919250488281,28.194446860487773],[83.99751663208006,28.194446860487773],[83.99751663208006,28.214869548073377],[83.96919250488281,28.214869548073377],[83.96919250488281,28.194446860487773]]]}}' -H 'Content-Type: application/json' http://127.0.0.1:8000/v2/raw-data/current-snapshot/
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ build
newrelic.ini
newrelic.ini_backup
exports
nohup.out
*.out
*.log
144 changes: 144 additions & 0 deletions API/api_worker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
import os
import pathlib
import orjson
import shutil
import time
import requests
from datetime import datetime as dt
import zipfile
from celery import Celery
from src.galaxy.config import config
from fastapi.responses import JSONResponse
from src.galaxy.query_builder.builder import format_file_name_str
from src.galaxy.validation.models import RawDataOutputType
from src.galaxy.app import RawData, S3FileTransfer
from src.galaxy.config import use_s3_to_upload, logger as logging, config

celery = Celery(__name__)
celery.conf.broker_url = config.get(
"CELERY", "CELERY_BROKER_URL", fallback="redis://localhost:6379"
)
celery.conf.result_backend = config.get(
"CELERY", "CELERY_RESULT_BACKEND", fallback="redis://localhost:6379"
) # using redis as backend , make sure you have redis server started on your system on port 6379

celery.conf.task_serializer = 'pickle'
celery.conf.result_serializer = 'pickle'
celery.conf.accept_content = ['application/json', 'application/x-python-serialize']


@celery.task(bind=True, name="process_raw_data")
def process_raw_data(self, incoming_scheme, incoming_host, params):
try:
start_time = dt.now()
if params.output_type is None: # if no ouput type is supplied default is geojson output
params.output_type = RawDataOutputType.GEOJSON.value

# unique id for zip file and geojson for each export
if params.file_name:
# need to format string from space to _ because it is filename , may be we need to filter special character as well later on
formatted_file_name = format_file_name_str(params.file_name)
# exportname = f"{formatted_file_name}_{datetime.now().isoformat()}_{str(self.request.id)}"
exportname = f"""{formatted_file_name}_{str(self.request.id)}_{params.output_type}""" # disabled date for now

else:
# exportname = f"Raw_Export_{datetime.now().isoformat()}_{str(self.request.id)}"
exportname = f"Raw_Export_{str(self.request.id)}_{params.output_type}"

logging.info("Request %s received", exportname)

dump_temp_file, geom_area, root_dir_file = RawData(
params).extract_current_data(exportname)
path = f"""{root_dir_file}{exportname}/"""

if os.path.exists(path) is False:
return JSONResponse(
status_code=400,
content={"Error": "Request went too big"}
)

logging.debug('Zip Binding Started !')
# saving file in temp directory instead of memory so that zipping file will not eat memory
zip_temp_path = f"""{root_dir_file}{exportname}.zip"""
zf = zipfile.ZipFile(zip_temp_path, "w", zipfile.ZIP_DEFLATED)

directory = pathlib.Path(path)
for file_path in directory.iterdir():
zf.write(file_path, arcname=file_path.name)

# Compressing geojson file
zf.writestr("clipping_boundary.geojson",
orjson.dumps(dict(params.geometry)))

zf.close()
logging.debug('Zip Binding Done !')
inside_file_size = 0
for temp_file in dump_temp_file:
# clearing tmp geojson file since it is already dumped to zip file we don't need it anymore
if os.path.exists(temp_file):
inside_file_size += os.path.getsize(temp_file)

# remove the file that are just binded to zip file , we no longer need to store it
remove_file(path)
# check if download url will be generated from s3 or not from config
if use_s3_to_upload:
file_transfer_obj = S3FileTransfer()
download_url = file_transfer_obj.upload(zip_temp_path, exportname)
else:
# getting from config in case api and frontend is not hosted on same url
client_host = config.get(
"API_CONFIG", "api_host", fallback=f"""{incoming_scheme}://{incoming_host}""")
client_port = config.get("API_CONFIG", "api_port", fallback=8000)

if client_port:
download_url = f"""{client_host}:{client_port}/v1/exports/{exportname}.zip""" # disconnected download portion from this endpoint because when there will be multiple hits at a same time we don't want function to get stuck waiting for user to download the file and deliver the response , we want to reduce waiting time and free function !
else:
download_url = f"""{client_host}/v1/exports/{exportname}.zip""" # disconnected download portion from this endpoint because when there will be multiple hits at a same time we don't want function to get stuck waiting for user to download the file and deliver the response , we want to reduce waiting time and free function !

# getting file size of zip , units are in bytes converted to mb in response
zip_file_size = os.path.getsize(zip_temp_path)
# watches the status code of the link provided and deletes the file if it is 200
watch_s3_upload(download_url, zip_temp_path)
response_time = dt.now() - start_time
response_time_str = str(response_time)
logging.info(
f"Done Export : {exportname} of {round(inside_file_size/1000000)} MB / {geom_area} sqkm in {response_time_str}")
return {"download_url": download_url, "file_name": exportname, "process_time": response_time_str, "query_area": f"""{geom_area} Sq Km """, "binded_file_size": f"""{round(inside_file_size/1000000,2)} MB""", "zip_file_size_bytes": zip_file_size}

except Exception as ex:
raise ex


def remove_file(path: str) -> None:
"""Used for removing temp file dir and its all content after zip file is delivered to user"""
try:
shutil.rmtree(path)
except OSError as ex:
logging.error("Error: %s - %s.", ex.filename, ex.strerror)


def watch_s3_upload(url: str, path: str) -> None:
"""Watches upload of s3 either it is completed or not and removes the temp file after completion

Args:
url (_type_): url generated by the script where data will be available
path (_type_): path where temp file is located at
"""
start_time = time.time()
remove_temp_file = True
check_call = requests.head(url).status_code
if check_call != 200:
logging.debug("Upload is not done yet waiting ...")
while check_call != 200: # check until status is not green
check_call = requests.head(url).status_code
if time.time() - start_time > 300:
logging.error(
"Upload time took more than 5 min , Killing watch : %s , URL : %s", path, url)
remove_temp_file = False # don't remove the file if upload fails
break
time.sleep(3) # check each 3 second
# once it is verfied file is uploaded finally remove the file
if remove_temp_file:
logging.debug(
"File is uploaded at %s , flushing out from %s", url, path)
os.unlink(path)
10 changes: 9 additions & 1 deletion API/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,11 @@
# from .test_router import router as test_router
from .status import router as status_router
from src.galaxy.db_session import database_instance
from src.galaxy.config import use_connection_pooling, use_s3_to_upload, logger as logging, config
from src.galaxy.config import limiter, use_connection_pooling, use_s3_to_upload, logger as logging, config
from fastapi_versioning import VersionedFastAPI
from slowapi import _rate_limit_exceeded_handler
from slowapi.errors import RateLimitExceeded


# only use sentry if it is specified in config blocks
if config.get("SENTRY", "dsn", fallback=None):
Expand All @@ -56,6 +59,8 @@
import os
os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'



app = FastAPI(title="Galaxy API")

# app.include_router(test_router)
Expand All @@ -81,6 +86,9 @@
version_format='{major}', prefix_format='/v{major}')


app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

origins = ["*"]


Expand Down
Loading