This is a project of 7th group of 24th stream of Yandex Practicum for Middle Python Developers. The goal of the project is to build a online streaming platform.
This repository contains User Generated Content Service (UGC).
Features of UGC service:
- watching progress
Fastapi, Kafka, ClickHouse, MongoDB
ClickHouse was chosen as distributed storage due to its perfomance in synthetic tests. Other options included Vertica and Greenplum.
To launch benchmarks:
- cd /benchmark
- choose and cd to folder with a db you are intested in
- In the root of db folder: docker compose -f up -d --build
- While waiting for containters to launch install requirements at requirements/local.txt in the root of db folder
- In the root of db folder run "run.py"
Greenplum and Vertica
- Greenplum's docker-compose is located in "docker" folder: docker compose -f up -d --build
- Greenplum's "run.py" is located in greenplum/src/
- Vertica's "run.py" is named "main.py" and located in vertica/src/
Note: You can also edit generating content scripts at utils.py. It is also crucial to edit pre-test scripts concerning all dbs except MongoDB
Benchmark results
Created database, table, inserted 10000000 rows
func:'check_write' took: 0.0220 sec
Selected 1000 rows Sample Data: ['1509', '218846485', '6402693977', '2007-03-26 21:33:32'] func:'check_read' took: 0.0809 sec
Selected 1000 rows Sample Data: ['1509', '218846485', '6402693977', '2007-03-26 21:33:32'] func:'check_read' took: 0.0135 sec func:'check_write' took: 0.0205 sec
func:'check_write' took: 472.5615 sec
Selected 10000000 rows Sample Data: ['1698400264', '6350138181', '2329006959', '1997-01-21 15:24:22'] func:'chech_read' took: 0.3870 sec
Selected 10000000 rows Sample Data: ['8497188662', '2435800366', '5013646629', '2015-12-13 04:44:03'] func:'chech_read' took: 0.0625 sec
func:'check_write' took: 472.5615 sec
Selected 10000000 rows Sample Data: ['1698400264', '6350138181', '2329006959', '1997-01-21 15:24:22'] func:'chech_read' took: 0.3870 sec
Selected 10000000 rows Sample Data: ['8497188662', '2435800366', '5013646629', '2015-12-13 04:44:03'] func:'chech_read' took: 0.0625 sec
Inserted 1000 rows func:'check_write_likes' took: 0.1114 sec func:'check_write_bookmarks' took: 0.2343 sec func:'check_write_reviews' took: 1.9533 sec
Selected 1000 rows func:'check_read_likes' took: 0.0039 sec func:'check_read_bookmarks' took: 0.0033 sec func:'check_read_reviews' took: 0.0149 sec
Selected 1000 rows func:'check_read_likes' took: 0.0130 sec func:'check_read_bookmarks' took: 0.0030 sec func:'check_read_reviews' took: 0.0119 sec Inserted 1000 rows func:'check_write_likes' took: 0.1235 sec func:'check_write_bookmarks' took: 0.2437 sec func:'check_write_reviews' took: 1.9946 sec
Inserted 1000 rows func:'check_write_likes' took: 4.6993 sec func:'check_write_bookmarks' took: 4.9062 sec func:'check_write_reviews' took: 4.6401 sec
Selected 1000 rows func:'check_read_likes' took: 0.0504 sec func:'check_read_bookmarks' took: 0.0102 sec func:'check_read_reviews' took: 0.0626 sec
Selected 1000 rows func:'check_read_likes' took: 0.0064 sec func:'check_read_bookmarks' took: 0.0047 sec func:'check_read_reviews' took: 0.0377 sec Inserted 1000 rows func:'check_write_likes' took: 2.4458 sec func:'check_write_bookmarks' took: 2.9509 sec func:'check_write_reviews' took: 3.6189 sec
We chose MongoDB due to its perfomance in read/write benchmark including the perfomance on load even on relatively weak machine for our ugc endpoints. The reading is outstanding and the writing is decent compared to other dbs. Pros: MongoDB is easy to deploy, its collections are flexible, almost no preparations needed to use it out of box. Cons: unusual syntax for those who is used to relational DBs.
Go to the main repo movix for instructions.