feat: add an aggregated velocity statistics table #24

NoamGaash · 2024-12-23T18:09:08Z

No description provided.

OriHoch · 2024-12-24T07:35:29Z

I tried to run this query manually for a single day

by adding WHERE date >= '2024-12-01' and date < '2024-12-02'

to the HourlyAverages query - it takes 10 seconds and returns ~20,000 rows
to the external query - it was running for more then 5 minutes so I stopped it

extrapolating to all the data we have - which currently is ~2 years the total number of rows will be 20,000 * 730 = 14,600,000

because it's a materialized view which stores all its rows and taking into account the indices, this will add significant size to the DB. Also, the load on the DB for running the materialized view refresh needs to be taken into account, we will need to run it periodically.

I think because the basic query takes 10 seconds per day, and assuming it won't run very often it's worth to just run the query itself, or maybe add a non-materizlied view.

OriHoch

^

NoamGaash · 2024-12-24T07:59:36Z

Thank you!
Ideally, I'd like to have some cache mechanism, because I want to expose an API endpoint to get a heatmap of velocity statistics, that I will call to from the frontent.
When considering 14 million rows, each row contains 7 values (8 bytes each I guess?) is about 56 bytes per row * 14,600,000 ~= 900MB. I can see how that's a lot.
The front-end will cache the responses, but it's still computationally heavy operation.
I'll read about non materialized views.
Thank you very much for the fast and educative feedback 🙏

NoamGaash · 2025-01-17T17:52:21Z

Hi @OriHoch ! I came out with this proposal to implement some kind of cache mechanism.

the last used column will store the last time a specific date was calculated. It will be used to remove old entries from the DB.

I made an implementation for the API and tested it locally
hasadna/open-bus-stride-api#44

OriHoch · 2025-01-18T10:46:20Z

Currently the API is strictly for SELECT and I want to keep it that way as introducing updates/inserts into it introduces a lot of complexity and risk I want to keep out
Caching mechanisms I would prefer not to implement on the DB as that also introduces risks and scale problems, if you want caching we can add a Redis server but I prefer if we can avoid it
The idea of adding a new table and populating it is good, but the way to do it is via the ETL system, you can add a task that runs daily, iterates over all the dates for which all the source data exists and for which no data is in the new table and inserts their data to this table. Most of the tasks in the open_bus_stride_etl do something like this. I would name this table a more generic name maybe something like siri_vehicle_locations_daily_stats so in the future we can add other statistics to this table

NoamGaash and others added 2 commits December 23, 2024 20:08

feat: add an aggregated velocity statistics view

4720235

automatic update of the data model documentation

3ed2117

NoamGaash marked this pull request as ready for review December 23, 2024 18:24

NoamGaash requested a review from OriHoch December 23, 2024 18:24

OriHoch requested changes Dec 24, 2024

View reviewed changes

feat: replace materialized view with a table

89a2b13

NoamGaash changed the title ~~feat: add an aggregated velocity statistics view~~ feat: add an aggregated velocity statistics table Jan 17, 2025

NoamGaash mentioned this pull request Jan 17, 2025

feat: add an endpoint for aggregated velocity stats hasadna/open-bus-stride-api#44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add an aggregated velocity statistics table #24

feat: add an aggregated velocity statistics table #24

NoamGaash commented Dec 23, 2024

OriHoch commented Dec 24, 2024

OriHoch left a comment

NoamGaash commented Dec 24, 2024

NoamGaash commented Jan 17, 2025

OriHoch commented Jan 18, 2025

feat: add an aggregated velocity statistics table #24

Are you sure you want to change the base?

feat: add an aggregated velocity statistics table #24

Conversation

NoamGaash commented Dec 23, 2024

OriHoch commented Dec 24, 2024

OriHoch left a comment

Choose a reason for hiding this comment

NoamGaash commented Dec 24, 2024

NoamGaash commented Jan 17, 2025

OriHoch commented Jan 18, 2025