-
-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add --stats-by-route-id-and-date option 📝 [todo:postgraphile?,tests?,…
…benchmark] Supports analyzing a feed by route and/or date and/or day of the week. part of #33
- Loading branch information
Showing
7 changed files
with
145 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# analysing a GTFS dataset by route ID and/or date | ||
|
||
Are you trying to answer a question like those below? | ||
|
||
- Are there certain dates or days of the week that have sigificantly less arrivals/departures (hereinafter "stop time events")? – This *may* indicate errors in the data, e.g. a faulty `calendar.csv` or `calendar_dates.csv` file. | ||
- Has the number of stop time events decreased, compared to the last dataset version? | ||
- Do specific routes stop running during certain time periods? | ||
|
||
`gtfs-via-postgres` optionally provides a **(materialized) view `stats_by_route_id_and_date` to help with such SQL queries. Use the `--stats-by-route-id-and-date` flag to enable it** in the generated SQL: | ||
|
||
- If you run `gtfs-to-sql` with `--stats-by-route-id-and-date=view`, `stats_by_route_id_and_date` will be a "regular" non-materialized view. Use this option if you want to import the GTFS data quickly, and if you only query `stats_by_route_id_and_date` rarely or in time-uncritical scenarios. | ||
- If you pass `--stats-by-route-id-and-date=materialized-view`, the `stats_by_route_id_and_date` view will [be materialized](https://www.postgresql.org/docs/14/rules-materializedviews.html). Use this option if you need fast queries, and if you can tolerate significantly longer import times (3m for the 64mb 2023-03-05 SNCB/NMBS GTFS feed, todo for the 540mb 2023-02-27 VBB GTFS feed). | ||
|
||
`stats_by_route_id_and_date` has the following columns: | ||
|
||
- `date` | ||
- `dow` – day of the week, following the [PostgreSQL notation `0` (Sunday) to `6` (Saturday)](https://www.postgresql.org/docs/14/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT) | ||
- `nr_of_trips` – nr of trips starting on that date | ||
- `nr_of_arrs_deps` – nr of trips taking place on that date | ||
- `is_effective` – wether `nr_of_trips` & `nr_of_arrs_deps` are calculated based on the *effective* date (i.e. the date that the stop time event actually happens on) or *schedule* date (i.e. the date which their `stop_time` rows refer to) | ||
|
||
So | ||
|
||
- if you want to take a customer-facing perspective on the data (as in "I don't care which if trips are scheduled before midnight, I want to know if they run today"), filter for `is_effective = True` rows; | ||
- If you're interested in the operational/planning perspective (e.g. if you're looking for data errors), filter for `is_effective = False` rows. | ||
|
||
## example: nr of effective stop time events of a single route over a week | ||
|
||
```sql | ||
-- using VBB's 2023-02-27 GTFS data | ||
SELECT | ||
*, | ||
EXTRACT(dow FROM "date") as dow | ||
FROM stats_by_route_id_and_date stats | ||
WHERE is_effective = True | ||
AND route_id = '17438_900', -- M1 tram line | ||
AND "date" >= '2023-03-19' -- Sunday, dow = 0 | ||
AND "date" <= '2023-03-25' -- Saturday, dow = 6 | ||
ORDER BY route_id, "date", is_effective DESC | ||
``` | ||
|
||
```csv | ||
route_id,date,dow,nr_of_trips,nr_of_arrs_deps,is_effective,dow | ||
17438_900,2023-03-19,0,258,5870,t,0 | ||
17438_900,2023-03-20,1,345,7831,t,1 | ||
17438_900,2023-03-21,2,345,7831,t,2 | ||
17438_900,2023-03-22,3,345,7831,t,3 | ||
17438_900,2023-03-23,4,345,7831,t,4 | ||
17438_900,2023-03-24,5,345,7831,t,5 | ||
17438_900,2023-03-25,6,326,9001,t,6 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
'use strict' | ||
|
||
const afterAll = (opt) => { | ||
let materialized = false | ||
if (opt.statsByRouteIdAndDate === 'materialized-view') { | ||
materialized = true | ||
} else if (opt.statsByRouteIdAndDate !== 'view') { | ||
throw new Error('invalid opt.statsByRouteIdAndDate, must be one of these: none, view, materialized-view.') | ||
} | ||
const createViewCmd = materialized | ||
? `CREATE MATERIALIZED VIEW` | ||
: `CREATE OR REPLACE VIEW` | ||
|
||
return `\ | ||
${createViewCmd} "${opt.schema}".stats_by_route_id_and_date AS | ||
WITH | ||
arrs_deps_with_svc_date AS NOT MATERIALIZED ( | ||
SELECT | ||
route_id, stop_sequence, | ||
"date"::date AS svc_date, | ||
EXTRACT(DOW FROM "date") AS svc_dow | ||
FROM "${opt.schema}".arrivals_departures | ||
), | ||
by_svc_date AS NOT MATERIALIZED ( | ||
SELECT DISTINCT ON (route_id, svc_date) | ||
route_id, | ||
svc_date AS "date", | ||
svc_dow AS dow, | ||
count(*) FILTER (WHERE stop_sequence = 0) OVER (PARTITION BY route_id, svc_date) AS nr_of_trips, | ||
count(*) OVER (PARTITION BY route_id, svc_date) AS nr_of_arrs_deps | ||
FROM arrs_deps_with_svc_date | ||
), | ||
arrs_deps_with_effective_date AS NOT MATERIALIZED ( | ||
SELECT | ||
route_id, stop_sequence, | ||
coalesce(t_departure, t_arrival)::date AS effective_date, | ||
EXTRACT(DOW FROM coalesce(t_departure, t_arrival)) AS effective_dow | ||
FROM "${opt.schema}".arrivals_departures | ||
), | ||
by_effective_date AS NOT MATERIALIZED ( | ||
SELECT DISTINCT ON (route_id, effective_date) | ||
route_id, | ||
effective_date AS "date", | ||
effective_dow AS dow, | ||
count(*) FILTER (WHERE stop_sequence = 0) OVER (PARTITION BY route_id, effective_date) AS nr_of_trips, | ||
count(*) OVER (PARTITION BY route_id, effective_date) AS nr_of_arrs_deps | ||
FROM arrs_deps_with_effective_date | ||
) | ||
SELECT | ||
*, | ||
True AS is_effective | ||
FROM by_effective_date | ||
UNION | ||
SELECT | ||
*, | ||
False AS is_effective | ||
FROM by_svc_date; | ||
${materialized ? `\ | ||
CREATE INDEX ON "${opt.schema}".stats_by_route_id_and_date (route_id); | ||
CREATE INDEX ON "${opt.schema}".stats_by_route_id_and_date ("date"); | ||
CREATE INDEX ON "${opt.schema}".stats_by_route_id_and_date (route_id, "date", is_effective); | ||
CREATE INDEX ON "${opt.schema}".stats_by_route_id_and_date (route_id, dow, is_effective); | ||
` : ''} | ||
` | ||
} | ||
|
||
module.exports = { | ||
afterAll, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ | |
"index.js", | ||
"scripts", | ||
"lib", | ||
"docs", | ||
"example.sh", | ||
"LICENSE", | ||
"LICENSE-PROSPERITY.md", | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters