From 639728699a4615044b4886c73d9d6c7b4c0c400e Mon Sep 17 00:00:00 2001 From: Jason Petersen Date: Tue, 19 Nov 2024 08:03:51 -0700 Subject: [PATCH] Add docs: incr. views/remove dead feature mentions (#38) The user-facing README and such did not include mention of make_view_incremental, so I added that where it was missing. In add- ition, the future feature section still contained entries we have de- cided are no longer on the roadmap, so updated that, too. Co-authored-by: Adam Hendel --- doc/guide.md | 66 +++++++++++++++++++++++++++++++++++++++++++++++ doc/timeseries.md | 10 +++++-- 2 files changed, 74 insertions(+), 2 deletions(-) diff --git a/doc/guide.md b/doc/guide.md index 223f6ad..02d9ced 100644 --- a/doc/guide.md +++ b/doc/guide.md @@ -458,6 +458,72 @@ WITH rides AS ( The 100th percentile is likely bad data, but the rest is interesting! Most rides are under ten minutes, but one in ten exceeds a half-hour. Putting this in to a `MATERIALIZED VIEW` and refreshing it weekly might make a nice source for an office dashboard or other visualization. +## Using incremental views + +On that note, a `MATERIALIZED VIEW` refreshed periodically is a nice idea, but will only reflect reality as often as it is refreshed. `pg_timeseries` includes a better solution: incremental views. The technology underlying this feature is provided by [`pg_ivm`](https://github.com/sraoss/pg_ivm). + +First, create a normal `VIEW` for your query. We'll reuse the query about bicycle type from the previous section, but modified to remove the restriction on date (our view should show all ongoing data) and to remove `ORDER BY` (not supported by `pg_ivm`)… + +```sql +CREATE VIEW monthly_type_counts AS + SELECT + date_trunc('month', started_at)::date AS month, + SUM(CASE WHEN rideable_type = 'classic_bike' THEN 1 ELSE 0 END) AS classic, + SUM(CASE WHEN rideable_type = 'docked_bike' THEN 1 ELSE 0 END) AS docked, + SUM(CASE WHEN rideable_type = 'electric_bike' THEN 1 ELSE 0 END) AS electric + FROM divvy_trips + GROUP BY month; +``` + +Next, we simply call the `make_view_incremental` function. When it returns, our plain `VIEW` will have been changed to point at an incremental view, which `pg_ivm` ensures stays up to date… + +```sql +SELECT make_view_incremental('monthly_type_counts'); +``` + +`make_view_incremental` rejects views whose queries combine a partitioned table with any other relation, so ensure your queries reference only a single partitioned table. This restriction may be lifted in a future release. + +Let's check out our imcremental view… + +```sql +SELECT * FROM monthly_type_counts ORDER BY month DESC LIMIT 1; +``` + +``` +┌────────────┬─────────┬────────┬──────────┐ +│ month │ classic │ docked │ electric │ +├────────────┼─────────┼────────┼──────────┤ +│ 2024-02-01 │ 1877 │ 0 │ 1099 │ +└────────────┴─────────┴────────┴──────────┘ +``` + +What if we add a docked bicycle ride on Valentine's Day 2024? + +```sql +INSERT INTO divvy_trips ( + ride_id, + rideable_type, + started_at, + ended_at) + VALUES ( + '1234567890', + 'docked_bike', + '2024-02-14', + '2024-02-15'); + +SELECT * FROM monthly_type_counts ORDER BY month DESC LIMIT 1; +``` + +``` +┌────────────┬─────────┬────────┬──────────┐ +│ month │ classic │ docked │ electric │ +├────────────┼─────────┼────────┼──────────┤ +│ 2024-02-01 │ 1877 │ 1 │ 1099 │ +└────────────┴─────────┴────────┴──────────┘ +``` + +Nice! The view is updated immediately. _Note: incremental view updates are implemented using triggers, which will impact your write throughtput._ If keeping a view up-to-date is worth this overhead, incremental views are a good solution, but be sure to use them wisely if you are concerned about throughput. + ## Configuring retention Up until now we've been exploring older data, but in a timeseries system it's usually the case that new data is always being appended to a main table and older data either rolls off to long-term storage or is dropped entirely. diff --git a/doc/timeseries.md b/doc/timeseries.md index e12bbf4..ed2d2dd 100644 --- a/doc/timeseries.md +++ b/doc/timeseries.md @@ -31,6 +31,7 @@ CREATE EXTENSION timeseries CASCADE; NOTICE: installing required extension "columnar" NOTICE: installing required extension "pg_cron" NOTICE: installing required extension "pg_partman" +NOTICE: installing required extension "pg_ivm" CREATE EXTENSION ``` @@ -110,6 +111,12 @@ The output of this query will differ from simply hitting the target table direct * The time column's values will be binned to the provided width * Extra rows will be added for periods with no data. They will include the time stamp for that bin and NULL in all other columns +### `make_view_incremental` + +This function accepts a view and converts it into a materialized view which is kept up-to-date after every modification. This removes the need for users to pick between always up-to-date `VIEW`s and having to call `REFRESH` on `MATERIALIZED VIEW`s. + +The underlying functionality is provided by [`pg_ivm`](https://github.com/sraoss/pg_ivm); consult that project for more information. + ## Requirements As seen in the Docker installation demonstration, the `pg_timeseries` extension depends on three other extensions: @@ -117,6 +124,7 @@ As seen in the Docker installation demonstration, the `pg_timeseries` extension * [Hydra Columnar](https://github.com/hydradatabase/hydra) * [pg_cron](https://github.com/citusdata/pg_cron) * [pg_partman](https://github.com/pgpartman/pg_partman) +* [pg_ivm](https://github.com/sraoss/pg_ivm) We recommend referring to documentation within these projects for more advanced use cases, or for a better understanding at how this extension works. @@ -129,11 +137,9 @@ This list is somewhat ordered by likelihood of near-term delivery, or maybe diff - Assorted "analytic" functions frequently associated with time-series workloads - Periodic `REFRESH MATERIALIZED VIEW` — set schedules for background refresh of materialized views (useful for dashboarding, etc.) - Roll-off to `TABLESPACE` — as data ages, it will be moved into a specified table space - - Use of "tiered storage", i.e. moving older partitions to be stored in S3 rather than on-disk - Automatic `CLUSTER BY`/repack for non-live partitions - Migration tools — adapters for existing time-scale installations to ease migration and promote best practices in new table configuration - "Approximate" functions — maintain statistics within known error bounds without rescanning all data - Change partition width — modify partition width of existing table (for future data) - "Roll-up and roll-off" — as data ages, combine multiple rows into single summary rows - - Incremental view maintenance — define views which stay up-to-date with incoming data without the performance hit of a `REFRESH` - Repartition — modify partition width of existing table data