Skip to content

Commit

Permalink
Add docs: incr. views/remove dead feature mentions (#38)
Browse files Browse the repository at this point in the history
The user-facing README and such did not include mention of
make_view_incremental, so I added that where it was missing. In add-
ition, the future feature section still contained entries we have de-
cided are no longer on the roadmap, so updated that, too.

Co-authored-by: Adam Hendel <[email protected]>
  • Loading branch information
jasonmp85 and ChuckHend authored Nov 19, 2024
1 parent c142d94 commit 6397286
Show file tree
Hide file tree
Showing 2 changed files with 74 additions and 2 deletions.
66 changes: 66 additions & 0 deletions doc/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -458,6 +458,72 @@ WITH rides AS (

The 100th percentile is likely bad data, but the rest is interesting! Most rides are under ten minutes, but one in ten exceeds a half-hour. Putting this in to a `MATERIALIZED VIEW` and refreshing it weekly might make a nice source for an office dashboard or other visualization.

## Using incremental views

On that note, a `MATERIALIZED VIEW` refreshed periodically is a nice idea, but will only reflect reality as often as it is refreshed. `pg_timeseries` includes a better solution: incremental views. The technology underlying this feature is provided by [`pg_ivm`](https://github.com/sraoss/pg_ivm).

First, create a normal `VIEW` for your query. We'll reuse the query about bicycle type from the previous section, but modified to remove the restriction on date (our view should show all ongoing data) and to remove `ORDER BY` (not supported by `pg_ivm`)…

```sql
CREATE VIEW monthly_type_counts AS
SELECT
date_trunc('month', started_at)::date AS month,
SUM(CASE WHEN rideable_type = 'classic_bike' THEN 1 ELSE 0 END) AS classic,
SUM(CASE WHEN rideable_type = 'docked_bike' THEN 1 ELSE 0 END) AS docked,
SUM(CASE WHEN rideable_type = 'electric_bike' THEN 1 ELSE 0 END) AS electric
FROM divvy_trips
GROUP BY month;
```

Next, we simply call the `make_view_incremental` function. When it returns, our plain `VIEW` will have been changed to point at an incremental view, which `pg_ivm` ensures stays up to date…

```sql
SELECT make_view_incremental('monthly_type_counts');
```

`make_view_incremental` rejects views whose queries combine a partitioned table with any other relation, so ensure your queries reference only a single partitioned table. This restriction may be lifted in a future release.

Let's check out our imcremental view…

```sql
SELECT * FROM monthly_type_counts ORDER BY month DESC LIMIT 1;
```

```
┌────────────┬─────────┬────────┬──────────┐
│ month │ classic │ docked │ electric │
├────────────┼─────────┼────────┼──────────┤
│ 2024-02-01 │ 1877 │ 0 │ 1099 │
└────────────┴─────────┴────────┴──────────┘
```

What if we add a docked bicycle ride on Valentine's Day 2024?

```sql
INSERT INTO divvy_trips (
ride_id,
rideable_type,
started_at,
ended_at)
VALUES (
'1234567890',
'docked_bike',
'2024-02-14',
'2024-02-15');

SELECT * FROM monthly_type_counts ORDER BY month DESC LIMIT 1;
```

```
┌────────────┬─────────┬────────┬──────────┐
│ month │ classic │ docked │ electric │
├────────────┼─────────┼────────┼──────────┤
│ 2024-02-01 │ 1877 │ 1 │ 1099 │
└────────────┴─────────┴────────┴──────────┘
```

Nice! The view is updated immediately. _Note: incremental view updates are implemented using triggers, which will impact your write throughtput._ If keeping a view up-to-date is worth this overhead, incremental views are a good solution, but be sure to use them wisely if you are concerned about throughput.

## Configuring retention

Up until now we've been exploring older data, but in a timeseries system it's usually the case that new data is always being appended to a main table and older data either rolls off to long-term storage or is dropped entirely.
Expand Down
10 changes: 8 additions & 2 deletions doc/timeseries.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ CREATE EXTENSION timeseries CASCADE;
NOTICE: installing required extension "columnar"
NOTICE: installing required extension "pg_cron"
NOTICE: installing required extension "pg_partman"
NOTICE: installing required extension "pg_ivm"
CREATE EXTENSION
```

Expand Down Expand Up @@ -110,13 +111,20 @@ The output of this query will differ from simply hitting the target table direct
* The time column's values will be binned to the provided width
* Extra rows will be added for periods with no data. They will include the time stamp for that bin and NULL in all other columns

### `make_view_incremental`

This function accepts a view and converts it into a materialized view which is kept up-to-date after every modification. This removes the need for users to pick between always up-to-date `VIEW`s and having to call `REFRESH` on `MATERIALIZED VIEW`s.

The underlying functionality is provided by [`pg_ivm`](https://github.com/sraoss/pg_ivm); consult that project for more information.

## Requirements

As seen in the Docker installation demonstration, the `pg_timeseries` extension depends on three other extensions:

* [Hydra Columnar](https://github.com/hydradatabase/hydra)
* [pg_cron](https://github.com/citusdata/pg_cron)
* [pg_partman](https://github.com/pgpartman/pg_partman)
* [pg_ivm](https://github.com/sraoss/pg_ivm)

We recommend referring to documentation within these projects for more advanced use cases, or for a better understanding at how this extension works.

Expand All @@ -129,11 +137,9 @@ This list is somewhat ordered by likelihood of near-term delivery, or maybe diff
- Assorted "analytic" functions frequently associated with time-series workloads
- Periodic `REFRESH MATERIALIZED VIEW` — set schedules for background refresh of materialized views (useful for dashboarding, etc.)
- Roll-off to `TABLESPACE` — as data ages, it will be moved into a specified table space
- Use of "tiered storage", i.e. moving older partitions to be stored in S3 rather than on-disk
- Automatic `CLUSTER BY`/repack for non-live partitions
- Migration tools — adapters for existing time-scale installations to ease migration and promote best practices in new table configuration
- "Approximate" functions — maintain statistics within known error bounds without rescanning all data
- Change partition width — modify partition width of existing table (for future data)
- "Roll-up and roll-off" — as data ages, combine multiple rows into single summary rows
- Incremental view maintenance — define views which stay up-to-date with incoming data without the performance hit of a `REFRESH`
- Repartition — modify partition width of existing table data

0 comments on commit 6397286

Please sign in to comment.