Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance best practice suggestions #32

Open
dino2gnt opened this issue Sep 20, 2024 · 2 comments
Open

Performance best practice suggestions #32

dino2gnt opened this issue Sep 20, 2024 · 2 comments

Comments

@dino2gnt
Copy link

Hi,

Let me preface this by stating that I am not a PostgreSQL expert, so if there's something obvious I'm not understanding I apologize.

I'm writing a plugin that leverages pg_timeseries in an open source enterprise network monitoring platform. I'm using a simple time series schema:

                                   Partitioned table "public.pgtimeseries_time_series"
 Column |           Type           | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 key    | text                     |           | not null |         | extended |             |              |
 time   | timestamp with time zone |           | not null |         | plain    |             |              |
 value  | double precision         |           |          |         | plain    |             |              |
Partition key: RANGE ("time")
Indexes:
    "pgtimeseries_time_series_key_idx" btree (key)
    "pgtimeseries_time_series_time_idx" btree ("time" DESC)

There are currently several hundred unique keys, but this could grow to tens of thousands for busy instances, making a table per datasource untenable.

# select count(distinct key) from pgtimeseries_time_series;
 count
-------
   637
(1 row)

Samples are collected every five minutes, but the interval is configurable per data source so it could be more or less frequently.

When using date_bin_table for time bucketting, I'm getting query plan results showing a lot of temp IO, as ~a million rows are selected and the filtered during the function scan, resulting in several hundred MB of read and write IO per query. I assume this is caused by the date_bin_table function selecting everything between the start time and end time, then filtering out rows that don't match the key?

=# explain (analyze, buffers) SELECT time AS step, COALESCE(avg(value), 'NaN') as aggregation FROM date_bin_table(NULL::pgtimeseries_time_series, '361 Seconds', '[2024-09-13 08:37:24.305, 2024-09-20 08:37:24.305]') where (key='name=G1YngCollCnt_resourceId=snmp/1/jvm/java_lang_type_GarbageCollector_name_G1_Young_Generation' OR key is null) GROUP BY step ORDER BY step;
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=12.92..13.12 rows=10 width=16) (actual time=1400.602..1401.472 rows=1676 loops=1)
   Group Key: "time"
   Buffers: shared hit=116131 read=5947 written=5, temp read=53560 written=53607
   ->  Sort  (cost=12.92..12.94 rows=10 width=16) (actual time=1400.590..1400.746 rows=2016 loops=1)
         Sort Key: "time"
         Sort Method: quicksort  Memory: 159kB
         Buffers: shared hit=116131 read=5947 written=5, temp read=53560 written=53607
         ->  Function Scan on date_bin_table  (cost=0.25..12.75 rows=10 width=16) (actual time=1243.910..1399.759 rows=2016 loops=1)
               Filter: ((key = 'name=G1YngCollCnt_resourceId=snmp/1/jvm/java_lang_type_GarbageCollector_name_G1_Young_Generation'::text) OR (key IS NULL))
               Rows Removed by Filter: 994966
               Buffers: shared hit=116128 read=5947 written=5, temp read=53560 written=53607
 Planning:
   Buffers: shared hit=19 read=2
 Planning Time: 0.279 ms
 Execution Time: 1416.850 ms
(15 rows)

See also: https://explain.depesz.com/s/G4gq

Granted, this development instance is quite under-resourced and the clock time here is illustrative of that, but as the total number of keys grows the absolute number of rows between a given start and end will grow, creating even more IO as more rows are selected and discarded.

Is there something in my schema design or query that violates the expected best practice for pg_timeseries that I should change to mitigate this circumstance? Are there indexes that date_bin_table expects or could take advantage of that I am missing?

@jasonmp85
Copy link
Contributor

Hey there, @dino2gnt

The query performed by date_bin_table does include the date bounds, so it's not "filtering out" any of your data in memory (well, at least not PL/pgSQL memory), it filters it as part of a SELECT.

That said, PL/pgSQL is not capable of streaming rows to a caller. As such, the result set it is building here does reside in memory before being scanned by the parent query.

date_bin_table, dealing with aligning data to a stride, is more suited for use as a late step in preparing data for something like an aggregation system or charting; I don't know how well it will handle millions of rows in a scenario such as this.

That all said, your feedback and use case is important… PL/pgSQL is always just a first step, and it would be possible to reimplement this function as a custom scan node in C, which would have the side-effect of cleaning up the usage syntax and also make the more stream/scan-like approach possible. I'll consider this a feature request.

What is it you need to accomplish with this query? Maybe there's some workaround I can help you with in the meantime?

@dino2gnt
Copy link
Author

What is it you need to accomplish with this query? Maybe there's some workaround I can help you with in the meantime?

In the application i am developing this plugin for, the time series request comes into the plugin containing the start and end times, the key, the aggregation (if any), the metric type, and the stride (or step, in seconds, dynamically calculated by something higher up in the stack). This lent itself to jamming everything into a query that took advantage of date_bin_table to return an aggregated result in the correct time buckets for the request, which is what the application expects to have returned from a timeseries plugin. It kept the SQL and the plugin code simple.

I spent some time this afternoon working on this, and came up with this CTE

In my testing environment, this returns numerically equal results to date_bin_table while significantly faster in wallclock time, but with much less temporary IO... but it is mostly the result of me reading a bunch of Stackoverflow posts, PostgreSQL docs, and a lot of trial and error, and so can probably still be optimized by someone that actually knows what they're doing :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants