Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: REINDEX CONCURRENTLY of some _timescaledb_catalog indexes causes "ERROR: could not open relation with OID" in all sessions #7590

Open
pmenke-de opened this issue Jan 14, 2025 · 0 comments
Labels

Comments

@pmenke-de
Copy link

pmenke-de commented Jan 14, 2025

What type of bug is this?

Crash, Unexpected error

What subsystems and features are affected?

Other

What happened?

REINDEX CONCURRENTLY of at least the following indexes in _timescaledb_catalog causes all concurrent sessions on the same database to become unusable:

  • _timescaledb_catalog.chunk_schema_name_table_name_key
  • _timescaledb_catalog.hypertable_table_name_schema_name_key

Queries following the REINDEX will fail with ERROR: could not open relation with OID nnnnn; nnnnn being the OID of the index before reindexing. This affects queries within the same session, as well as queries in concurrent sessions. Sessions started after completion of the REINDEX don't seem to be affected (reconnecting "fixes" the issue).

My suspicion is, that the timescaledb extension keeps a cache of the index OIDs somewhere, which doesn't get invalidated on REINDEX CONCURRENTLY. The PostgreSQL docs mention explicitly that they invalidate such caches within PostgreSQL itself (step 4 in "Rebuilding Indexes Concurrently").

The issue is easily reproduced in a fresh database with timescaledb enabled:
(text moved to "How can we reproduce the bug?" section)

Observations:
By listing the OIDs from pg_class before the REINDEX, the affected indexes can be determined. In this test for me it always was _timescaledb_catalog.chunk_schema_name_table_name_key. Our actual database, which suffered from this issue, also reported the OID of _timescaledb_catalog.hypertable_table_name_schema_name_key in concurrent sessions.

Reindexing those identified indexes individually produces the same issue; REINDEX DATABASE CONCURRENTLY is not strictly required:
REINDEX INDEX CONCURRENTLY _timescaledb_catalog.chunk_schema_name_table_name_key; produces the same result.

REINDEX without CONCURRENTLY doesn't produce the issue, as the OID of the index doesn't change.

TimescaleDB version affected

2.17.2

PostgreSQL version used

16.6, 17.2

What operating system did you use?

Debian GNU/Linux 12 on x64

What installation method did you use?

Deb/Apt, Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

-- docker logs of the listed reproduction steps. 
2025-01-14 10:31:45.766 UTC [1] LOG:  starting PostgreSQL 16.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 13.2.1_git20240309) 13.2.1 20240309, 64-bit
2025-01-14 10:31:45.766 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2025-01-14 10:31:45.766 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2025-01-14 10:31:45.769 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2025-01-14 10:31:45.772 UTC [74] LOG:  database system was shut down at 2025-01-14 10:31:45 UTC
2025-01-14 10:31:45.777 UTC [1] LOG:  database system is ready to accept connections
2025-01-14 10:31:45.778 UTC [77] LOG:  TimescaleDB background worker launcher connected to shared catalogs
2025-01-14 10:32:08.604 UTC [93] ERROR:  could not open relation with OID 17289
2025-01-14 10:32:08.607 UTC [1] LOG:  background worker "Telemetry Reporter [1]" (PID 93) exited with exit code 1
2025-01-14 10:32:08.607 UTC [92] ERROR:  could not open relation with OID 17289
2025-01-14 10:32:08.610 UTC [1] LOG:  background worker "TimescaleDB Background Worker Scheduler" (PID 92) exited with exit code 1
2025-01-14 10:32:13.259 UTC [90] ERROR:  could not open relation with OID 17210
2025-01-14 10:32:13.259 UTC [90] STATEMENT:  SELECT n.nspname as "Schema",
	  c.relname as "Name",
	  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 't' THEN 'TOAST table' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'partitioned table' WHEN 'I' THEN 'partitioned index' END as "Type",
	  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
	FROM pg_catalog.pg_class c
	     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
	     LEFT JOIN pg_catalog.pg_am am ON am.oid = c.relam
	WHERE c.relkind IN ('r','p','v','m','S','f','')
	      AND n.nspname <> 'pg_catalog'
	      AND n.nspname !~ '^pg_toast'
	      AND n.nspname <> 'information_schema'
	  AND pg_catalog.pg_table_is_visible(c.oid)
	ORDER BY 1,2;

How can we reproduce the bug?

# 1. start a fresh timescaledb server
docker run -d --name timescale-test -e POSTGRES_PASSWORD=password timescale/timescaledb:latest-pg16
# 2. run psql against the server
docker exec -ti --user postgres timescale-test psql
-- in psql:
-- 3. create a test database, and connect
CREATE DATABASE test;
\c test
-- execute at least one query (seems to be required to get the assumed cache populated)
-- \d performs as SELECT on pg_catalog to list tables
\d
-- perform REINDEX
REINDEX DATABASE CONCURRENTLY;
-- execute some query and observe the error
\d
ERROR:  could not open relation with OID 17210
@pmenke-de pmenke-de added the bug label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant