Build a cache of Postgres system objects #185

jmafc · 2018-03-02T03:29:39Z

Issue #175, the test_operclass.py tests that were marked xfail in the last phase of dealing with #176 and issue #183 (probably also #184) all point to deficiencies in Pyrseas knowledge of PG system types and functions, such as integer and tsvector_concat. These objects reside in the pg_catalog schema and Pyrseas has always avoided fetching those objects.

I doubt that more than a few users CREATE OPERCLASSes on something like the pg_catalog.integer type as shown in those xfail tests. Maybe a sophisticated extension like PostGIS does, but they're an exception. OTOH, usage of ts_vector functions is more likely as seen in #175 and in some test_trigger.py tests that use tsvector_update_trigger. Issues #183 and #184 request display of attribute information that may go beyond the PG system types, since there may be user defined record-types (table row-types).

The proposal is to retrieve very basic information, probably just oid and name, at initialization, so that it can be used in subsequent processing. The precedent for this is Post-Facto, but the difference is that whereas PF was connected live to both source and target databases, yamltodb only has access to the target oids. If we limit it to system types, it is highly unlikely that the oid for tsvector_concat in one database will differ from the other.

The text was updated successfully, but these errors were encountered:

jmafc · 2022-09-24T00:27:50Z

@greggailly Regarding our discussion under #251.
If I were to attempt to implement this, I'd first grep through the files in pyrseas/dbobject for 'pg_catalog'. Most of the results will turn to be like this:

WHERE (nspname != 'pg_catalog' AND ...

What needs to be done as a first step is to remove that first part of that WHERE clause. That will result in retrieving everything in the pg_catalog schema. Since you have a specific use case in the aggregate described in #251, you could just change the query starting at about line 99 of dbobject/dbtype.py. BTW, I've tried this and it doesn't fetch the row for 'anyarray'. The problem appears to be that the query includes AND typtype = 'b' but 'anyarray' is a pseudo-type, so that would have to be changed to AND typtype IN ('b', 'p').

Once this change is made, I would test it against an empty database to see what dbtoyaml produces as output. It may break elsewhere because I don't think we've catered for pseudo-types. 'anyarray' may also be excluded because aside from the SQL queries, there may be other places (in Python) where we exclude the pg_catalog objects. Once those possible issues are resolved or worked around, then I would test it against your aggregate.

Hope this helps.

benjamin-kirkbride · 2023-03-23T18:38:18Z

@jmafc do we have any idea how big of a lift doing this would be?

jmafc · 2023-03-23T19:54:44Z

"this"? If by "this" you mean what Daniele suggested (which you have essentially done) and fixing loose ends such as the specific problem of "misplaced" functions, I suspect it may not be too difficult to come up with some fix or general workaround. But (there's always one of those), there could be something more ragged. However, the problem of #165 may be easier to solve now than when the issue was first opened, because in the interim we have done away with treating schema public as a special case [although looking at CatDbConnection.connect, I don't recall why we do what we do there--it seems like we ought to not set search_path at all, and then presumably PG would have to return all qualified names].

benjamin-kirkbride · 2023-03-23T20:12:34Z

@jmafc I'm asking specifically about having a cache of all Postgres system objects, sounded like there were a lot of gotchas there, and this issue has been open a long time. Sorry if I'm being dense and not understanding

jmafc · 2023-03-23T20:51:22Z

No, I'm the one who needs to apologize. I thought your comment was under issue #244 so I thought "this" meant fixing that issue.

This, i.e., the cache, is probably like lifting 100 kg or more. Looking at one of my medium sized databases where \d shows about 50 rows, pg_class has a little over 100 rows in the public and pg_toast schemas, which I believe is primarily because of indexes and varchar/text columns, respectively, but pg_catalog has over 250 rows. In pg_attribute, there are about 600 rows each in public and pg_toast for an average of six attributes per relation, but there are nearly 1900 rows in pg_catalog. And reading in pg_catalog rows isn't probably where the big lift is (although it will affect performance), but then having to decide (a) in dbtoyaml whether to include a particular object in the output and (b) in yamltodb how to reconcile these normally "hidden" objects.

benjamin-kirkbride · 2023-03-24T03:30:40Z

Is this something you have a clear vision for, or is the above description about as far into planning/design as it's gotten?

jmafc · 2023-03-24T03:51:26Z

I may have had a "vision" five years ago, but it seems rather blurry right now, not only due to the passage of time, but also to other changes and the fact I haven't really been thinking a lot about Pyrseas in the meantime. As far as I can recall, the idea was to read in all the catalogs and build a cache, either in memory or YAML file, so that, for example, when we found a pg_catalog.ts_vector in a user object, we could link to it and output it or use it as a dependency as we do other with other objects.

At this point, in fact, if I were to start trying to implement something I'd start by building a test case for #175 and use TDD to implement the minimal stuff necessary to get the test to pass. Then based on what I may have learned from that exercise, I would try to generalize.

jmafc added yamltodb dbtoyaml labels Mar 2, 2018

jmafc added this to the Release 0.9 milestone Mar 8, 2018

jmafc mentioned this issue May 17, 2018

Conditionally inject dependencies of functions #189

Open

jmafc mentioned this issue Sep 14, 2018

problem dumping aggregate #175

Open

jmafc mentioned this issue Nov 5, 2019

Casts depending on extensions are not excluded #209

Open

jmafc mentioned this issue Apr 14, 2020

Extension unaccent: duplication #218

Open

jmafc mentioned this issue Jun 28, 2021

CREATE EXTENSION statements in the change script #232

Open

jmafc mentioned this issue Sep 23, 2022

Key error when converting aggregate to YAML #251

Open

jmafc mentioned this issue Mar 23, 2023

Error Using dbtoyaml With Supabase Schema #244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build a cache of Postgres system objects #185

Build a cache of Postgres system objects #185

jmafc commented Mar 2, 2018

jmafc commented Sep 24, 2022

benjamin-kirkbride commented Mar 23, 2023

jmafc commented Mar 23, 2023

benjamin-kirkbride commented Mar 23, 2023

jmafc commented Mar 23, 2023

benjamin-kirkbride commented Mar 24, 2023

jmafc commented Mar 24, 2023

Build a cache of Postgres system objects #185

Build a cache of Postgres system objects #185

Comments

jmafc commented Mar 2, 2018

jmafc commented Sep 24, 2022

benjamin-kirkbride commented Mar 23, 2023

jmafc commented Mar 23, 2023

benjamin-kirkbride commented Mar 23, 2023

jmafc commented Mar 23, 2023

benjamin-kirkbride commented Mar 24, 2023

jmafc commented Mar 24, 2023