Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash in oracle_fdw 2.1.0 #279

Closed
Jaime2ndQuadrant opened this issue Oct 15, 2018 · 25 comments
Closed

crash in oracle_fdw 2.1.0 #279

Jaime2ndQuadrant opened this issue Oct 15, 2018 · 25 comments
Labels

Comments

@Jaime2ndQuadrant
Copy link

I have been running sqlsmith on postgres with oracle_fdw (2.1.0) and found a crash. I reduced the example as much as i could and i was able to reproduce it from 9.6 upto master (didn't try with other versions).
To reproduce it:

  1. Install Oracle XE and unlock schema HR
  2. compile and install postgres and oracle_fdw
  3. create extension oracle_fdw;
  4. CREATE SERVER oradb FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver '');
  5. GRANT USAGE ON FOREIGN SERVER oradb TO postgres;
  6. CREATE USER MAPPING FOR postgres SERVER oradb OPTIONS (user 'HR', password 'passwd');
  7. IMPORT FOREIGN SCHEMA "HR" FROM SERVER oradb INTO public;
  8. execute query in query_crash.txt (attached)

I also attached the stack trace of the crash. Let me know if i can help debugging this in any other way.

gdb.txt

query_crash.txt

@laurenz laurenz added the bug label Oct 15, 2018
@laurenz
Copy link
Owner

laurenz commented Oct 15, 2018

Thanks for the test and the report!

Since I don't have the HR schema around, would you mind sending me the CREATE TABLE statements?

@Jaime2ndQuadrant
Copy link
Author

i'm not very knowledgable about Oracle so i'm not sure how i get that, what i did was to export the whole schema with /u01/app/oracle/product/11.2.0/xe/bin/exp
It's a small schema (it comes pre-installed with express edition)
Just unzip it and restore it with imp utility.
User HR has password 'oraclexe'

let me know if this is what you need or how to get a sql script with CREATE objects and rows of those objects

expdat.dmp.zip

@laurenz
Copy link
Owner

laurenz commented Oct 16, 2018

I have been trying to reduce the query as much as possible and have been able to come up with a similar query using our regression test table that also crashes:

SELECT 1
FROM typetest1 a
   LEFT JOIN typetest1 b
   ON (a.id IS NOT NULL)
WHERE (a.c = a.vc)
    = (b.id IS NOT NULL);

The error happens in a.c = a.vc, but so far I have not been able to figure out what is going on.

@yamatattsu, can you see if you can figure out something?
The query does not crash if I replace the outer join with an inner join.

@yamatattsu
Copy link
Collaborator

Hi Laurenz!

@yamatattsu, can you see if you can figure out something?
The query does not crash if I replace the outer join with an inner join.

$ psql
psql (10.4)
Type "help" for help.

# explain (analyze, verbose) SELECT 1
FROM typetest1 a
   LEFT JOIN typetest1 b
   ON (a.id IS NOT NULL)
WHERE (a.c = a.vc)
    = (b.id IS NOT NULL);

                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------------------- 
Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=4) (actual time=0.778..0.778 rows=0 loops=1)
   Output: 1
   Filter: ((a.c = (a.vc)::bpchar) = (b.id IS NOT NULL))
   Rows Removed by Filter: 9
   Oracle query: SELECT /*a573aba54b7dccf91e19e6dff41baef2*/ '1' FROM ("TYPETEST1" r1 LEFT JOIN "TYPETEST1" r2 ON (r1."ID" IS NOT NULL))
   Oracle plan: SELECT STATEMENT
   Oracle plan:   NESTED LOOPS OUTER
   Oracle plan:     INDEX FULL SCAN TYPETEST1_PKEY
   Oracle plan:     VIEW
   Oracle plan:       INDEX FULL SCAN TYPETEST1_PKEY
 Planning time: 2.182 ms
 Execution time: 0.827 ms
(12 rows)

I tried the query using PG10.4 and oracle_fdw 2.1.0 but I couldn't get the error.
What PG version did you use?
I'll investigate the bug after PGConf.Eu. Is it okay?

Regards,
Tatsuro Yamada

@Jaime2ndQuadrant
Copy link
Author

Jaime2ndQuadrant commented Oct 17, 2018

Hi @yamatattsu,

I noted that to replicate the failure using typetest1 table you need to add at least one row to it, the one i added was:
insert into typetest1 values (1, 'no', 'si', '0', '1', null, null, null, null, null, null, null, null, null, null, null, null, null);

Also i reproduced the original problem on all versions starting in 9.6 (latest minor version in each case) upto master. For example, in v10 i used 10.5.

I'll investigate the bug after PGConf.Eu. Is it okay?

Only if you send us a video of your talk ;) enjoy pgconf.eu

@yamatattsu
Copy link
Collaborator

Hi, Jaime and Laurenz,

insert into typetest1 values (1, 'no', 'si', '0', '1', null, null, null, null, null, null, null, null, null, null, null, null, null);

Thank you. I reproduced it using PG10.5 but I couldn't reproduce it on PG10.4.
It means FDW APIs or helper functions were changed, possibly?!

Only if you send us a video of your talk ;) enjoy pgconf.eu

Oh, I see.
I'm not sure whether taking video or not. However, I'll share at least my presentation material to you.
Thanks! :)

Regards,
Tatsuro Yamada

@laurenz
Copy link
Owner

laurenz commented Oct 17, 2018

Thanks for the new information.

I'll see if I can make any progress with that.

@laurenz
Copy link
Owner

laurenz commented Oct 17, 2018

I just built 10.4 and 10.5 from source, and the query crashed both of them.

@laurenz
Copy link
Owner

laurenz commented Oct 17, 2018

Ah, I have a lead:

                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=4)
   Filter: ((c = (vc)::bpchar) = (id IS NULL))
   Oracle query: SELECT /*a573aba54b7dccf91e19e6dff41baef2*/ '1' FROM ("TYPETEST1" r1 LEFT JOIN "TYPETEST1" r2 ON (r1."ID" IS NOT NULL))
(3 rows)

The problem is clearly that the Oracle query has SELECT '1' rather than SELECT r2."ID", r1."C", r1."VC". The columns would have to be fetched for the filter to work.

@Jaime2ndQuadrant
Copy link
Author

Jaime2ndQuadrant commented Oct 17, 2018

The problem is clearly that the Oracle query has SELECT '1' rather than SELECT r2."ID", r1."C", r1."VC". The columns would have to be fetched for the filter to work.

right, you need at least a.c and a.vc for the query to work:

SELECT a.c, a.vc
FROM typetest1 a
   LEFT JOIN typetest1 b
   ON (a.id IS NOT NULL)
WHERE (a.c = a.vc)
    = (b.id IS NOT NULL);

btw, in a maybe non-related failure i got an ERROR when trying to explain analyze that last query:

postgres=# explain (analyze, verbose) SELECT a.c, a.vc      
FROM typetest1 a
   LEFT JOIN typetest1 b
   ON (a.id IS NOT NULL)
WHERE (a.c = a.vc)
    = (b.id IS NOT NULL);
ERROR:  no SELECT privilege on V$SQL in the remote database
DETAIL:  ORA-00942: table or view does not exist

@laurenz
Copy link
Owner

laurenz commented Oct 18, 2018

Jaime, that last error message is covered by the README:

For EXPLAIN VERBOSE the user will also need SELECT privileges on V$SQL and
V$SQL_PLAN.

@laurenz
Copy link
Owner

laurenz commented Oct 18, 2018

The problem is in oracleGetForeignPlan and build_join_oratable.

In oracleGetForeignPlan, we build fdw_scan_tlist, which seems to contain exactly the three TargetEntrys we need.

But rather than passing fdw_scan_tlist to build_join_oratable, we pass foreignrel->reltarget which does not contain them.

@yamatattsu, this is a bit above my head. I'll let you have a go at it (whenever you find the time).

Maybe I broke something in 1e53ada...

@Jaime2ndQuadrant
Copy link
Author

is there any advance on this? i can try to do something but this is not at my level i will be like a monkey trying to write a coherent phrase on a keyboard ;)

@yamatattsu
Copy link
Collaborator

Hi All,

Happy new year! :)

I restart to investigate the problem. I've not checked oracleGetForeignPlan and build_join_oratable yet.
However, I found other errors (?) regarded to Where clauses like this:

[First time]

# SELECT a.c, a.vc
FROM typetest1 a
... 
 c | vc
---+----
(0 rows)

[Second time]

[local] postgres@postgres:9910=# SELECT a.c, a.vc
FROM typetest1 a
...
     c      | vc
------------+----
 no         | 0
(1 row)

[Execution plan of the first time]

 Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=82) (actual time=0.804..0.806 rows=0 loops=1)
   Output: a.c, a.vc
   Filter: ((a.c = (a.vc)::bpchar) = (b.id IS NOT NULL))
   Rows Removed by Filter: 1
   Oracle query: SELECT /*11dc6f4286462b19368acfc605e397d1*/ r1."C", r1."VC" FROM ("TYPETEST1" r1 LEFT JOIN "TYPETEST1" r2 ON (r1."ID" IS NOT NULL))
...

[Execution plan of the second time]

 Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=82) (actual time=0.719..0.737 rows=1 loops=1)
   Output: a.c, a.vc
   Filter: ((a.c = (a.vc)::bpchar) = (b.id IS NOT NULL))
   Oracle query: SELECT /*11dc6f4286462b19368acfc605e397d1*/ r1."C", r1."VC" FROM ("TYPETEST1" r1 LEFT JOIN "TYPETEST1" r2 ON (r1."ID" IS NOT NULL))
   ...

So, I guess following code is wrong in createQuery() because fdwState->where_clause didn't have all fillter conditions based on Outerrel and Innerrel.
In other words, if there are some fillter conditions, it should be add them to remote query.

createQuery()
/*
 * For base relations and OUTER joins, add a WHERE clause if there is one.
 *
 * For an INNER join, all conditions that are pushed down get added
 * to fdwState->joinclauses and have already been added above,
 * so there is no extra WHERE clause.
 */
if (fdwState->where_clause)
    appendStringInfo(&query, "%s", fdwState->where_clause);

@laurenz, What do you think about that?

BTW,
to @Jaime2ndQuadrant,

I'll share at least my presentation material to you.

I uploaded it on PGConf.Eu web site last year. Sorry for the delayed response.

@yamatattsu
Copy link
Collaborator

Hi All,

Oops, above investigation is wrong, probably.
Because I forgot to create fdwState->where_clause for OUTER join in foreign_join_ok().

I'll check oracleGetForeignPlan and build_join_oratable tomorrow.

Thanks,

@yamatattsu
Copy link
Collaborator

I wonder why there is no error after Explain command execution.
Following [A] is crashed but [B] is not crashed. Hmm...

[A]

$ psql
# SELECT 1
...
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
@:!> \q

[B]

$ psql
# explain SELECT 1
...
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=4)
   Filter: ((c = (vc)::bpchar) = (id IS NOT NULL))
   Oracle query: SELECT /*a573aba54b7dccf91e19e6dff41baef2*/ '1' FROM ("TYPETEST1" r1 LEFT JOIN "TYPETEST1" r2 ON (r1."ID" IS NOT NULL))
(3 rows)

# SELECT 1
...
 ?column?
----------
(0 rows)

@yamatattsu
Copy link
Collaborator

Hi,

I checked oraTable and fdw_scan_tlist by using debugger. I share my results to you. :)

The results are:

  • oraTable was not made from foreignrel->reltarget because reltarget has no exprs.
    See below [1] and [3].
  • fdw_scan_tlist was maybe Okay because there were three elements in the list such as
    (varno, varattno) = (1,3), (1,5) and (2,1). See below [2].

Therefore,
We have to check why exprs in foreignrel->reltarget are empty when the query executed.

[Query]

SELECT 1
FROM typetest1 a
   LEFT JOIN typetest1 b
   ON (a.id IS NOT NULL)
WHERE (a.c = a.vc)
    = (b.id IS NOT NULL);

[Breakpoints]

b oracleGetForeignPlan
b oracle_fdw.c:1189
b oracle_fdw.c:3293

[Breakpoint 1]

Breakpoint 1, oracleGetForeignPlan (root=0x1e98860, foreignrel=0x1eb1978, foreigntableid=0, best_path=0x1f8ddb0, tlist=0x1f8f5e8, scan_clauses=0x0,
    outer_plan=0x0) at oracle_fdw.c:1099
1099            struct OracleFdwState *fdwState = (struct OracleFdwState *)foreignrel->fdw_private;
(gdb) p *foreignrel->reltarget
$1 = {type = T_PathTarget, exprs = 0x0, sortgrouprefs = 0x0, cost = {startup = 0, per_tuple = 0}, width = 0}

[Breakpoint 2]

Breakpoint 2, oracleGetForeignPlan (root=0x1e98860, foreignrel=0x1eb1978, foreigntableid=0, best_path=0x1f8ddb0, tlist=0x1f8f5e8, scan_clauses=0x0,
    outer_plan=0x0) at oracle_fdw.c:1189
1189                    if (outer_plan)
(gdb) p *fdw_scan_tlist
$2 = {type = T_List, length = 3, head = 0x1f8f780, tail = 0x1f8f968}

(gdb) p *(Var *)((TargetEntry *)fdw_scan_tlist->head->data->ptr_value)->expr
$3 = {xpr = {type = T_Var}, varno = 1, varattno = 3, vartype = 1042, vartypmod = 14, varcollid = 100, varlevelsup = 0, varnoold = 1, varoattno = 3,
  location = 84}

(gdb) p *(Var *)((TargetEntry *)fdw_scan_tlist->head->next->data->ptr_value)->expr
$4 = {xpr = {type = T_Var}, varno = 1, varattno = 5, vartype = 1043, vartypmod = 14, varcollid = 100, varlevelsup = 0, varnoold = 1, varoattno = 5,
  location = 90}

(gdb) p *(Var *)((TargetEntry *)fdw_scan_tlist->head->next->next->data->ptr_value)->expr
$5 = {xpr = {type = T_Var}, varno = 2, varattno = 1, vartype = 23, vartypmod = -1, varcollid = 0, varlevelsup = 0, varnoold = 2, varoattno = 1,
  location = 103}

[Breakpoint 3]

Breakpoint 3, build_join_oratable (fdwState=0x1eb1648, reltarget=0x1f8db58) at oracle_fdw.c:3293
3293            oraTable->npgcols = oraTable->ncols;
(gdb) p *reltarget
$6 = {type = T_PathTarget, exprs = 0x0, sortgrouprefs = 0x0, cost = {startup = 0, per_tuple = 0}, width = 0}
(gdb) p *oraTable
$7 = {name = 0x1f8f9c8 "", pgname = 0x1f8f9e8 "", ncols = 0, npgcols = 0, cols = 0x1f8fa08}
(gdb) p *oraTable->cols
$8 = (struct oraColumn *) 0x0

yamatattsu added a commit that referenced this issue Jan 9, 2019
build_join_oratable() couldn't create suitable oraTable when foreignrel->reltarget is empty.
Therefore, this fix replaces foreignrel->reltarget with fdw_scan_list as a parameter.

Reported by Jaime2ndQuadrant in #279.
@yamatattsu
Copy link
Collaborator

Hi Laurenz and Jaime2ndQuadrant,

I realized foreignrel->reltarget which is given form PostgreSQL core is empty sometimes.
However, build_join_oratable() expects it is not empty and create oraTable. So, not suitable oraTable causes the SIGSEGV, I suppose.

To fix, I replaced the parameter with fdw_scan_tlist because fdw_scan_tlist contains all colmuns in the query. And It seems that the fix works fine since creating suitable oraTable by using fdw_scan_tlist.
Also regression test oracle_join.sql is fine. :)

I commited the patch, if possible, could you check and try the patch? 839b125

Regards,
Tatsuro Yamada

@Jaime2ndQuadrant
Copy link
Author

Hi @yamatattsu

I still haven't tested your patch, but it makes sense... actually i got the same conclusion just wasn't able to code it...

will test the patch later today, thanks

@laurenz
Copy link
Owner

laurenz commented Jan 10, 2019

Thank you!
That looks exactly what I had in mind.
I didn't realize there was that little to change in build_join_oratable!

Could you add my crasher query from above to the regression tests?

@yamatattsu
Copy link
Collaborator

yamatattsu commented Jan 10, 2019 via email

@yamatattsu
Copy link
Collaborator

Hi Laurenz!

I created following regression test. Is it fine with you?
If it's okay, I'll commit it with expected/oracle_join.out.

diff --git a/sql/oracle_join.sql b/sql/oracle_join.sql
index 4ce1484..d653ed8 100644
--- a/sql/oracle_join.sql
+++ b/sql/oracle_join.sql
@@ -57,6 +57,16 @@ SELECT t1.id, t2.id FROM typetest1  t1 RIGHT JOIN typetest1  t2 ON t1.d = t2.d W
 EXPLAIN (COSTS off)
 SELECT t1.id, t2.id FROM typetest1  t1 FULL  JOIN typetest1  t2 ON t1.d = t2.d WHERE t1.id > 1 ORDER BY t1.id, t2.id;
 SELECT t1.id, t2.id FROM typetest1  t1 FULL  JOIN typetest1  t2 ON t1.d = t2.d WHERE t1.id > 1 ORDER BY t1.id, t2.id;
+---- left outer join with WHERE clause but foreignrel->reltarget is empty (Issues #279)
+CREATE TABLE typetest1_bak AS SELECT * FROM TYPETEST1;
+DELETE FROM typetest1;
+INSERT INTO typetest1 VALUES (1, 'no', 'si', '0', '1', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);
+EXPLAIN (COSTS off)
+SELECT 1 FROM typetest1 t1 LEFT JOIN typetest1 t2 ON (t1.id IS NOT NULL) WHERE (t1.c = t1.vc) = (t2.id IS NOT NULL);
+SELECT 1 FROM typetest1 t1 LEFT JOIN typetest1 t2 ON (t1.id IS NOT NULL) WHERE (t1.c = t1.vc) = (t2.id IS NOT NULL);
+DELETE FROM typetest1;
+INSERT INTO typetest1 select * from typetest1_bak;
+DROP TABLE typetest1_bak;

Thanks,
Tatsuro Yamada

@laurenz
Copy link
Owner

laurenz commented Jan 15, 2019

What about simply using my query from here? Then we don't need an extra foreign table.

@Jaime2ndQuadrant
Copy link
Author

Hi @yamatattsu

I still haven't tested your patch, but it makes sense... actually i got the same conclusion just wasn't able to code it...

will test the patch later today, thanks

at last i could test this (i cherry-picked this into 2.1.0) and it worked just fine, this will be backpatched to older branches with the same bug?

good work @yamatattsu and @laurenz

@laurenz
Copy link
Owner

laurenz commented Jan 17, 2019

Thanks for checking.
I have pushed a regression test and a changelog entry, so we can close this one.
Thanks again to both of you!

oracle_fdw only has a single persistent branch, and the idea is that the master branch is always kept stable.
So you'd just upgrade to current git HEAD if you need the bug fix.
Releases are more or less just points of reference, whenever enough has changed and I can be bothered to build Windows binaries.

@laurenz laurenz closed this as completed Jan 17, 2019
Repository owner locked as resolved and limited conversation to collaborators May 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants