Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dblink called multiple times in one query #29

Closed

Conversation

h-serizawa
Copy link

This PR is for issue #28.

If we call dblink multiple times in one query like the query has some sub-queries and each sub-query calls dblink, UDx side process is down. Several ODBC-related variables have been defined as the global variables to be shared with DBLink class and DBLinkFactory class. This is reasonable because Vertica UDx SDK doesn't provide the functionality to share the variables between them and it can prevent retrieving the column definitions in both classes. But, on the other hand, it becomes a cause that we cannot call it multiple times in one query. Each call has to keep its ODBC-related variables until completing the process.

This PR implements the following:

  • ODBCBase class inherited by DBLink and DBLinkFactory class. This class has ODBC-related variables and methods. DBLink class and DBLinkFactory class don't share the ODBC connections and variables. It means each class has its database connection.
  • In DBLink.processPartition method, it checks the data type between the columns returned by the remote query and the return columns of Vertica UDx before binding the columns. The return columns are defined in DBLinkFactory class and the columns returned by the remote query are known in DBLink class. These are defined through the different database connections. This check is implemented to avoid the data type mismatch due to changing the column definitions between 2 database connections.

@CLAassistant
Copy link

CLAassistant commented Nov 22, 2023

CLA assistant check
All committers have signed the CLA.

@h-serizawa
Copy link
Author

[Test case 1 with Oracle Database]

--
-- Table definition on Oracle
--
CREATE TABLE tab_test (
  col1 INTEGER,
  col2 FLOAT,
  col3 NUMERIC,
  col4 CHAR(100),
  col5 VARCHAR2(100),
  col6 CLOB,
  col7 LONG,
  col8 DATE,
  col9 TIMESTAMP,
  col10 RAW(100),
  col11 BLOB
);

--
-- Single call from Vertica
--
SELECT DBLINK(USING PARAMETERS cid='orcl', query='SELECT * FROM tab_test') OVER();

--
-- Three times call from Vertica
--
SELECT l.id, l.description, tab_a.col2 value_a, tab_b.col3 value_b, tab_c.col4 value_c
FROM tab_local l
LEFT JOIN (SELECT DBLINK(USING PARAMETERS cid='orcl', query='SELECT * FROM tab_test') OVER()) tab_a
  ON tab_a.col1 = l.id
LEFT JOIN (SELECT DBLINK(USING PARAMETERS cid='orcl', query='SELECT * FROM tab_test') OVER()) tab_b
  ON tab_b.col1 = l.id
LEFT JOIN (SELECT DBLINK(USING PARAMETERS cid='orcl', query='SELECT * FROM tab_test') OVER()) tab_c
  ON tab_c.col1 = l.id;

@h-serizawa
Copy link
Author

[Test case 2 with MS SQL Server and Oracle Database]

--
-- Table definition on MS SQL Server
--
USE [MSSQLTRIAL]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tab_test](
	[col1] [bit] NULL,
	[col2] [int] NULL,
	[col3] [decimal] NULL,
	[col4] [numeric] NULL,
	[col5] [float] NULL,
	[col6] [date] NULL,
	[col7] [datetime] NULL,
	[col8] [char] NULL,
	[col9] [varchar] NULL,
	[col10] [text] NULL,
	[col11] [varbinary] NULL
)
GO

--
-- Single call from Vertica
--
SELECT DBLINK(USING PARAMETERS cid='mssql', query='SELECT * FROM [MSSQLTRIAL].[dbo].[tab_test]') OVER();

--
-- Four times call (two for MS SQL Server, two for Oracle Database) from Vertica
--
SELECT l.id, l.description, tab_a.col2 value_a, tab_b.col3 value_b, tab_c.col4 value_c, tab_d.col5 value_d
FROM tab_local l
LEFT JOIN (SELECT DBLINK(USING PARAMETERS cid='orcl', query='SELECT * FROM tab_test') OVER()) tab_a
  ON tab_a.col1 = l.id
LEFT JOIN (SELECT DBLINK(USING PARAMETERS cid='orcl', query='SELECT * FROM tab_test') OVER()) tab_b
  ON tab_b.col1 = l.id
LEFT JOIN (SELECT DBLINK(USING PARAMETERS cid='mssql', query='SELECT * FROM [MSSQLTRIAL].[dbo].[tab_test]') OVER()) tab_c
  ON tab_c.col2 = l.id
LEFT JOIN (SELECT DBLINK(USING PARAMETERS cid='mssql', query='SELECT * FROM [MSSQLTRIAL].[dbo].[tab_test]') OVER()) tab_d
  ON tab_d.col2 = l.id;

@roypaulin roypaulin requested a review from mfelici November 28, 2023 17:13
@mfelici
Copy link
Collaborator

mfelici commented Nov 28, 2023

Will have a look to both code an ODBC efficiency as soon as possible. I think I will need 2 or 3 weeks.

@h-serizawa
Copy link
Author

I close this PR since I saw the connection issues during the tests.

@h-serizawa h-serizawa closed this Nov 13, 2024
@h-serizawa h-serizawa deleted the multiple-calls-in-one-query branch November 13, 2024 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants