Connection closed when try to use om.microsoft.sqlserver.jdbc.spark connector #247

Rozenceto · 2023-12-08T17:39:43Z

Hi, I try to write a row from Azure Databricks (11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12) ) to Azure SQL DB using com.microsoft.azure:spark-mssql-connector_2.12:1.3.0-BETA library but unfortunately receive the following error:

23/12/08 17:29:56 ERROR Executor: Exception in task 0.3 in stage 198.0 (TID ...) com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.

Below is the part from the code where I invoke Dataframe writing. The service principle has a db_owner database role.
if "access_token" in token:
access_token = token["access_token"]
result = df.write
.format("com.microsoft.sqlserver.jdbc.spark")
.option("url", the_sql_url)
.option("dbtable", f"staging.test_table")
.option("accessToken", access_token)
.mode("overwrite")
.save()
else:
raise ValueError("Failed to obtain an access token.")

At the same time, I do not have such a problem when I use the JDBC connector.
Would anyone be able to help me or at the current moment I can only use JDBC connector?

Thank you,
Roza

StevenM11 · 2023-12-14T17:08:00Z

I'm getting the same error using:
13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)
spark_mssql_connector_2_12_1_4_0_BETA.jar

I can see the table itself is created before the error with columns in it

Standard JDBC is working but processed only 2M rows in 2 hours

RozalinaZaharieva · 2023-12-14T18:13:41Z

I found the problem on my side, so please check it on yours.
The problem was between differences in data types in Databricks and SQL server and more precisely between TIMESTAMP columns in the table in Databricks (source table) and DATETIME data type for the column in SQL Server (target table). In the SQL Server, I changed the datatype to DATETIM2 and CAST() the column in Databricks ETL before writing the data to the SQL server and everything started working.
Unfortunately, the message is quite unuseful. I hope MS/Databricks to consider this and return more useful error messages in these cases.

Best Regards,
Roza

StevenM11 · 2023-12-15T16:13:56Z

Interesting, I am going to play around at bit with the datatypes. The weird thing is that I was creating a new table without specifying any datatypes. Probably should because with the standard JDBC all strings become nvarchar(max)

Just creating a new table using this:

df.write
.format("com.microsoft.sqlserver.jdbc.spark")
.mode("overwrite")
.option("url", url)
.option("dbtable", table_name)
.option("user", username)
.option("password", password)
.save()

franko14 · 2024-02-22T18:38:53Z

For anyone interested, we faced similar issue recently and the solution is to recast the datatypes on PySpark DataFrame. It looks like that for some reason, the write to SQL works more reliable with strong casting. We added this step before every write in ETL pipelines.

for column in df.schema:
    df = df.withColumn(column.name, col(column.name).cast(column.dataType))

If you use .mode(“overwrite”), you will probably not see any issue because this will drop and recreate the table in SQL with datatypes inferred from the DataFrame. However, if you want to use the table with predefined datatypes without actually dropping the table before write (with .mode(“overwrite”) alongside with .option(“truncate”, True)), I strongly suggest recasting with the code snippet above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection closed when try to use om.microsoft.sqlserver.jdbc.spark connector #247

Connection closed when try to use om.microsoft.sqlserver.jdbc.spark connector #247

Rozenceto commented Dec 8, 2023

StevenM11 commented Dec 14, 2023 •

edited

Loading

RozalinaZaharieva commented Dec 14, 2023 •

edited

Loading

StevenM11 commented Dec 15, 2023

franko14 commented Feb 22, 2024 •

edited

Loading

Connection closed when try to use om.microsoft.sqlserver.jdbc.spark connector #247

Connection closed when try to use om.microsoft.sqlserver.jdbc.spark connector #247

Comments

Rozenceto commented Dec 8, 2023

StevenM11 commented Dec 14, 2023 • edited Loading

RozalinaZaharieva commented Dec 14, 2023 • edited Loading

StevenM11 commented Dec 15, 2023

franko14 commented Feb 22, 2024 • edited Loading

StevenM11 commented Dec 14, 2023 •

edited

Loading

RozalinaZaharieva commented Dec 14, 2023 •

edited

Loading

franko14 commented Feb 22, 2024 •

edited

Loading