Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1337012: ExecuteNonQuery deadlocks when called from a ThreadPool thread #923

Open
DVN237294 opened this issue Apr 19, 2024 · 4 comments
Assignees
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team

Comments

@DVN237294
Copy link

Issue description

We are currently using the PUT file:// API to upload files to an internal stage in Snowflake. Due to current limitations, PUTs must be done synchronously. We have however seen that calls to ExecuteNonQuery sometimes deadlocks our application. After some investigation I believe that these deadlocks stems from use of "sync over async" constructs in the RestRequester implementation, here:

public T Post<T>(IRestRequest request)
{
//Run synchronous in a new thread-pool task.
return Task.Run(async () => await (PostAsync<T>(request, CancellationToken.None)).ConfigureAwait(false)).Result;
}

Post here will execute the PostAsync method on a ThreadPool thread and block the currently executing thread. However, when the currently executing thread is already a ThreadPool thread there is a risk that the work will be scheduled on the same thread, causing a deadlock.

For completeness, here is the code and query I am running:

using var connection = new SnowflakeDbConnection(_snowflakeConnectionString);
using var command = connection.CreateCommand();

connection.Open();
command.CommandTimeout = 0;
command.CommandText = string.Format(_uploadFileToStageScript.Contents, filePath, overwrite);
command.ExecuteNonQuery();

_uploadFileToStageScript.Contents contains PUT file://{{0}} @{Stage} AUTO_COMPRESS=false OVERWRITE={{1}};

Suggested solution

The proper solution would be to avoid any "sync over async" constructs and use entirely synchronous methods in the synchronous APIs.
Secondarily, async support for PUT commands would be nice.

@DVN237294 DVN237294 added the bug label Apr 19, 2024
@github-actions github-actions bot changed the title ExecuteNonQuery deadlocks when called from a ThreadPool thread SNOW-1337012: ExecuteNonQuery deadlocks when called from a ThreadPool thread Apr 19, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Apr 22, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage Issue is under initial triage label Apr 22, 2024
@sfc-gh-dszmolka
Copy link
Contributor

hi - thank you for reporting this and providing all the details! i'll try to set up a repro and see

@sfc-gh-dszmolka
Copy link
Contributor

this is how i try to reproduce the issue, and so far, did not see it. Of course not sure about its frequency either (or if I'm headed in the right direction at all.)

Repro attempt: run a big long SELECT query + PUT in the same thread and hope there's a deadlock. PUT has a ~22MB zipped CSV and executes in a couple seconds.

Tried the below too, with following modifications:

  • instead of one big SELECT + PUT, use two PUTs
  • use two ThreadPool.QueueUserWorkItem to execute tasks
using System;
using System.Data;
using Snowflake.Data.Client;
using System.Threading;

class Program
{
    // Connection parameters
    private const string Account = "myorg-myaccount";
    private const string Username = "myuser";
    private const string Password = "mypassword";
    private const string Warehouse = "mywh";

    static void Main(string[] args)
    {
        // Initialize Snowflake connection
        string connectionString = $"account={Account};user={Username};password={Password};warehouse={Warehouse};";
        using (var connection = new SnowflakeDbConnection())
        {
            connection.ConnectionString = connectionString;
            connection.Open();

            
            // Execute both queries in the same ThreadPool thread
            ThreadPool.QueueUserWorkItem(state =>
            {
                PutQuery1(connection);
                //PutQuery2(connection);
                SelectQuery(connection);

            });
            

            /*
            // Execute two queries in separate ThreadPool threads
            ThreadPool.QueueUserWorkItem(SelectQuery, connection);
            ThreadPool.QueueUserWorkItem(PutQuery, connection);
            */

            Thread.Sleep(5000); // 

            Console.WriteLine("Press any key to exit...");
            Console.ReadKey();
        }
    }

    static void SelectQuery(object state)
    {
        SnowflakeDbConnection connection = (SnowflakeDbConnection)state;
        using (var command = connection.CreateCommand())
        {
            // quite big query, takes some time and returns 15.000.000 rows
            command.CommandText = "SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.CUSTOMER;";
            using (var reader = command.ExecuteReader())
            {
                int count = 0;
                while (reader.Read()) { count++; };
                Console.WriteLine($"Thread {Thread.CurrentThread.ManagedThreadId}: Select returned {count} rows.");
            }
        }
    }

    static void PutQuery1(object state)
    {
        SnowflakeDbConnection connection = (SnowflakeDbConnection)state;
        using (var command = connection.CreateCommand())
        {
            command.CommandText = "PUT file://C:\\temp\\building-consents-issued-february-2024.zip @test_db.public.MYSTAGE1 auto_compress=false overwrite=true;";
            command.ExecuteNonQuery();
            Console.WriteLine($"Thread {Thread.CurrentThread.ManagedThreadId}: PUT query executed successfully.");
        }
    }

    static void PutQuery2(object state)
    {
        SnowflakeDbConnection connection = (SnowflakeDbConnection)state;
        using (var command = connection.CreateCommand())
        {
            command.CommandText = "PUT file://C:\\temp\\building-consents-issued-february-2024-2.zip @test_db.public.MYSTAGE1 auto_compress=false overwrite=true;";
            command.ExecuteNonQuery();
            Console.WriteLine($"Thread {Thread.CurrentThread.ManagedThreadId}: PUT query 2 executed successfully.");
        }
    }
}

Can you advise is this something representing what you're trying to do and see the issue with ? Or if there's a sample minimal viable repro program, which when run, exhibits the issue you're seeing - that would be fantastic and speed up the debugging a lot. Thank you in advance !

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-information_needed Additional information is required from the reporter label Apr 22, 2024
@DVN237294
Copy link
Author

Hey, thanks for taking the time to look into this.

I have been trying to debug this for a while, and I'm now at the point where I'm not entirely convinced the lockup is due to this "sync over async" construct. I have a dotnet-dump memory dump from the application, but my knowledge of TPL and/or default ThreadPool implementation is too limited to narrow the cause of the lockup to prove that this Task.Run(...).Result construct is at the cause of it. All I know for certain is that the threadpool thread is waiting "indefinitely" (or at least 16+ hours) on the UnwrapPromise returned by Task.Run.

I have been trying to get to the bottom of the question "can Task.Run ever deadlock?". Some say yes. Others (Herman Schoenfeld and stephentoub) says no.
The only logical argument I can see why it would deadlock is if any continuation is put in the ThreadPool thread's local work queue (instead of the global work queue). I just feel like if that was the cause then it should deadlock much faster.
So far, I've only seen one lockup over millions of PUT queries in the previous months.
However If that was the cause of the deadlock, then I suppose starting the task with TaskCreationOptions.PreferFairness should alleviate any deadlock.

For now I have decided to ignore this issue until it may appear again, but from my point of view the proposed solutions in this issue still seem relevant: It is generally considered bad practice to do these kinds of "sync over async" workarounds, so it would make sense for this library to work towards avoiding it. If nothing else then just for the increased performance of not having to block threadpool threads.

@sfc-gh-dszmolka
Copy link
Contributor

hi - appreciate all the additional details and digging deeper ! I agree we could look into avoiding the "sync over async" approach in general, and thus this is now an enhancement request for the team to consider (alongside with the 'async support for PUT) and if deemed feasible, then pick up and prioritize.

@sfc-gh-dszmolka sfc-gh-dszmolka added enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team and removed bug status-triage Issue is under initial triage status-information_needed Additional information is required from the reporter labels Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

3 participants