Overhead of creating new threads in Streaming Response #2201

dkalinowski · 2024-11-04T09:13:19Z

Is your feature request related to a problem? Please describe.
There is an overhead of creating new threads when using streaming response feature.
This drogon example demonstrates it very well:

drogon/examples/async_stream/main.cc

Lines 15 to 24 in 8541e67

    
           std::thread([stream = 
        
                            std::shared_ptr<drogon::ResponseStream>{ 
        
                                std::move(stream)}]() mutable { 
        
               std::cout << std::boolalpha << stream->send("hello ") 
        
                         << std::endl; 
        
               std::this_thread::sleep_for(std::chrono::seconds(2)); 
        
               std::cout << std::boolalpha << stream->send("world"); 
        
               std::this_thread::sleep_for(std::chrono::seconds(2)); 
        
               stream->close(); 
        
           }).detach();

We have uncontrolled number of new threads being created which is a bottleneck for high-throughput workloads.
Another issue is that the thread is detached and never joined.

Describe the solution you'd like
Other http frameworks use internal event loop to schedule partial send operation, example:
https://github.com/tensorflow/serving/blob/8eec641dee627243e13f8d04d8980f326c3e371d/tensorflow_serving/util/net_http/server/internal/evhttp_request.cc#L359-L360

Describe alternatives you've considered
No alternative

Additional context
High-throughput scenarios, for example serving LLM text generation with streaming output approach

hwc0919 · 2024-11-12T07:00:33Z

It's just an example to illustate api usage. You could use a thread pool. There is also async stream api now.

dkalinowski · 2024-11-21T13:24:12Z

@hwc0919 What do you mean with async stream API, can you elaborate?
About the own thread pool - I don't think it is good idea to spin up separate thread pool for streaming responses, since there already is one inside drogon, designed for regular responses

hwc0919 · 2024-11-21T13:49:23Z

drogon's internal pool is for non-blocking fast io operations, not for cpu intensive workload or blocking operations. You will block incoming connections if you block those threads.

dkalinowski · 2024-11-21T13:54:08Z

@hwc0919 do you mean I should not run cpu intensive workloads in this lambda?
https://github.com/drogonframework/drogon/blob/master/examples/helloworld/main.cc#L20-L26

This is what I meant

hwc0919 · 2024-11-21T14:02:55Z

do you mean I should not run cpu intensive workloads in this lambda?

Yes.

If you have lots of cpu intensive workloads that may block the threads, you should run them in another thread pool.

For example:

    app().registerHandler(
        "/",
        [](const HttpRequestPtr &request,
           std::function<void(const HttpResponsePtr &)> &&callback) {
            LOG_INFO << "connected:"
                     << (request->connected() ? "true" : "false");

            trantor::EventLoopThreadPool * pool = ...;  // get the pool from where it is allocated

            pool->getNextLoop()->queueInLoop([req, callback=std::move(callback)](){
                sleep(10); // cpu intensive jobs or blocking operations
                auto resp = HttpResponse::newHttpResponse();
                resp->setBody("Hello, World!");
                callback(resp);
            });
        },
        {Get});

dkalinowski · 2024-11-21T16:01:36Z

drogon's internal pool is for non-blocking fast io operations, not for cpu intensive workload or blocking operations. You will block incoming connections if you block those threads.

This is exactly what I want - number of drogon threads limits concurrently processed requests in my application. After some thread is freed I let another connection in. Is it bad use in regard to drogon architecture? Or is there another reason why I should not use drogon connection thread pool?

hwc0919 · 2024-11-22T09:49:48Z

It's bad from all aspects.

You should tell the client the server is full, not just let it suspend. What's worse, you are doing it by suspending the whole server.

Why not just keep count of the running jobs and reject excessive requests?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhead of creating new threads in Streaming Response #2201

Overhead of creating new threads in Streaming Response #2201

dkalinowski commented Nov 4, 2024 •

edited

Loading

hwc0919 commented Nov 12, 2024

dkalinowski commented Nov 21, 2024

hwc0919 commented Nov 21, 2024

dkalinowski commented Nov 21, 2024

hwc0919 commented Nov 21, 2024

dkalinowski commented Nov 21, 2024 •

edited

Loading

hwc0919 commented Nov 22, 2024

Overhead of creating new threads in Streaming Response #2201

Overhead of creating new threads in Streaming Response #2201

Comments

dkalinowski commented Nov 4, 2024 • edited Loading

hwc0919 commented Nov 12, 2024

dkalinowski commented Nov 21, 2024

hwc0919 commented Nov 21, 2024

dkalinowski commented Nov 21, 2024

hwc0919 commented Nov 21, 2024

dkalinowski commented Nov 21, 2024 • edited Loading

hwc0919 commented Nov 22, 2024

dkalinowski commented Nov 4, 2024 •

edited

Loading

dkalinowski commented Nov 21, 2024 •

edited

Loading