Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhead of creating new threads in Streaming Response #2201

Open
dkalinowski opened this issue Nov 4, 2024 · 7 comments
Open

Overhead of creating new threads in Streaming Response #2201

dkalinowski opened this issue Nov 4, 2024 · 7 comments

Comments

@dkalinowski
Copy link

dkalinowski commented Nov 4, 2024

Is your feature request related to a problem? Please describe.
There is an overhead of creating new threads when using streaming response feature.
This drogon example demonstrates it very well:

std::thread([stream =
std::shared_ptr<drogon::ResponseStream>{
std::move(stream)}]() mutable {
std::cout << std::boolalpha << stream->send("hello ")
<< std::endl;
std::this_thread::sleep_for(std::chrono::seconds(2));
std::cout << std::boolalpha << stream->send("world");
std::this_thread::sleep_for(std::chrono::seconds(2));
stream->close();
}).detach();

We have uncontrolled number of new threads being created which is a bottleneck for high-throughput workloads.
Another issue is that the thread is detached and never joined.

Describe the solution you'd like
Other http frameworks use internal event loop to schedule partial send operation, example:
https://github.com/tensorflow/serving/blob/8eec641dee627243e13f8d04d8980f326c3e371d/tensorflow_serving/util/net_http/server/internal/evhttp_request.cc#L359-L360

Describe alternatives you've considered
No alternative

Additional context
High-throughput scenarios, for example serving LLM text generation with streaming output approach

@hwc0919
Copy link
Member

hwc0919 commented Nov 12, 2024

It's just an example to illustate api usage. You could use a thread pool. There is also async stream api now.

@dkalinowski
Copy link
Author

@hwc0919 What do you mean with async stream API, can you elaborate?
About the own thread pool - I don't think it is good idea to spin up separate thread pool for streaming responses, since there already is one inside drogon, designed for regular responses

@hwc0919
Copy link
Member

hwc0919 commented Nov 21, 2024

drogon's internal pool is for non-blocking fast io operations, not for cpu intensive workload or blocking operations. You will block incoming connections if you block those threads.

@dkalinowski
Copy link
Author

@hwc0919 do you mean I should not run cpu intensive workloads in this lambda?
https://github.com/drogonframework/drogon/blob/master/examples/helloworld/main.cc#L20-L26

This is what I meant

@hwc0919
Copy link
Member

hwc0919 commented Nov 21, 2024

do you mean I should not run cpu intensive workloads in this lambda?

Yes.

If you have lots of cpu intensive workloads that may block the threads, you should run them in another thread pool.

For example:

    app().registerHandler(
        "/",
        [](const HttpRequestPtr &request,
           std::function<void(const HttpResponsePtr &)> &&callback) {
            LOG_INFO << "connected:"
                     << (request->connected() ? "true" : "false");

            trantor::EventLoopThreadPool * pool = ...;  // get the pool from where it is allocated

            pool->getNextLoop()->queueInLoop([req, callback=std::move(callback)](){
                sleep(10); // cpu intensive jobs or blocking operations
                auto resp = HttpResponse::newHttpResponse();
                resp->setBody("Hello, World!");
                callback(resp);
            });
        },
        {Get});

@dkalinowski
Copy link
Author

dkalinowski commented Nov 21, 2024

drogon's internal pool is for non-blocking fast io operations, not for cpu intensive workload or blocking operations. You will block incoming connections if you block those threads.

This is exactly what I want - number of drogon threads limits concurrently processed requests in my application. After some thread is freed I let another connection in. Is it bad use in regard to drogon architecture? Or is there another reason why I should not use drogon connection thread pool?

@hwc0919
Copy link
Member

hwc0919 commented Nov 22, 2024

It's bad from all aspects.

You should tell the client the server is full, not just let it suspend. What's worse, you are doing it by suspending the whole server.

Why not just keep count of the running jobs and reject excessive requests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants