Skip to content

Latest commit

 

History

History
52 lines (40 loc) · 2.43 KB

C10k.md

File metadata and controls

52 lines (40 loc) · 2.43 KB

The C10k problem

The C10k problem is the problem of optimising network sockets to handle a large number of clients at the same time.

Further Reading on C10k

As part of being a high-performance, highly-scalable server, Couchbase Server must be capable of handling a large number of connections. This especially applies to the KV Engine where each node will typically be handling thousands, if not tens of thousands, of connections.

Naive: Thread per connection

Probably the simplest approach to the problem is to create a new thread for each new connection received. Most of the handling in this case is automatic, if any thread becomes blocked waiting for I/O then the operating system can switch to another thread.

The downside of this approach is whilst it scales reasonably up to several hundred connections, the performance degrades severely once you reach thousands of connections due to the high amount of context-switching between threads. As with any kernel operation, context-switching is pure overhead. You can soon reach a point where you spend more time context-switching than servicing requests.

Original Approach: Single-threaded event-loop

An alternative approach originally used by Memcached is to service all the connections using one thread. This is typically achieved using an event loop and [cooperative multitasking] (https://en.wikipedia.org/wiki/Cooperative_multitasking). At any time where servicing a particular connection must wait for some blocking operation (e.g. a disk-read), the thread will switch (or 'yield') to a different connection that is not currently blocked.

The downside of this approach is that it under-utilises system resources. Although this approach achieves concurrency, due to having only one real thread of execution it cannot achieve any parallelism and will leave additional CPUs unused. This is particularly bad on server-grade hardware where you can have upwards of 24 CPUs.

Current Approach: Why not both

Memcached takes the best of both worlds and creates a pool of worker threads (Usually one each for about 3/4 of the CPUs in order to leave some CPUs for other tasks), each of which service roughly an equal number of requests using their own event Loop. A dispatch thread is responsible for delegating new in-bound connections to each of these threads.

This approach achieves high parallelism while avoiding the high context-switching overhead of the first approach.