This file contains descriptions/notes of demo and exercise in the repo.
I am sorry that generated table of contents contains too many uppercase stuff...
- NOTES OF DEMOS AND EXERCISES
- DESCRIPTION
- TABLE OF CONTENTS
- DEMOSTRATIONS
- DEMO 00 - INTRODUCTION TO MULTITHREADING
- DEMO 01 - HELLO
- DEMO 02 - THREAD JOINS
- DEMO 03 - PASSING ARGUMENTS
- DEMO 04 - SLEEP
- DEMO 05 - GETTING THREAD'S ID
- DEMO 06 - LIST OF MULTIPLE THREADS
- DEMO 07 - FORCING A THREAD TO TERMINATE
- DEMO 08 - GETTING RETURNED VALUES FROM THREADS
- DEMO 09 - THREAD DETACHING
- DEMO 10 - THREAD YIELDING
- DEMO 11 - EXECUTOR SERVICES AND THREAD POOLS
- DEMO 12A - RACE CONDITIONS
- DEMO 12B - DATA RACES
- DEMO 12C - RACE CONDITIONS AND DATA RACES
- IMPORTANT NOTES
- DEMO 13 - MUTEXES
- DEMO 14 - SYNCHRONIZED BLOCKS
- DEMO 15 - DEADLOCK
- DEMO 16 - MONITORS
- DEMO 17 - REENTRANT LOCKS (RECURSIVE MUTEXES)
- DEMO 18 - BARRIERS AND LATCHES
- DEMO 19 - READ-WRITE LOCKS
- DEMO 20A - SEMAPHORES
- DEMO 20B - SEMAPHORES
- DEMO 21 - CONDITION VARIABLES
- DEMO 22 - BLOCKING QUEUES
- DEMO 23 - THREAD-LOCAL STORAGE
- DEMO 24 & 25 - THE VOLATILE KEYWORD AND ATOMIC ACCESS
- EXERCISES
Just run the code several times and see results: The results are not the same!!!
This is because threads execute concurrently. The operating system shall care the order of thread execution. Depend on current state, the coressponding result varies.
You learn how to create a thread. That's all.
When I say: "Thread X waits for thread Y to join". It means thread X shall wait for thread Y to complete, then thread X continues its execution.
You learn how to pass arguments to a thread:
- Passing various data types of variables.
- Passing some special types (such as C++ references).
Making a thread sleep for a while.
Note that thread sleep is:
- Useful, when you want to wait for something to be ready.
- Awful, when performance is important, you may waste a lot of resources while thread is asleep.
Each thread has its own identification. This demo helps you learn how to get the thread's id.
Handling a list of multiple threads.
Be careful when you pass arguments to threads due to variable reference mechanism. In C/C++ you can forget this warning because variables are usually passed by values.
For an example:
function doTask(i) {
}
for (int i = 0; i < 3; ++i) {
th = new Thread(doTask(i));
th.start();
}
There is only one variable i
and its reference is passed into thread th
. In the end, all 3 threads will receive i = 3
as the parameter.
How to solve this problem? Just create new variables.
function doTask(i) {
}
for (int i = 0; i < 3; ++i) {
arg = i;
th = new Thread(doTask(arg));
th.start();
}
Forcing a thread to terminate aka. "killing the thread".
Sometimes, we want to force a thread to terminate (for convenient).
However, to be careful, the thread should terminate by itself, not by external factors. Assume that a thread is using resource or locking a mutex, and then it is suddenly killed by external factors, so the harmful results are:
- Resource may not be disposed/freed.
- Mutex is not unlocked, which is strongly possible to leads to the deadlock.
You learn how to return value from a thread, and how to use that value for future tasks.
Please note that if you do not use a synchronization mechanism (e.g. thread join, mutex...) then the result may be incorrect. To be clear, let's see this:
result = 0;
function doTask() {
do something for a while;
result = 9;
}
th = new Thread(doTask);
th.start();
print(result);
We are not sure that result printed is 9
, because at the time print(result)
executes, th
does not complete yet (result = 9
are not executed).
So, we need to wait for th
to complete before printing the result.
result = 0;
function doTask() {
do something for a while;
result = 9;
}
th = new Thread(doTask);
th.start();
th.join(); // wait for th to complete to make sure result is set
print(result);
When a thread is created, one of its attributes defines whether it is joinable or detached. Only threads that are created as joinable can be joined. If a thread is created as detached, it can never be joined.
Yield is an action that occurs in a computer program during multithreading, of forcing a processor to relinquish control of the current running thread, and sending it to the end of the running queue, of the same scheduling priority.
You learn how to use the executor services (and thread pools) and how they works.
From Wikipedia:
A thread pool is a software design pattern for achieving concurrency of execution in a computer program. Often also called a replicated workers or worker-crew model, a thread pool maintains multiple threads waiting for tasks to be allocated for concurrent execution by the supervising program. By maintaining a pool of threads, the model increases performance and avoids latency in execution due to frequent creation and destruction of threads for short-lived tasks.
An executor service (or a thread-pool executor) includes:
- A thread pool, and
- Methods to manage this thread pool:
submit()
: Push tasks to thread poolshutdown()
: Stop/join all threads in pool- ...
A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events.
This program illustrates race condition: Each time you run the program, the results displayed are different.
Data race specifically refers to the non-synchronized conflicting "memory accesses" (or actions, or operations) to the same memory location.
I use a problem statement for illustration: From range [ 1..N ], count numbers of integers which are divisible by 2 or by 3.
For an example of N = 8, the integers that match the requirements are 2, 3, 6 and 8. Hence, the result is four numbers.
Of course, you can easily solve the problem by a single loop. However, we need to go deeper. There is another solution using "marking array".
- Let
a
be array of boolean values. Initialize all elements ina
byfalse
. - For
i
in range1
toN
, ifi
is divisible by2
or by3
, then assigna[i] = true
. - Finally, the result is now counting number of
true
ina
.
With N = 8
, a[2], a[3], a[4], a[6], a[8]
are marked by true
.
About the source code, there are two versions:
-
Version 01 uses traditional single threading to let you get started.
-
Version 02 uses multithreading.
- Thread
markDiv2
will marktrue
for all numbers divisible by 2. - Thread
markDiv3
will marktrue
for all numbers divisible by 3. - The rule of threading tell us that
a[6]
might be accessed by both threads at the same time ==> DATA RACE. However, in the end,a[6] = true
is obvious, so the final result is still correct ==> not race condition.
- Thread
Many people are confused about race conditions and data races.
- There is a case that race condition appears without data race. That is demo version A.
- There is also a case that data race appears without race condition. That is demo version B.
Small note: Looking from a deeper perspective, demo version A still causes data race (that is... output console terminal, hahaha).
Ususally, race condition happens together with data race. A race condition often occurs when two or more threads need to perform operations on the same memory area (data race) but the results of computations depends on the order in which these operations are performed.
Concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed need to be protected in ways that avoid the concurrent access. This protected section is the critical section or critical region.
Multithreading makes things run in parallel/concurrency. Therefore we need techniques that handle the control flow to:
- make sure the app runs correctly, and
- avoid race conditions.
There are several techniques, which are divided into two types: synchronization and non-synchronization.
SYNCHRONIZATION | NON-SYNCHRONIZATION | |
---|---|---|
Description | To block threads until a condition is satisfy | Not to block threads |
Other names | Blocking | Non-blocking, lock-free |
Techniques | - Low-level: Mutex, semaphore, condition variable - High-level: Synchronized block, blocking queue, barrier and latch |
Atomic, thread-local storage |
Pros | - Give you in-depth controls - Cooperate among threads |
- App's performance may be better (compared to synchronization) - Avoid deadlock |
Cons | - Hard to control in complex synchronization - May be dangerous (when deadlock appears) |
Usually too simple |
Please note that we cannot replace sync techniques by non-sync techniques and vice versa. Each technique has its use cases, strengths and weaknesses.
In practical:
- We mostly use synchronized blocks, blocking queues, atomic operations and mutexes.
- We usually combine sync and non-sync techniques.
------------------ End of important notes ------------------
The mutex is the synchronization primitive used to prevent race conditions.
If a resource meets race conditions (i.e. it is access by more than one thread), we create a mutex associating with it:
- Lock the mutex before accessing the resource.
- Release the mutex after accessing the resource.
The code block between (lock-release) mutex action shall be executed by one thread at a time.
Synchronized blocks are the blocks of code which prevent executions of multiple threads, that means only one thread can execute a synchronized block at a time although multiple threads are running this block. A synchronized block can be made by using a mutex:
- Lock the mutex at the begin of the block.
- Unlock the mutex at the end of the block.
In some programming languages, synchronized blocks are supported by default, such as Java and C#.
A simple demo of deadlock: Forgetting to release mutex.
There are 2 workers "foo" and "bar".
They try to access resource A and B (which are protected by mutResourceA and mutResourceB).
Scenario:
foo():
synchronized:
access resource A
synchronized:
access resource B
bar():
synchronized:
access resource B
synchronized:
access resource A
After foo accessing A and bar accessing B, foo and bar might wait other together ==> Deadlock occurs.
Monitor: Concurrent programming meets object-oriented programming.
- When concurrent programming became a big deal, object-oriented programming too.
- People started to think about ways to make concurrent programming more structured.
A monitor is a thread-safe class, object, or module that wraps around a mutex in order to safely allow access to a method or variable by more than one thread.
The reason for using reentrant lock is to avoid a deadlock due to e.g. recursion.
A reentrant lock is a synchronization primitive that may be acquired multiple times by the same thread. Internally, it uses the concepts of "owning thread" and "recursion level" in addition to the locked/unlocked state used by primitive locks.
In the locked state, some thread owns the lock; in the unlocked state, no thread owns it.
In cases where you must wait for a number of tasks to be completed before an overall task can proceed, barrier synchronization can be used.
There are two types of barriers:
- Cyclic barrier: A general, reusable barrier.
- Count down latch: There is no possibility to increase or reset the counter, which makes the latch a single-use barrier.
In many situations, data is read more often than it is modified or written. In these cases, you can allow threads to read concurrently while holding the lock and allow only one thread to hold the lock when data is modified. A multiple-reader single-writer lock (or read-write lock) does this.
A read-write lock is acquired either for reading or writing, and then is released. The thread that acquires the read-write lock must be the one that releases it.
In an exam, each candidate is given a couple of 2 scratch papers. Write a program to illustrate this scenario.
The program will combine 2 scratch papers into one test package, concurrenly.
The problem in version 01 is:
- When "makeOnePaper" produces too fast, there are a lot of pending papers...
This version 02 solves the problem:
- Use a semaphore to restrict "makeOnePaper": Only make papers when a package is finished.
The problem in this version is DEADLOCK, due to a mistake of semaphore synchronization.
A car is manufactured at each stop on a conveyor belt in a car factory.
A car is constructed from thefollowing parts: chassis, tires. Thus there are 2 tasks in manufacturing a car. However, 4 tires cannot be added until the chassis is placed on the belt.
There are:
- 2 production lines (i.e. 2 threads) of making tires.
- 1 production line (i.e. 1 thread) of making chassis.
Write a program to illustrate this scenario.
Condition variables are synchronization primitives that enable threads to wait until a particular condition occurs. Condition variables are user-mode objects that cannot be shared across processes.
A blocking queue is a queue that blocks when you:
- try to dequeue from it and the queue is empty, or...
- try to enqueue items to it and the queue is already full.
"Synchronous queue" is usually a synonym of "blocking queue". In Java, a synchronous queue has zero capacity (i.e. it does not store any value at all).
In some cases, shared resources could be used individually for each thread. Every thread has its own copy of the shared resources. Therefore, race condition disappears.
BEFORE:
X = 8 (X is shared resource)
|
---------------------
| | |
v v v
ThreadA ThreadB ThreadC
AFTER:
---------------------
| | |
v v v
ThreadA ThreadB ThreadC
X = 8 X = 8 X = 8
Thread-local storage helps you to avoid synchronization, because synchronization might be dangerous and hard to handle.
Application of thread-local storage:
-
Counter: Each thread does its own counting job.
-
Security: The random functions often use an initialization seed. When multiple threads calling a random function, results may be the same for some threads. This may lead to security issues. Using synchronization of course solves this problem, but it is too overhead. In this case, using thread-local storage is great. Each thread use an individual random function, which has a different random seed.
In the demo code, by using thread-local storage, each thread has its own counter. So, the counter in one thread is completely independent of each other.
Please read article "Volatile vs Atomic" in notes-articles.md for better understanding.
Problem statement: Find the integer in the range 1 to 100000 that has the largest number of divisors.
The solution without multithreading.
This source code file contains the solution using multithreading.
There are 2 phases:
-
Phase 1:
- Each worker finds result on a specific range.
- This phase uses multiple threads.
-
Phase 2:
- Based on multiple results from workers, main function gets the final result with maximum numDiv.
- This phase uses a single thread (i.e. main function).
The difference between version C and version B is:
- Each worker finds result on a specific range, and then updates final result itself.
- So, main function does nothing.
The producer–consumer problem (also known as the bounded-buffer problem) is a classic example of a multi-process synchronization problem. The first version of which was proposed by Edsger W. Dijkstra in 1965.
In the producer-consumer problem, there is one producer that is producing something and there is one consumer that is consuming the products produced by the producer. The producers and consumers share the same memory buffer that is of fixed-size.
The job of the producer is to generate the data, put it into the buffer, and again start generating data. While the job of the consumer is to consume the data from the buffer.
In the later formulation of the problem, Dijkstra proposed multiple producers and consumers sharing a finite collection of buffers.
What are the problems here?
-
The producer and consumer should not access the buffer at the same time.
-
The producer should produce data only when the buffer is not full.
- If the buffer is full, then the producer shouldn't be allowed to put any data into the buffer.
-
The consumer should consume data only when the buffer is not empty.
- If the buffer is empty, then the consumer shouldn't be allowed to take any data from the buffer.
Consider a situation where we have a file shared between many people.
If one of the people tries editing the file, no other person should be reading or writing at the same time, otherwise changes will not be visible to him/her. However if some person is reading the file, then others may read it at the same time.
Precisely in Computer Science we call this situation as the readers-writers problem.
What are the problems here?
- One set of data is shared among a number of processes.
- Once a writer is ready, it performs its write. Only one writer may write at a time.
- If a process is writing, no other process can read it.
- If at least one reader is reading, no other process can write.
The first solution is suboptimal, because it is possible that a reader R1 might have the lock, a writer W be waiting for the lock, and then a reader R2 requests access.
It would be unfair for R2 to jump in immediately, ahead of W; if that happened often enough, W would STARVE. Instead, W should start as soon as possible.
This is the motivation for the second readers–writers problem, in which the constraint is added that no writer, once added to the queue, shall be kept waiting longer than absolutely necessary. This is also called writers-preference.
In fact, the solutions implied by both problem statements can result in starvation - the first one may starve writers in the queue, and the second one may starve readers.
Therefore, the third readers–writers problem is sometimes proposed, which adds the constraint that no thread shall be allowed to starve; that is, the operation of obtaining a lock on the shared data will always terminate in a bounded amount of time.
Solution:
- The idea is using a semaphore "serviceQueue" to preserve ordering of requests (signaling must be FIFO).
The dining philosophers problem states that there are 5 philosophers sharing a circular table and they eat and think alternatively. There is a bowl of rice for each of the philosophers and 5 chopsticks.
A philosopher needs both their right and left chopstick to eat.
A hungry philosopher may only eat if there are both chopsticks available.
Otherwise a philosopher puts down their chopstick and begin thinking again.
A solution of the dining philosophers problem is to use a semaphore to represent a chopstick.
A chopstick can be picked up by executing a wait operation on the semaphore and released by executing a signal semaphore.
The structure of a random philosopher i
is given as follows:
while true do
wait( chopstick[i] );
wait( chopstick[ (i+1) % 5] );
EATING THE RICE
signal( chopstick[i] );
signal( chopstick[ (i+1) % 5] );
THINKING
What are the problems here?
- Deadlock.
- Starvation.
The above solution makes sure that no two neighboring philosophers can eat at the same time. But this solution can lead to a deadlock. This may happen if all the philosophers pick their left chopstick simultaneously. Then none of them can eat and deadlock occurs.
Some of the ways to avoid deadlock are as follows:
- An even philosopher should pick the right chopstick and then the left chopstick while an odd philosopher should pick the left chopstick and then the right chopstick.
- A philosopher should only be allowed to pick their chopstick if both are available at the same time.
For an example:
Matrix A:
| 1 2 3 |
| 4 5 6 |
| 7 8 9 |
Vector b:
| 3 |
| -1 |
| 0 |
The multiplication of A and b is the vector:
| 1 |
| 7 |
| 13 |
Solution:
- Separate matrix A into list of rows.
- For each row, calculate scalar product with vector b.
- We can process each row individually. Therefore multithreading will get the job done.
For an example:
Matrix A:
| 1 3 5 |
| 2 4 6 |
Matrix B:
| 1 0 1 0 |
| 0 1 0 1 |
| 1 0 0 -2 |
The result of dot(A, B) is the matrix:
| 6 3 1 -7 |
| 8 4 2 -8 |
Blocking queues strongly relate to the producer-consumer problem:
- The
enqueue
method is corresponding to theproduce
action, which creates an object and push it into the rear of the queue. - The
dequeue
method is corresponding to theconsume
action, which removes an object from the front of the queue.
There are many methods to implement the producer-consumer problem and this is similar to implementation of the blocking queues.
The internal data server of a company performs two main tasks for a request:
- check_auth_user()
- process_files()
The pseudo code is:
function process_request():
if check_auth_user(), then:
list_file_data = process_files()
return list_file_data
function check_auth_user():
check username, permissions, encryption...
return (true or false)
function process_files():
read file data from disk
do some stuff...
write log
return file data
The problem: Usually, two tasks might take a long time. How can we improve the performance? Suppose this server serves internally for employees in company.
We can assume that check_auth_user()
usually returns true, so instead of waiting for a long time of check_auth_user()
, we can run process_files()
in parallel.
BEFORE:
Main thread -------------------> ------------------->
check_auth_user() process_files()
AFTER:
Thread join
Synchronize
Main thread ------------------->----------------->
check_auth_user() ^
|
Child thread ------------------->---------
process_files()
The most important & interesting things is here:
The first thing:
- Checking for authorization is CPU bound (and maybe network bandwidth bound).
- Processing files is disk bound.
By running check_auth_user()
and process_files()
in parallel, we can increase the performance.
The second thing:
Function process_files()
not only reads files but also writes logs. After reading files, we can return file data to user immediately. The "writting logs" tasks could perform later.
Synchronize
------------------------|----------------------------> Time
CPU Check auth user | Do other tasks
Disk Read file | Write log
User is authorized
and files are loaded
Let's have a look at the code. I hope you could learn something good.
To implement an executor service, you need to focus on the worker function (the function that is executed by threads in the pool). For an idle thread:
- Pick a task (a job) in the queue.
- Do the task.
Step 1 requires synchronization (by a blocking queue, a queue with mutex/synchronized block/condition variable...).
It looks simple, but if we need to expand features for our thread pool, things start to get complicated. You may take care of synchronization everywhere:
- When a task is dequeued.
- When a task is done.
- When all tasks are done.
- When users want to shutdown thread pool.
- ...