Chapter 10 Kernel Synchronization Methods #74

jason--liu · 2021-07-11T04:54:42Z

原子操作

Atomic operations provide instructions that execute atomically—without interruption.
内核提高了两种原子操作，一种是对整数操作，一种是对比特位操作。一些体系架构提高了原子操作指令，而一些架构则是通过Lock内存总线来实现的。

原子整数操作

原子整数操作由atomic_t表示。

typedef struct {
    volatile int counter;
}atomic_t

定义及操作原子变量示例：

atomic_t v; 					/* define v */
atomic_t u = ATOMIC_INIT(0); 	/* define u and initialize it to zero */

atomic_set(&v, 4); 		/* v = 4 (atomically) */
atomic_add(2, &v); 		/* v = v + 2 = 6 (atomically) */
atomic_inc(&v); 		/* v = v + 1 = 7 (atomically) */

原子变量操作集

原子变量与屏障

Atomicity ensures that instructions occur without interruption and that they complete either in their entirety or not at all. Ordering, on the other hand, ensures that the desired, relative ordering of two or more instructions

原子变量只能保证操作不能打断，但是不能保证指令的前后顺序。
在大多数架构上，与更复杂的同步方法相比，一两个原子操作会产生更少的开销和更少的缓存抖动。

64位原子操作

定义：

typedef struct {
    volatile long counter;
} atomic64_t;

64位原子操作

为了可移植性，建议使用32位原子变量

原子位操作

原子位操作，操作的是内存地址。
例如：

unsigned long word = 0;
set_bit(0, &word); 		/* bit zero is now set (atomically) */
set_bit(1, &word);     /* bit one is now set (atomically) */
printk(“%ul\n”, word); /* will print “3” */
clear_bit(1, &word);      /* bit one is now unset (atomically) */
change_bit(0, &word); /* bit zero is flipped; now it is unset (atomically) */
/* atomically sets bit zero and returns the previous value (zero) */
if (test_and_set_bit(0, &word)) {
/* never true ... */
}
/* the following is legal; you can mix atomic bit instructions with normal C */
word = 7;

原子位操作集

对于不需要原子操作的位操作指令前面有两个__“”，比如test_bit()对应的普通位操作指令是__test_bit()
其他原子位操作函数

// 找到第一个设置或者位设置的比特位
int find_first_bit(unsigned long *addr, unsigned int size)
int find_first_zero_bit(unsigned long *addr, unsigned int size)

类似的如果输入参数是word类型，可以省略一个参数

 unsigned long __ffs(unsigned long word)
#define ffz(x)  __ffs(~(x))

自旋锁

对应临界区数据保护，可以使用自旋锁。比如将一个structure中的数据修改移除然后添加到另外一个structure中。
自旋锁特点：

It is not wise to hold a spin lock for a long time.This is the nature of the spin lock: a lightweight single-holder lock that should be held for short durations.

自旋时间应该尽可能短。

自旋锁方法

#include <linux/spinlock.h>  // 通用接口
#include <asm/spinlock.h> // 架构相关
DEFINE_SPINLOCK(mr_lock);

spin_lock(&mr_lock);
/* critical region ... */
spin_unlock(&mr_lock);

在SMP系统上，只有一个线程能进入临界区，其他线程只有自旋等待；在UP系统上，spinlock仅仅是disable/enable抢占
spinlock不能递归

This means that if you attempt to acquire a lock you already hold, you will spin, waiting for yourself to release the lock. But because you are busy spinning, you will never release the lock and you will deadlock. Be careful!

spinlock可以在中断中使用，而信号量不行；如果在中断处理函数中使用spinlock，在获取锁前还必须关闭local interrupt，否则其他代码路径持有锁时中断处理程序可能会中断内核代码并尝试重新获取锁，这样中断处理函数回spin，而锁持有者因为被中断没法继续运行。这样就出现了double-acquire deadlock。因此这种情况下需要disable local irq，内核已经提供了相关接口。

DEFINE_SPINLOCK(mr_lock);
unsigned long flags;
spin_lock_irqsave(&mr_lock, flags); // save the current state and disable irq
/* critical region ... */
spin_unlock_irqrestore(&mr_lock, flags);

在UP系统中，上面的例子只需要关中断和关闭抢占就可以了。如果已经知道中断是打开的，那么可以直接使用

DEFINE_SPINLOCK(mr_lock);
spin_lock_irq(&mr_lock);
/* critical section ... */
spin_unlock_irq(&mr_lock);

可以打开CONFIG_DEBUG_SPINLOCK调试spinlock，比如对未初始化的spinlock以及未加锁就lock的代码，调试lock生命周期，可以打开CONFIG_DEBUG_LOCK_ALLOC

其他spinclo方法

spin_lock_init()可以动态初始化，spin_trylock尝试获取锁，如果获取不到，函数立即返回0，如果获取到锁，返回非0.

spinlock和中断下半部

因为下半部可以抢占进程上下文，如果进程上下文和下半部有数据共享，需要加锁。
spin_lock_bh() 获取锁并关下半部，spin_unlock_bh过程则相反l。
对于只有一类tasklet使用的共享数据，没必要加锁，因为相同类型的tasklet不能同时运行；如果是不同类型的tasklet共享数据，那么需要spinlock保护，这里不需要关闭下半部，因为同一个处理器上tasklet不会抢占另外tasklet.
对于softirq，无论是否是相同类型，都需要spinclock保护，因为相同类型的softirq可以同时运行，softirq也不能抢占当前正在运行的softirq，因此可以不用关闭下半部

读写自旋锁

读写自旋锁使用场景

.When the list is updated (written to), it is important that no other threads of execution concurrently write to or read from the list. Writing demands mutual exclusion. On the other hand, when the list is searched (read from), it is only important that nothing else writes to the list. Multiple concurrent readers are safe so long as there are no writers.

task list就是使用读写自旋锁保护的。
读写自旋锁规则：

One or more readers can concurrently hold the reader lock.The writer lock, conversely, can be held by at most one writer with no concurrent readers.

使用示例：

DEFINE_RWLOCK(mr_rwlock);
//Then, in the reader code path:
read_lock(&mr_rwlock);
/* critical section (read only) ... */
read_unlock(&mr_rwlock);

//Finally, in the writer code path:
write_lock(&mr_rwlock);
/* critical section (read and write) ... */
write_unlock(&mr_lock);

读锁不能被提升为写锁，否则可能引起死锁。

read_lock(&mr_rwlock);
write_lock(&mr_rwlock);

当写者自旋的时候，等待所有读者释放，包括“自己”，引起死锁。
读写锁接口


读写锁更倾向于读者。如果读者持锁，则写者自旋，而有其他读者来时可以获取锁，只有当所有读者释放了，写者才能拿到锁，者有可能引起写者“饥饿”

信号量（semaphores）

信号量是可以睡眠的锁。如果进程获取不到信号量，将被放入等待队列并睡眠。

信号量适用于持有锁时间比较长的情况
信号量只能用于进程上下文
获取了信号量就不能获取自旋锁，因为随时可能睡眠

信号量不会关闭抢占，持有信号量的进程也可以被抢占，因此信号量不会引起调度延时。

计数和二进制信号量

信号量允许同时有多个锁持有者。信号量的持有者个数可以在初始化时设置，这个数量称为usage count。通常情况下，设置最多只有一个信号量持有者，也就是二进制信号量或mutex。

定义和使用信号量

初始化

struct semaphore name;
sema_init(&name, count); //动态初始化
// 静态初始化互斥锁
static DECLARE_MUTEX(name);
// 动态初始化互斥锁
init_MUTEX(sem);

使用信号量

/* define and declare a semaphore, named mr_sem, with a count of one */
static DECLARE_MUTEX(mr_sem);
/* attempt to acquire the semaphore ... */
if (down_interruptible(&mr_sem)) {
/* signal received, semaphore not acquired ... */
}
/* critical region ... */
/* release the given semaphore */
up(&mr_sem)

信号量接口


down_trylock尝试获取信号量，如果已经被持有，立即返回非0，否则返回0并获取到锁，不会阻塞。
down_interruptible()当收到信号时返回-EINTR。
up()释放信号量，如果等待队列不为空，唤醒下一个进程

读写信号量

读写信号量由struct rw_semaphore表示。
静态声明：

static DECLARE_RWSEM(name);

动态声明：

init_rwsem(struct rw_semaphore *sem)

所有读写信号量都是互斥锁，即count位1。只要没有写者，任何数量的读者都可以获取锁，相反，只有一个写者能获取锁，所有的读写锁都是不可中断睡眠，因此只有一种获取锁函数down_read()和down_write()
down_read_trylock()和down_write_trylock()如果获取到锁返回非0，否则返回0，这和普通信号量是的接口函数行为是相反。
downgrate_write()可以将写者降级为读者。使用读写信号量和读写自旋锁一样，需要在读写路径清晰明确的情况下使用。

互斥锁（mutexes）

互斥锁由struct mutex表示，它的机制和count位1的信号量相似，但是有更简单的接口和更好的性能，以及额外的限制。
静态声明互斥锁：
``C
DEFINE_MUTEX(name);

动态初始化互斥锁：
```C
mutex_init(&mutex);

使用示例：

mutex_lock(&mutex);
/* critical region ... */
mutex_unlock(&mutex);

看起来确实比二进制信号量使用简单很多。
对于使用互斥锁的限制：

Only one task can hold the mutex at a time.That is, the usage count on a mutex is always one.
Whoever locked a mutex must unlock it.That is, you cannot lock a mutex in one context and then unlock it in another.
Recursive locks and unlocks are not allowed
A process cannot exit while holding a mutex.
A mutex cannot be acquired by an interrupt handler or bottom half, even with mutex_trylock().
A mutex can be managed only via the official API: It must be initialized via the methods described in this section and cannot be copied, hand initialized, or reinitialized.

互斥锁最有用的部分可能是调试部分了，内核可以打开CONFIG_DEBUG_MUTEX来进行检查，比如是否符合上诉限制。

信号量vs互斥锁

二者使用原则：

Unless one of mutex’s additional constraints prevent you from using them, prefer the new mutex type to semaphores.

spinlock vs互斥锁

使用原则：

. Only a spin lock can be used in interrupt context, whereas only a mutex can be held while a task sleeps.

完成量（completion variables）

Using completion variables is an easy way to synchronize between two tasks in the kernel when one task needs to signal to the other that an event has occurred.

比如vfork系统调用使用完成量来唤醒父进程当子进程退出或开始执行。
完成量由struct completion表示
静态初始化：

DECLARE_COMPLETION(mr_comp);

动态初始化

init_completion()

针对指定的完成量，代码中可以调用wait_for_completion()来等待。如果事件发生了，调用complete来唤醒所有等待进程。

对于完成量的使用，可以参考kernel/sched.c和kernel/fork.c

BKL

TODO

顺序锁（dequential locks）

什么是顺序锁

It provides a simple mechanism for reading and writing shared data. It works by
maintaining a sequence counter.

工作机制

Whenever the data in question is written to, a lock is obtained and a sequence number is incremented. Prior to and after reading the data, the sequence number is read. If the values are the same, a write did not begin in the middle of the read. Further, if the values are even, a write is not underway. (Grabbing the write lock makes the value odd, whereas releasing it makes it even because the lock starts at zero.)

定义seq lock

seqlock_t mr_seq_lock = DEFINE_SEQLOCK(mr_seq_lock);

写路径是：

write_seqlock(&mr_seq_lock);
/* write lock is obtained... */
write_sequnlock(&mr_seq_lock);

上面看起来和spin lock相似，但在读路径上就不同了

unsigned long seq;
do {
seq = read_seqbegin(&mr_seq_lock);
/* read data here ... */
} while (read_seqretry(&mr_seq_lock, seq));

Seq locks are useful to provide a lightweight and scalable lock for use with many readers and a few writers..An acquisition of the write lock always succeeds as long as there are no other writers.
读者不会影响写者，这与读写自旋锁和读写信号量不同。挂起的写入者不断地导致读取循环（前面的例子）重复，直到不再有任何写入者持有锁。下面场景适合顺序锁：

Your data has a lot of readers.
Your data has few writers.
Although few in number, you want to favor writers over readers and never allow readers to starve writers.
Your data is simple, such as a simple structure or even a single integer that, for whatever reason, cannot be made atomic.

一个实际使用例子是jiffies

u64 get_jiffies_64(void)
{
unsigned long seq;
u64 ret;
do {
seq = read_seqbegin(&xtime_lock);
ret = jiffies_64;
} while (read_seqretry(&xtime_lock, seq));
return ret;
}

在始终中断中更新jiffies

write_seqlock(&xtime_lock);
jiffies_64 += 1;
write_sequnlock(&xtime_lock);

抢占关闭

在一些场景下，代码路径可以不用spinlock，比如per-cpu变量数据，如果per数据每个cpu都不同，没必要用锁因为只有当前cpu才能访问它，但是需要关抢占。场景如下：

task A manipulates per-processor variable foo, which is not protected by a lock
task A is preempted
task B is scheduled
task B manipulates variable foo
task B completes
task A is rescheduled
task A continues manipulating variable foo

为了解决这个问题，代码中可以使用preempt_disable()，这个函数是可嵌套的，可以多次调用，但需要有对应次数的preempt_enable()。
如果计数为0，可抢占。

对于per-cpu的数据问题，有更简洁的方式来保护数据，可以使用get_cpu()，这个函数会关抢占，使用流程如下。

int cpu;
/* disable kernel preemption and set “cpu” to the current processor */
cpu = get_cpu();
/* manipulate per-processor data ... */
/* reenable kernel preemption, “cpu” can change and so is no longer valid */
put_cpu();

内存屏障

When dealing with synchronization between multiple processors or with hardware devices, it is sometimes a requirement that memory-reads (loads) and memory-writes (stores) issue in the order specified in your program code.

rmb()

The rmb() method provides a read memory barrier. It ensures that no loads are reordered across the rmb() call.That is, no loads prior to the call will be reordered to after
the call, and no loads after the call will be reordered to before the call.

wmb()

The wmb() method provides a write barrier. It functions in the same manner as rmb(),
but with respect to stores instead of loads—it ensures no stores are reordered across the
barrier

mb()

The mb() call provides both a read barrier and a write barrier. No loads or stores will
be reordered across a call to mb().

常用内存屏障接口


如果架构不支持乱序执行（out-of-order）,wmb()是空函数，比如x86。

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 10 Kernel Synchronization Methods #74

Chapter 10 Kernel Synchronization Methods #74

jason--liu commented Jul 11, 2021 •

edited

Loading

Chapter 10 Kernel Synchronization Methods #74

Chapter 10 Kernel Synchronization Methods #74

Comments

jason--liu commented Jul 11, 2021 • edited Loading

原子操作

原子整数操作

原子变量与屏障

64位原子操作

原子位操作

自旋锁

自旋锁方法

其他spinclo方法

spinlock和中断下半部

读写自旋锁

信号量（semaphores）

计数和二进制信号量

定义和使用信号量

读写信号量

互斥锁（mutexes）

信号量vs互斥锁

spinlock vs互斥锁

完成量（completion variables）

BKL

顺序锁（dequential locks）

抢占关闭

内存屏障

jason--liu commented Jul 11, 2021 •

edited

Loading