Work queues are added in the Linux kernel 2.6 version. Work queues are a different form of deferring work. Work queues defer work into a kernel thread; this bottom half always runs in the process context. Because workqueue allows users to create a kernel thread and bind work to the kernel thread.

A work item describing which function to execute is put on a queue. An independent thread serves as the asynchronous execution context. The queue is called workqueue and the thread is called worker.

引入了一个 work item 的概念:其实就是一个函数指针,指向需要被异步执行的函数。一个 work item 也是 workqueue 里的基本元素。A work item can be executed in either a thread or the BH (softirq) context.

kworkers 就是负责执行 workqueue 里 work item 的内核线程。这些 workers 就是在一个 worker 池里面被管理的。

worker pool(worker 池)的数量也是不一定的,至少有两个 worker pool,一个就是为了放普通的 work item,另一个是为了放高优先级的工作。还可以有一些其他的 worker pool,所以是动态的。

一个 worker pool 应该是 bound 在一个 CPU 上的。

我们要弄清楚,work item, workers, worker pool, CPU, workqueue 之间的对应关系:

  • workqueuework item 应该是一对多的,因为一个 work item 不会被加到多个 workqueue 之中。
  • workers 和 worker pool 也是多对一的,一个 worker 只能位于一个 worker pool 中。
  • worker pool 和一个 CPU 大概是多对一的关系,一个 CPU 上可以有多个 worker pool。但是一个 worker pool 只能绑定一个 CPU,可以看 struct worker_pool 的 cpu 属性。
  • worker pool 和 workqueue 也不是对应的关系,一个 workqueue 中的不同 item 可以在不同的 worker pool 中执行。
  • worklist 和 worker pool 是一对一的,每一个 worker pool 都有自己的 worklist(struct worker_poolworklist),一个一个来执行。

As long as there are one or more runnable workers on the CPU, the worker-pool doesn’t start execution of a new work, but, when the last running worker goes to sleep, it immediately schedules a new worker so that the CPU doesn’t sit idle while there are pending work items. This allows using a minimal number of workers without losing execution bandwidth. 这么来看的话,好像 worker-pool 里只有一个 worker 也没什么关系?因为 worker 只有在所有其他的 worker sleep 的时候才会启动,那么是不是就说明同一时刻只有一个 worker 被启动了,那为什么还有设计多个 worker 线程呢?

Work queues are a different form of deferring work. Work queues defer work into a kernel thread; this bottom half always runs in the process context. So, this will run in the process context and the work queue can sleep. Normally, it is easy to decide between using workqueue and softirq/tasklet:

  • If the deferred work needs to sleep, then workqueue is used.
  • If the deferred work needs not sleep, then softirq or tasklet are used.
[Mastering Workqueue in Linux Kernel Programming Full Tutorial](https://embetronicx.com/tutorials/linux/device-drivers/workqueue-in-linux-kernel/)

A "workqueue" is a list of tasks to perform, along with a (per-CPU) kernel thread to execute those tasks.

Workqueue subsystem init process

start_kernel
    // 1.
    workqueue_init_early();
    rest_init
        kernel_init
            kernel_init_freeable
                // 2. create the real workers
            	workqueue_init();
                // 3.
            	workqueue_init_topology();

每一个 CPU 上固定的 worker pool

每一个 CPU 上固定有

  • 2 个 bh worker pool,包含一个普通优先级 pool 和一个高优先级 pool;
  • 2 个 cpu worker pool,包含一个普通优先级 pool 和一个高优先级 pool。

Why workqueue and worker pool are not 1:1?

由于内核默认创建的这些队列是共享的, 各个驱动都有可能将自己的工作项放到同个队列, 会导致队列的项拥挤, 当有些项写的代码耗时久或者调用 delay() 延时特别久, 你的项将会迟迟得不到执行!

所以早期很多驱动开发人员都是自己创建 workqueue,添加自己的 work。在 Linux-2.XXX 时代,创建 workqueue 时会创建属于 workqueue 自己的内核线程, 这些线程是“私有的”,虽然是方便了驱动开发人员, 但每个驱动都“一言不合”就创建 workqueue 导致太多线程, 严重占用系统资源和效率,所以在 Linux-3.XXX 时代,社区开发人员将 workqueue 和内核线程剥离!内核会自己事先创建相应数量的线程,被所有驱动共享使用。 用户调用 alloc_workqueue() 只是创建 workqueue 这个空壳。

内核默认创建的 workqueue 和 worker pool / workqueue_init_early() Kernel

start_kernel
    workqueue_init_early


system_wq = alloc_workqueue("events", 0, 0);
system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
system_long_wq = alloc_workqueue("events_long", 0, 0);
system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND, WQ_MAX_ACTIVE);
system_freezable_wq = alloc_workqueue("events_freezable", WQ_FREEZABLE, 0);
system_power_efficient_wq = alloc_workqueue("events_power_efficient", WQ_POWER_EFFICIENT, 0);
system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_pwr_efficient", WQ_FREEZABLE | WQ_POWER_EFFICIENT, 0);
system_bh_wq = alloc_workqueue("events_bh", WQ_BH, 0);
system_bh_highpri_wq = alloc_workqueue("events_bh_highpri", WQ_BH | WQ_HIGHPRI, 0);

使用 kthread_create()workqueue 的优缺点对比

当然使用者在实现自己函数功能后可以直接调用,或者通过 kthread_create() 把函数当做新线程的主代码, 或者 add_timer 添加到一个定时器延时处理。

那为何要弄个 work_struct 工作项先封装函数, 然后再丢到 workqueue_srtuct 处理呢? 这就看使用场景了, 如果是一个大函数, 处理事项比较多, 且需要重复处理, 可以单独开辟一个内核线程处理; 对延时敏感的可以用定时器;

如果只是简单的一个函数功能, 且函数里面有延时动作的, 就适合放到工作队列来处理了, 毕竟

  • 定时器处理的函数是在中断上下文,不能 delay 或者引发进程切换的 API
  • 而且开辟一个内核线程是耗时且耗费资源的, 一般用于函数需要 while(1) 不断循环处理的。

不然处理一次函数后退出,线程又被销毁, 简直就是浪费!

总结来说 workqueue 更适合轻量级的任务使用。

Linux-workqueue讲解 - Vedic - 博客园

Linux Workqueue Interface

create_workqueue() Kernel

Tasks to be run out of a workqueue need to be packaged in a work_t structure.

DECLARE_WORK() Kernel

Declare and initialize a work_t strucure at compile time.

INIT_WORK() / PREPARE_WORK() / Kernel

Set up a work_t structure at run time.

The difference between the two is that INIT_WORK initializes the linked list pointers within the work_t structure, while PREPARE_WORK changes only the function and data pointers. INIT_WORK must be used at least once before queueing the work_t structure, but should not be used if the work_t might already be in a workqueue.

queue_work() / queue_delayed_work() Kernel

The second form of the call ensures that a minimum delay (in jiffies) passes before the work is actually executed.

flush_workqueue() Kernel

Entries in workqueues are executed at some undefined time in the future, when the associated worker thread is scheduled to run.

This call is to wait until all workqueue entries have actually run.

Note that if the queue contains work with long delays, or if something keeps refilling the queue, this call could take a long time to complete.

destroy_workqueue() / Kernel

Flush the queue, then delete it.

schedule_work() / schedule_delayed_work() / Kernel

当然,有一些 work 就是简单的想执行,并不想再刻意定义一个 workqueue 包含它们,这种情况可以调用这两个函数用全局的 queue。

flush_scheduled_work() Kernel

Wait for everything on this global queue to be executed.