cpuset(7) Miscellaneous Information Manual cpuset(7) cpuset - (cpuset) -- - , . , /dev/cpuset. On systems with kernels compiled with built in support for cpusets, all processes are attached to a cpuset, and cpusets are always present. If a system supports cpusets, then it will have the entry nodev cpuset in the file /proc/filesystems. By mounting the cpuset filesystem (see the EXAMPLES section below), the administrator can configure the cpusets on a system to control the processor and memory placement of processes on that system. By default, if the cpuset configuration on a system is not modified or if the cpuset filesystem is not even mounted, then the cpuset mechanism, though present, has no effect on the system's behavior. . , , , , Hyper-Threads . ; SMP-, , , , NUMA (non-uniform memory access) . - , (/dev/cpuset) ( ) -- , . . . , . fork(2), . , . , , , . , , , . (scheduling affinity) sched_setaffinity(2) mbind(2) set_mempolicy(2) . . , , . . , , , , . , , , . /dev/cpuset -, . mkdir(2) mkdir(1). , : , , , . - , mkdir(2). -. , , , rmdir(2) rmdir(1). - . - , , cat(1) echo(1), - , open(2), read(2), write(2) close(2). - . . tasks ID (PID) ASCII, . ( , ) PID tasks ( ). : tasks PID. PID, . notify_on_release (0 1). (1), , , ( ), . . cpuset.cpus , . cpus . cpus. cpuset.cpu_exclusive (0 1). (1), ( (sibling) (cousin) ). (0). (0). , /dev/cpuset. , . cpu_exclusive, -- , cpus, cpus , cpus cpus . cpuset.mems , . mems . cpuset.mem_exclusive (0 1) (1), ( ). , (1), -- Hardwall ( ). (0). (0). mem_exclusive, -- , , . cpuset.mem_hardwall ( Linux 2.6.26) (0 1) (1), -- Hardwall ( ). mem_exclusive, mem_hardwall ( ) . (0). (0). cpuset.memory_migrate ( Linux 2.6.16) (0 1). (1), (migration) . (0). . cpuset.memory_pressure ( Linux 2.6.16) , . . memory_pressure_enabled , (0). . . cpuset.memory_pressure_enabled ( Linux 2.6.16) (0 1). , /dev/cpuset. (1), memory_pressure . (0). . cpuset.memory_spread_page ( Linux 2.6.17) (0 1). (1), ( ) . (0) , . . cpuset.memory_spread_slab ( Linux 2.6.17) Flag (0 or 1). If set (1), the kernel slab caches for file I/O (directory and inode structures) are uniformly spread across the cpuset. By default, is off (0) in the top cpuset, and inherited from the parent cpuset in newly created cpusets. See the Memory Spread section, below. cpuset.sched_load_balance ( Linux 2.6.24) (0 1). (1), . (0), , - sched_load_balance. . cpuset.sched_relax_domain_level ( Linux 2.6.26) -1 (small positive value). sched_relax_domain_level . sched_load_balance , sched_relax_domain_level , . sched_load_balance , sched_relax_domain_level, , . . In addition to the above pseudo-files in each directory below /dev/cpuset, each process has a pseudo-file, /proc/pid/cpuset, that displays the path of the process's cpuset directory relative to the root of the cpuset filesystem. Also the /proc/pid/status file for each process has four added lines, displaying the process's Cpus_allowed (on which CPUs it may be scheduled) and Mems_allowed (on which memory nodes it may obtain memory), in the two formats Mask Format and List Format (see below) as shown in the following example: Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff Cpus_allowed_list: 0-127 Mems_allowed: ffffffff,ffffffff Mems_allowed_list: 0-63 <> Linux 2.6.24; <> Linux 2.6.26. cpus mems , . cpu_exclusive mem_exclusive, , , . mem_exclusive , . , mem_exclusive , . , , . mem_exclusive mem_exclusive, . , , mem_exclusive. Hardwall , mem_exclusive mem_hardwall hardwall. hardwall , , . , hardwall , . , , , . hardwall hardwall, . , , hardwall. notify_on_release (1), ( ) , /sbin/cpuset_release_agent, ( ) . . notify_on_release (0). notify_on_release . /sbin/cpuset_release_agent argv[1] ( /dev/cpuset) . , /sbin/cpuset_release_agent -- : #!/bin/sh rmdir /dev/cpuset/$1 , , , ASCII- 0 1 ( ), . memory_pressure , , . , , , , . , , , , -- , , , , , , . . , . - /dev/cpuset/cpuset.memory_pressure_enabled, , memory_pressure ASCII <<0\n>>. . (running average) : o , , , , , . o , , , . o Because this meter is per-cpuset rather than per-process, the batch scheduler can obtain the key information--memory pressure in a cpuset--with a single read, rather than having to query and accumulate results over all the (dynamically changing) set of processes in the cpuset. memory_pressure , . , , , (kernel direct reclaim code). , - (repurpose) , . . , , . cpuset.memory_pressure , ( (half-life) 10 ) , , , 1000. -- , , . cpuset.memory_spread_page cpuset.memory_spread_slab. - cpuset.memory_spread_page, ( ) , , , . - cpuset.memory_spread_slab, slab- , , inode , , , , . ( brk(2)) . , , , . NUMA , . . - slab NUMA . , , , mbind(2) set_mempolicy(2). NUMA , - , . , NUMA, . cpuset.memory_spread_page cpuset.memory_spread_slab . , <<0>>; , . <<1>>, . , , ( ) . , , , : o , , ; o , - . , , . , cpuset.memory_migrate () ( ), , , , mems . , mems , , , , . , , memory_migrate, , , , , . , . , , , . , . , , , sched_setaffinity(2). , , , . , , , ( , , ). sched_load_balance . , , <> ( ). , , : o . , . o , , , , . sched_load_balance ( ), , ( sched_setaffinity(2)) . sched_load_balance , , - - sched_load_balance. , , sched_load_balance, , sched_load_balance , . sched_load_balance , . , , , , , . , . , , , sched_load_balance, . (immediate) . , . , time(7). sched_relax_domain_level . sched_relax_domain_level ( sched_load_balance). , , sched_setaffinity(2). . , , . sched_relax_domain_level . . Linux 2.6.26, sched_relax_domain_level : 1 Hyper-Thread . 2 . 3 . 4 ( ) ( NUMA). 5 ( NUMA). sched_relax_domain_level (0) , , , . sched_relax_domain_level (-1) . . <>. In the case of multiple overlapping cpusets which have conflicting sched_relax_domain_level values, then the highest such value applies to all CPUs in any of the overlapping cpusets. In such cases, -1 is the lowest value, overridden by any other value, and 0 is the next lowest value. : The Mask Format is used to represent CPU and memory-node bit masks in the /proc/pid/status file. 32- ( ASCII <<0>>-<<9>> <>-<>); , . . , . . 32- , , . : 00000001 # 0 40000000,00000000,00000000 # 94 00000001,00000000,00000000 # 64 000000ff,00000000 # 32-39 00000000,000e3862 # 1,5,6,11-13,17-19 0, 1, 2, 4, 8, 16, 32 64 : 00000001,00000001,00010117 <<1>> 64, -- 32, -- 16, -- 8, -- 4 <<7>> -- 2, 1 0. -- ASCII cpus mems ( ) . : 0-4,9 # 0, 1, 2, 3, 4 9 0-2,7,12-14 # 0, 1, 2, 7, 12, 13 14 : o (, ) . o cpu_exclusive , . o mem_exclusive , . o cpu_exclusive, . o If it is mem_exclusive, its memory nodes may not overlap any sibling. - , /dev/cpuset. , ( ), tasks . tasks. . , ( kill(2)). , . , ( ) cpus mems. , . , . , , . , ( cd chdir(2) /dev/cpuset, ) . , , (, , /dev/cpuset). , , , /dev/cpuset. - /dev/cpuset/tasks, , . memory_pressure , cpuset.memory_pressure (0). <<1>> - /dev/cpuset/cpuset.memory_pressure_enabled, memory_pressure. echo echo echo; write(2). , : echo 19 > cpuset.mems - , 19 (, 19), echo . /bin/echo , write(2), : /bin/echo 19 > cpuset.mems /bin/echo: write error: Invalid argument : , , cpus_allowed , , . , . , , , . , , . , GFP_ATOMIC, . - , . , , . , . , , . rename(2). ; -- . Linux errno . errno , : E2BIG write(2) , . EACCES write(2) ID (PID) tasks . EACCES , write(2), , . EACCES , write(2), cpuset.cpu_exclusive cpuset.mem_exclusive , . EACCES write(2) cpuset.memory_pressure. EACCES . EBUSY , rmdir(2), . EBUSY , rmdir(2), . EBUSY , . EEXIST , mkdir(2), , . EEXIST rename(2) , . EFAULT read(2) write(2) , . EINVAL write(2) , cpu_exclusive mem_exclusive . EINVAL write(2) cpuset.cpus cpuset.mems , . EINVAL write(2) cpuset.cpus cpuset.mems, , . EINVAL write(2) cpuset.cpus cpuset.mems, . EINVAL write(2) cpuset.cpus, . EINVAL write(2) cpuset.mems, . EINVAL write(2) cpuset.mems, . EIO write(2) tasks , ASCII. EIO rename(2) . ENAMETOOLONG Attempted to read(2) a /proc/pid/cpuset file for a cpuset path that is longer than the kernel page size. ENAMETOOLONG mkdir(2) , 255 . ENAMETOOLONG mkdir(2) , , ( <>), 4095 . ENODEV , write(2) - . ENOENT mkdir(2) , . ENOENT access(2) open(2) . ENOMEM ; , . ENOSPC write(2) ID (PID) tasks, cpuset.cpus cpuset.mems. ENOSPC write(2) cpuset.cpus cpuset.mems , . ENOTDIR rename(2) . EPERM . ERANGE cpuset.cpus cpuset.mems , , . ESRCH write(2) ID (PID) tasks. Cpusets appeared in Linux 2.6.12. , pid ID , . pid , gettid(2). cpuset.memory_pressure , , write(2) errno, EACCES, open(2) . . . : (1) mkdir /dev/cpuset ( ) (2) mount -t cpuset none /dev/cpuset ( ) (3) mkdir(1). (4) . (5) . , <>, 2 3, 1, . $ mkdir /dev/cpuset $ mount -t cpuset cpuset /dev/cpuset $ cd /dev/cpuset $ mkdir Charlie $ cd Charlie $ /bin/echo 2-3 > cpuset.cpus $ /bin/echo 1 > cpuset.mems $ /bin/echo $$ > tasks # The current shell is now running in cpuset Charlie # The next line should display '/Charlie' $ cat /proc/self/cpuset . ( , ) , , , : (1) Let's say we want to move the job in cpuset alpha (CPUs 4-7 and memory nodes 2-3) to a new cpuset beta (CPUs 16-19 and memory nodes 8-9). (2) beta. (3) Then allow CPUs 16-19 and memory nodes 8-9 in beta. (4) memory_migration beta. (5) alpha beta. : $ cd /dev/cpuset $ mkdir beta $ cd beta $ /bin/echo 16-19 > cpuset.cpus $ /bin/echo 8-9 > cpuset.mems $ /bin/echo 1 > cpuset.memory_migrate $ while read i; do /bin/echo $i; done < ../alpha/tasks > tasks The above should move any processes in alpha to beta, and any memory held by these processes on memory nodes 2-3 to memory nodes 8-9, respectively. , : $ cp ../alpha/tasks tasks while, cp(1), , PID tasks. ( PID ) while , , : -u ( ) sed(1): $ sed -un p < ../alpha/tasks > tasks . taskset(1), get_mempolicy(2), getcpu(2), mbind(2), sched_getaffinity(2), sched_setaffinity(2), sched_setscheduler(2), set_mempolicy(2), CPU_SET(3), proc(5), cgroups(7), numa(7), sched(7), migratepages(8), numactl(8) Documentation/admin-guide/cgroup-v1/cpusets.rst in the Linux kernel source tree (or Documentation/cgroup-v1/cpusets.txt before Linux 4.18, and Documentation/cpusets.txt before Linux 2.6.29) Azamat Hackimov , Dmitriy S. Seregin , Dmitry Bolkhovskikh , Katrin Kutepova , Yuri Kozlov ; GNU 3 , . . , , . Linux man-pages 6.06 31 2023 . cpuset(7)