(7) Miscellaneous Information Manual capabilities - Linux UNIX : (ID 0, root), (ID ). , (, UID, GID ). Starting with Linux 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute. , Linux, , : CAP_AUDIT_CONTROL ( Linux 2.6.11) ; ; . CAP_AUDIT_READ ( Linux 3.16) netlink. CAP_AUDIT_WRITE ( Linux 2.6.11) . CAP_BLOCK_SUSPEND ( Linux 3.5) , (epoll(7) EPOLLWAKEUP, /proc/sys/wake_lock). CAP_BPF (since Linux 5.8) Employ privileged BPF operations; see bpf(2) and bpf-helpers(7). This capability was added in Linux 5.8 to separate out BPF functionality from the overloaded CAP_SYS_ADMIN capability. CAP_CHECKPOINT_RESTORE (since Linux 5.9) o Update /proc/sys/kernel/ns_last_pid (see pid_namespaces(7)); o employ the set_tid feature of clone3(2); o read the contents of the symbolic links in /proc/pid/map_files for other processes. This capability was added in Linux 5.9 to separate out checkpoint/restore functionality from the overloaded CAP_SYS_ADMIN capability. CAP_CHOWN UID GID ( chown(2)). CAP_DAC_OVERRIDE , (DAC (discretionary access control) -- ). CAP_DAC_READ_SEARCH o ; o open_by_handle_at(2); o linkat(2) AT_EMPTY_PATH , . CAP_FOWNER o , UID UID (, chmod(2), utime(2)), , CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH; o ( ioctl_iflags(2)) ; o (ACL) ; o ; o , ; o O_NOATIME open(2) fcntl(2). CAP_FSETID o set-user-ID set-group-ID ; o set-group-ID , GID GID . CAP_IPC_LOCK o Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)); o Allocate memory using huge pages (memfd_create(2), mmap(2), shmctl(2)). CAP_IPC_OWNER System V IPC. CAP_KILL ( kill(2)). ioctl(2) KDSIGACCEPT. CAP_LEASE ( Linux 2.4) ( fcntl(2)). CAP_LINUX_IMMUTABLE FS_APPEND_FL FS_IMMUTABLE_FL ( ioctl_iflags(2)). CAP_MAC_ADMIN ( Linux 2.6.25) MAC . Smack Linux Security Module (LSM). CAP_MAC_OVERRIDE ( Linux 2.6.25) (MAC). Smack LSM. CAP_MKNOD ( Linux 2.4) mknod(2). CAP_NET_ADMIN : o ; o IP , ; o ; o ; o set type-of-service (TOS); o ; o (promiscuous); o (multicasting); o setsockopt(2) : SO_DEBUG, SO_MARK, SO_PRIORITY ( 0 - 6), SO_RCVBUFFORCE SO_SNDBUFFORCE. CAP_NET_BIND_SERVICE ( 1024). CAP_NET_BROADCAST ( ) . CAP_NET_RAW o RAW PACKET; o . CAP_PERFMON (since Linux 5.8) Employ various performance-monitoring mechanisms, including: o perf_event_open(2); o employ various BPF operations that have performance implications. This capability was added in Linux 5.8 to separate out performance monitoring functionality from the overloaded CAP_SYS_ADMIN capability. See also the kernel source file Documentation/admin-guide/perf-security.rst. CAP_SETGID o GID GID; o GID UNIX; o ( user_namespaces(7)). CAP_SETFCAP ( Linux 2.6.24) . Since Linux 5.12, this capability is also needed to map user ID 0 in a new user namespace; see user_namespaces(7) for details. CAP_SETPCAP (. ., Linux 2.6.24): ; ( prctl(2) PR_CAPBSET_DROP); securebits. If file capabilities are not supported (i.e., before Linux 2.6.24): grant or remove any capability in the caller's permitted capability set to or from any other process. (This property of CAP_SETPCAP is not available when the kernel is configured to support file capabilities, since CAP_SETPCAP has entirely different semantics for such kernels.) CAP_SETUID o UID (setuid(2), setreuid(2), setresuid(2), setfsuid(2)); o UID UNIX; o ( user_namespaces(7)). CAP_SYS_ADMIN Note: this capability is overloaded; see Notes to kernel developers below. o : quotactl(2), mount(2), umount(2), pivot_root(2), swapon(2), swapoff(2), sethostname(2), and setdomainname(2); o syslog(2) ( Linux 2.6.37, CAP_SYSLOG); o VM86_REQUEST_IRQ vm86(2); o access the same checkpoint/restore functionality that is governed by CAP_CHECKPOINT_RESTORE (but the latter, weaker capability is preferred for accessing that functionality). o perform the same BPF operations as are governed by CAP_BPF (but the latter, weaker capability is preferred for accessing that functionality). o employ the same performance monitoring mechanisms as are governed by CAP_PERFMON (but the latter, weaker capability is preferred for accessing that functionality). o IPC_SET IPC_RMID System V IPC; o RLIMIT_NPROC; o perform operations on trusted and security extended attributes (see xattr(7)); o lookup_dcookie(2); o ioprio_set(2) - IOPRIO_CLASS_RT ( Linux 2.6.25) IOPRIO_CLASS_IDLE; o PID UNIX; o /proc/sys/fs/file-max, , , (, accept(2), execve(2), open(2), pipe(2)); o CLONE_*, clone(2) unshare(2)) ( Linux 3.8 ); o perf; o setns(2) ( CAP_SYS_ADMIN ); o fanotify_init(2); o KEYCTL_CHOWN KEYCTL_SETPERM keyctl(2); o MADV_HWPOISON madvise(2); o TIOCSTI ioctl(2) , ; o nfsservctl(2); o bdflush(2); o ioctl(2) ; o ioctl(2) ; o ioctl(2) /dev/random ( random(4)); o seccomp(2) no_new_privs; o / ; o ptrace(2) PTRACE_SECCOMP_GET_FILTER seccomp ; o ptrace(2) PTRACE_SETOPTIONS seccomp (. ., PTRACE_O_SUSPEND_SECCOMP); o perform administrative operations on many device drivers; o modify autogroup nice values by writing to /proc/pid/autogroup (see sched(7)). CAP_SYS_BOOT reboot(2) kexec_load(2). CAP_SYS_CHROOT o chroot(2); o setns(2). CAP_SYS_MODULE o ( init_module(2) delete_module(2)); o before Linux 2.6.25: drop capabilities from the system-wide capability bounding set. CAP_SYS_NICE o Lower the process nice value (nice(2), setpriority(2)) and change the nice value for arbitrary processes; o (sched_setscheduler(2), sched_setparam(2), sched_setattr(2)); o (sched_setaffinity(2)); o - (ioprio_set(2)); o migrate_pages(2) ; o move_pages(2) ; o MPOL_MF_MOVE_ALL mbind(2) move_pages(2). CAP_SYS_PACCT acct(2). CAP_SYS_PTRACE o ptrace(2); o get_robust_list(2) ; o / process_vm_readv(2) process_vm_writev(2); o kcmp(2). CAP_SYS_RAWIO o - (iopl(2) ioperm(2)); o /proc/kcore; o FIBMAP ioctl(2); o x86 (MSR, msr(4)); o /proc/sys/vm/mmap_min_addr; o , /proc/sys/vm/mmap_min_addr; o /proc/bus/pci; o /dev/mem /dev/kmem; o SCSI; o hpsa(4) cciss(4); o . CAP_SYS_RESOURCE o ext2; o ioctl(2), ext3; o ; o ( setrlimit(2)); o RLIMIT_NPROC; o ; o ; o 64hz ; o msg_qbytes System V /proc/sys/kernel/msgmnb ( msgop(2) msgctl(2)); o RLIMIT_NOFILE , (<< >>, in-flight), UNIX ( unix(7)); o override the /proc/sys/fs/pipe-size-max limit when setting the capacity of a pipe using the F_SETPIPE_SZ fcntl(2) command; o F_SETPIPE_SZ , /proc/sys/fs/pipe-max-size; o override /proc/sys/fs/mqueue/queues_max, /proc/sys/fs/mqueue/msg_max, and /proc/sys/fs/mqueue/msgsize_max limits when creating POSIX message queues (see mq_overview(7)); o prctl(2) PR_SET_MM(); o set /proc/pid/oom_score_adj to a value lower than the value last set by a process with CAP_SYS_RESOURCE. CAP_SYS_TIME (settimeofday(2), stime(2), adjtimex(2)) (). CAP_SYS_TTY_CONFIG vhangup(2); ioctl(2) . CAP_SYSLOG ( Linux 2.6.37) o syslog(2). syslog(2) . o , /proc , /proc/sys/kernel/kptr_restrict 1 ( kptr_restrict proc(5)). CAP_WAKE_ALARM ( Linux 3.0) - ( CLOCK_REALTIME_ALARM CLOCK_BOOTTIME_ALARM). : o , . o , . o , . Before Linux 2.6.24, only the first two of these requirements are met; since Linux 2.6.24, all three requirements are met. , , . o -- , , , , . o . , , ( : 64 ). o , , . , , , . , . o CAP_SYS_ADMIN, ! ( ). << >>, , , , . . , CAP_SYS_ADMIN , . o , << >>. , , CAP_SYS_PACCT , , . , . , : Permitted , . , , CAP_SETPCAP . , ( execve(2) set-user-ID-root , ). Inheritable execve(2). , , . , , execve(2), , , , (ambient capabilities), . Effective . Bounding ( Linux 2.6.25) -- , , execve(2). Linux 2.6.25 . , . For more details, see Capability bounding set below. Ambient ( Linux 4.3) execve(2) . (ambient capability set) , , . prctl(2). , . , UID GID - set-user-ID set-group-ID, , . execve(2). - execve(2), , ld.so(8). A child created via fork(2) inherits copies of its parent's capability sets. For details on how execve(2) affects capabilities, see Transformation of capabilities during execve() below. Using capset(2), a thread may manipulate its own capability sets; see Programmatically adjusting capability sets below. Linux 3.2, /proc/sys/kernel/cap_last_cap , ; , . Since Linux 2.6.24, the kernel supports associating capability sets with an executable file using setcap(8). The file capability sets are stored in an extended attribute (see setxattr(2) and xattr(7)) named security.capability. Writing to this extended attribute requires the CAP_SETFCAP capability. The file capability sets, in conjunction with the capability sets of the thread, determine the capabilities of a thread after an execve(2). : Permitted ( forced): . Inheritable ( allowed): (AND) , execve(2). Effective: , , . , execve(2) . , execve(2) . Enabling the file effective capability bit implies that any file permitted or inheritable capability that causes a thread to acquire the corresponding permitted capability during an execve(2) (see Transformation of capabilities during execve() below) will also acquire that capability in its effective set. Therefore, when assigning capabilities to a file (setcap(8), cap_set_file(3), cap_set_fd(3)), if we specify the effective flag as being enabled for any capability, then the effective flag must also be specified as enabled for all other capabilities for which the corresponding permitted or inheritable flag is enabled. security.capability, . . : VFS_CAP_REVISION_1 , 32- . VFS_CAP_REVISION_2 ( Linux 2.6.25) 64- , 32. 32- 1- , , , , 2- (, , 3- ). VFS_CAP_REVISION_3 ( Linux 4.14) 3 ( ). 2, 3 64- . , security.capability ID (ID -- , ID 0 ). 3 2; Linux 2, 3. Linux 4.14 , , VFS_CAP_REVISION_2. Linux 4.14 security.capability, , , . Linux 4.14, security.capability ( ) 3 (VFS_CAP_REVISION_3), : o , , ( : , ). o CAP_SETFCAP , (a) CAP_SETFCAP ; (b) UID GID . security.capability VFS_CAP_REVISION_3 ID . security.capability (CAP_SETFCAP) , , (, ), 2 (VFS_CAP_REVISION_2). , security.capability 3 . (setxattr(2)) security.capability 2 3, , . , , security.capability 3 (getxattr(2)) , , ID ( ), () 2 (. ., 2 ID ). (, setcap(1) getcap(1)) security.capability 3. , security.capability 2 3, : security.capability , . execve() execve(2) : P'(ambient) = (file is privileged) ? 0 : P(ambient) P'(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P'(ambient) P'(effective) = F(effective) ? P'(permitted) : P'(ambient) P'(inheritable) = P(inheritable) [i.e., unchanged] P'(bounding) = P(bounding) [i.e., unchanged] : P() execve(2) P'() execve(2) F() : o ambient Linux 4.3. ambient execve(2) -- , , set-user-ID set-group-ID. o Linux 2.6.25 , . execve(2) P(bounding), . : , , ( ) set-user-ID set-group-ID; execve(2). , no_file_caps. Note: according to the rules above, if a process with nonzero user IDs performs an execve(2) then any capabilities that are present in its permitted and effective sets will be cleared. For the treatment of capabilities when a process with a user ID of zero performs an execve(2), see Capabilities and execution of programs by root below. , , (capability-dumb binary) -- , , libcap(3) ( , set-user-ID-root , , ). , - . , , . When executing a capability-dumb binary, the kernel checks if the process obtained all permitted capabilities that were specified in the file permitted set, after the capability transformations described above have been performed. (The typical reason why this might not occur is that the capability bounding set masked out some of the capabilities in the file permitted set.) If the process did not obtain the full set of file permitted capabilities, then execve(2) fails with the error EPERM. This prevents possible security risks that could arise when a capability-dumb application is executed with less privilege than it needs. Note that, by definition, the application could not itself recognize this problem, since it does not employ the libcap(3) API. root UNIX, , UID 0 () set-user-ID-root. After having performed any changes to the process effective ID that were triggered by the set-user-ID mode bit of the binary--e.g., switching the effective user ID to 0 (root) because a set-user-ID-root program was executed--the kernel calculates the file capability sets as follows: (1) If the real or effective user ID of the process is 0 (root), then the file inheritable and permitted sets are ignored; instead they are notionally considered to be all ones (i.e., all capabilities enabled). (There is one exception to this behavior, described in Set-user-ID-root programs that have file capabilities below.) (2) ID 0 (root) , (). , , execve(2). , UID execve(2) set-user-ID-root, , , UID , execve(2), : P'(permitted) = P(inheritable) | P(bounding) P'(effective) = P'(permitted) , , . ( P'(permitted), P'(ambient) , P(inheritable).) ID 0 (root), , securebits, . set-user-ID-root There is one exception to the behavior described in Capabilities and execution of programs by root above. If (a) the binary that is being executed has capabilities attached and (b) the real user ID of the process is not 0 (root) and (c) the effective user ID of the process is 0 (root), then the file capability bits are honored (i.e., they are not notionally considered to be all ones). The usual way in which this situation can arise is when executing a set-UID-root program that also has file capabilities. When such a program is executed, the process gains just the capabilities granted by the program (i.e., not all capabilities, as would occur when executing a set-user-ID-root program that does not have any associated file capabilities). , , set-user-ID-root, set-user-ID , , 0, . -- , , execve(2). : o execve(2) (AND) , . , , . o ( Linux 2.6.25) , capset(2). , , , execve(2) , . , , . , , - , . , . Linux 2.6.25 Linux 26.25, ( , , ). fork(2) execve(2). prctl(2) PR_CAPBSET_DROP CAP_SETPCAP. . prctl(2) PR_CAPBSET_READ. Removing capabilities from the bounding set is supported only if file capabilities are compiled into the kernel. Before Linux 2.6.33, file capabilities were an optional feature configurable via the CONFIG_SECURITY_FILE_CAPABILITIES option. Since Linux 2.6.33, the configuration option has been removed and file capabilities are always part of the kernel. When file capabilities are compiled into the kernel, the init process (the ancestor of all processes) begins with a full bounding set. If file capabilities are not compiled into the kernel, then init begins with a full bounding set minus CAP_SETPCAP, because this capability has a different meaning when there are no file capabilities. . . Linux 2.6.25 Before Linux 2.6.25, the capability bounding set is a system-wide attribute that affects all threads on the system. The bounding set is accessible via the file /proc/sys/kernel/cap-bound. (Confusingly, this bit mask parameter is expressed as a signed decimal number in /proc/sys/kernel/cap-bound.) init ; , (, CAP_SYS_MODULE) . CAP_SETPCAP. (!), CAP_INIT_EFF_SET include/linux/capability.h . The system-wide capability bounding set feature was added to Linux 2.2.11. ID 0 ID, , , ID ID ( setuid(2), setresuid(2) ): o If one or more of the real, effective, or saved set user IDs was previously 0, and as a result of the UID changes all of these IDs have a nonzero value, then all capabilities are cleared from the permitted, effective, and ambient capability sets. o ID 0 , . o ID 0, . o ID 0 ( setfsuid(2)), : CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, CAP_FSETID, CAP_LINUX_IMMUTABLE ( Linux 2.6.30), CAP_MAC_OVERRIDE CAP_MKNOD ( Linux 2.6.30). ID 0, , , . , ID 0, ID , SECBIT_KEEP_CAPS securebits, . , capget(2) capset(2). cap_get_proc(3) cap_set_proc(3) libcap. : o CAP_SETPCAP, . o ( Linux 2.6.25) . o (. ., , ). o . securebits: Starting with Linux 2.6.26, and with a kernel in which file capabilities are enabled, Linux implements a set of per-thread securebits flags that can be used to disable special handling of capabilities for UID 0 (root). These flags are as follows: SECBIT_KEEP_CAPS 0 UID, , UID . , UID , . execve(2). , SECBIT_KEEP_CAPS , UID . , UID UID , . SECBIT_KEEP_CAPS , SECBIT_NO_SETUID_FIXUP ( ). PR_SET_KEEPCAPS prctl(2). SECBIT_NO_SETUID_FIXUP Setting this flag stops the kernel from adjusting the process's permitted, effective, and ambient capability sets when the thread's effective and filesystem UIDs are switched between zero and nonzero values. See Effect of user ID changes on capabilities above. SECBIT_NOROOT If this bit is set, then the kernel does not grant capabilities when a set-user-ID-root program is executed, or when a process with an effective or real UID of 0 calls execve(2). (See Capabilities and execution of programs by root above.) SECBIT_NO_CAP_AMBIENT_RAISE prctl(2) PR_CAP_AMBIENT_RAISE. <<>> <<>>. <<>> <<>> . : SECBIT_KEEP_CAPS_LOCKED, SECBIT_NO_SETUID_FIXUP_LOCKED, SECBIT_NOROOT_LOCKED SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED. securebits prctl(2) PR_SET_SECUREBITS PR_GET_SECUREBITS. CAP_SETPCAP. , SECBIT_* . securebits . execve(2) , SECBIT_KEEP_CAPS, . , -- : prctl(PR_SET_SECUREBITS, /* SECBIT_KEEP_CAPS off */ SECBIT_KEEP_CAPS_LOCKED | SECBIT_NO_SETUID_FIXUP | SECBIT_NO_SETUID_FIXUP_LOCKED | SECBIT_NOROOT | SECBIT_NOROOT_LOCKED); /* / SECBIT_NO_CAP_AMBIENT_RAISE */ set-user-ID-root set-user-ID, UID UID , , . The rules about the transformation of the process's capabilities during the execve(2) are exactly as described in Transformation of capabilities during execve() and Capabilities and execution of programs by root above, with the difference that, in the latter subsection, "root" is the UID of the creator of the user namespace. Traditional (i.e., version 2) file capabilities associate only a set of capability masks with a binary executable file. When a process executes a binary with such capabilities, it gains the associated capabilities (within its user namespace) as per the rules described in Transformation of capabilities during execve() above. 2 , , . <<>> , CAP_SETFCAP , (, ). . , , , , , . Linux 4.14 added so-called namespaced file capabilities to support such use cases. Namespaced file capabilities are recorded as version 3 (i.e., VFS_CAP_REVISION_3) security.capability extended attributes. Such an attribute is automatically created in the circumstances described in File capability extended attribute versioning above. When a version 3 security.capability extended attribute is created, the kernel records not just the capability masks in the extended attribute, but also the namespace root user ID. VFS_CAP_REVISION_2 VFS_CAP_REVISION_3 execve(). , , , , UID 0 ID , , , . user_namespaces(7). No standards govern capabilities, but the Linux capability implementation is based on the withdrawn POSIX.1e draft standard . strace(1) ( set-user-ID-root), -u <_>. : $ sudo strace -o trace.log -u ceci ./myprivprog From Linux 2.5.27 to Linux 2.6.26, capabilities were an optional kernel component, and could be enabled/disabled via the CONFIG_SECURITY_CAPABILITIES kernel configuration option. The /proc/pid/task/TID/status file can be used to view the capability sets of a thread. The /proc/pid/status file shows the capability sets of a process's main thread. Before Linux 3.8, nonexistent capabilities were shown as being enabled (1) in these sets. Since Linux 3.8, all nonexistent capabilities (above CAP_LAST_CAP) are shown as disabled (0). libcap ; , capset(2) capget(2). setcap(8) getcap(8) . : . Before Linux 2.6.24, and from Linux 2.6.24 to Linux 2.6.32 if file capabilities are not enabled, a thread with the CAP_SETPCAP capability can manipulate the capabilities of threads other than itself. However, this is only theoretically possible, since no thread ever has CAP_SETPCAP in either of these cases: o In the pre-2.6.25 implementation the system-wide capability bounding set, /proc/sys/kernel/cap-bound, always masks out the CAP_SETPCAP capability, and this can not be changed without modifying the kernel source and rebuilding the kernel. o If file capabilities are disabled (i.e., the kernel CONFIG_SECURITY_FILE_CAPABILITIES option is disabled), then init starts out with the CAP_SETPCAP capability removed from its per-process bounding set, and that bounding set is inherited by all other processes created on the system. . capsh(1), setpriv(1), prctl(2), setfsuid(2), cap_clear(3), cap_copy_ext(3), cap_from_text(3), cap_get_file(3), cap_get_proc(3), cap_init(3), capgetp(3), capsetp(3), libcap(3), proc(5), credentials(7), pthreads(7), user_namespaces(7), captest(8), filecap(8), getcap(8), getpcaps(8), netcap(8), pscap(8), setcap(8) include/linux/capability.h Linux. Azamat Hackimov , Dmitriy S. Seregin , Dmitry Bolkhovskikh , Katrin Kutepova , Yuri Kozlov ; GNU 3 , . . , , . Linux man-pages 6.06 31 2023 . (7)