.\" Copyright (c) 2021, IBM Corporation. .\" Written by Mike Rapoport .\" .\" Based on memfd_create(2) man page .\" Copyright (C) 2014 Michael Kerrisk .\" and Copyright (C) 2014 David Herrmann .\" .\" SPDX-License-Identifier: GPL-2.0-or-later .\" .TH memfd_secret 2 2024-05-02 "Linux man-pages 6.9.1" .SH NAME memfd_secret \- create an anonymous RAM-based file to access secret memory regions .SH LIBRARY Standard C library .RI ( libc ", " \-lc ) .SH SYNOPSIS .nf .P .BR "#include " " /* Definition of " SYS_* " constants */" .B #include .P .BI "int syscall(SYS_memfd_secret, unsigned int " flags ); .fi .P .IR Note : glibc provides no wrapper for .BR memfd_secret (), necessitating the use of .BR syscall (2). .SH DESCRIPTION .BR memfd_secret () creates an anonymous RAM-based file and returns a file descriptor that refers to it. The file provides a way to create and access memory regions with stronger protection than usual RAM-based files and anonymous memory mappings. Once all open references to the file are closed, it is automatically released. The initial size of the file is set to 0. Following the call, the file size should be set using .BR ftruncate (2). .P The memory areas backing the file created with .BR memfd_secret (2) are visible only to the processes that have access to the file descriptor. The memory region is removed from the kernel page tables and only the page tables of the processes holding the file descriptor map the corresponding physical memory. (Thus, the pages in the region can't be accessed by the kernel itself, so that, for example, pointers to the region can't be passed to system calls.) .P The following values may be bitwise ORed in .I flags to control the behavior of .BR memfd_secret (): .TP .B FD_CLOEXEC Set the close-on-exec flag on the new file descriptor, which causes the region to be removed from the process on .BR execve (2). See the description of the .B O_CLOEXEC flag in .BR open (2) .P As its return value, .BR memfd_secret () returns a new file descriptor that refers to an anonymous file. This file descriptor is opened for both reading and writing .RB ( O_RDWR ) and .B O_LARGEFILE is set for the file descriptor. .P With respect to .BR fork (2) and .BR execve (2), the usual semantics apply for the file descriptor created by .BR memfd_secret (). A copy of the file descriptor is inherited by the child produced by .BR fork (2) and refers to the same file. The file descriptor is preserved across .BR execve (2), unless the close-on-exec flag has been set. .P The memory region is locked into memory in the same way as with .BR mlock (2), so that it will never be written into swap, and hibernation is inhibited for as long as any .BR memfd_secret () descriptions exist. However the implementation of .BR memfd_secret () will not try to populate the whole range during the .BR mmap (2) call that attaches the region into the process's address space; instead, the pages are only actually allocated as they are faulted in. The amount of memory allowed for memory mappings of the file descriptor obeys the same rules as .BR mlock (2) and cannot exceed .BR RLIMIT_MEMLOCK . .SH RETURN VALUE On success, .BR memfd_secret () returns a new file descriptor. On error, \-1 is returned and .I errno is set to indicate the error. .SH ERRORS .TP .B EINVAL .I flags included unknown bits. .TP .B EMFILE The per-process limit on the number of open file descriptors has been reached. .TP .B EMFILE The system-wide limit on the total number of open files has been reached. .TP .B ENOMEM There was insufficient memory to create a new anonymous file. .TP .B ENOSYS .BR memfd_secret () is not implemented on this architecture, or has not been enabled on the kernel command-line with .BR secretmem_enable =1. .SH STANDARDS Linux. .SH HISTORY Linux 5.14. .SH NOTES The .BR memfd_secret () system call is designed to allow a user-space process to create a range of memory that is inaccessible to anybody else - kernel included. There is no 100% guarantee that kernel won't be able to access memory ranges backed by .BR memfd_secret () in any circumstances, but nevertheless, it is much harder to exfiltrate data from these regions. .P .BR memfd_secret () provides the following protections: .IP \[bu] 3 Enhanced protection (in conjunction with all the other in-kernel attack prevention systems) against ROP attacks. Absence of any in-kernel primitive for accessing memory backed by .BR memfd_secret () means that one-gadget ROP attack can't work to perform data exfiltration. The attacker would need to find enough ROP gadgets to reconstruct the missing page table entries, which significantly increases difficulty of the attack, especially when other protections like the kernel stack size limit and address space layout randomization are in place. .IP \[bu] Prevent cross-process user-space memory exposures. Once a region for a .BR memfd_secret () memory mapping is allocated, the user can't accidentally pass it into the kernel to be transmitted somewhere. The memory pages in this region cannot be accessed via the direct map and they are disallowed in get_user_pages. .IP \[bu] Harden against exploited kernel flaws. In order to access memory areas backed by .BR memfd_secret (), a kernel-side attack would need to either walk the page tables and create new ones, or spawn a new privileged user-space process to perform secrets exfiltration using .BR ptrace (2). .P The way .BR memfd_secret () allocates and locks the memory may impact overall system performance, therefore the system call is disabled by default and only available if the system administrator turned it on using "secretmem.enable=y" kernel parameter. .P To prevent potential data leaks of memory regions backed by .BR memfd_secret () from a hybernation image, hybernation is prevented when there are active .BR memfd_secret () users. .SH SEE ALSO .BR fcntl (2), .BR ftruncate (2), .BR mlock (2), .BR memfd_create (2), .BR mmap (2), .BR setrlimit (2)