scontrol(1) Slurm Commands scontrol(1) NAME scontrol - view or modify Slurm configuration and state. SYNOPSIS scontrol [OPTIONS...] [COMMAND...] DESCRIPTION scontrol is used to view or modify Slurm configuration including: job, job step, node, partition, reservation, and overall system configuration. Most of the commands can only be executed by user root or an Administrator. If an attempt to view or modify configuration information is made by an unauthorized user, an error message will be printed and the requested action will not occur. If no command is entered on the execute line, scontrol will operate in an interactive mode and prompt for input. It will continue prompting for input and executing commands until explicitly terminated. If a command is entered on the execute line, scontrol will execute that command and terminate. All commands and options are case-insensitive, although node names, partition names, and reservation names are case-sensitive (node names "LX" and "lx" are distinct). All commands and options can be abbreviated to the extent that the specification is unique. A modified Slurm configuration can be written to a file using the scontrol write config command. The resulting file will be named using the convention "slurm.conf." and located in the same directory as the original "slurm.conf" file. The directory containing the original slurm.conf must be writable for this to occur. OPTIONS -a, --all When the show command is used, then display all partitions, their jobs and jobs steps. This causes information to be displayed about partitions that are configured as hidden and partitions that are unavailable to user's group. -M, --clusters= The cluster to issue commands to. Only one cluster name may be specified. Note that the SlurmDBD must be up for this option to work properly. This option implicitly sets the --local option. -d, --details Causes the show command to provide additional details where available. --federation Report jobs from federation if a member of one. -F, --future Report nodes in FUTURE state. -h, --help Print a help message describing the usage of scontrol. --hide Do not display information about hidden partitions, their jobs and job steps. By default, neither partitions that are configured as hidden nor those partitions unavailable to user's group will be displayed (i.e. this is the default behavior). --json, --json=list, --json= Dump information as JSON using the default data_parser plugin or explicit data_parser with parameters. All information is dumped, even if it would normally not be. Sorting and formatting arguments passed to other options are ignored; however, filtering arguments are still used. This option implicitly sets the --details option. --local Show only information local to this cluster. Ignore other clusters in the federated if a member of one. Overrides --federation. -o, --oneliner Print information one line per record. -Q, --quiet Print no warning or informational messages, only fatal error messages. --sibling Show all sibling jobs on a federated cluster. Implies --federation. -u, --uid= Attempt to update a job as user instead of the invoking user id. -v, --verbose Print detailed event logging. Multiple '-v's will further increase the verbosity of logging. By default only errors will be displayed. -V , --version Print version information and exit. Dump information as YAML using the default data_parser plugin or explicit data_parser with parameters. All information is dumped, even if it would normally not be. Sorting and formatting arguments passed to other options are ignored; however, filtering arguments are still used. This option implicitly sets the --details option. COMMANDS abort Instruct the Slurm controller to terminate immediately and generate a core file. See "man slurmctld" for information about where the core file will be written. cancel_reboot Cancel pending reboots on nodes. The node will be undrain'ed and the reason cleared if the node was drained by an ASAP reboot. create Create a new node, partition, or reservation. See the full list of parameters below. completing Display all jobs in a COMPLETING state along with associated nodes in either a COMPLETING or DOWN state. delete Delete the entry with the specified SPECIFICATION. The three SPECIFICATION choices are NodeName=, PartitionName= and Reservation=. Only dynamic nodes that have no running jobs and that are not part of a reservation can be deleted. Reservations and partitions should have no associated jobs at the time of their deletion (modify the jobs first). If the specified partition is in use, the request is denied. errnumstr Given a Slurm error number, return a descriptive string. fsdampeningfactor Set the FairShareDampeningFactor in slurmctld. getaddrs Get IP addresses of from slurmctld. help Display a description of scontrol options and commands. hold Prevent a pending job from being started (sets its priority to 0). Use the release command to permit the job to be scheduled. The job_list argument is a comma separated list of job IDs OR "jobname=" with the job's name, which will attempt to hold all jobs having that name. Note that when a job is held by a system administrator using the hold command, only a system administrator may release the job for execution (also see the uhold command). When the job is held by its owner, it may also be released by the job's owner. Additionally, attempting to hold a running job will have not suspend or cancel it. But, it will set the job priority to 0 and update the job reason field, which would hold the job if it was requeued at a later time. notify Send a message to standard error of the salloc or srun command or batch job associated with the specified job_id. pidinfo Print the Slurm job id and scheduled termination time corresponding to the supplied process id, proc_id, on the current node. This will work only with processes on node on which scontrol is run, and only for those processes spawned by Slurm and their descendants. listpids [[.]] [] Print a listing of the process IDs in a job step (if JOBID.STEPID is provided), or all of the job steps in a job (if job_id is provided), or all of the job steps in all of the jobs on the local node (if job_id is not provided or job_id is "*"). This will work only with processes on the node on which scontrol is run, and only for those processes spawned by Slurm and their descendants. Note that some Slurm configurations (ProctrackType value of pgid) are unable to identify all processes associated with a job or job step. Note that the NodeName option is only really useful when you have multiple slurmd daemons running on the same host machine. Multiple slurmd daemons on one host are, in general, only used by Slurm developers. ping Ping the primary and secondary slurmctld daemon and report if they are responding. reboot [ASAP] [nextstate={RESUME|DOWN}] [reason=] {ALL||} Reboot the nodes in the system when they become idle using the RebootProgram as configured in Slurm's slurm.conf file. Each node will have the "REBOOT" flag added to its node state. After a node reboots and the slurmd daemon starts up again, the HealthCheckProgram will run once. Then, the slurmd daemon will register itself with the slurmctld daemon and the "REBOOT" flag will be cleared. The "ASAP" option adds the "DRAIN" flag to each node's state, preventing additional jobs from running on the node so it can be rebooted and returned to service "As Soon As Possible" (i.e. ASAP). "ASAP" will also set the node reason to "Reboot ASAP" if the "reason" option isn't specified. If the "nextstate" option is specified as "DOWN", then the node will remain in a down state after rebooting. If "nextstate" is specified as "RESUME", then the nodes will resume as normal and the node's reason and "DRAIN" state will be cleared. Resuming nodes will be considered as available in backfill future scheduling and won't be replaced by idle nodes in a reservation. The "reason" option sets each node's reason to a user-defined message. A default reason of "reboot requested" is set if no other reason is set on the node. The reason will be appended with: "reboot issued" when the reboot is issued; "reboot complete" when the node registers and has a "nextstate" of "DOWN"; or "reboot timed out" when the node fails to register within ResumeTimeout. You must specify either a list of nodes or that ALL nodes are to be rebooted. NOTE: By default, this command does not prevent additional jobs from being scheduled on any nodes before reboot. To do this, you can either use the "ASAP" option or explicitly drain the nodes beforehand. You can alternately create an advanced reservation to prevent additional jobs from being initiated on nodes to be rebooted. Pending reboots can be cancelled by using "scontrol cancel_reboot " or setting the node state to "CANCEL_REBOOT". A node will be marked "DOWN" if it doesn't reboot within ResumeTimeout. reconfigure Instruct all Slurm daemons to re-read the configuration file. This command does not restart the daemons. This mechanism can be used to modify configuration parameters set in slurm.conf. The Slurm controller (slurmctld) forwards the request to all other daemons (slurmd daemon on each compute node). Running jobs continue execution. Most configuration parameters can be changed by just running this command; however, there are parameters that require a restart of the relevant Slurm daemons. Parameters requiring a restart will be noted in the slurm.conf(5) man page. The slurmctld daemon and all slurmd daemons must also be restarted if nodes are added to or removed from the cluster. release Release a previously held job to begin execution. The job_list argument is a comma separated list of job IDs OR "jobname=" with the job's name, which will attempt to hold all jobs having that name. Also see hold. requeue [