scontrol(1) Slurm Commands scontrol(1) NAME scontrol - view or modify Slurm configuration and state. SYNOPSIS scontrol [OPTIONS...] [COMMAND...] DESCRIPTION scontrol is used to view or modify Slurm configuration including: job, job step, node, partition, reservation, and overall system configuration. Most of the commands can only be executed by user root or an Administrator. If an attempt to view or modify configuration information is made by an unauthorized user, an error message will be printed and the requested action will not occur. If no command is entered on the execute line, scontrol will operate in an interactive mode and prompt for input. It will continue prompting for input and executing commands until explicitly terminated. If a command is entered on the execute line, scontrol will execute that command and terminate. All commands and options are case-insensitive, although node names, partition names, and reservation names are case-sensitive (node names "LX" and "lx" are distinct). All commands and options can be abbreviated to the extent that the specification is unique. A modified Slurm configuration can be written to a file using the scontrol write config command. The resulting file will be named using the convention "slurm.conf." and located in the same directory as the original "slurm.conf" file. The directory containing the original slurm.conf must be writable for this to occur. OPTIONS -a, --all When the show command is used, then display all partitions, their jobs and jobs steps. This causes information to be displayed about partitions that are configured as hidden and partitions that are unavailable to user's group. -M, --clusters= The cluster to issue commands to. Only one cluster name may be specified. Note that the slurmdbd must be up for this option to work properly, unless running in a federation with either FederationParameters=fed_display configured or the --federation option set. This option implicitly sets the --local option. -d, --details Causes the show command to provide additional details where available. --federation Report jobs from federation if a member of one. -F, --future Report nodes in FUTURE state. -h, --help Print a help message describing the usage of scontrol. --hide Do not display information about hidden partitions, their jobs and job steps. By default, neither partitions that are configured as hidden nor those partitions unavailable to user's group will be displayed (i.e. this is the default behavior). --json, --json=list, --json= Dump information as JSON using the default data_parser plugin or explicit data_parser with parameters. All information is dumped, even if it would normally not be. Sorting and formatting arguments passed to other options are ignored; however, filtering arguments are still used. This option is not available for every command. This option implicitly sets the --details option. --local Show only information local to this cluster. Ignore other clusters in the federated if a member of one. Overrides --federation. -o, --oneliner Print information one line per record. -Q, --quiet Print no warning or informational messages, only fatal error messages. --sibling Show all sibling jobs on a federated cluster. Implies --federation. -u, --uid= Attempt to update a job as user instead of the invoking user id. -v, --verbose Print detailed event logging. Multiple '-v's will further increase the verbosity of logging. By default only errors will be displayed. -V , --version Print version information and exit. --yaml, --yaml=list, --yaml= Dump information as YAML using the default data_parser plugin or explicit data_parser with parameters. All information is dumped, even if it would normally not be. Sorting and formatting arguments passed to other options are ignored; however, filtering arguments are still used. This option is not available for every command. This option implicitly sets the --details option. COMMANDS cancel_reboot Cancel pending reboots on nodes. The node will be undrain'ed and the reason cleared if the node was drained by an ASAP reboot. create Create a new node, partition, or reservation. See the full list of parameters below. completing Display all jobs in a COMPLETING state along with associated nodes in either a COMPLETING or DOWN state. delete Delete the entry with the specified SPECIFICATION. The three SPECIFICATION choices are NodeName=, PartitionName= and ReservationName=. Only dynamic nodes that have no running jobs and that are not part of a reservation can be deleted. Reservations and partitions should have no associated jobs at the time of their deletion (modify the jobs first). If the specified partition is in use, the request is denied. errnumstr Given a Slurm error number, return a descriptive string. fsdampeningfactor Set the FairShareDampeningFactor in slurmctld. getaddrs Get IP addresses of from slurmctld. help Display a description of scontrol options and commands. hold Prevent a pending job from being started (sets its priority to 0). Use the release command to permit the job to be scheduled. The job_list argument is a comma separated list of job IDs OR "jobname=" with the job's name, which will attempt to hold all jobs having that name. Note that when a job is held by a system administrator using the hold command, only a system administrator may release the job for execution (also see the uhold command). When the job is held by its owner, it may also be released by the job's owner. Additionally, attempting to hold a running job will have not suspend or cancel it. But, it will set the job priority to 0 and update the job reason field, which would hold the job if it was requeued at a later time. notify Send a message to standard error of the salloc or srun command or batch job associated with the specified job_id. pidinfo Print the Slurm job id and scheduled termination time corresponding to the supplied process id, proc_id, on the current node. This will work only with processes on node on which scontrol is run, and only for those processes spawned by Slurm and their descendants. listjobs [] Print jobs running on the host that runs this. This contacts any slurmstepd's running locally, and does not contact slurmctld Use if using --enable-multiple-slurmd. listpids [[.]] [] Print a listing of the process IDs in a job step (if JOBID.STEPID is provided), or all of the job steps in a job (if job_id is provided), or all of the job steps in all of the jobs on the local node (if job_id is not provided or job_id is "*"). This will work only with processes on the node on which scontrol is run, and only for those processes spawned by Slurm and their descendants. Note that some Slurm configurations (ProctrackType value of pgid) are unable to identify all processes associated with a job or job step. Note that the NodeName option is only really useful when you have multiple slurmd daemons running on the same host machine. Multiple slurmd daemons on one host are, in general, only used by Slurm developers. liststeps [] Print steps running on the host that runs this. This contacts any slurmstepd's running locally, and does not contact slurmctld Use if using --enable-multiple-slurmd. ping Ping the primary and secondary slurmctld daemon and report if they are responding. power {up|down} [asap|force] {ALL||} Control power state of the provided node list/set. For 'power down', the optional ASAP/FORCE flag will be added to the power down request, but will otherwise be rejected for power up requests. All arguments will be processed insensitive of case except for the node list/set. This subcommand obsoletes the prior usage of scontrol's update command: scontrol update NodeName= State={POWER_UP|POWER_DOWN|POWER_DOWN_ASAP|POWER_DOWN_FORCE} Commands: down Will use the configured SuspendProgram program to explicitly place node(s) into power saving mode. If a node is already in the process of being powered down, the command will only change the state of the node but won't have any effect until the configured SuspendTimeout is reached. Use of this command can be useful in situations where a ResumeProgram, like capmc in Cray machines, is stalled and one wants to restore the node to "IDLE" manually. In this case rebooting the node and setting the state to "power down" will cancel the previous "power up" state and the node will become "IDLE". down asap Will drain the node(s) and mark them for power down. Currently running jobs will complete first and no additional jobs will be allocated to the node(s). down force Will cancel all jobs on the node(s), power them down, and reset their state to "IDLE". up Will use the configured ResumeProgram program to explicitly move node(s) out of power saving mode. If a node is already in the process of being powered up, the command will only change the state of the node but won't have any effect until the configured ResumeTimeout is reached. reboot [ASAP] [nextstate={RESUME|DOWN}] [reason=] {ALL||} Reboot the nodes in the system when they become idle using the RebootProgram as configured in Slurm's slurm.conf file. Each node will have the "REBOOT" flag added to its node state. After a node reboots and the slurmd daemon starts up again, the HealthCheckProgram will run once. Then, the slurmd daemon will register itself with the slurmctld daemon and the "REBOOT" flag will be cleared. The "ASAP" option adds the "DRAIN" flag to each node's state, preventing additional jobs from running on the node so it can be rebooted and returned to service "As Soon As Possible" (i.e. ASAP). "ASAP" will also set the node reason to "Reboot ASAP" if the "reason" option isn't specified and will set nextstate=UNDRAIN if nextstate isn't specified. If the "nextstate" option is specified as "DOWN", then the node will remain in a down state after rebooting. If "nextstate" is specified as "RESUME", then the nodes will resume as normal and the node's reason and "DRAIN" state will be cleared. Resuming nodes will be considered as available in backfill future scheduling and won't be replaced by idle nodes in a reservation. The "reason" option sets each node's reason to a user-defined message. A default reason of "reboot requested" is set if no other reason is set on the node. The reason will be appended with: "reboot issued" when the reboot is issued; "reboot complete" when the node registers and has a "nextstate" of "DOWN"; or "reboot timed out" when the node fails to register within ResumeTimeout. You must specify either a list of nodes or that ALL nodes are to be rebooted. NOTE: The reboot request will be ignored for hosts in the following states: FUTURE, POWER_DOWN, POWERED_DOWN, POWERING_DOWN, REBOOT_ISSUED, REBOOT_REQUESTED NOTE: By default, this command does not prevent additional jobs from being scheduled on any nodes before reboot. To do this, you can either use the "ASAP" option or explicitly drain the nodes beforehand. You can alternately create an advanced reservation to prevent additional jobs from being initiated on nodes to be rebooted. Pending reboots can be cancelled by using "scontrol cancel_reboot " or setting the node state to "CANCEL_REBOOT". A node will be marked "DOWN" if it doesn't reboot within ResumeTimeout. reconfigure Instruct all slurmctld and slurmd daemons to re-read the configuration file. This mechanism can be used to modify configuration parameters set in slurm.conf(5) without interrupting running jobs. Starting in 23.11, this command operates by creating new processes for the daemons, then passing control to the new processes when or if they start up successfully. This allows it to gracefully catch configuration problems and keep running with the previous configuration if there is a problem. This will not be able to change the daemons' listening TCP port settings or authentication mechanism. release Release a previously held job to begin execution. The job_list argument is a comma separated list of job IDs OR "jobname=" with the job's name, which will attempt to release all jobs having that name. Also see hold. requeue [