topology.conf(5) Slurm Configuration File topology.conf(5)

topology.conf - Slurm configuration file for the topology plugins

topology.conf is an ASCII file which describes the cluster's network topology for optimized job resource allocation. The file will always be located in the same directory as the slurm.conf.

Parameter names are case insensitive. Any text following a "#" in the configuration file is treated as a comment through the end of that line. Changes to the configuration file take effect upon restart of Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the command "scontrol reconfigure" unless otherwise noted.

This plugin requires you to use the select/cons_tres plugin. The network topology configuration, each line defining a switch name and its children, either node names or switch names. Slurm's hostlist expression parser is used, so the node and switch names need not be consecutive (e.g. "Nodes=tux[0-3,12,18-20]" and "Switches=s[0-2,4-8,12]" will parse fine). An optional link speed may also be specified.

All nodes in the network must be connected to at least one switch. The network must be fully connected to use TopologyParam=RouteTree. Jobs can only span nodes connected by the same switch fabric, even if there are available idle nodes in other areas of the cluster.

The topology.conf file for an Infiniband switch can be automatically generated using the slurmibtopology tool found here: https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmibtopology.

The overall configuration parameters available for topology/tree include:

The name of a switch. This name is internal to Slurm and arbitrary. Each switch should have a unique name. This field must be specified.
Child switches of the named switch. Either this option or the Nodes option must be specified.
Child Nodes of the named leaf switch. Either this option or the Switches option must be specified.
An optional value specifying the performance of this communication link. The units used are arbitrary and this information is currently not used. It may be used in the future to optimize resource allocations.

The network topology configuration, each line defining a block name and its children node names. Slurm's hostlist expression parser is used, so the node names need not be consecutive (e.g. "Nodes=tux[0-3,12,18-20]").

This topology plugin places emphasis on reducing fragmentation of the cluster, allowing jobs to take advantage of lower-latency connections between smaller "blocks" of node, rather than starting jobs as quickly as possible on the first available resources.

Defined blocks of nodes are paired with other contiguous blocks, to create a higher level block of nodes. These larger blocks can then be paired with other blocks at the same level for bigger and bigger blocks of contiguous nodes with optimized communication between them. Levels are defined as level 0 being a single defined block, level 1 being a pair of blocks, level 2 being a group of 4 blocks, or as many as can be grouped together if there are an uneven number of blocks, level 3 being a group of (up to) 8 blocks, etc. You can define how many levels you want to enforce jobs remain contiguous across with BlockLevels.

The overall configuration parameters available for topology/block include:

The name of a block. This name is internal to Slurm and arbitrary. Each block should have a unique name. This field must be specified.
Child Nodes of the named block. This must be specified along with the BlockName.
List of the planning base block size, alongside any higher-level block sizes that would be enforced. Each block must have at least the planning base block size count of nodes. Successive BlockSizes must be a power of two larger than the prior values.

##################################################################
# Slurm's network topology configuration file for use with the
# topology/tree plugin
##################################################################
SwitchName=s0 Nodes=dev[0-5]
SwitchName=s1 Nodes=dev[6-11]
SwitchName=s2 Nodes=dev[12-17]
SwitchName=s3 Switches=s[0-2]
##################################################################
# Slurm's network topology configuration file for use with the
# topology/block plugin
##################################################################
BlockName=block1 Nodes=node[1-32]
BlockName=block2 Nodes=node[33-64]
BlockName=block3 Nodes=node[65-96]
BlockName=block4 Nodes=node[97-128]
BlockSizes=30,120

Copyright (C) 2009 Lawrence Livermore National Security. Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
Copyright (C) 2010-2023 SchedMD LLC.

This file is part of Slurm, a resource management program. For details, see https://slurm.schedmd.com/.

Slurm is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Slurm is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

slurm.conf(5)

Slurm Configuration File November 2023