fi_mon_sampler(1) | Libfabric v2.2.0 | fi_mon_sampler(1) |
NAME
fi_mon_sampler - Simple sampler for ofi_hook_monitor provider.
SYNOPSIS
-
fi_mon_sampler [OPTIONS] <target> sample from file(s) at <target>
DESCRIPTION
Extract data from the ofi_hook_monitor provider via communication files. <target> can either be one communication file or a folder of files. Data is exported based on -f <format> and either printed to stdout (only for single files), or stored per communication file at -o <outpath>. The sampler can watch the communication files for changes via the option -w <msec> for repeated sampling.
The name format of the output files is based on the ofi_hook_monitor provider and is as follows: <ppid>_<pid>_<sequential id>_<job id>_<provider name>. ppid and pid are taken from the perspective of the monitored application. In a batched environment running SLURM, job id is set to the SLURM job ID, otherwise it is set to 0.
HOW TO RUN
Launch a libfabric application with FI_HOOK=monitor to enable the ofi_hook_monitor provider. Adjust the monitor provider settings according to fi_hook(7).
Then launch the sampler via fi_mon_sampler -o <output> <target>. By default, the ofi_hook_monitor provider stores data at /dev/shm/ofi/<uid>/<hostname>.
The sampler will generate output files in the directory specified at <output>, one for each monitored provider.
OPTIONS
- -w <msec>
- Watch files for changes, check every <msec> milliseconds.
- -f <format>
- Output format. Currently only supports CSV.
- -o <outpath>
- Output file path. Uses stdout if unset.
USAGE EXAMPLES
Launch a libfabric application and enable the ofi_hook_monitor provider:
-
FI_HOOK=monitor fi_pingpong [OPTIONS]
Launch another fi_pingpong with the respective settings.
Finally, launch the sampler:
-
fi_mon_sampler -o $HOME -w 1000 -f csv /dev/shm/ofi/$UID/$HOSTNAME
OUTPUT
Output files will be generated in the folder specified at -o <output>.
In -f csv mode, this will contain a CSV file with data for all monitored libfabric functions. For each function, both the count and sum counters are exported, indicated by the column name suffix _c and _s respectively. In addition, each function is monitored for each data size bucket. Refer to fi_hook(7) for more details.
Example CSV output, first four columns, first three rows:
-
mon_recv_0_64_c,mon_recv_0_64_s,mon_recv_64_512_c,mon_recv_64_512_s 0,0,0,0 22529,0,0,0 113664,0,0,0
SEE ALSO
fi_hook(7)
AUTHORS
OpenFabrics.
2025-06-06 | Libfabric Programmer’s Manual |