fi_mon_sampler(1) Libfabric v2.2.0 fi_mon_sampler(1)

fi_mon_sampler - Simple sampler for ofi_hook_monitor provider.

 fi_mon_sampler [OPTIONS] <target>      sample from file(s) at <target>

Extract data from the ofi_hook_monitor provider via communication files. <target> can either be one communication file or a folder of files. Data is exported based on -f <format> and either printed to stdout (only for single files), or stored per communication file at -o <outpath>. The sampler can watch the communication files for changes via the option -w <msec> for repeated sampling.

The name format of the output files is based on the ofi_hook_monitor provider and is as follows: <ppid>_<pid>_<sequential id>_<job id>_<provider name>. ppid and pid are taken from the perspective of the monitored application. In a batched environment running SLURM, job id is set to the SLURM job ID, otherwise it is set to 0.

Launch a libfabric application with FI_HOOK=monitor to enable the ofi_hook_monitor provider. Adjust the monitor provider settings according to fi_hook(7).

Then launch the sampler via fi_mon_sampler -o <output> <target>. By default, the ofi_hook_monitor provider stores data at /dev/shm/ofi/<uid>/<hostname>.

The sampler will generate output files in the directory specified at <output>, one for each monitored provider.

Watch files for changes, check every <msec> milliseconds.
Output format. Currently only supports CSV.
Output file path. Uses stdout if unset.

Launch a libfabric application and enable the ofi_hook_monitor provider:

FI_HOOK=monitor fi_pingpong [OPTIONS]

Launch another fi_pingpong with the respective settings.

Finally, launch the sampler:

fi_mon_sampler -o $HOME -w 1000 -f csv /dev/shm/ofi/$UID/$HOSTNAME

Output files will be generated in the folder specified at -o <output>.

In -f csv mode, this will contain a CSV file with data for all monitored libfabric functions. For each function, both the count and sum counters are exported, indicated by the column name suffix _c and _s respectively. In addition, each function is monitored for each data size bucket. Refer to fi_hook(7) for more details.

Example CSV output, first four columns, first three rows:

mon_recv_0_64_c,mon_recv_0_64_s,mon_recv_64_512_c,mon_recv_64_512_s
0,0,0,0
22529,0,0,0
113664,0,0,0

fi_hook(7)

OpenFabrics.

2025-06-06 Libfabric Programmer’s Manual