perf-mem - Profile memory accesses
perf mem [<options>] (record [<command>] | report)
"perf mem record" runs a command and gathers memory
operation data from it, into perf.data. Perf record options are accepted and
are passed through.
"perf mem report" displays the result. It invokes perf
report with the right set of options to display a memory access profile. By
default, loads and stores are sampled. Use the -t option to limit to loads
or stores.
Note that on Intel systems the memory latency reported is the
use-latency, not the pure load (or store latency). Use latency includes any
pipeline queuing delays in addition to the memory subsystem latency.
On Arm64 this uses SPE to sample load and store operations,
therefore hardware and kernel support is required. See
perf-arm-spe(1) for a setup guide. Due to the statistical nature of
SPE sampling, not every memory operation will be sampled.
-f, --force
Don’t do ownership validation
-t, --type=<type>
Select the memory operation type: load or store (default:
load,store)
-v, --verbose
Be more verbose (show counter open errors, etc)
-p, --phys-data
Record/Report sample physical addresses
--data-page-size
Record/Report sample data address page size
<command>...
Any command you can specify in a shell.
-e, --event <event>
Event selector. Use perf mem record -e list to
list available events.
-K, --all-kernel
Configure all used events to run in kernel space.
-U, --all-user
Configure all used events to run in user space.
--ldlat <n>
Specify desired latency for loads event. Supported on
Intel and Arm64 processors only. Ignored on other archs.
-i, --input=<file>
Input file name.
-C, --cpu=<cpu>
Monitor only on the list of CPUs provided. Multiple CPUs
can be provided as a comma-separated list with no space: 0,1. Ranges of CPUs
are specified with - like 0-2. Default is to monitor all CPUS.
-D, --dump-raw-samples
Dump the raw decoded samples on the screen in a format
that is easy to parse with one sample per line.
-s, --sort=<key>
Group result by given key(s) - multiple keys can be
specified in CSV format. The keys are specific to memory samples are:
symbol_daddr, symbol_iaddr, dso_daddr, locked, tlb, mem, snoop, dcacheline,
phys_daddr, data_page_size, blocked.
•symbol_daddr: name of data symbol being executed
on at the time of sample
•symbol_iaddr: name of code symbol being executed
on at the time of sample
•dso_daddr: name of library or module containing
the data being executed on at the time of the sample
•locked: whether the bus was locked at the time of
the sample
•tlb: type of tlb access for the data at the time
of the sample
•mem: type of memory access for the data at the
time of the sample
•snoop: type of snoop (if any) for the data at the
time of the sample
•dcacheline: the cacheline the data address is on
at the time of the sample
•phys_daddr: physical address of data being
executed on at the time of sample
•data_page_size: the data page size of data being
executed on at the time of sample
•blocked: reason of blocked load access for the
data at the time of the sample
And the default sort keys are changed to local_weight, mem, sym, dso,
symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat.
-T, --type-profile
Show data-type profile result instead of code symbols.
This requires the debug information and it will change the default sort keys
to: mem, snoop, tlb, type.
-U, --hide-unresolved
Only display entries resolved to a symbol.
-x, --field-separator=<separator>
Specify the field separator used when dump raw samples
(-D option). By default, The separator is the space character.
In addition, for report all perf report options are valid, and for
record all perf record options.