'\" t
.\" Title: perf-list
.\" Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author]
.\" Generator: DocBook XSL Stylesheets vsnapshot
.\" Date: 2024-11-09
.\" Manual: perf Manual
.\" Source: perf
.\" Language: English
.\"
.TH "PERF\-LIST" "1" "2024\-11\-09" "perf" "perf Manual"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
perf-list \- List all symbolic event types
.SH "SYNOPSIS"
.sp
.nf
\fIperf list\fR []
[hw|sw|cache|tracepoint|pmu|sdt|metric|metricgroup|event_glob]
.fi
.SH "DESCRIPTION"
.sp
This command displays the symbolic event types which can be selected in the various perf commands with the \-e option\&.
.SH "OPTIONS"
.PP
\-d, \-\-desc
.RS 4
Print extra event descriptions\&. (default)
.RE
.PP
\-\-no\-desc
.RS 4
Don\(cqt print descriptions\&.
.RE
.PP
\-v, \-\-long\-desc
.RS 4
Print longer event descriptions\&.
.RE
.PP
\-\-debug
.RS 4
Enable debugging output\&.
.RE
.PP
\-\-details
.RS 4
Print how named events are resolved internally into perf events, and also any extra expressions computed by perf stat\&.
.RE
.PP
\-\-deprecated
.RS 4
Print deprecated events\&. By default the deprecated events are hidden\&.
.RE
.PP
\-\-unit
.RS 4
Print PMU events and metrics limited to the specific PMU name\&. (e\&.g\&. \-\-unit cpu, \-\-unit msr, \-\-unit cpu_core, \-\-unit cpu_atom)
.RE
.PP
\-j, \-\-json
.RS 4
Output in JSON format\&.
.RE
.PP
\-o, \-\-output=
.RS 4
Output file name\&. By default output is written to stdout\&.
.RE
.SH "EVENT MODIFIERS"
.sp
Events can optionally have a modifier by appending a colon and one or more modifiers\&. Modifiers allow the user to restrict the events to be counted\&. The following modifiers exist:
.sp
.if n \{\
.RS 4
.\}
.nf
u \- user\-space counting
k \- kernel counting
h \- hypervisor counting
I \- non idle counting
G \- guest counting (in KVM guests)
H \- host counting (not in KVM guests)
p \- precise level
P \- use maximum detected precise level
S \- read sample value (PERF_SAMPLE_READ)
D \- pin the event to the PMU
W \- group is weak and will fallback to non\-group if not schedulable,
e \- group or event are exclusive and do not share the PMU
b \- use BPF aggregration (see perf stat \-\-bpf\-counters)
R \- retire latency value of the event
.fi
.if n \{\
.RE
.\}
.sp
The \fIp\fR modifier can be used for specifying how precise the instruction address should be\&. The \fIp\fR modifier can be specified multiple times:
.sp
.if n \{\
.RS 4
.\}
.nf
0 \- SAMPLE_IP can have arbitrary skid
1 \- SAMPLE_IP must have constant skid
2 \- SAMPLE_IP requested to have 0 skid
3 \- SAMPLE_IP must have 0 skid, or uses randomization to avoid
sample shadowing effects\&.
.fi
.if n \{\
.RE
.\}
.sp
For Intel systems precise event sampling is implemented with PEBS which supports up to precise\-level 2, and precise level 3 for some special cases
.sp
On AMD systems it is implemented using IBS OP (up to precise\-level 2)\&. Unlike Intel PEBS which provides levels of precision, AMD core pmu is inherently non\-precise and IBS is inherently precise\&. (i\&.e\&. ibs_op//, ibs_op//p, ibs_op//pp and ibs_op//ppp are all same)\&. The precise modifier works with event types 0x76 (cpu\-cycles, CPU clocks not halted) and 0xC1 (micro\-ops retired)\&. Both events map to IBS execution sampling (IBS op) with the IBS Op Counter Control bit (IbsOpCntCtl) set respectively (see the Core Complex (CCX) \(-> Processor x86 Core \(-> Instruction Based Sampling (IBS) section of the [AMD Processor Programming Reference (PPR)] relevant to the family, model and stepping of the processor being used)\&.
.sp
Manual Volume 2: System Programming, 13\&.3 Instruction\-Based Sampling)\&. Examples to use IBS:
.sp
.if n \{\
.RS 4
.\}
.nf
perf record \-a \-e cpu\-cycles:p \&.\&.\&. # use ibs op counting cycles
perf record \-a \-e r076:p \&.\&.\&. # same as \-e cpu\-cycles:p
perf record \-a \-e r0C1:p \&.\&.\&. # use ibs op counting micro\-ops
.fi
.if n \{\
.RE
.\}
.SH "RAW HARDWARE EVENT DESCRIPTOR"
.sp
Even when an event is not available in a symbolic form within perf right now, it can be encoded in a per processor specific way\&.
.sp
For instance on x86 CPUs, N is a hexadecimal value that represents the raw register encoding with the layout of IA32_PERFEVTSELx MSRs (see [Intel\(rg 64 and IA\-32 Architectures Software Developer\(cqs Manual Volume 3B: System Programming Guide] Figure 30\-1 Layout of IA32_PERFEVTSELx MSRs) or AMD\(cqs PERF_CTL MSRs (see the Core Complex (CCX) \(-> Processor x86 Core \(-> MSR Registers section of the [AMD Processor Programming Reference (PPR)] relevant to the family, model and stepping of the processor being used)\&.
.sp
Note: Only the following bit fields can be set in x86 counter registers: event, umask, edge, inv, cmask\&. Esp\&. guest/host only and OS/user mode flags must be setup using EVENT MODIFIERS\&.
.sp
Example:
.sp
If the Intel docs for a QM720 Core i7 describe an event as:
.sp
.if n \{\
.RS 4
.\}
.nf
Event Umask Event Mask
Num\&. Value Mnemonic Description Comment
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
A8H 01H LSD\&.UOPS Counts the number of micro\-ops Use cmask=1 and
delivered by loop stream detector invert to count
cycles
.fi
.if n \{\
.RE
.\}
.sp
raw encoding of 0x1A8 can be used:
.sp
.if n \{\
.RS 4
.\}
.nf
perf stat \-e r1a8 \-a sleep 1
perf record \-e r1a8 \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
It\(cqs also possible to use pmu syntax:
.sp
.if n \{\
.RS 4
.\}
.nf
perf record \-e r1a8 \-a sleep 1
perf record \-e cpu/r1a8/ \&.\&.\&.
perf record \-e cpu/r0x1a8/ \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
Some processors, like those from AMD, support event codes and unit masks larger than a byte\&. In such cases, the bits corresponding to the event configuration parameters can be seen with:
.sp
.if n \{\
.RS 4
.\}
.nf
cat /sys/bus/event_source/devices//format/
.fi
.if n \{\
.RE
.\}
.sp
Example:
.sp
If the AMD docs for an EPYC 7713 processor describe an event as:
.sp
.if n \{\
.RS 4
.\}
.nf
Event Umask Event Mask
Num\&. Value Mnemonic Description
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
28FH 03H op_cache_hit_miss\&.op_cache_hit Counts Op Cache micro\-tag
hit events\&.
.fi
.if n \{\
.RE
.\}
.sp
raw encoding of 0x0328F cannot be used since the upper nibble of the EventSelect bits have to be specified via bits 32\-35 as can be seen with:
.sp
.if n \{\
.RS 4
.\}
.nf
cat /sys/bus/event_source/devices/cpu/format/event
.fi
.if n \{\
.RE
.\}
.sp
raw encoding of 0x20000038F should be used instead:
.sp
.if n \{\
.RS 4
.\}
.nf
perf stat \-e r20000038f \-a sleep 1
perf record \-e r20000038f \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
It\(cqs also possible to use pmu syntax:
.sp
.if n \{\
.RS 4
.\}
.nf
perf record \-e r20000038f \-a sleep 1
perf record \-e cpu/r20000038f/ \&.\&.\&.
perf record \-e cpu/r0x20000038f/ \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
You should refer to the processor specific documentation for getting these details\&. Some of them are referenced in the SEE ALSO section below\&.
.SH "ARBITRARY PMUS"
.sp
perf also supports an extended syntax for specifying raw parameters to PMUs\&. Using this typically requires looking up the specific event in the CPU vendor specific documentation\&.
.sp
The available PMUs and their raw parameters can be listed with
.sp
.if n \{\
.RS 4
.\}
.nf
ls /sys/devices/*/format
.fi
.if n \{\
.RE
.\}
.sp
For example the raw event "LSD\&.UOPS" core pmu event above could be specified as
.sp
.if n \{\
.RS 4
.\}
.nf
perf stat \-e cpu/event=0xa8,umask=0x1,name=LSD\&.UOPS_CYCLES,cmask=0x1/ \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
or using extended name syntax
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
perf stat \-e cpu/event=0xa8,umask=0x1,cmask=0x1,name=\e\*(AqLSD\&.UOPS_CYCLES:cmask=0x1\e\*(Aq/ \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.SH "PER SOCKET PMUS"
.sp
Some PMUs are not associated with a core, but with a whole CPU socket\&. Events on these PMUs generally cannot be sampled, but only counted globally with perf stat \-a\&. They can be bound to one logical CPU, but will measure all the CPUs in the same socket\&.
.sp
This example measures memory bandwidth every second on the first memory controller on socket 0 of a Intel Xeon system
.sp
.if n \{\
.RS 4
.\}
.nf
perf stat \-C 0 \-a uncore_imc_0/cas_count_read/,uncore_imc_0/cas_count_write/ \-I 1000 \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
Each memory controller has its own PMU\&. Measuring the complete system bandwidth would require specifying all imc PMUs (see perf list output), and adding the values together\&. To simplify creation of multiple events, prefix and glob matching is supported in the PMU name, and the prefix \fIuncore_\fR is also ignored when performing the match\&. So the command above can be expanded to all memory controllers by using the syntaxes:
.sp
.if n \{\
.RS 4
.\}
.nf
perf stat \-C 0 \-a imc/cas_count_read/,imc/cas_count_write/ \-I 1000 \&.\&.\&.
perf stat \-C 0 \-a *imc*/cas_count_read/,*imc*/cas_count_write/ \-I 1000 \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
This example measures the combined core power every second
.sp
.if n \{\
.RS 4
.\}
.nf
perf stat \-I 1000 \-e power/energy\-cores/ \-a
.fi
.if n \{\
.RE
.\}
.SH "ACCESS RESTRICTIONS"
.sp
For non root users generally only context switched PMU events are available\&. This is normally only the events in the cpu PMU, the predefined events like cycles and instructions and some software events\&.
.sp
Other PMUs and global measurements are normally root only\&. Some event qualifiers, such as "any", are also root only\&.
.sp
This can be overridden by setting the kernel\&.perf_event_paranoid sysctl to \-1, which allows non root to use these events\&.
.sp
For accessing trace point events perf needs to have read access to /sys/kernel/tracing, even when perf_event_paranoid is in a relaxed setting\&.
.SH "TOOL/HWMON EVENTS"
.sp
Some events don\(cqt have an associated PMU instead reading values available to software without perf_event_open\&. As these events don\(cqt support sampling they can only really be read by tools like perf stat\&.
.sp
Tool events provide times and certain system parameters\&. Examples include duration_time, user_time, system_time and num_cpus_online\&.
.sp
Hwmon events provide easy access to hwmon sysfs data typically in /sys/class/hwmon\&. This information includes temperatures, fan speeds and energy usage\&.
.SH "TRACING"
.sp
Some PMUs control advanced hardware tracing capabilities, such as Intel PT, that allows low overhead execution tracing\&. These are described in a separate intel\-pt\&.txt document\&.
.SH "PARAMETERIZED EVENTS"
.sp
Some pmu events listed by \fIperf\-list\fR will be displayed with \fI?\fR in them\&. For example:
.sp
.if n \{\
.RS 4
.\}
.nf
hv_gpci/dtbp_ptitc,phys_processor_idx=?/
.fi
.if n \{\
.RE
.\}
.sp
This means that when provided as an event, a value for \fI?\fR must also be supplied\&. For example:
.sp
.if n \{\
.RS 4
.\}
.nf
perf stat \-C 0 \-e \*(Aqhv_gpci/dtbp_ptitc,phys_processor_idx=0x2/\*(Aq \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
EVENT QUALIFIERS:
.sp
It is also possible to add extra qualifiers to an event:
.sp
percore:
.sp
Sums up the event counts for all hardware threads in a core, e\&.g\&.:
.sp
.if n \{\
.RS 4
.\}
.nf
perf stat \-e cpu/event=0,umask=0x3,percore=1/
.fi
.if n \{\
.RE
.\}
.SH "EVENT GROUPS"
.sp
Perf supports time based multiplexing of events, when the number of events active exceeds the number of hardware performance counters\&. Multiplexing can cause measurement errors when the workload changes its execution profile\&.
.sp
When metrics are computed using formulas from event counts, it is useful to ensure some events are always measured together as a group to minimize multiplexing errors\&. Event groups can be specified using { }\&.
.sp
.if n \{\
.RS 4
.\}
.nf
perf stat \-e \*(Aq{instructions,cycles}\*(Aq \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.sp
The number of available performance counters depend on the CPU\&. A group cannot contain more events than available counters\&. For example Intel Core CPUs typically have four generic performance counters for the core, plus three fixed counters for instructions, cycles and ref\-cycles\&. Some special events have restrictions on which counter they can schedule, and may not support multiple instances in a single group\&. When too many events are specified in the group some of them will not be measured\&.
.sp
Globally pinned events can limit the number of counters available for other groups\&. On x86 systems, the NMI watchdog pins a counter by default\&. The nmi watchdog can be disabled as root with
.sp
.if n \{\
.RS 4
.\}
.nf
echo 0 > /proc/sys/kernel/nmi_watchdog
.fi
.if n \{\
.RE
.\}
.sp
Events from multiple different PMUs cannot be mixed in a group, with some exceptions for software events\&.
.SH "LEADER SAMPLING"
.sp
perf also supports group leader sampling using the :S specifier\&.
.sp
.if n \{\
.RS 4
.\}
.nf
perf record \-e \*(Aq{cycles,instructions}:S\*(Aq \&.\&.\&.
perf report \-\-group
.fi
.if n \{\
.RE
.\}
.sp
Normally all events in an event group sample, but with :S only the first event (the leader) samples, and it only reads the values of the other events in the group\&.
.sp
However, in the case AUX area events (e\&.g\&. Intel PT or CoreSight), the AUX area event must be the leader, so then the second event samples, not the first\&.
.SH "OPTIONS"
.sp
Without options all known events will be listed\&.
.sp
To limit the list use:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
\fIhw\fR
or
\fIhardware\fR
to list hardware events such as cache\-misses, etc\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
\fIsw\fR
or
\fIsoftware\fR
to list software events such as context switches, etc\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
\fIcache\fR
or
\fIhwcache\fR
to list hardware cache events such as L1\-dcache\-loads, etc\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
\fItracepoint\fR
to list all tracepoint events, alternatively use
\fIsubsys_glob:event_glob\fR
to filter by tracepoint subsystems such as sched, block, etc\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 5." 4.2
.\}
\fIpmu\fR
to print the kernel supplied PMU events\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 6.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 6." 4.2
.\}
\fIsdt\fR
to list all Statically Defined Tracepoint events\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 7.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 7." 4.2
.\}
\fImetric\fR
to list metrics
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 8.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 8." 4.2
.\}
\fImetricgroup\fR
to list metricgroups with metrics\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 9.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 9." 4.2
.\}
If none of the above is matched, it will apply the supplied glob to all events, printing the ones that match\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'10.\h'+01'\c
.\}
.el \{\
.sp -1
.IP "10." 4.2
.\}
As a last resort, it will do a substring search in all event names\&.
.RE
.sp
One or more types can be used at the same time, listing the events for the types specified\&.
.sp
Support raw format:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
\fI\-\-raw\-dump\fR, shows the raw\-dump of all the events\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
\fI\-\-raw\-dump [hw|sw|cache|tracepoint|pmu|event_glob]\fR, shows the raw\-dump of a certain kind of events\&.
.RE
.SH "SEE ALSO"
.sp
\fBperf-stat\fR(1), \fBperf-top\fR(1), \fBperf-record\fR(1), \m[blue]\fBIntel\(rg 64 and IA\-32 Architectures Software Developer\(cqs Manual Volume 3B: System Programming Guide\fR\m[]\&\s-2\u[1]\d\s+2, \m[blue]\fBAMD Processor Programming Reference (PPR)\fR\m[]\&\s-2\u[2]\d\s+2
.SH "NOTES"
.IP " 1." 4
Intel\(rg 64 and IA-32 Architectures Software Developer\(cqs Manual Volume 3B: System Programming Guide
.RS 4
\%http://www.intel.com/sdm/
.RE
.IP " 2." 4
AMD Processor Programming Reference (PPR)
.RS 4
\%https://bugzilla.kernel.org/show_bug.cgi?id=206537
.RE