'\" t
.\" Title: trace-cmd-sqlhist
.\" Author: [see the "AUTHOR" section]
.\" Generator: DocBook XSL Stylesheets vsnapshot
.\" Date: 10/13/2024
.\" Manual: libtracefs Manual
.\" Source: libtracefs
.\" Language: English
.\"
.TH "TRACE\-CMD\-SQLHIST" "1" "10/13/2024" "libtracefs" "libtracefs Manual"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
trace-cmd-sqlhist \- Use SQL language to create / show creation of tracefs histograms and synthetic events
.SH "SYNOPSIS"
.sp
\fBtrace\-cmd sqlhist\fR [\fIOPTIONS\fR] [\fISQL\-select\-command\fR]
.SH "DESCRIPTION"
.sp
The trace\-cmd sqlhist(1) will take an SQL like statement to create tracefs histograms and synthetic events that can perform various actions for various handling of the data\&.
.sp
The tracefs file system interfaces with the Linux tracing infrastructure that has various dynamic and static events through out the kernel\&. Each of these events can have a "histogram" attached to it, where the fields of the event will define the buckets of the histogram\&.
.sp
A synthetic event is a way to attach two separate events and use the fields and time stamps of those events to create a new dynamic event\&. This new dynamic event is call a synthetic event\&. The fields of each event can have simple calculations done on them where, for example, the delta between a field of one event to a field of the other event can be taken\&. This also works for the time stamps of the events where the time delta between the two events can also be extracted and placed into the synthetic event\&.
.sp
Other actions can be done from the fields of the events\&. A snapshot can be taken of the kernel ring buffer a variable used in the synthetic event creating hits a max, or simply changes\&.
.sp
The commands to create histograms and synthetic events are complex and not easy to remember\&. \fBtrace\-cmd sqlhist\fR is used to convert SQL syntax into the commands needed to create the histogram or synthetic event\&.
.sp
The \fBSQL\-select\-command\fR is a SQL string defined by \fBtracefs_sqlhist\fR(3)\&.
.sp
Note, this must be run as root (or sudo) as interacting with the tracefs directory requires root privilege, unless the \fB\-t\fR option is given with a copy of the \fItracefs\fR directory and its events\&.
.SH "OPTIONS"
.PP
\fB\-n\fR \fIname\fR
.RS 4
The name of the synthetic event to create\&. This event can then be used like any other event, and enabled via
\fBtrace\-cmd record\fR(1)\&.
.RE
.PP
\fB\-t\fR \fItracefs\-dir\fR
.RS 4
In order to test this out as non root user, a copy of the tracefs directory can be used, and passing that directory with this option will allow the program to work\&. Obviously,
\fB\-e\fR
will not work as non\-root because it will not be able to execute\&.
.sp
.if n \{\
.RS 4
.\}
.nf
# mkdir /tmp/tracing
# cp \-r /sys/kernel/tracing/events /tmp/tracing
# exit
$ trace\-cmd sqlhist \-t /tmp/tracing \&.\&.\&.
.fi
.if n \{\
.RE
.\}
.RE
.PP
\fB\-e\fR
.RS 4
Not only display the commands to create the histogram, but also execute them\&. This requires root privilege\&.
.RE
.PP
\fB\-f\fR \fIfile\fR
.RS 4
Instead of reading the SQL commands from the command line, read them from
\fIfile\fR\&. If
\fIfile\fR
is
\fI\-\fR
then read from standard input\&.
.RE
.PP
\fB\-m\fR \fIvar\fR
.RS 4
Do the given action when the variable
\fIvar\fR
hits a new maximum\&. This can not be used with
\fB\-c\fR\&. The
\fIvar\fR
must be defined in the
\fBSQL\-select\-command\fR\&.
.RE
.PP
\fB\-c\fR \fIvar\fR
.RS 4
Do the given action when the variable
\fIvar\fR
changes its value\&. This can not be used with
\fB\-m\fR\&. The
\fIvar\fR
must be defined in the
\fBSQL\-select\-command\fR\&.
.RE
.PP
\fB\-s\fR
.RS 4
Perform a snapshot instead of calling the synthetic event\&.
.RE
.PP
\fB\-T\fR
.RS 4
Perform both a snapshot and trace the synthetic event\&.
.RE
.PP
\fB\-S\fR \fIfields[,fields]\fR
.RS 4
Save the given fields\&. The fields must be fields of the "end" event given in the
\fBSQL\-select\-command\fR
.RE
.PP
\fB\-B\fR \fIinstance\fR
.RS 4
For simple statements that only produce a histogram, the instance given here will be where the histogram will be created\&. This is ignored for full synthetic event creation, as sythetic events have a global affect on all tracing instances, where as, histograms only affect a single instance\&.
.RE
.SH "EXAMPLES"
.sp
As described above, for testing purposes, make a copy of the event directory:
.sp
.if n \{\
.RS 4
.\}
.nf
$ mkdir /tmp/tracing
$ sudo cp \-r /sys/kernel/tracing/events /tmp/tracing/
$ sudo chmod \-R 0644 /tmp/tracing/
.fi
.if n \{\
.RE
.\}
.sp
For an example of simple histogram output using the copy of the tracefs directory\&.
.sp
.if n \{\
.RS 4
.\}
.nf
$ trace\-cmd sqlhist \-t /tmp/tracing/ \*(AqSELECT CAST(call_site as SYM\-OFFSET), bytes_req, CAST(bytes_alloc AS _COUNTER_) FROM kmalloc\*(Aq
.fi
.if n \{\
.RE
.\}
.sp
Produces the output:
.sp
.if n \{\
.RS 4
.\}
.nf
echo \*(Aqhist:keys=call_site\&.sym\-offset,bytes_req:vals=bytes_alloc\*(Aq > /sys/kernel/tracing/events/kmem/kmalloc/trigger
.fi
.if n \{\
.RE
.\}
.sp
Which could be used by root:
.sp
.if n \{\
.RS 4
.\}
.nf
# echo \*(Aqhist:keys=call_site\&.sym\-offset,bytes_req:vals=bytes_alloc\*(Aq > /sys/kernel/tracing/events/kmem/kmalloc/trigger
# cat /sys/kernel/tracing/events/kmem/kmalloc/hist
# event histogram
#
# trigger info: hist:keys=call_site\&.sym\-offset,bytes_req:vals=hitcount,bytes_alloc:sort=hitcount:size=2048 [active]
#
{ call_site: [ffffffff813f8d8a] load_elf_phdrs+0x4a/0xb0 , bytes_req: 728 } hitcount: 1 bytes_alloc: 1024
{ call_site: [ffffffffc0c69e74] nf_ct_ext_add+0xd4/0x1d0 [nf_conntrack] , bytes_req: 128 } hitcount: 1 bytes_alloc: 128
{ call_site: [ffffffff818355e6] dma_resv_get_fences+0xf6/0x440 , bytes_req: 8 } hitcount: 1 bytes_alloc: 8
{ call_site: [ffffffffc06dc73f] intel_gt_get_buffer_pool+0x15f/0x290 [i915] , bytes_req: 424 } hitcount: 1 bytes_alloc: 512
{ call_site: [ffffffff813f8d8a] load_elf_phdrs+0x4a/0xb0 , bytes_req: 616 } hitcount: 1 bytes_alloc: 1024
{ call_site: [ffffffff8161a44c] __sg_alloc_table+0x11c/0x180 , bytes_req: 32 } hitcount: 1 bytes_alloc: 32
{ call_site: [ffffffffc070749d] shmem_get_pages+0xad/0x5d0 [i915] , bytes_req: 16 } hitcount: 1 bytes_alloc: 16
{ call_site: [ffffffffc07507f5] intel_framebuffer_create+0x25/0x60 [i915] , bytes_req: 408 } hitcount: 1 bytes_alloc: 512
{ call_site: [ffffffffc06fc20f] eb_parse+0x34f/0x910 [i915] , bytes_req: 408 } hitcount: 1 bytes_alloc: 512
{ call_site: [ffffffffc0700ebd] i915_gem_object_get_pages_internal+0x5d/0x270 [i915] , bytes_req: 16 } hitcount: 1 bytes_alloc: 16
{ call_site: [ffffffffc0771188] intel_frontbuffer_get+0x38/0x220 [i915] , bytes_req: 400 } hitcount: 1 bytes_alloc: 512
{ call_site: [ffffffff8161a44c] __sg_alloc_table+0x11c/0x180 , bytes_req: 128 } hitcount: 1 bytes_alloc: 128
{ call_site: [ffffffff813f8f45] load_elf_binary+0x155/0x1680 , bytes_req: 28 } hitcount: 1 bytes_alloc: 32
{ call_site: [ffffffffc07038c8] __assign_mmap_offset+0x208/0x3d0 [i915] , bytes_req: 288 } hitcount: 1 bytes_alloc: 512
{ call_site: [ffffffff813737b2] alloc_bprm+0x32/0x2f0 , bytes_req: 416 } hitcount: 1 bytes_alloc: 512
{ call_site: [ffffffff813f9027] load_elf_binary+0x237/0x1680 , bytes_req: 64 } hitcount: 1 bytes_alloc: 64
{ call_site: [ffffffff8161a44c] __sg_alloc_table+0x11c/0x180 , bytes_req: 64 } hitcount: 1 bytes_alloc: 64
{ call_site: [ffffffffc040ffe7] drm_vma_node_allow+0x27/0xe0 [drm] , bytes_req: 40 } hitcount: 2 bytes_alloc: 128
{ call_site: [ffffffff813cda98] __do_sys_timerfd_create+0x58/0x1c0 , bytes_req: 336 } hitcount: 2 bytes_alloc: 1024
{ call_site: [ffffffff818355e6] dma_resv_get_fences+0xf6/0x440 , bytes_req: 40 } hitcount: 2 bytes_alloc: 128
{ call_site: [ffffffff8139b75a] single_open+0x2a/0xa0 , bytes_req: 32 } hitcount: 2 bytes_alloc: 64
{ call_site: [ffffffff815df715] bio_kmalloc+0x25/0x80 , bytes_req: 136 } hitcount: 2 bytes_alloc: 384
{ call_site: [ffffffffc071e5cd] i915_vma_work+0x1d/0x50 [i915] , bytes_req: 416 } hitcount: 3 bytes_alloc: 1536
{ call_site: [ffffffff81390d0d] alloc_fdtable+0x4d/0x100 , bytes_req: 56 } hitcount: 3 bytes_alloc: 192
{ call_site: [ffffffffc06ff65f] i915_gem_do_execbuffer+0x158f/0x2440 [i915] , bytes_req: 16 } hitcount: 4 bytes_alloc: 64
{ call_site: [ffffffff8137713c] alloc_pipe_info+0x5c/0x230 , bytes_req: 384 } hitcount: 5 bytes_alloc: 2560
{ call_site: [ffffffff813771b4] alloc_pipe_info+0xd4/0x230 , bytes_req: 640 } hitcount: 5 bytes_alloc: 5120
{ call_site: [ffffffff81834cdb] dma_resv_list_alloc+0x1b/0x40 , bytes_req: 40 } hitcount: 6 bytes_alloc: 384
{ call_site: [ffffffff81834cdb] dma_resv_list_alloc+0x1b/0x40 , bytes_req: 56 } hitcount: 9 bytes_alloc: 576
{ call_site: [ffffffff8120086e] tracing_map_sort_entries+0x9e/0x3e0 , bytes_req: 24 } hitcount: 60 bytes_alloc: 1920
Totals:
Hits: 122
Entries: 30
Dropped: 0
.fi
.if n \{\
.RE
.\}
.sp
Note, although the examples use uppercase for the SQL keywords, they do not have to be\&. \fISELECT\fR could also be \fIselect\fR or even \fIsElEcT\fR\&.
.sp
By using the full SQL language, synthetic events can be made and processed\&. For example, using \fBtrace\-cmd sqlhist\fR along with \fBtrace\-cmd record\fR(1), wake up latency can be recorded by creating a synthetic event by attaching the \fIsched_waking\fR and the \fIsched_switch\fR events\&.
.sp
.if n \{\
.RS 4
.\}
.nf
# trace\-cmd sqlhist \-n wakeup_lat \-e \-T \-m lat \*(AqSELECT end\&.next_comm AS comm, (end\&.TIMESTAMP_USECS \- start\&.TIMESTAMP_USECS) AS lat FROM \*(Aq \e
\*(Aqsched_waking AS start JOIN sched_switch AS end ON start\&.pid = end\&.next_pid WHERE end\&.next_prio < 100 && end\&.next_comm == "cyclictest"\*(Aq
# trace\-cmd start \-e all \-e wakeup_lat \-R stacktrace
# cyclictest \-l 1000 \-p80 \-i250 \-a \-t \-q \-m \-d 0 \-b 1000 \-\-tracemark
# trace\-cmd show \-s | tail \-30
\-0 [002] dNh4 23454\&.902246: sched_wakeup: comm=cyclictest pid=12272 prio=120 target_cpu=002
\-0 [005] \&.\&.\&.1 23454\&.902246: cpu_idle: state=4294967295 cpu_id=5
\-0 [007] d\&.\&.1 23454\&.902246: cpu_idle: state=0 cpu_id=7
\-0 [002] dNh1 23454\&.902247: hrtimer_expire_exit: hrtimer=0000000037956dc2
\-0 [005] d\&.\&.1 23454\&.902248: cpu_idle: state=0 cpu_id=5
\-0 [002] dNh1 23454\&.902248: write_msr: 6e0, value 4866ce957272
\-0 [006] \&.\&.\&.1 23454\&.902248: cpu_idle: state=4294967295 cpu_id=6
\-0 [002] dNh1 23454\&.902249: local_timer_exit: vector=236
\-0 [006] d\&.\&.1 23454\&.902250: cpu_idle: state=0 cpu_id=6
\-0 [002] \&.N\&.1 23454\&.902250: cpu_idle: state=4294967295 cpu_id=2
\-0 [002] dN\&.1 23454\&.902251: rcu_utilization: Start context switch
\-0 [002] dN\&.1 23454\&.902252: rcu_utilization: End context switch
\-0 [001] \&.\&.\&.1 23454\&.902252: cpu_idle: state=4294967295 cpu_id=1
\-0 [002] dN\&.3 23454\&.902253: prandom_u32: ret=3692516021
\-0 [001] d\&.\&.1 23454\&.902254: cpu_idle: state=0 cpu_id=1
\-0 [002] d\&.\&.2 23454\&.902254: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=12275 next_prio=19
\-0 [002] d\&.\&.4 23454\&.902256: wakeup_lat: next_comm=cyclictest lat=17
\-0 [002] d\&.\&.5 23454\&.902258:
=> trace_event_raw_event_synth
=> action_trace
=> event_hist_trigger
=> event_triggers_call
=> trace_event_buffer_commit
=> trace_event_raw_event_sched_switch
=> __traceiter_sched_switch
=> __schedule
=> schedule_idle
=> do_idle
=> cpu_startup_entry
=> secondary_startup_64_no_verify
.fi
.if n \{\
.RE
.\}
.sp
Here\(cqs the options for above example explained:
.PP
\fB\-n wakeup_lat\fR
.RS 4
Name the synthetic event to use
\fBwakeup_lat\fR\&.
.RE
.PP
\fB\-e\fR
.RS 4
Execute the commands that are printed\&.
.RE
.PP
\fB\-T\fR
.RS 4
Perform both a trace action and then a snapshot action (swap the buffer into the static
\fIsnapshot\fR
buffer)\&.
.RE
.PP
\fB\-m lat\fR
.RS 4
Trigger the actions whenever
\fIlat\fR
hits a new maximum value\&.
.RE
.sp
Now a breakdown of the SQL statement:
.sp
.if n \{\
.RS 4
.\}
.nf
\*(AqSELECT end\&.next_comm AS comm, (end\&.TIMESTAMP_USECS \- start\&.TIMESTAMP_USECS) AS lat FROM \*(Aq \e
\*(Aqsched_waking AS start JOIN sched_switch AS end ON start\&.pid = end\&.next_pid WHERE end\&.next_prio < 100 && end\&.next_comm == "cyclictest"\*(Aq
.fi
.if n \{\
.RE
.\}
.PP
\fBend\&.next_comm AS comm\fR
.RS 4
Save the
\fIsched_switch\fR
field
\fBnext_comm\fR
and place it into the
\fBcomm\fR
field of the
\fIwakeup_lat\fR
synthetic event\&.
.RE
.PP
\fB(end\&.TIMESTAMP_USECS \- start\&.TIMESTAMP_USECS) AS lat\fR
.RS 4
Take the delta of the time stamps from the
\fIsched_switch\fR
event and the
\fIsched_waking\fR
event\&. As time stamps are usually recorded in nanoseconds,
\fBTIMESTAMP\fR
would give the full nanosecond time stamp, but here, the
\fBTIMESTAMP_USECS\fR
will truncate it into microseconds\&. The value is saved in the variable
\fBlat\fR, which will also be recorded in the synthetic event\&.
.RE
.PP
\fBFROM \fR\fB\fIsched_waking\fR\fR\fB AS start JOIN sched_switch AS end ON start\&.pid = end\&.next_pid\fR
.RS 4
Create the synthetic event by joining
\fIsched_waking\fR
to
\fIsched_switch\fR, matching the
\fIsched_waking\fR
\fIpid\fR
field with the
\fIsched_switch\fR
\fInext_pid\fR
field\&. Also make
\fBstart\fR
an alias for
\fIsched_waking\fR
and
\fBend\fR
an alias for
\fIsched_switch\fR
which then an use
\fBstart\fR
and
\fBend\fR
as a subsitute for
\fIsched_waking\fR
and
\fIsched_switch\fR
respectively through out the rest of the SQL statement\&.
.RE
.PP
\fBWHERE end\&.next_prio < 100 && end\&.next_comm == "cyclictest"\fR
.RS 4
Filter the logic where it executes only if
\fIsched_waking\fR
\fInext_prio\fR
field is less than 100\&. (Note, in the Kernel, priorities are inverse, and the real\-time priorities are represented from 0\-100 where 0 is the highest priority)\&. Also only trace when the
\fInext_comm\fR
(the task scheduling in) of the
\fIsched_switch\fR
event has the name "cyclictest"\&.
.RE
.sp
For the \fBtrace\-cmd\fR(3) command:
.sp
.if n \{\
.RS 4
.\}
.nf
trace\-cmd start \-e all \-e wakeup_lat \-R stacktrace
.fi
.if n \{\
.RE
.\}
.PP
\fBtrace\-cmd start\fR
.RS 4
Enables tracing (does not record to a file)\&.
.RE
.PP
\fB\-e all\fR
.RS 4
Enable all events
.RE
.PP
\fB\-e wakeup_lat \-R stacktrace\fR
.RS 4
have the "wakeup_lat" event (our synthetic event) enable the
\fBstacktrace\fR
trigger, were for every instance of the "wakeup_lat" event, a kernel stack trace will be recorded in the ring buffer\&.
.RE
.sp
After calling \fBcyclictest\fR (a real\-time tool to measure wakeup latency), read the snapshot buffer\&.
.PP
\fBtrace\-cmd show \-s\fR
.RS 4
\fBtrace\-cmd show\fR
reads the kernel ring buffer, and the
\fB\-s\fR
option will read the
\fBsnapshot\fR
buffer instead of the normal one\&.
.RE
.sp
.if n \{\
.RS 4
.\}
.nf
\-0 [002] d\&.\&.4 23454\&.902256: wakeup_lat: next_comm=cyclictest lat=17
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
We see on the "wakeup_lat" event happened on CPU 2, with a wake up latency 17 microseconds\&.
.fi
.if n \{\
.RE
.\}
.sp
This can be extracted into a \fBtrace\&.dat\fR file that \fBtrace\-cmd\fR(3) can read and do further analysis, as well as \fBkernelshark\fR\&.
.sp
.if n \{\
.RS 4
.\}
.nf
# trace\-cmd extract \-s
# trace\-cmd report \-\-cpu 2 | tail \-30
\-0 [002] 23454\&.902238: prandom_u32: ret=1633425088
\-0 [002] 23454\&.902239: sched_wakeup: cyclictest:12275 [19] CPU:002
\-0 [002] 23454\&.902241: hrtimer_expire_exit: hrtimer=0xffffbbd68286fe60
\-0 [002] 23454\&.902241: hrtimer_cancel: hrtimer=0xffffbbd6826efe70
\-0 [002] 23454\&.902242: hrtimer_expire_entry: hrtimer=0xffffbbd6826efe70 now=23455294430750 function=hrtimer_wakeup/0x0
\-0 [002] 23454\&.902243: sched_waking: comm=cyclictest pid=12272 prio=120 target_cpu=002
\-0 [002] 23454\&.902244: prandom_u32: ret=1102749734
\-0 [002] 23454\&.902246: sched_wakeup: cyclictest:12272 [120] CPU:002
\-0 [002] 23454\&.902247: hrtimer_expire_exit: hrtimer=0xffffbbd6826efe70
\-0 [002] 23454\&.902248: write_msr: 6e0, value 4866ce957272
\-0 [002] 23454\&.902249: local_timer_exit: vector=236
\-0 [002] 23454\&.902250: cpu_idle: state=4294967295 cpu_id=2
\-0 [002] 23454\&.902251: rcu_utilization: Start context switch
\-0 [002] 23454\&.902252: rcu_utilization: End context switch
\-0 [002] 23454\&.902253: prandom_u32: ret=3692516021
\-0 [002] 23454\&.902254: sched_switch: swapper/2:0 [120] R ==> cyclictest:12275 [19]
\-0 [002] 23454\&.902256: wakeup_lat: next_comm=cyclictest lat=17
\-0 [002] 23454\&.902258: kernel_stack:
=> trace_event_raw_event_synth (ffffffff8121a0db)
=> action_trace (ffffffff8121e9fb)
=> event_hist_trigger (ffffffff8121ca8d)
=> event_triggers_call (ffffffff81216c72)
=> trace_event_buffer_commit (ffffffff811f7618)
=> trace_event_raw_event_sched_switch (ffffffff8110fda4)
=> __traceiter_sched_switch (ffffffff8110d449)
=> __schedule (ffffffff81c02002)
=> schedule_idle (ffffffff81c02c86)
=> do_idle (ffffffff8111e898)
=> cpu_startup_entry (ffffffff8111eba9)
=> secondary_startup_64_no_verify (ffffffff81000107)
.fi
.if n \{\
.RE
.\}
.SH "SEE ALSO"
.sp
trace\-cmd(1), tracefs_sqlhist(3)
.SH "AUTHOR"
.sp
Written by Steven Rostedt, <\m[blue]\fBrostedt@goodmis\&.org\fR\m[]\&\s-2\u[1]\d\s+2>
.SH "RESOURCES"
.sp
\m[blue]\fBhttps://git\&.kernel\&.org/pub/scm/utils/trace\-cmd/trace\-cmd\&.git/\fR\m[]
.SH "COPYING"
.sp
Copyright (C) 2021 , Inc\&. Free use of this software is granted under the terms of the GNU Public License (GPL)\&.
.SH "NOTES"
.IP " 1." 4
rostedt@goodmis.org
.RS 4
\%mailto:rostedt@goodmis.org
.RE