'\"macro stdmacro .\" .\" Copyright (C) 2015-2021 Marko Myllynen .\" Copyright (C) 2016-2018 Red Hat. .\" .\" This program is free software; you can redistribute it and/or modify it .\" under the terms of the GNU General Public License as published by the .\" Free Software Foundation; either version 2 of the License, or (at your .\" option) any later version. .\" .\" This program is distributed in the hope that it will be useful, but .\" WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY .\" or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License .\" for more details. .\" .\" .TH PCP2SPARK 1 "PCP" "Performance Co-Pilot" .SH NAME \f3pcp2spark\f1 \- pcp-to-spark metrics exporter .SH SYNOPSIS \fBpcp2spark\fP [\fB\-5CGHIjLmnrRvV?\fP] [\fB\-4\fP \fIaction\fP] [\fB\-8\fP|\fB\-9\fP \fIlimit\fP] [\fB\-a\fP \fIarchive\fP] [\fB\-A\fP \fIalign\fP] [\fB\-\-archive\-folio\fP \fIfolio\fP] [\fB\-b\fP|\fB\-B\fP \fIspace-scale\fP] [\fB\-c\fP \fIconfig\fP] [\fB\-\-container\fP \fIcontainer\fP] [\fB\-\-daemonize\fP] [\fB\-e\fP \fIderived\fP] [\fB\-g\fP \fIserver\fP] [\fB\-h\fP \fIhost\fP] [\fB\-i\fP \fIinstances\fP] [\fB\-J\fP \fIrank\fP] [\fB\-K\fP \fIspec\fP] [\fB\-N\fP \fIpredicate\fP] [\fB\-O\fP \fIorigin\fP] [\fB\-p\fP \fIport\fP] [\fB\-P\fP|\fB\-0\fP \fIprecision\fP] [\fB\-q\fP|\fB\-Q\fP \fIcount-scale\fP] [\fB\-s\fP \fIsamples\fP] [\fB\-S\fP \fIstarttime\fP] [\fB\-t\fP \fIinterval\fP] [\fB\-T\fP \fIendtime\fP] [\fB\-y\fP|\fB\-Y\fP \fItime-scale\fP] \fImetricspec\fP [...] .SH DESCRIPTION .B pcp2spark is a customizable performance metrics exporter tool from PCP to Apache Spark. Any available performance metric, live or archived, system and/or application, can be selected for exporting using either command line arguments or a configuration file. .PP .B pcp2spark acts as a bridge which provides a network socket stream on a given address/port which an Apache Spark worker task can connect to and pull the configured PCP metrics from .B pcp2spark exporting them using the streaming extensions of the Apache Spark API. .PP .B pcp2spark is a close relative of .BR pmrep (1). Refer to .BR pmrep (1) for the .I metricspec description accepted on .B pcp2spark command line. See .BR pmrep.conf (5) for description of the .B pcp2spark.conf configuration file syntax. This page describes .B pcp2spark specific options and configuration file differences with .BR pmrep.conf (5). .BR pmrep (1) also lists some usage examples of which most are applicable with .B pcp2spark as well. .PP Only the command line options listed on this page are supported, other options available for .BR pmrep (1) are not supported. .PP Options via environment values (see .BR pmGetOptions (3)) override the corresponding built-in default values (if any). Configuration file options override the corresponding environment variables (if any). Command line options override the corresponding configuration file options (if any). .SH GENERAL USAGE A general setup for making use of .B pcp2spark would involve the user configuring .B pcp2spark for the PCP metrics to export followed by starting the .B pcp2spark application. The .B pcp2spark application will then wait and listen on the given address/port for a connection from an Apache Spark worker thread to be started. The worker thread will then connect to .B pcp2spark. .PP When an Apache Spark worker thread has connected, .B pcp2spark will begin streaming PCP metric data to Apache Spark until the worker thread completes or the connection is interrupted. If the connection is interrupted or the socket is closed from the Apache Spark worker thread .B pcp2spark will exit. .PP For an example Apache Spark worker job which will connect to an .B pcp2spark instance on a given address/port and pull in PCP metric data see the example provided in the PCP examples directory for .B pcp2spark (often provided by the PCP development package) or the online version at .IR https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/ . .SH CONFIGURATION FILE .B pcp2spark uses a configuration file with syntax described in .BR pmrep.conf (5). The following options are common with .BR pmrep.conf : .BR version , .BR source , .BR speclocal , .BR derived , .BR header , .BR globals , .BR samples , .BR interval , .BR type , .BR type_prefer , .BR ignore_incompat , .BR names_change , .BR instances , .BR live_filter , .BR rank , .BR limit_filter , .BR limit_filter_force , .BR invert_filter , .BR predicate , .BR omit_flat , .BR include_labels , .BR precision , .BR precision_force , .BR count_scale , .BR count_scale_force , .BR space_scale , .BR space_scale_force , .BR time_scale , .BR time_scale_force . The rest of the .B pmrep.conf options are recognized but ignored for compatibility. .SS pcp2spark specific options spark_server (string) .RS 4 Specify the address on which .B pcp2spark will listen for connections from an Apache Spark worker thread. Corresponding command line option is \fB\-g\fP. Defaults to \fB127.0.0.1\fP. .RE .PP spark_port (integer) .RS 4 Specify the port on which .B pcp2spark will listen for connections. Corresponding command line option is \fB\-p\fP. Defaults to \fB44325\fP. .RE .SH OPTIONS The available command line options are: .TP 5 \fB\-0\fR \fIprecision\fR, \fB\-\-precision\-force\fR=\fIprecision\fR Like .B \-P but this option \fIwill\fP override per-metric specifications. .TP \fB\-4\fR \fIaction\fR, \fB\-\-names\-change\fR=\fIaction\fR Specify which .I action to take on receiving a metric names change event during sampling. These events occur when a PMDA discovers new metrics sometime after starting up, and informs running client tools like .BR pcp2spark . Valid values for .I action are \fBupdate\fP (refresh metrics being sampled), \fBignore\fP (do nothing \- the default behaviour) and \fBabort\fP (exit the program if such an event occurs). .TP \fB\-5\fR, \fB\-\-ignore\-unknown\fR Silently ignore any metric name that cannot be resolved. At least one metric must be found for the tool to start. .TP \fB\-8\fR \fIlimit\fR, \fB\-\-limit\-filter\fR=\fIlimit\fR Limit results to instances with values above/below .IR limit . A positive integer will include instances with values at or above the limit in reporting. A negative integer will include instances with values at or below the limit in reporting. A value of zero performs no limit filtering. This option will \fInot\fP override possible per-metric specifications. See also .BR \-J " and " .BR \-N . .TP \fB\-9\fR \fIlimit\fR, \fB\-\-limit\-filter\-force\fR=\fIlimit\fR Like .B \-8 but this option \fIwill\fP override per-metric specifications. .TP \fB\-a\fR \fIarchive\fR, \fB\-\-archive\fR=\fIarchive\fR Performance metric values are retrieved from the set of Performance Co-Pilot (PCP) archive files identified by the .I archive argument, which is a comma-separated list of names, each of which may be the base name of an archive or the name of a directory containing one or more archives. .TP \fB\-A\fR \fIalign\fR, \fB\-\-align\fR=\fIalign\fR Force the initial sample to be aligned on the boundary of a natural time unit .IR align . Refer to .BR PCPIntro (1) for a complete description of the syntax for .IR align . .TP \fB\-\-archive\-folio\fR=\fIfolio\fR Read metric source archives from the PCP archive .I folio created by tools like .BR pmchart (1) or, less often, manually with .BR mkaf (1). .TP \fB\-b\fR \fIscale\fR, \fB\-\-space\-scale\fR=\fIscale\fR .I Unit/scale for space (byte) metrics, possible values include .BR bytes , .BR Kbytes , .BR KB , .BR Mbytes , .BR MB , and so forth. This option will \fInot\fP override possible per-metric specifications. See also .BR pmParseUnitsStr (3). .TP \fB\-B\fR \fIscale\fR, \fB\-\-space\-scale\-force\fR=\fIscale\fR Like .B \-b but this option \fIwill\fP override per-metric specifications. .TP \fB\-c\fR \fIconfig\fR, \fB\-\-config\fR=\fIconfig\fR Specify the .I config file or directory to use. In case \fIconfig\fP is a directory all files in it ending \fB.conf\fR will be included. The default is the first found of: .IR ./pcp2spark.conf , .IR \f(CR$HOME\fP/.pcp2spark.conf , .IR \f(CR$HOME\fP/pcp/pcp2spark.conf , and .IR \f(CR$PCP_SYSCONF_DIR\fP/pcp2spark.conf . For details, see the above section and .BR pmrep.conf (5). .TP \fB\-\-container\fR=\fIcontainer\fR Fetch performance metrics from the specified .IR container , either local or remote (see .BR \-h ). .TP \fB\-C\fR, \fB\-\-check\fR Exit before reporting any values, but after parsing the configuration and metrics and printing possible headers. .TP .B \-\-daemonize Daemonize on startup. .TP \fB\-e\fR \fIderived\fR, \fB\-\-derived\fR=\fIderived\fR Specify .I derived performance metrics. If .I derived starts with a slash (``/'') or with a dot (``.'') it will be interpreted as a PCP derived metrics configuration file, otherwise it will be interpreted as comma- or semicolon-separated derived metric expressions. For complete description of derived metrics and PCP derived metrics configuration files see .BR pmLoadDerivedConfig (3) and .BR pmRegisterDerived (3). Alternatively, using .BR pmrep.conf (5) configuration syntax allows defining derived metrics as part of metricsets. .RS .PP In case of issues with derived metrics, review the aforementioned manual pages in detail and ensure all the required metrics are available, especially when using archives. Use .B -Dderive to see additional debug information about parsing derived metrics. .RE .TP \fB\-g\fR \fIserver\fR, \fB\-\-spark\-server\fR=\fIserver\fR .B pcp2spark local .I server address. .TP \fB\-G\fR, \fB\-\-no\-globals\fR Do not include global metrics in reporting (see .BR pmrep.conf (5)). .TP \fB\-h\fR \fIhost\fR, \fB\-\-host\fR=\fIhost\fR Fetch performance metrics from .BR pmcd (1) on .IR host , rather than from the default localhost. .TP \fB\-H\fR, \fB\-\-no\-header\fR Do not print any headers. .TP \fB\-i\fR \fIinstances\fR, \fB\-\-instances\fR=\fIinstances\fR Retrieve and report only the specified metric .IR instances . By default all instances, present and future, are reported. .RS .PP Refer to .BR pmrep (1) for complete description of this option. .RE .TP \fB\-I\fR, \fB\-\-ignore\-incompat\fR Ignore incompatible metrics. By default incompatible metrics (that is, their type is unsupported or they cannot be scaled as requested) will cause .B pcp2spark to terminate with an error message. With this option all incompatible metrics are silently omitted from reporting. This may be especially useful when requesting non-leaf nodes of the PMNS tree for reporting. .TP \fB\-j\fR, \fB\-\-live\-filter\fR Perform instance live filtering. This allows capturing all named instances even if processes are restarted at some point (unlike without live filtering). Performing live filtering over a huge number of instances will add some internal overhead so a bit of user caution is advised. See also .BR \-n . .TP \fB\-J\fR \fIrank\fR, \fB\-\-rank\fR=\fIrank\fR Limit results to highest/lowest .IR rank ed instances of set-valued metrics. A positive integer will include highest valued instances in reporting. A negative integer will include lowest valued instances in reporting. A value of zero performs no ranking. Ranking does not imply sorting, see .BR \-6 . See also .BR \-8 . .TP \fB\-K\fR \fIspec\fR, \fB\-\-spec\-local\fR=\fIspec\fR When fetching metrics from a local context (see .BR \-L ), the .B \-K option may be used to control the DSO PMDAs that should be made accessible. The .I spec argument conforms to the syntax described in .BR pmSpecLocalPMDA (3). More than one .B \-K option may be used. .TP \fB\-L\fR, \fB\-\-local\-PMDA\fR Use a local context to collect metrics from DSO PMDAs on the local host without PMCD. See also .BR \-K . .TP \fB\-m\fR, \fB\-\-include\-labels\fR Include PCP metric labels in the output. .TP \fB\-n\fR, \fB\-\-invert\-filter\fR Perform ranking before live filtering. By default instance live filtering (when requested, see .BR \-j ) happens before instance ranking (when requested, see .BR \-J ). With this option the logic is inverted and ranking happens before live filtering. .TP \fB\-N\fR \fIpredicate\fR, \fB\-\-predicate\fR=\fIpredicate\fR Specify a comma-separated list of .I predicate filter reference metrics. By default ranking (see .BR \-J ) happens for each metric individually. With predicates, ranking is done only for the specified predicate metrics. When reporting, rest of the metrics sharing the same .I instance domain (see .BR PCPIntro (1)) as the predicate will include only the highest/lowest ranking instances of the corresponding predicate. Ranking does not imply sorting, see .BR \-6 . .RS .PP So for example, using \fBproc.memory.rss\fP (resident memory size of process) as the .I predicate metric together with \fBproc.io.total_bytes\fP and \fBmem.util.used\fP as metrics to be reported, only the processes using most/least (as per .BR \-J ) memory will be included when reporting total bytes written by processes. Since \fBmem.util.used\fP is a single-valued metric (thus not sharing the same instance domain as the process related metrics), it will be reported as usual. .RE .TP \fB\-O\fR \fIorigin\fR, \fB\-\-origin\fR=\fIorigin\fR When reporting archived metrics, start reporting at .I origin within the time window (see .B \-S and .BR \-T ). Refer to .BR PCPIntro (1) for a complete description of the syntax for .IR origin . .TP \fB\-p\fR \fIport\fR, \fB\-\-spark\-port\fR=\fIport\fR .B pcp2spark local .IR port . .TP \fB\-P\fR \fIprecision\fR, \fB\-\-precision\fR=\fIprecision\fR Use .I precision for numeric non-integer output values. The default is to use 3 decimal places (when applicable). This option will \fInot\fP override possible per-metric specifications. .TP \fB\-q\fR \fIscale\fR, \fB\-\-count\-scale\fR=\fIscale\fR .I Unit/scale for count metrics, possible values include .BR "count x 10^\-1" , .BR "count" , .BR "count x 10" , .BR "count x 10^2" , and so forth from .B 10^\-8 to .BR 10^7 . .\" https://bugzilla.redhat.com/show_bug.cgi?id=1264124 (These values are currently space-sensitive.) This option will \fInot\fP override possible per-metric specifications. See also .BR pmParseUnitsStr (3). .TP \fB\-Q\fR \fIscale\fR, \fB\-\-count\-scale\-force\fR=\fIscale\fR Like .B \-q but this option \fIwill\fP override per-metric specifications. .TP \fB\-r\fR, \fB\-\-raw\fR Output raw metric values, do not convert cumulative counters to rates. This option \fIwill\fP override possible per-metric specifications. .TP \fB\-R\fR, \fB\-\-raw\-prefer\fR Like .B \-r but this option will \fInot\fP override per-metric specifications. .TP \fB\-s\fR \fIsamples\fR, \fB\-\-samples\fR=\fIsamples\fR The .I samples argument defines the number of samples to be retrieved and reported. If .I samples is 0 or .B \-s is not specified, .B pcp2spark will sample and report continuously (in real time mode) or until the end of the set of PCP archives (in archive mode). See also .BR \-T . .TP \fB\-S\fR \fIstarttime\fR, \fB\-\-start\fR=\fIstarttime\fR When reporting archived metrics, the report will be restricted to those records logged at or after .IR starttime . Refer to .BR PCPIntro (1) for a complete description of the syntax for .IR starttime . .TP \fB\-t\fR \fIinterval\fR, \fB\-\-interval\fR=\fIinterval\fR Set the reporting .I interval to something other than the default 1 second. The .I interval argument follows the syntax described in .BR PCPIntro (1), and in the simplest form may be an unsigned integer (the implied units in this case are seconds). See also the .B \-T option. .TP \fB\-T\fR \fIendtime\fR, \fB\-\-finish\fR=\fIendtime\fR When reporting archived metrics, the report will be restricted to those records logged before or at .IR endtime . Refer to .BR PCPIntro (1) for a complete description of the syntax for .IR endtime . .RS .PP When used to define the runtime before \fBpcp2spark\fP will exit, if no \fIsamples\fP is given (see \fB\-s\fP) then the number of reported samples depends on \fIinterval\fP (see \fB\-t\fP). If .I samples is given then .I interval will be adjusted to allow reporting of .I samples during runtime. In case all of .BR \-T , .BR \-s , and .B \-t are given, .I endtime determines the actual time .B pcp2spark will run. .RE .TP \fB\-v\fR, \fB\-\-omit\-flat\fR Report only set-valued metrics with instances (e.g. disk.dev.read) and omit single-valued ``flat'' metrics without instances (e.g. kernel.all.sysfork). See .B \-i and .BR \-I . .TP \fB\-V\fR, \fB\-\-version\fR Display version number and exit. .TP \fB\-y\fR \fIscale\fR, \fB\-\-time\-scale\fR=\fIscale\fR .I Unit/scale for time metrics, possible values include .BR nanosec , .BR ns , .BR microsec , .BR us , .BR millisec , .BR ms , and so forth up to .BR hour , .BR hr . This option will \fInot\fP override possible per-metric specifications. See also .BR pmParseUnitsStr (3). .TP \fB\-Y\fR \fIscale\fR, \fB\-\-time\-scale\-force\fR=\fIscale\fR Like .B \-y but this option \fIwill\fP override per-metric specifications. .TP \fB\-?\fR, \fB\-\-help\fR Display usage message and exit. .SH FILES .TP 5 .I pcp2spark.conf \fBpcp2spark\fP configuration file (see \fB\-c\fP) .TP .I \f(CR$PCP_SYSCONF_DIR\fP/pmrep/*.conf system provided default \fBpmrep\fP configuration files .SH PCP ENVIRONMENT Environment variables with the prefix \fBPCP_\fP are used to parameterize the file and directory names used by PCP. On each installation, the file \fI/etc/pcp.conf\fP contains the local values for these variables. The \fB$PCP_CONF\fP variable may be used to specify an alternative configuration file, as described in \fBpcp.conf\fP(5). .PP For environment variables affecting PCP tools, see \fBpmGetOptions\fP(3). .SH SEE ALSO .BR PCPIntro (1), .BR mkaf (1), .BR pcp (1), .BR pcp2elasticsearch (1), .BR pcp2graphite (1), .BR pcp2influxdb (1), .BR pcp2json (1), .BR pcp2xlsx (1), .BR pcp2xml (1), .BR pcp2zabbix (1), .BR pmcd (1), .BR pminfo (1), .BR pmrep (1), .BR pmGetOptions (3), .BR pmLoadDerivedConfig (3), .BR pmParseUnitsStr (3), .BR pmRegisterDerived (3), .BR pmSpecLocalPMDA (3), .BR LOGARCHIVE (5), .BR pcp.conf (5), .BR pmrep.conf (5) and .BR PMNS (5). .\" control lines for scripts/man-spell .\" +ok+ limit_filter_force count_scale_force .\" +ok+ space_scale_force time_scale_force ignore_incompat precision_force .\" +ok+ include_labels invert_filter CGHIjLmnrRvV names_change .\" +ok+ limit_filter spark_server live_filter total_bytes count_scale .\" +ok+ space_scale type_prefer metricsets time_scale spark_port .\" +ok+ omit_flat incompat influxdb github .\" +ok+ src