URLWATCH-JOBS(5)                   urlwatch                   URLWATCH-JOBS(5)

NAME
       urlwatch-jobs - Job types and configuration for urlwatch

SYNOPSIS
       urlwatch --edit

DESCRIPTION
       Jobs are the kind of things that urlwatch(1) can monitor.

       The list of jobs to run are contained in the configuration file
       urls.yaml, accessed with the command urlwatch --edit, each separated by
       a line containing only ---. The command urlwatch --list prints the name
       of each job, along with its index number (1, 2, 3, ...) which gets
       assigned automatically according to its position in the configuration
       file.

       While optional, it is recommended that each job starts with a name
       entry:

          name: "This is a human-readable name/label of the job"

       The following job types are available:

URL
       This is the main job type -- it retrieves a document from a web server:

          name: "urlwatch homepage"
          url: "https://thp.io/2008/urlwatch/"

       Required keys:

       o url: The URL to the document to watch for changes

       Job-specific optional keys:

       o cookies: Cookies to send with the request (see Advanced Topics)

       o method: HTTP method to use (default: GET)

       o data: HTTP POST/PUT data

       o ssl_no_verify: Do not verify SSL certificates (true/false)

       o ignore_cached: Do not use cache control (ETag/Last-Modified) values
         (true/false)

       o http_proxy: Proxy server to use for HTTP requests

       o https_proxy: Proxy server to use for HTTPS requests

       o headers: HTTP header to send along with the request

       o encoding: Override the character encoding from the server (see
         Advanced Topics)

       o timeout: Override the default socket timeout (see Advanced Topics)

       o ignore_connection_errors: Ignore (temporary) connection errors (see
         Advanced Topics)

       o ignore_http_error_codes: List of HTTP errors to ignore (see Advanced
         Topics)

       o ignore_timeout_errors: Do not report errors when the timeout is hit

       o ignore_too_many_redirects: Ignore redirect loops (see Advanced
         Topics)

       (Note: url implies kind: url)

BROWSER
       This job type is a resource-intensive variant of "URL" to handle web
       pages that require JavaScript to render the content being monitored.

       The optional playwright package must be installed in order to run
       Browser jobs (see Dependencies). You will also need to install the
       browsers using playwright install (see Playwright Installation
       <https://playwright.dev/python/docs/intro> for details).

          name: "A page with JavaScript"
          navigate: "https://example.org/"

       Required keys:

       o navigate: URL to navigate to with the browser

       Job-specific optional keys:

       o wait_until: Either load, domcontentloaded, networkidle, or commit
         (see Advanced Topics)

       o useragent: User-Agent header used for requests (otherwise browser
         default is used)

       o browser:  Either chromium, chrome, chrome-beta, msedge, msedge-beta,
         msedge-dev, firefox, webkit (must be installed with playwright
         install)

       Because this job uses Playwright <https://playwright.dev/python/> to
       render the page in a headless browser instance, it uses massively more
       resources than a "URL" job. Use it only on pages where url does not
       return the correct results. In many cases, instead of using a "Browser"
       job, you can use the output of an API called by the page as it loads,
       which contains the information you are you're looking for by using the
       much faster "URL" job type.

       (Note: navigate implies kind: browser)

SHELL
       This job type allows you to watch the output of arbitrary shell
       commands, which is useful for e.g. monitoring an FTP uploader folder,
       output of scripts that query external devices (RPi GPIO), etc...

          name: "What is in my Home Directory?"
          command: "ls -al ~"

       Required keys:

       o command: The shell command to execute

       Job-specific optional keys:

       o stderr: Change how standard error is treated, see below

       (Note: command implies kind: shell)

   Configuring stderr behavior for shell jobs
       By default urlwatch captures stderr for error reporting (non-zero exit
       code), but ignores the output when the shell job exits with exit code
       0.

       This behavior can be customized using the stderr key:

       o ignore: Capture stderr, report on non-zero exit code, ignore
         otherwise (default)

       o urlwatch: stderr of the shell job is sent to stderr of the urlwatch
         process; any error message on stderr will not be visible in the error
         message from the reporter (legacy default behavior of urlwatch 2.24
         and older)

       o fail: Treat the job as failed if there is any output on stderr, even
         with exit status 0

       o stdout: Merge stderr output into stdout, which means stderr output is
         also considered for the change detection/diff part of urlwatch (this
         is similar to 2>&1 in a shell)

       For example, this job definition will make the job appear as failed,
       even though the script exits with exit code 0:

          command: |
            echo "Normal standard output."
            echo "Something goes to stderr, which makes this job fail." 1>&2
            exit 0
          stderr: fail

       On the other hand, if you want to diff both stdout and stderr of the
       job, use this:

          command: |
            echo "An important line on stdout."
            echo "Another important line on stderr." 1>&2
          stderr: stdout

OPTIONAL KEYS FOR ALL JOB TYPES

       o name: Human-readable name/label of the job

       o filter: Filters (if any) to apply to the output (can be tested with
         --test-filter)

       o max_tries: Number of times to retry fetching the resource

       o diff_tool: Command to a custom tool for generating diff text

       o diff_filter: Filters (if any) to apply to the diff result (can be
         tested with --test-diff-filter)

       o treat_new_as_changed: Will treat jobs that don't have any historic
         data as CHANGED instead of NEW (and create a diff for new jobs)

       o compared_versions: Number of versions to compare for similarity

       o kind (redundant): Either url, shell or browser.  Automatically
         derived from the unique key (url, command or navigate) of the job
         type

       o user_visible_url: Different URL to show in reports (e.g. when watched
         URL is a REST API URL, and you want to show a webpage)

SETTING KEYS FOR ALL JOBS AT ONCE
       The main Configuration file has a job_defaults key that can be used to
       configure keys for all jobs at once.

       See urlwatch-config(5) for how to configure job defaults.

EXAMPLES
       See urlwatch-cookbook(7) for example job configurations.

FILES
       $XDG_CONFIG_HOME/urlwatch/urls.yaml

SEE ALSO
       urlwatch(1), urlwatch-intro(5), urlwatch-filters(5)

COPYRIGHT
       2023 Thomas Perl

                                  May 3, 2023                 URLWATCH-JOBS(5)