'\" t
.\" Title: bogofilter
.\" Author: [see the "AUTHOR" section]
.\" Generator: DocBook XSL Stylesheets vsnapshot
.\" Date: 05/19/2019
.\" Manual: Bogofilter Reference Manual
.\" Source: Bogofilter
.\" Language: English
.\"
.TH "BOGOFILTER" "1" "05/19/2019" "Bogofilter" "Bogofilter Reference Manual"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
bogofilter \- fast Bayesian spam filter
.SH "SYNOPSIS"
.HP \w'\fBbogofilter\fR\ 'u
\fBbogofilter\fR [help\ options | classification\ options | registration\ options | parameter\ options | info\ options] [general\ options] [config\ file\ options]
.PP
where
.PP
\fBhelp options\fR
are:
.HP \w'\ 'u
[\-h] [\-\-help] [\-V] [\-Q]
.PP
\fBclassification options\fR
are:
.HP \w'\ 'u
[\-p] [\-e] [\-t] [\-T] [\-u] [\-H] [\-M] [\-b] [\-B\ \fIobject\ \&.\&.\&.\fR] [\-R] [general\ options] [parameter\ options] [config\ file\ options]
.PP
\fBregistration options\fR
are:
.HP \w'\ 'u
[\-s | \-n] [\-S | \-N] [general\ options]
.PP
\fBgeneral options\fR
are:
.HP \w'\ 'u
[\-c\ \fIfilename\fR] [\-C] [\-d\ \fIdir\fR] [\-k\ \fIcachesize\fR] [\-l] [\-L\ \fItag\fR] [\-I\ \fIfilename\fR] [\-O\ \fIfilename\fR]
.PP
\fBparameter options\fR
are:
.HP \w'\ 'u
[\-E\ \fIvalue\fR\fI[,value]\fR] [\-m\ \fIvalue\fR\fI[,value]\fR\fI[,value]\fR] [\-o\ \fIvalue\fR\fI[,value]\fR]
.PP
\fBinfo options\fR
are:
.HP \w'\ 'u
[\-v] [\-y\ \fIdate\fR] [\-D] [\-x\ \fIflags\fR]
.PP
\fBconfig file options\fR
are:
.HP \w'\ 'u
[\-\-\fIoption=value\fR]
.PP
Note: Use
\fBbogofilter \-\-help\fR
to display the complete list of options\&.
.SH "DESCRIPTION"
.PP
Bogofilter
is a Bayesian spam filter\&. In its normal mode of operation, it takes an email message or other text on standard input, does a statistical check against lists of "good" and "bad" words, and returns a status code indicating whether or not the message is spam\&.
Bogofilter
is designed with a fast algorithm, uses the Berkeley DB for fast startup and lookups, coded directly in C, and tuned for speed, so it can be used for production by sites that process a lot of mail\&.
.SH "THEORY OF OPERATION"
.PP
Bogofilter
treats its input as a bag of tokens\&. Each token is checked against a wordlist, which maintains counts of the numbers of times it has occurred in non\-spam and spam mails\&. These numbers are used to compute an estimate of the probability that a message in which the token occurs is spam\&. Those are combined to indicate whether the message is spam or ham\&.
.PP
While this method sounds crude compared to the more usual pattern\-matching approach, it turns out to be extremely effective\&. Paul Graham\*(Aqs paper
\m[blue]\fBA Plan For Spam\fR\m[]\&\s-2\u[1]\d\s+2
is recommended reading\&.
.PP
This program substantially improves on Paul\*(Aqs proposal by doing smarter lexical analysis\&.
Bogofilter
does proper MIME decoding and a reasonable HTML parsing\&. Special kinds of tokens like hostnames and IP addresses are retained as recognition features rather than broken up\&. Various kinds of MTA cruft such as dates and message\-IDs are ignored so as not to bloat the wordlist\&. Tokens found in various header fields are marked appropriately\&.
.PP
Another improvement is that this program offers Gary Robinson\*(Aqs suggested modifications to the calculations (see the parameters robx and robs below)\&. These modifications are described in Robinson\*(Aqs paper
\m[blue]\fBSpam Detection\fR\m[]\&\s-2\u[2]\d\s+2\&.
.PP
Since then, Robinson (see his Linux Journal article
\m[blue]\fBA Statistical Approach to the Spam Problem\fR\m[]\&\s-2\u[3]\d\s+2) and others have realized that the calculation can be further optimized using Fisher\*(Aqs method\&.
\m[blue]\fBAnother improvement\fR\m[]\&\s-2\u[4]\d\s+2
compensates for token redundancy by applying separate effective size factors (ESF) to spam and nonspam probability calculations\&.
.PP
In short, this is how it works: The estimates for the spam probabilities of the individual tokens are combined using the "inverse chi\-square function"\&. Its value indicates how badly the null hypothesis that the message is just a random collection of independent words with probabilities given by our previous estimates fails\&. This function is very sensitive to small probabilities (hammish words), but not to high probabilities (spammish words); so the value only indicates strong hammish signs in a message\&. Now using inverse probabilities for the tokens, the same computation is done again, giving an indicator that a message looks strongly spammish\&. Finally, those two indicators are subtracted (and scaled into a 0\-1\-interval)\&. This combined indicator (bogosity) is close to 0 if the signs for a hammish message are stronger than for a spammish message and close to 1 if the situation is the other way round\&. If signs for both are equally strong, the value will be near 0\&.5\&. Since those message don\*(Aqt give a clear indication there is a tristate mode in
bogofilter
to mark those messages as unsure, while the clear messages are marked as spam or ham, respectively\&. In two\-state mode, every message is marked as either spam or ham\&.
.PP
Various parameters influence these calculations, the most important are:
.PP
robx: the score given to a token which has not seen before\&. robx is the probability that the token is spammish\&.
.PP
robs: a weight on robx which moves the probability of a little seen token towards robx\&.
.PP
min\-dev: a minimum distance from \&.5 for tokens to use in the calculation\&. Only tokens farther away from 0\&.5 than this value are used\&.
.PP
spam\-cutoff: messages with scores greater than or equal to will be marked as spam\&.
.PP
ham\-cutoff: If zero or spam\-cutoff, all messages with values strictly below spam\-cutoff are marked as ham, all others as spam (two\-state)\&. Else values less than or equal to ham\-cutoff are marked as ham, messages with values strictly between ham\-cutoff and spam\-cutoff are marked as unsure; the rest as spam (tristate)
.PP
sp\-esf: the effective size factor (ESF) for spam\&.
.PP
ns\-esf: the ESF for nonspam\&. These ESF values default to 1\&.0, which is the same as not using ESF in the calculation\&. Values suitable to a user\*(Aqs email population can be determined with the aid of the
bogotune
program\&.
.SH "OPTIONS"
.PP
HELP OPTIONS
.PP
The
\fB\-h\fR
option prints the help message and exits\&.
.PP
The
\fB\-V\fR
option prints the version number and exits\&.
.PP
The
\fB\-Q\fR
(query) option prints
bogofilter\*(Aqs configuration, i\&.e\&. registration parameters, parsing options,
bogofilter
directory, etc\&.
.PP
CLASSIFICATION OPTIONS
.PP
The
\fB\-p\fR
(passthrough) option outputs the message with an X\-Bogosity line at the end of the message header\&. This requires keeping the entire message in memory when it\*(Aqs read from stdin (or from a pipe or socket)\&. If the message is read from a file that can be rewound,
bogofilter
will read it a second time\&.
.PP
The
\fB\-e\fR
(embed) option tells
bogofilter
to exit with code 0 if the message can be classified, i\&.e\&. if there is not an error\&. Normally
bogofilter
uses different codes for spam, ham, and unsure classifications, but this simplifies using
bogofilter
with
procmail
or
maildrop\&.
.PP
The
\fB\-t\fR
(terse) option tells
bogofilter
to print an abbreviated spamicity message containing 1 letter and the score\&. Spam is indicated with "Y", ham by "N", and unsure by "U"\&. Note: the formatting can be customized using the config file\&.
.PP
The
\fB\-T\fR
provides an invariant terse mode for scripts to use\&.
bogofilter
will print an abbreviated spamicity message containing 1 letter and the score\&. Spam is indicated with "S", ham by "H", and unsure by "U"\&.
.PP
The
\fB\-TT\fR
provides an invariant terse mode for scripts to use\&.
Bogofilter
prints only the score and displays it to 16 significant digits\&.
.PP
The
\fB\-u\fR
option tells
bogofilter
to register the message\*(Aqs text after classifying it as spam or non\-spam\&. A spam message will be registered on the spamlist and a non\-spam message on the goodlist\&. If the classification is "unsure", the message will not be registered\&. Effectively this option runs
bogofilter
with the
\fB\-s\fR
or
\fB\-n\fR
flag, as appropriate\&. Caution is urged in the use of this capability, as any classification errors
bogofilter
may make will be preserved and will accumulate until manually corrected with the
\fB\-Sn\fR
and
\fB\-Ns\fR
option combinations\&. Note this option causes the database to be opened for write access, which can entail massive slowdowns through lock contention and synchronous I/O operations\&.
.PP
The
\fB\-H\fR
option tells
bogofilter
to not tag tokens from the header\&. This option is for testing, you should not use it in normal operation\&.
.PP
The
\fB\-M\fR
option tells
bogofilter
to process its input as a mbox formatted file\&. If the
\fB\-v\fR
or
\fB\-t\fR
option is also given, a spamicity line will be printed for each message\&.
.PP
The
\fB\-b\fR
(streaming bulk mode) option tells
bogofilter
to classify multiple objects whose names are read from stdin\&. If the
\fB\-v\fR
or
\fB\-t\fR
option is also given,
bogofilter
will print a line giving file name and classification information for each file\&. This is an alternative to
\fB\-B\fR
which lists objects on the command line\&.
.PP
An object in this context shall be a maildir (autodetected), or if it\*(Aqs not a maildir, a single mail unless
\fB\-M\fR
is given \- in that case it\*(Aqs processed as mbox\&. (The Content\-Length: header is not taken into account currently\&.)
.PP
When reading mbox format,
bogofilter
relies on the empty line after a mail\&. If needed,
\fBformail \-es\fR
will ensure this is the case\&.
.PP
The
\fB\-B \fR\fB\fIobject \&.\&.\&.\fR\fR
(bulk mode) option tells
bogofilter
to classify multiple objects named on the command line\&. The objects may be filenames (for single messages), mailboxes (files with multiple messages), or directories (of maildir and MH format)\&. If the
\fB\-v\fR
or
\fB\-t\fR
option is also given,
bogofilter
will print a line giving file name and classification information for each file\&. This is an alternative to
\fB\-b\fR
which lists objects on stdin\&.
.PP
The
\fB\-R\fR
option tells
bogofilter
to output an R data frame in text form on the standard output\&. See the section on integration with R, below, for further detail\&.
.PP
REGISTRATION OPTIONS
.PP
The
\fB\-s\fR
option tells
bogofilter
to register the text presented as spam\&. The database is created if absent\&.
.PP
The
\fB\-n\fR
option tells
bogofilter
to register the text presented as non\-spam\&.
.PP
Bogofilter
doesn\*(Aqt detect if a message registered twice\&. If you do this by accident, the token counts will off by 1 from what you really want and the corresponding spam scores will be slightly off\&. Given a large number of tokens and messages in the wordlist, this doesn\*(Aqt matter\&. The problem
\fIcan\fR
be corrected by using the
\fB\-S\fR
option or the
\fB\-N\fR
option\&.
.PP
The
\fB\-S\fR
option tells
bogofilter
to undo a prior registration of the same message as spam\&. If a message was incorrectly entered as spam by
\fB\-s\fR
or
\fB\-u\fR
and you want to remove it and enter it as non\-spam, use
\fB\-Sn\fR\&. If
\fB\-S\fR
is used for a message that wasn\*(Aqt registered as spam, the counts will still be decremented\&.
.PP
The
\fB\-N\fR
option tells
bogofilter
to undo a prior registration of the same message as non\-spam\&. If a message was incorrectly entered as non\-spam by
\fB\-n\fR
or
\fB\-u\fR
and you want to remove it and enter it as spam, then use
\fB\-Ns\fR\&. If
\fB\-N\fR
is used for a message that wasn\*(Aqt registered as non\-spam, the counts will still be decremented\&.
.PP
GENERAL OPTIONS
.PP
The
\fB\-c \fR\fB\fIfilename\fR\fR
option tells
bogofilter
to read the config file named\&.
.PP
The
\fB\-C\fR
option prevents
bogofilter
from reading configuration files\&.
.PP
The
\fB\-d \fR\fB\fIdir\fR\fR
option allows you to set the directory for the database\&. See the ENVIRONMENT section for other directory setting options\&.
.PP
The
\fB\-k \fR\fB\fIcachesize\fR\fR
option sets the cache size for the BerkeleyDB subsystem, in units of 1 MiB (1,048,576 bytes)\&. Properly sizing the cache improves
bogofilter\*(Aqs performance\&. The recommended size is one third of the size of the database file\&. You can run the
bogotune
script (in the tuning directory) to determine the recommended size\&.
.PP
The
\fB\-l\fR
option writes an informational line to the system log each time
bogofilter
is run\&. The information logged depends on how
bogofilter
is run\&.
.PP
The
\fB\-L \fR\fB\fItag\fR\fR
option configures a tag which can be included in the information being logged by the
\fB\-l\fR
option, but it requires a custom format that includes the %l string for now\&. This option implies
\fB\-l\fR\&.
.PP
The
\fB\-I \fR\fB\fIfilename\fR\fR
option tells
bogofilter
to read its input from the specified file, rather than from
\fBstdin\fR\&.
.PP
The
\fB\-O \fR\fB\fIfilename\fR\fR
option tells
bogofilter
where to write its output in passthrough mode\&. Note that this only works when \-p is explicitly given\&.
.PP
PARAMETER OPTIONS
.PP
The
\fB\-E \fR\fB\fIvalue\fR\fI[,value]\fR\fR
option allows setting the sp\-esf value and the ns\-esf value\&. With two values, both sp\-esf and ns\-esf are set\&. If only one value is given, parameters are set as described in the note below\&.
.PP
The
\fB\-m \fR\fB\fIvalue\fR\fI[,value]\fR\fI[,value]\fR\fR
option allows setting the min\-dev value and, optionally, the robs and robx values\&. With three values, min\-dev, robs, and robx are all set\&. If fewer values are given, parameters are set as described in the note below\&.
.PP
The
\fB\-o \fR\fB\fIvalue\fR\fI[,value]\fR\fR
option allows setting the spam\-cutoff ham\-cutoff values\&. With two values, both spam\-cutoff and ham\-cutoff are set\&. If only one value is given, parameters are set as described in the note below\&.
.PP
Note: All of these options allow fewer values to be provided\&. Values can be skipped by using just the comma delimiter, in which case the corresponding parameter(s) won\*(Aqt be changed\&. If only the first value is provided, then only the first parameter is set\&. Trailing values can be skipped, in which case the corresponding parameters won\*(Aqt be changed\&. Within the parameter list, spaces are not allowed after commas\&.
.PP
INFO OPTIONS
.PP
The
\fB\-v\fR
option produces a report to standard output on
bogofilter\*(Aqs analysis of the input\&. Each additional
\fBv\fR
will increase the verbosity of the output, up to a maximum of 4\&. With
\fB\-vv\fR, the report lists the tokens with highest deviation from a mean of 0\&.5 association with spam\&.
.PP
Option
\fB\-y date\fR
can be used to override the current date when timestamping tokens\&. A value of zero (0) turns off timestamping\&.
.PP
The
\fB\-D\fR
option redirects debug output to stdout\&.
.PP
The
\fB\-x \fR\fB\fIflags\fR\fR
option allows setting of debug flags for printing debug information\&. See header file debug\&.h for the list of usable flags\&.
.PP
CONFIG FILE OPTIONS
.PP
Using GNU longopt
\fB\-\-\fR
syntax, a config file\*(Aqs
\fB\fIname=value\fR\fR
statement becomes a command line\*(Aqs
\fB\-\-\fR\fB\fIoption=value\fR\fR\&. Use command
\fBbogofilter \-\-help\fR
for a list of options and see
bogofilter\&.cf\&.example
for more info on them\&. For example to change the X\-Bogosity header to "X\-Spam\-Header", use:
.PP
\fB\fI\-\-spam\-header\-name=X\-Spam\-Header\fR\fR
.SH "ENVIRONMENT"
.PP
Bogofilter
uses a database directory, which can be set in the config file\&. If not set there,
bogofilter
will use the value of
\fBBOGOFILTER_DIR\fR\&. Both can be overridden by the
\fB\-d \fR\fB\fIdir\fR\fR
option\&. If none of that is available,
bogofilter
will use directory
$HOME/\&.bogofilter\&.
.SH "CONFIGURATION"
.PP
The
bogofilter
command line allows setting of many options that determine how
bogofilter
operates\&. File
/etc/bogofilter/bogofilter\&.cf
can be used to set additional parameters that affect its operation\&. File
/etc/bogofilter/bogofilter\&.cf\&.example
has samples of all of the parameters\&. Status and logging messages can be customized for each site\&.
.SH "RETURN VALUES"
.PP
0 for spam; 1 for non\-spam; 2 for unsure ; 3 for I/O or other errors\&.
.PP
If both
\fB\-p\fR
and
\fB\-e\fR
are used, the return values are: 0 for spam or non\-spam; 3 for I/O or other errors\&.
.PP
Error 3 usually means that the wordlist file
bogofilter
wants to read at startup is missing or the hard disk has filled up in
\fB\-p\fR
mode\&.
.SH "INTEGRATION WITH OTHER TOOLS"
.PP
Use with procmail
.PP
The following recipe (a) spam\-bins anything that
bogofilter
rates as spam, (b) registers the words in messages rated as spam as such, and (c) registers the words in messages rated as non\-spam as such\&. With this in place, it will normally only be necessary for the user to intervene (with
\fB\-Ns\fR
or
\fB\-Sn\fR) when
bogofilter
miscategorizes something\&.
.sp
.if n \{\
.RS 4
.\}
.nf
# filter mail through bogofilter, tagging it as Ham, Spam, or Unsure,
# and updating the wordlist
:0fw
| bogofilter \-u \-e \-p
# if bogofilter failed, return the mail to the queue;
# the MTA will retry to deliver it later
# 75 is the value for EX_TEMPFAIL in /usr/include/sysexits\&.h
:0e
{ EXITCODE=75 HOST }
# file the mail to spam\-bogofilter if it\*(Aqs spam\&.
:0:
* ^X\-Bogosity: Spam, tests=bogofilter
spam\-bogofilter
# file the mail to unsure\-bogofilter
# if it\*(Aqs neither ham nor spam\&.
:0:
* ^X\-Bogosity: Unsure, tests=bogofilter
unsure\-bogofilter
# With this recipe, you can train bogofilter starting with an empty
# wordlist\&. Be sure to check your unsure\-folder regularly, take the
# messages out of it, classify them as ham (or spam), and use them to
# train bogofilter\&.
.fi
.if n \{\
.RE
.\}
.PP
The following procmail rule will take mail on stdin and save it to file
spam
if
bogofilter
thinks it\*(Aqs spam:
.sp
.if n \{\
.RS 4
.\}
.nf
:0HB:
* ? bogofilter
spam
.fi
.if n \{\
.RE
.\}
.sp
and this similar rule will also register the tokens in the mail according to the
bogofilter
classification:
.sp
.if n \{\
.RS 4
.\}
.nf
:0HB:
* ? bogofilter \-u
spam
.fi
.if n \{\
.RE
.\}
.PP
If
bogofilter
fails (returning 3) the message will be treated as non\-spam\&.
.PP
This one is for
maildrop, it automatically defers the mail and retries later when the
xfilter
command fails, use this in your
~/\&.mailfilter:
.sp
.if n \{\
.RS 4
.\}
.nf
xfilter "bogofilter \-u \-e \-p"
if (/^X\-Bogosity: Spam, tests=bogofilter/)
{
to "spam\-bogofilter"
}
.fi
.if n \{\
.RE
.\}
.PP
The following
\&.muttrc
lines will create mutt macros for dispatching mail to
bogofilter\&.
.sp
.if n \{\
.RS 4
.\}
.nf
macro index d "unset wait_key\en\e
bogofilter \-n\en\e
set wait_key\en\e
" "delete message as non\-spam"
macro index \eed "unset wait_key\en\e
bogofilter \-s\en\e
set wait_key\en\e
" "delete message as spam"
.fi
.if n \{\
.RE
.\}
.PP
Integration with Mail Transport Agent (MTA)
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
bogofilter
can also be integrated into an MTA to filter all incoming mail\&. While the specific implementation is MTA dependent, the general steps are as follows:
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
Install
bogofilter
on the mail server
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
Prime the
bogofilter
databases with a spam and non\-spam corpus\&. Since
bogofilter
will be serving a larger community, it is important to prime it with a representative set of messages\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
Set up the MTA to invoke
bogofilter
on each message\&. While this is an MTA specific step, you\*(Aqll probably need to use the
\fB\-p\fR,
\fB\-u\fR, and
\fB\-e\fR
options\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 5." 4.2
.\}
Set up a mechanism for users to register spam/non\-spam messages, as well as to correct mis\-classifications\&. The most generic solution is to set up alias email addresses to which users bounce messages\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 6.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 6." 4.2
.\}
See the
doc
and
contrib
directories for more information\&.
.RE
.PP
Use of R to verify
bogofilter\*(Aqs calculations
.PP
The \-R option tells
bogofilter
to generate an R data frame\&. The data frame contains one row per token analyzed\&. Each such row contains the token, the sum of its database "good" and "spam" counts, the "good" count divided by the number of non\-spam messages used to create the training database, the "spam" count divided by the spam message count, Robinson\*(Aqs f(w) for the token, the natural logs of (1 \- f(w)) and f(w), and an indicator character (+ if the token\*(Aqs f(w) value exceeded the minimum deviation from 0\&.5, \- if it didn\*(Aqt)\&. There is one additional row at the end of the table that contains a label in the token field, followed by the number of words actually used (the ones with + indicators), Robinson\*(Aqs P, Q, S, s and x values and the minimum deviation\&.
.PP
The R data frame can be saved to a file and later read into an R session (see
\m[blue]\fBthe R project website\fR\m[]\&\s-2\u[5]\d\s+2
for information about the mathematics package R)\&. Provided with the
bogofilter
distribution is a simple R script (file bogo\&.R) that can be used to verify
bogofilter\*(Aqs calculations\&. Instructions for its use are included in the script in the form of comments\&.
.SH "LOG MESSAGES"
.PP
Bogofilter
writes messages to the system log when the
\fB\-l\fR
option is used\&. What is written depends on which other flags are used\&.
.PP
A classification run will generate (we are not showing the date and host part here):
.sp
.if n \{\
.RS 4
.\}
.nf
bogofilter[1412]: X\-Bogosity: Ham, spamicity=0\&.000227
bogofilter[1415]: X\-Bogosity: Spam, spamicity=0\&.998918
.fi
.if n \{\
.RE
.\}
.PP
Using
\fB\-u\fR
to classify a message and update a wordlist will produce (one a single line):
.sp
.if n \{\
.RS 4
.\}
.nf
bogofilter[1426]: X\-Bogosity: Spam, spamicity=0\&.998918,
register \-s, 329 words, 1 messages
.fi
.if n \{\
.RE
.\}
.PP
Registering words (\fB\-l\fR
and
\fB\-s\fR,
\fB\-n\fR,
\fB\-S\fR, or
\fB\-N\fR) will produce:
.sp
.if n \{\
.RS 4
.\}
.nf
bogofilter[1440]: register\-n, 255 words, 1 messages
.fi
.if n \{\
.RE
.\}
.PP
A registration run (using
\fB\-s\fR,
\fB\-n\fR,
\fB\-N\fR, or
\fB\-S\fR) will generate messages like:
.sp
.if n \{\
.RS 4
.\}
.nf
bogofilter[17330]: register\-n, 574 words, 3 messages
bogofilter[6244]: register\-s, 1273 words, 4 messages
.fi
.if n \{\
.RE
.\}
.SH "FILES"
.PP
/etc/bogofilter/bogofilter\&.cf
.RS 4
System configuration file\&.
.RE
.PP
~/\&.bogofilter\&.cf
.RS 4
User configuration file\&.
.RE
.PP
~/\&.bogofilter/wordlist\&.db
.RS 4
Combined list of good and spam tokens\&.
.RE
.SH "AUTHOR"
.sp
.if n \{\
.RS 4
.\}
.nf
Eric S\&. Raymond \&.
David Relson \&.
Matthias Andree \&.
Greg Louis \&.
.fi
.if n \{\
.RE
.\}
.PP
For updates, see the
\m[blue]\fBbogofilter project page\fR\m[]\&\s-2\u[6]\d\s+2\&.
.SH "SEE ALSO"
.PP
bogolexer(1), bogotune(1), bogoupgrade(1), bogoutil(1)
.SH "NOTES"
.IP " 1." 4
A Plan For Spam
.RS 4
\%http://www.paulgraham.com/spam.html
.RE
.IP " 2." 4
Spam Detection
.RS 4
\%http://radio-weblogs.com/0101454/stories/2002/09/16/spamDetection.html
.RE
.IP " 3." 4
A Statistical Approach to the Spam Problem
.RS 4
\%http://www.linuxjournal.com/article/6467
.RE
.IP " 4." 4
Another improvement
.RS 4
\%http://www.garyrobinson.net/2004/04/improved%5fchi.html
.RE
.IP " 5." 4
the R project website
.RS 4
\%http://cran.r-project.org/
.RE
.IP " 6." 4
bogofilter project page
.RS 4
\%http://bogofilter.sourceforge.net/
.RE