Filtering your aq.org mail with procmail
Introduction: What is procmail?
procmail is a package that lets you automatically filter or otherwise
process your mail. Applications include automatically segregating
mail from particular mailing lists into separate folders, changing
the format of incoming mail, eliminating spam or other unwanted
mail, eliminating duplicate messages, adding an email-driven interface
to other software, generating automated replies (like vacation
messages), delivering incoming mail automatically to your home
directory, running your own small mailing lists, and so forth.
procmail's particular advantage is that it's very flexible and general.
It's really a toolkit for constructing mail filters, rather
than a complete tool in itself, so if you want to use all of
procmail's features, there's a sizable learning curve. But you can
learn the most common
procmail idioms very quickly and start filtering your mail right away.
(In fact, if you're in a hurry, you may want to skim your way
down to the bottom of this document, which shows some example
~/.procmailrc files.)
Sources of further information
-
http://www.procmail.org/ is the home of procmail (and related software).
- There's a great
procmail Frequently Asked Questions document at
http://www.ling.helsinki.fi/users/reriksso/procmail/mini-faq.html. (It bills itself as a `Mini-FAQ', but it's actually more
of a Mega-FAQ.)
- The
procmailrc(5) manual page describes the syntax of the
~/.procmailrc file that controls how mail is processed.
- The
procmailex(5) manual page has short example recipes and demonstrates a lot
of the features of procmail.
- The
procmail(1) manual page documents the
procmail binary itself (e.g. command-line arguments), and the
NOTES section at the very end has a sample small
~/.procmailrc file set up to deliver most mail to
~/Mail/mbox in standard Unix mailbox format (as used by
elm and
pine).
Automatic delivery via procmail on aq.org
On
aq.org,
procmail is automatically run for every incoming mail message. That
means that all you need to do to use it is create a
.procmailrc file. (At some sites, you need to set up a
.forward file to filter your mail through
procmail.) However, if you
already have a
.forward file, that takes precedence.
Overview of procmail's operation
procmail reads a file called
.procmailrc in your home directory to determine how it should process incoming
messages. That file can contain environment variable assignments
(and some of the environment variables are used by
procmail) and
recipes which are triggered by particular patterns in a mail message
and control how pieces of mail that match them are handled.
Each recipe is examined in turn. If a recipe is triggered by
the particular message being processed, it gets a chance to try
to deliver the message in some way. That could mean piping
it through a filter, appending it to a file, dropping it in a
special directory, or even sending it to
/dev/null or otherwise discarding it. Then the message is considered
delivered, and no further recipes are examined. If the recipe
didn't apply to this message or delivery to the folder failed
(or if the command to deliver the mail failed
and you've asked for that to be checked), then the next recipe
is examined until something succeeds in delivering the mail or
procmail falls off the end of the
.procmailrc file.
If
procmail hasn't delivered the message by the time it reaches the end
of the
.procmailrc file, then it is (by default) delivered to your incoming mailbox
(/var/spool/mail/username) - the same thing that would happen if
procmail weren't involved. (You can override that default action
if you choose to.) So it's easy to set
procmail up to do something special with certain kinds of mail (from
a particular mailing list, say, or messages over a certain size)
without affecting other messages at all.
The ~/.procmailrc file
The
~/.procmailrc file consists of environment variable assignments (expressed
in a syntax that is a very close to the syntax of the shell) interspersed
with recipes for delivering mail. Assignments and recipes can
be mixed together, but
typically the variable assignments occur at the top of the file, so we'll
discuss them first.
Comments in a
.procmailrc file can be indicated with hash marks, as in Perl and shell
scripts. To be safe, you should only do this at the beginning
of a line. (In some cases it doesn't work at the end of a line.)
Variable assignments
An environment variable assignment consists of a variable name,
an equal sign, and a value. You can assign to arbitrary environment
variables, but a number are special to
procmail, and those are the ones you typically set. The syntax is very
complete - you can include backticks and environment variable
references in the value, just as in the shell.
As the
procmailrc(5) man page states, `Before you get lost in the multitude
of environment variables, keep in mind that all of them have reasonable
defaults.' Here are a few of the commonly-set ones:
Variable | Meaning | Default |
MAILDIR |
The current directory for procmail; most
conveniently, where you store most of your mail folders. |
$HOME |
ORGMAIL |
The normal place where your mail would be delivered by
the system, in the absence of procmail. |
/var/spool/mail/you |
DEFAULT |
Where procmail will deliver mail if no recipe
matches (and succeeds); i.e., if it falls off the end of
your .procmailrc file. |
$ORGMAIL |
PATH |
As you would expect. You need to add non-OS directories
like /arch/unix/bin if you want them. |
$HOME/bin:/bin:/usr/bin (pretty minimal) |
LOGFILE |
File to write diagnostics to. (See the man page.) |
(unset) |
LOGABSTRACT |
File to which procmail should write a summary
of what it did with each message. Very useful. |
(unset) |
UMASK |
As with the shell's umask command. Can be set
between recipes if you want some but not all of your mail
to be publically readable. Usually left alone. |
077 (make everything private) |
(Typically, set PATH, DEFAULT - that's all that's necessary.)
procmail recipes
A recipe starts with a magic line that usually looks like
:0:
but can have additional flags before the second colon. (The
second colon itself can be missing
if no locking is required. Usually it needs to be there.)
I said above that if a recipe succeeds, later recipes aren't considered.
Sometimes you want to do something special with messages that
match a certain pattern, but then you still want them to be affected
by the rest of the
.procmailrc file. You can do this by adding a `c' flag after the `0', so that the line reads `:0c:'. Other flags you can add include `B' (to apply conditions - see below - to the body rather than the
header), `D' to make pattern matches case-sensitive, `f' to filter the message in-place, and a number of others. See
the
procmailrc(5) man page for the full story.
After the magic
:0: line, there are one or more
condition lines, which start with asterisks (*). These typically contain regular expressions (à la
egrep) which are checked against the headers of the mail message.
If they
all match, then the recipe is actually triggered - its action will
be applied to the current message. (There's a way to grep for
the regular expressions in the body rather than the header, and
you can also test the value of environment variables or the result
of arbitrary Unix commands. So you could take a certain action
for all mail messages received on weekends that contain the word
`barbeque' in the body - maybe forward them to your alphanumeric
pager so you don't miss the barbeque.) You can
negate a condition by preceding it with an exclamation mark (!).
There are some special macros you can use in condition lines.
The most important one is `^TO', which causes the following expression to match any recipient
of the message. So `* ^TOjs@' will match any message where `js@' appears as part of a recipient address, whether on the To:,
Cc:, or Bcc: line.
Here are some examples of condition lines:
Condition |
Matches... |
* ^TOpostmaster\> |
messages sent to postmaster. (The \> matches any non alphanumeric character; commonly used for a word break) |
* ^Subject:.*laser printer toner cart |
messages with the indicated text anywhere in the subject |
* ^From:.*\ |
certain bounced mail |
* !Received:.*by amber\.ccs\.neu\.edu |
messages that did not (`!') pass through CCS' mail server |
* Subject:.*urgent
| messages with `urgent' anywhere in the subject |
There are some other conditions you can check for besides regular-expression matches. Here are some examples:
Condition |
Matches... |
* > 10240 |
messages larger than 10k long (including headers) |
* ? grep 'gone until' $HOME/.plan |
whenever gone until appears in my .plan file
This test is independent of the message itself. |
Again, you can see the
procmailrc(5) man page for full details, but the examples above cover the
normal cases.
After the condition lines (all of which start with `*'), there is exactly one action line, which specifies what to
do with the mail message. An action line can have one of the
following forms:
- A mailbox pathname, referring to
- an MH folder (a directory with numbered messages within it),
if it ends with `/.' (slash and dot), or
- a directory of uniquely-named messages, if it ends with just
a `/', or
- a mail folder in
mbox format (as used by Elm, Pine, Berkeley Mail, Netscape, and
many other mail programs) otherwise.
So
procmail can deliver directly to the folders used by almost all mail
readers. If you use the `/.' form to deliver to an MH folder,
procmail does not update MH's
unseen sequence (i.e., it doesn't mark the mail as unread).
- An exclamation mark (!), followed by an email address (or addresses) to forward the
mail to.
- A vertical bar (|), followed by a program to pipe the message through. This
can be an arbitrarily complex command; it can be a pipeline
and can have backticks in it.
If you're delivering to a mailbox,
procmail will consider the message undelivered (and therefore continue
trying further recipes) if there's some sort of write error, such
as running out of space or permission problems. If you're delivering
to a pipe, by default
procmail will send the message to the pipe, consider it delivered, and
exit without waiting for the command to complete. If you want
to handle possible errors, you can add the `w' (wait) flag to the `:0' line, and then
procmail will wait for the command to complete and look at its exit
status. If the command fails, then,
procmail will consider it undelivered and look at the next recipe in
your
~/.procmailrc file.
Other utilities that come with procmail
There are some other tools that come with the
procmail distribution.
The most important one is probably
formail, which parses and manipulates mail messages. One use is to
add, delete, or change particular headers. Another is to generate
automated reply mail -
formail is commonly used in a `vacation' recipe.
Another one is
lockfile, which creates
procmail-compatible lock files; it's useful for writing scripts to work
with your mail; that way (if you're careful) they can coordinate
with each other and with
procmail so that they don't step on each other's toes.
Some example .procmailrcs
Set up a `safety net'
Mistakes in a
.procmailrc file can cause you to lose all your incoming mail! Because
of that, it's a good idea to have a `safety net' recipe at the
top of your
.procmailrc file whenever you make any changes. That will store an independent
copy of all your incoming mail in a file somewhere before the
rest of your
.procmailrc gets at it. When you're confident your whole
.procmailrc is working right, you can get rid of (or comment out) the safety
net and delete the file it's been writing to. But if there's
a problem in the
.procmailrc (after the safety net), a copy of all your mail has been saved.
- This recipe uses the `c' flag, because it saves a
copy of each message. The rest of the
.procmailrc still gets to process the message.
# $HOME/Mail *should already exist*
MAILDIR=$HOME/Mail
# not setting DEFAULT or ORGMAIL, so mail that doesn't match will
# be left in system mailbox
LOGFILE=$MAILDIR/from
# safety net
:0c:
$HOME/tmp_mail
# ... your own recipes would go down here ...
Filter out some spam
This recipe just recognizes headers to a few common spam messages
and files them in a
spam folder.
- The `* !^TO.*ccs.neu.edu' line in a couple of these recipes makes the recipe fail to match
if a piece of mail is addressed to a CCS email address. That
helps avoid `false positives', e.g. if somebody sends out mail
to
faculty@ccs.neu.edu or
systems@ccs.neu.edu
complaining about one of these pieces of spam, you might want to see that.
(These examples were written for my workplace; of course,
here you'd want to use `* !^TO.*aq.org' instead, if anything. You might just leave that line off.
# /arch/unix/bin is necessary for `formail'
PATH=/bin:/usr/bin:/usr/ucb:/arch/unix/bin
# $HOME/Mail *should already exist*
MAILDIR=$HOME/Mail
# not setting DEFAULT or ORGMAIL, so mail that doesn't match will
# be left in system mailbox
LOGFILE=$MAILDIR/from
# # safety net (commented out)
# :0c:
# $HOME/tmp_mail
# Recognize some common spam and save it to $HOME/Mail/spam
# $HOME/Mail must already exist; spam will be created if necessary
:0:
* To:.*friend@public\>
* !^TO.*ccs.neu.edu
spam
:0:
* Subject:[ ]*laser p[re]inter toner advertisement
spam
:0:
* Subject:.*FREE 1 *y(ea)?r USA magazine sub
* !^TO.*ccs.neu.edu
spam
Handle problems delivering mail
This
.procmailrc will try to deliver to the file
incoming in my home directory first (I'd read my mail with something
like `pine -f ~/incoming'). If that fails, it tries to deliver mail to the system-wide
mailbox ($ORGMAIL). If
that fails, it saves (or tries to) a copy in
/tmp and also sends a copy to an off-site address. So this is
an extremely paranoid
.procmailrc. :-)
Some points about this file:
-
$LOGNAME is my login name
- The recipe that saves to
/ccs/tmp has the
c flag on its
:0c: line; that means it's saving a
copy of the mail, but processing should continue with the following
recipe. That way, if we get that far in the
.procmailrc file, a copy will be stored in
/ccs/tmp (which gets wiped out from time to time, so it's not really
safe there) and
also the next recipe will be processed, which sends the mail to
another address.
- There are no condition lines - each recipe applies to
all incoming mail. That means that processing will stop with
the first (non-copy) recipe that succeeds.
# store mailboxes in my home directory by default (atypical)
MAILDIR=$HOME
LOGFILE=$MAILDIR/log
:0:
incoming
:0:
$ORGMAIL
:0c:
/tmp/$LOGNAME.mbox
:0
! me@me.ne.mediaone.net
Deliver to an MH inbox (method 1)
This
.procmailrc doesn't update the MH
unseen sequence. Some points:
- When you're delivering to a directory (with an action line
that ends in
/ or
/.), you don't need a lockfile; hence `:0' rather than the more normal `:0:'.
- In
inbox/., the `/.' means to treat
inbox as an MH mail folder (a directory containing numbered message
files).
# $HOME/Mail *should already exist*. In this case we're assuming
# it's your MH directory.
MAILDIR=$HOME/Mail
LOGFILE=$MAILDIR/from
# # safety net (commented out)
# :0c:
# $HOME/tmp_mail
:0
inbox/.
Deliver to an MH inbox (method 2)
This
.procmailrc
does update the MH
unseen sequence, because it pipes the message into MH's normal command
for receiving mail.
- When you're delivering to a pipe, you don't need a lockfile,
but if you want
procmail to be able to tell whether the delivery succeeded, you should
add the `w' flag.
# $HOME/Mail *should already exist*. In this case we're assuming
# it's your MH directory.
MAILDIR=$HOME/Mail
LOGFILE=$MAILDIR/from
# If you use EXMH, you might want to uncomment the following line
#MHCONTEXT=.exmhcontext # so EXMH sees the unseen sequence
# # safety net (commented out)
# :0c:
# $HOME/tmp_mail
:0w
|/usr/lib/mh/rcvstore +inbox
Deliver list mail to a pine/elm/Mail mailbox; handle urgent
mail specially; leave most mail in inbox
This is a more complicated and realistic example.
-
$HOME/mail is the Pine convention for where your mail folders are stored.
If you use Elm, this would probably be
$HOME/Mail. If you used MH, you'd probably use
$HOME/Mail and also tack on `/.' to all the folder names (or use
rcvstore).
- The `* !^TOjay\>' trick helps ensure that when I'm Cc:'ed on a message that also
goes to the list, it goes into my regular mailbox. (You might
want to change that.)
- Food for thought: What happens if you're filtering two lists,
and the same message is sent to both of them (so they both appear
in the To: line)? What
should happen?
# $HOME/Mail *should already exist*.
MAILDIR=$HOME/Mail
LOGFILE=$HOME/procmail.log
# # safety net (commented out)
# :0c:
# $HOME/tmp_mail
##################################################################
# mailing lists
:0:
* !^TOjay\>
* ^TO(alpha-osf|tru64-unix)-managers@
alphamgrs
:0:
* !^TOjay\>
* ^TOsun-managers@
sunmgrs
:0:
* !^TOjay\>
* Sender:.*BUGTRAQ@(netspace.org|securityfocus.com)
bugtraq
# If mail has "urgent" in the subject, send a *copy* to my pager
# and my home address.
:0c:
* Subject.*urgent
! mypager@mypagercompany.com, me@myhomeisp.com
A vacation filter
You can use
procmail to implement a vacation auto-responder (something that automatically
sends mail back saying you're on vacation and telling senders
when you'll get to their mail). I'm not going to copy the recipe
here, because (1) it uses fancy features (like
formail and chaining recipes onto each other) that we haven't discussed,
and (2) it's in the
procmailex(5) manual page. However, here are a few points to think about
when constructing autoresponders:
- You should never autoreply to automated mail (e.g. list mail,
bounce messages, etc.)
- You definitely want the `c' flag so that the mail that triggers the vacation recipe will
still be delivered normally.
- You want to avoid mail loops, where your vacation bounce somehow
gets bounced back to you (and triggers another vacation bounce).
Postmasters tend to get upset when that happens.
- It's courteous to autoreply only once, or only once every week
or so, to each address you get mail from. If somebody sends
you five pieces of mail for you to read when you get back, they
don't really need five copies of your automated reply.
The vacation recipe in the
procmailex(5) man page handles all these issues properly.
Last modified 1999.12.14 by
js.