Filtering your aq.org mail with procmail

Introduction: What is procmail?

procmail is a package that lets you automatically filter or otherwise process your mail. Applications include automatically segregating mail from particular mailing lists into separate folders, changing the format of incoming mail, eliminating spam or other unwanted mail, eliminating duplicate messages, adding an email-driven interface to other software, generating automated replies (like vacation messages), delivering incoming mail automatically to your home directory, running your own small mailing lists, and so forth.

procmail's particular advantage is that it's very flexible and general. It's really a toolkit for constructing mail filters, rather than a complete tool in itself, so if you want to use all of procmail's features, there's a sizable learning curve. But you can learn the most common procmail idioms very quickly and start filtering your mail right away. (In fact, if you're in a hurry, you may want to skim your way down to the bottom of this document, which shows some example ~/.procmailrc files.)

Sources of further information

Automatic delivery via procmail on aq.org

On aq.org, procmail is automatically run for every incoming mail message. That means that all you need to do to use it is create a .procmailrc file. (At some sites, you need to set up a .forward file to filter your mail through procmail.) However, if you already have a .forward file, that takes precedence.

Overview of procmail's operation

procmail reads a file called .procmailrc in your home directory to determine how it should process incoming messages. That file can contain environment variable assignments (and some of the environment variables are used by procmail) and recipes which are triggered by particular patterns in a mail message and control how pieces of mail that match them are handled.

Each recipe is examined in turn. If a recipe is triggered by the particular message being processed, it gets a chance to try to deliver the message in some way. That could mean piping it through a filter, appending it to a file, dropping it in a special directory, or even sending it to /dev/null or otherwise discarding it. Then the message is considered delivered, and no further recipes are examined. If the recipe didn't apply to this message or delivery to the folder failed (or if the command to deliver the mail failed and you've asked for that to be checked), then the next recipe is examined until something succeeds in delivering the mail or procmail falls off the end of the .procmailrc file.

If procmail hasn't delivered the message by the time it reaches the end of the .procmailrc file, then it is (by default) delivered to your incoming mailbox (/var/spool/mail/username) - the same thing that would happen if procmail weren't involved. (You can override that default action if you choose to.) So it's easy to set procmail up to do something special with certain kinds of mail (from a particular mailing list, say, or messages over a certain size) without affecting other messages at all.

The ~/.procmailrc file

The ~/.procmailrc file consists of environment variable assignments (expressed in a syntax that is a very close to the syntax of the shell) interspersed with recipes for delivering mail. Assignments and recipes can be mixed together, but typically the variable assignments occur at the top of the file, so we'll discuss them first.

Comments in a .procmailrc file can be indicated with hash marks, as in Perl and shell scripts. To be safe, you should only do this at the beginning of a line. (In some cases it doesn't work at the end of a line.)

Variable assignments

An environment variable assignment consists of a variable name, an equal sign, and a value. You can assign to arbitrary environment variables, but a number are special to procmail, and those are the ones you typically set. The syntax is very complete - you can include backticks and environment variable references in the value, just as in the shell.

As the procmailrc(5) man page states, `Before you get lost in the multitude of environment variables, keep in mind that all of them have reasonable defaults.' Here are a few of the commonly-set ones:
VariableMeaningDefault
MAILDIR The current directory for procmail; most conveniently, where you store most of your mail folders. $HOME
ORGMAIL The normal place where your mail would be delivered by the system, in the absence of procmail. /var/spool/mail/you
DEFAULT Where procmail will deliver mail if no recipe matches (and succeeds); i.e., if it falls off the end of your .procmailrc file. $ORGMAIL
PATH As you would expect. You need to add non-OS directories like /arch/unix/bin if you want them. $HOME/bin:/bin:/usr/bin (pretty minimal)
LOGFILE File to write diagnostics to. (See the man page.) (unset)
LOGABSTRACT File to which procmail should write a summary of what it did with each message. Very useful. (unset)
UMASK As with the shell's umask command. Can be set between recipes if you want some but not all of your mail to be publically readable. Usually left alone. 077 (make everything private)

(Typically, set PATH, DEFAULT - that's all that's necessary.)

procmail recipes

A recipe starts with a magic line that usually looks like
  :0:
but can have additional flags before the second colon. (The second colon itself can be missing if no locking is required. Usually it needs to be there.)

I said above that if a recipe succeeds, later recipes aren't considered. Sometimes you want to do something special with messages that match a certain pattern, but then you still want them to be affected by the rest of the .procmailrc file. You can do this by adding a `c' flag after the `0', so that the line reads `:0c:'. Other flags you can add include `B' (to apply conditions - see below - to the body rather than the header), `D' to make pattern matches case-sensitive, `f' to filter the message in-place, and a number of others. See the procmailrc(5) man page for the full story.

After the magic :0: line, there are one or more condition lines, which start with asterisks (*). These typically contain regular expressions (à la egrep) which are checked against the headers of the mail message. If they all match, then the recipe is actually triggered - its action will be applied to the current message. (There's a way to grep for the regular expressions in the body rather than the header, and you can also test the value of environment variables or the result of arbitrary Unix commands. So you could take a certain action for all mail messages received on weekends that contain the word `barbeque' in the body - maybe forward them to your alphanumeric pager so you don't miss the barbeque.) You can negate a condition by preceding it with an exclamation mark (!).

There are some special macros you can use in condition lines. The most important one is `^TO', which causes the following expression to match any recipient of the message. So `* ^TOjs@' will match any message where `js@' appears as part of a recipient address, whether on the To:, Cc:, or Bcc: line.

Here are some examples of condition lines:
Condition Matches...
* ^TOpostmaster\> messages sent to postmaster. (The \> matches any non alphanumeric character; commonly used for a word break)
* ^Subject:.*laser printer toner cart messages with the indicated text anywhere in the subject
* ^From:.*\ certain bounced mail
* !Received:.*by amber\.ccs\.neu\.edu messages that did not (`!') pass through CCS' mail server
* Subject:.*urgent messages with `urgent' anywhere in the subject
There are some other conditions you can check for besides regular-expression matches. Here are some examples:
Condition Matches...
* > 10240 messages larger than 10k long (including headers)
* ? grep 'gone until' $HOME/.plan whenever gone until appears in my .plan file
This test is independent of the message itself.
Again, you can see the procmailrc(5) man page for full details, but the examples above cover the normal cases.

After the condition lines (all of which start with `*'), there is exactly one action line, which specifies what to do with the mail message. An action line can have one of the following forms:

If you're delivering to a mailbox, procmail will consider the message undelivered (and therefore continue trying further recipes) if there's some sort of write error, such as running out of space or permission problems. If you're delivering to a pipe, by default procmail will send the message to the pipe, consider it delivered, and exit without waiting for the command to complete. If you want to handle possible errors, you can add the `w' (wait) flag to the `:0' line, and then procmail will wait for the command to complete and look at its exit status. If the command fails, then, procmail will consider it undelivered and look at the next recipe in your ~/.procmailrc file.

Other utilities that come with procmail

There are some other tools that come with the procmail distribution.

The most important one is probably formail, which parses and manipulates mail messages. One use is to add, delete, or change particular headers. Another is to generate automated reply mail - formail is commonly used in a `vacation' recipe.

Another one is lockfile, which creates procmail-compatible lock files; it's useful for writing scripts to work with your mail; that way (if you're careful) they can coordinate with each other and with procmail so that they don't step on each other's toes.

Some example .procmailrcs

Set up a `safety net'

Mistakes in a .procmailrc file can cause you to lose all your incoming mail! Because of that, it's a good idea to have a `safety net' recipe at the top of your .procmailrc file whenever you make any changes. That will store an independent copy of all your incoming mail in a file somewhere before the rest of your .procmailrc gets at it. When you're confident your whole .procmailrc is working right, you can get rid of (or comment out) the safety net and delete the file it's been writing to. But if there's a problem in the .procmailrc (after the safety net), a copy of all your mail has been saved.
# $HOME/Mail *should already exist*
MAILDIR=$HOME/Mail

# not setting DEFAULT or ORGMAIL, so mail that doesn't match will
# be left in system mailbox

LOGFILE=$MAILDIR/from

# safety net
:0c:
$HOME/tmp_mail

# ... your own recipes would go down here ...

Filter out some spam

This recipe just recognizes headers to a few common spam messages and files them in a spam folder.
# /arch/unix/bin is necessary for `formail'
PATH=/bin:/usr/bin:/usr/ucb:/arch/unix/bin
# $HOME/Mail *should already exist*
MAILDIR=$HOME/Mail

# not setting DEFAULT or ORGMAIL, so mail that doesn't match will
# be left in system mailbox

LOGFILE=$MAILDIR/from

# # safety net (commented out)
# :0c:
# $HOME/tmp_mail

# Recognize some common spam and save it to $HOME/Mail/spam
# $HOME/Mail must already exist; spam will be created if necessary
:0:
* To:.*friend@public\>
* !^TO.*ccs.neu.edu
spam

:0:
* Subject:[ 	]*laser p[re]inter toner advertisement
spam

:0:
* Subject:.*FREE 1 *y(ea)?r USA magazine sub
* !^TO.*ccs.neu.edu
spam

Handle problems delivering mail

This .procmailrc will try to deliver to the file incoming in my home directory first (I'd read my mail with something like `pine -f ~/incoming'). If that fails, it tries to deliver mail to the system-wide mailbox ($ORGMAIL). If that fails, it saves (or tries to) a copy in /tmp and also sends a copy to an off-site address. So this is an extremely paranoid .procmailrc. :-)

Some points about this file:

# store mailboxes in my home directory by default (atypical)
MAILDIR=$HOME
LOGFILE=$MAILDIR/log

:0:
incoming

:0:
$ORGMAIL

:0c:
/tmp/$LOGNAME.mbox

:0
! me@me.ne.mediaone.net

Deliver to an MH inbox (method 1)

This .procmailrc doesn't update the MH unseen sequence. Some points:
# $HOME/Mail *should already exist*.  In this case we're assuming
# it's your MH directory.
MAILDIR=$HOME/Mail
LOGFILE=$MAILDIR/from

# # safety net (commented out)
# :0c:
# $HOME/tmp_mail

:0
inbox/.

Deliver to an MH inbox (method 2)

This .procmailrc does update the MH unseen sequence, because it pipes the message into MH's normal command for receiving mail.
# $HOME/Mail *should already exist*.  In this case we're assuming
# it's your MH directory.
MAILDIR=$HOME/Mail
LOGFILE=$MAILDIR/from

# If you use EXMH, you might want to uncomment the following line
#MHCONTEXT=.exmhcontext	# so EXMH sees the unseen sequence

# # safety net (commented out)
# :0c:
# $HOME/tmp_mail

:0w
|/usr/lib/mh/rcvstore +inbox

Deliver list mail to a pine/elm/Mail mailbox; handle urgent mail specially; leave most mail in inbox

This is a more complicated and realistic example.
# $HOME/Mail *should already exist*.
MAILDIR=$HOME/Mail
LOGFILE=$HOME/procmail.log

# # safety net (commented out)
# :0c:
# $HOME/tmp_mail

##################################################################
# mailing lists

:0:
* !^TOjay\>
* ^TO(alpha-osf|tru64-unix)-managers@
alphamgrs

:0:
* !^TOjay\>
* ^TOsun-managers@
sunmgrs

:0:
* !^TOjay\>
* Sender:.*BUGTRAQ@(netspace.org|securityfocus.com)
bugtraq

# If mail has "urgent" in the subject, send a *copy* to my pager
# and my home address.

:0c:
* Subject.*urgent
! mypager@mypagercompany.com, me@myhomeisp.com

A vacation filter

You can use procmail to implement a vacation auto-responder (something that automatically sends mail back saying you're on vacation and telling senders when you'll get to their mail). I'm not going to copy the recipe here, because (1) it uses fancy features (like formail and chaining recipes onto each other) that we haven't discussed, and (2) it's in the procmailex(5) manual page. However, here are a few points to think about when constructing autoresponders: The vacation recipe in the procmailex(5) man page handles all these issues properly.
Last modified 1999.12.14 by js.