SourceFiles.org - Use the Source, Luke
Home | Register | News | Forums | Guide | MyLinks | Bookmark

Related Sites

Latest News
  General News
  Reviews
  Press Releases
  Software
  Hardware
  Security
  Tutorials
  Off Topic


Back to files

Spam Filtering for Mail Exchangers

How to reject junk mail in incoming SMTP transactions.

Tor Slettnes

<tor@slett.net>

Edited by

Joost De Cock

Devdas Bhagat

Tom Wright

Version 1.0 -- Release Edition

Table of Contents
Introduction

  1. Purpose of this Document
  2. Audience
  3. New versions of this document
  4. Revision History
  5. Credits
  6. Feedback
  7. Translations
  8. Copyright information
  9. What do you need?
  10. Conventions used in this document
  11. Organization of this document
    1. Background 1.1. Why Filter Mail During the SMTP Transaction? 1.2. The Good, The Bad, The Ugly 1.3. The SMTP Transaction
    2. Techniques 2.1. SMTP Transaction Delays 2.2. DNS Checks 2.3. SMTP checks 2.4. Greylisting 2.5. Sender Authorization Schemes 2.6. Message data checks 2.7. Blocking Collateral Spam
    3. Considerations 3.1. Multiple Incoming Mail Exchangers 3.2. Blocking Access to Other SMTP Servers 3.3. Forwarded Mail 3.4. User Settings and Data
    4. Questions & Answers
    5. Exim Implementation A.1. Prerequisites A.2. The Exim Configuration File A.3. Options and Settings A.4. Building the ACLs - First Pass A.5. Adding SMTP transaction delays A.6. Adding Greylisting Support A.7. Adding SPF Checks A.8. Adding MIME and Filetype Checks A.9. Adding Anti-Virus Software A.10. Adding SpamAssassin A.11. Adding Envelope Sender Signatures A.12. Accept Bounces Only for Real Users A.13. Exempting Forwarded Mail A.14. Final ACLs

Glossary
B. GNU General Public License

B.1. Preamble
B.2. TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION B.3. How to Apply These Terms to Your New Programs

List of Tables
1. Typographic and usage conventions
1-1. Simple SMTP dialogue
A-1. Use of ACL connection/message variables


Introduction
  1. Purpose of this Document

This document discusses various highly effective and low impact ways to weed out spam and malware during incoming SMTP transactions in a mail exchanger (MX host), with an added emphasis on eliminating so-called Collateral Spam.

The discussions are conceptual in nature, but a sample implementation is provided using the Exim MTA and other specific software tools. Miscellaneous other bigotry is expressed throughout.


2. Audience

The intended audience is mail system administrators, who are already familiar with such acronyms as SMTP, MTA/MDA/MUA, DNS/rDNS, and MX records. If you are an end user who is looking for a spam filtering solution for your mail reader (such as Evolution, Thunderbird, Mail.app or Outlook Express), this document is not for you; but you may wish to point the mail system administrator for your domain (company, school, ISP...) to its existence.


3. New versions of this document

The newest version of this document can be found at [http://slett.net/ spam-filtering-for-mx/] http://slett.net/spam-filtering-for-mx/. Please check back periodically for corrections and additions.


4. Revision History

Revision History Revision 1.0 2004-09-08 Revised by: TS First public release. Revision 0.18 2004-09-07 Revised by: TS Incorporated second language review from Tom Wright. Revision 0.17 2004-09-06 Revised by: TS Incorporated language review from Tom Wright. Revision 0.16 2004-08-13 Revised by: TS Incorporated third round of changes from Devdas Bhagat. Revision 0.15 2004-08-04 Revised by: TS Incorporated second round of changes from technical review by Devdas Bhagat. Revision 0.14 2004-08-01 Revised by: TS Incorporated technical review comments/corrections from Devdas Bhagat. Revision 0.13 2004-08-01 Revised by: TS Incorporated technical review from Joost De Cock. Revision 0.12 2004-07-27 Revised by: TS Replaced "A Note on Controversies" with a more opinionated "The Good, The Bad, the Ugly" section. Also rewrote text on DNS blocklists. Some corrections from Seymour J. Metz. Revision 0.11 2004-07-19 Revised by: TS Incorporated comments from Rick Stewart on RMX++. Swapped order of "Techniques" and "Considerations". Minor typographic fixes in Exim implementation. Revision 0.10 2004-07-16 Revised by: TS Added <?dbhtml..?> tags to control generated HTML filenames - should prevent broken links from google etc. Swapped order of "Forwarded Mail" and "User Settings". Correction from Tony Finch on Bayesian filters; commented out check for Subject:, Date:, and Message-ID: headers per Johannes Berg; processing time subtracted from SMTP delays per suggestion from Alan Flavell. Revision 0.09 2004-07-13 Revised by: TS Elaborated on problems with envelope sender signatures and mailing list servers, and a scheme to make such signatures optional per host/domain for each user. Moved "Considerations" section out as a separate chapter; added subsections "Blocking Access to other SMTP Server", "User Settings" and "Forwarded Mail". Incorporated Matthew Byng-Maddick's comments on the mechanism used to generate these signatures, Chris Edwards' comments on sender callout verification, and Hadmut Danisch's comments on RMX++ and other topics. Changed license terms (GPL instead of GFDL). Revision 0.08 2004-07-09 Revised by: TS Additional work on Exim implementation: Added section on per-user settings and data for SpamAssassin per suggestion from Tollef Fog Heen. Added SPF checks via Exiscan-ACL. Corrections from Sam Michaels. Revision 0.07 2004-07-08 Revised by: TS Made corrections to the Exim Envelope Sender Signatures examples, and added support for users to "opt in" to this feature, per suggestion from Christian Balzer. Revision 0.06 2004-07-08 Revised by: TS Incorporated Exim/MySQL greylisting implementation and various corrections from Johannes Berg. Moved "Sender Authorization Schemes" up two levels to become a top-level section in the Techniques chapter. Added greylisting for NULL empty envelope senders after DATA. Added SpamAssassin configuration to match Exim examples. Incorporated corrections from Dominik Ruff, Mark Valites, "Andrew" at Supernews. Revision 0.05 2004-07-07 Revised by: TS Eliminated the (empty) Sendmail implementation for now, to move ahead with the final review process. Revision 0.04 2004-07-06 Revised by: TS Reorganized layout a little: Combined "SMTP-Time Filtering", "Introduction to SMTP", and "Considerations" into a single "Background" chapter. Split the previous "Building ACLs" section in the Exim implementation into top-level sections. Added alternate sender authorization schemes to SPF: Microsoft Caller-ID for E-Mail and RMX++. Incorporated comments from Ken Raeburn. Revision 0.03 2004-07-02 Revised by: TS Added discussion on Multiple Incoming Mail Exchangers; minor corrections related to Sender Callout Verification. Revision 0.02 2004-06-30 Revised by: TS Added Exim implementation as an appendix Revision 0.01 2004-06-16 Revised by: TS Initial draft.


5. Credits

A number of people have provided feedback, corrections, and contributions, as indicated in the Revision History. Thank you!

The following are some of the people and groups that have provided tools and ideas to this document, in no particular order:

  *  Evan Harris <eharris (at) puremagic.com>, who conceived and wrote a

white paper on greylisting.

  *  Axel Zinser <fifi (at) hiss.org>, who apparently conceived of

teergrubing.

  *  The developers of [http://spf.pobox.com/] SPF, [http://www.danisch.de/

work/security/antispam.html] RMX++, and other Sender Authorization Schemes.

  *  The creators and maintainers of distributed, collaborative junk mail

signature repositories, such as [http://rhyolite.com/anti-spam/dcc/] DCC, [http://razor.sf.net/] Razor, and [http://pyzor.sf.net/] Pyzor.

  *  The creators and maintainers of various DNS blocklists and whitelists,

such as [http://www.spamcop.net/] SpamCop, [http://www.spamhaus.org/] SpamHaus, [http://www.sorbs.net/] SORBS, [http://cbl.abuseat.org/] CBL, and [http://moensted.dk/spam/] many others.

  *  The [http://www.spamassassin.org/full/3.0.x/dist/CREDITS] developers of

[http://www.spamassassin.org/] SpamAssassin, who have taken giant leaps forward in developing and integrating various spam filtering techniques into a sophisticated heuristics-based tool.

  *  Tim Jackson <tim (at) timj.co.uk> collated and maintains a list of

bogus virus warnings for use with SpamAssassin.

  *  A lot of smart people who developed the excellent Exim MTA, including:

Philip Hazel <ph10 (at) cus.cam.ac.uk>, the maintainer; Tom Kistner <tom (at) duncanthrax.net>, who wrote the Exiscan-ACL patch for SMTP-time content checks; Andreas Metzler <ametzler (at) debian.org>, who did a really good job of building the Exim 4 Debian packages.

  *  Many, many others who contributed ideas, software, and other techniques

to counter the spam epidemic.

  *  You, for reading this document and your interest in reclaiming e-mail

as a useful communication tool


6. Feedback

I would love to hear of your experiences with the techniques outlined in this document, and of any other comments, questions, suggestions, and/or contributions you may have. Please send me an e-mail at: <tor@slett.net>.

If you are able to provide implementations for other Mail Transport Agents, such as Sendmail or Postfix, please let me know.


7. Translations

No translations exist yet. If you would like to create one, please let me know.


8. Copyright information

Copyright © 2004 Tor Slettnes.

This document is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A copy of the license is included in Appendix B.

Read The GNU Manifesto if you want to know why this license was chosen for this book.

The logos, trademarks and symbols used in this book are the properties of their respective owners.


9. What do you need?

The techniques described in this document predicate system access to the inbound Mail Exchanger(s) for the internet domain where you receive e-mail. Essentially, you need to be able to install software and/or modify the configuration files for the Mail Transport Agent on that system.

Although the discussions in this document are conceptual in nature and can be incorporated into a number of different MTAs, a sample Exim 4 implementation is provided. This implementation, in turn, incorporates other software tools, such as [http://www.spamassassin.org/] SpamAssassin. See Appendix A for details.


10. Conventions used in this document

The following typographic and usage conventions occur in this text:

Table 1. Typographic and usage conventions

|Text type |Meaning | +-------------------------+-------------------------------------------------+ |"Quoted text" |Quotes from people, quoted computer output. | +-------------------------+-------------------------------------------------+

|terminal view            |Literal computer input and output captured from  |
|                         |the terminal, usually rendered with a light grey |
|                         |background.                                      |
+-------------------------+-------------------------------------------------+
|command                  |Name of a command that can be entered on the     |
|                         |command line.                                    |
+-------------------------+-------------------------------------------------+
|VARIABLE                 |Name of a variable or pointer to content of a    |
|                         |variable, as in $VARNAME.                        |
+-------------------------+-------------------------------------------------+
|option                   |Option to a command, as in "the -a option to the |
|                         |ls command".                                     |
+-------------------------+-------------------------------------------------+
|argument                 |Argument to a command, as in "read man ls ".     |

+-------------------------+-------------------------------------------------+ |command options arguments|Command synopsis or general usage, on a separated| | |line. | +-------------------------+-------------------------------------------------+

|filename                 |Name of a file or directory, for example "Change |
|                         |to the /usr/bin directory."                      |
+-------------------------+-------------------------------------------------+
|Key                      |Keys to hit on the keyboard, such as "type Q to  |
|                         |quit".                                           |
+-------------------------+-------------------------------------------------+
|Button                   |Graphical button to click, like the OK button.   |
+-------------------------+-------------------------------------------------+
|Menu->Choice             |Choice to select from a graphical menu, for      |
|                         |instance: "Select Help->About Mozilla in your    |
|                         |browser."                                        |
+-------------------------+-------------------------------------------------+
|Terminology              |Important term or concept: "The Linux kernel is  |
|                         |the heart of the system."                        |
+-------------------------+-------------------------------------------------+
|See Glossary             |link to related subject within this guide.       |
+-------------------------+-------------------------------------------------+
|[http://slett.net/gallery|Clickable link to an external web resource.      |
|/2003-05/IMG_1655] The   |                                                 |
|author                   |                                                 |

+-------------------------+-------------------------------------------------+


11. Organization of this document

This document is organized into the following chapters:

Background

General introduction to SMTP time filtering, and to SMTP.

Techniques

Various ways to block junk mail in an SMTP transaction.

Considerations

Issues that pertain to transaction time filtering.

Questions & Answers

My attempt at anticipating your questions, and then answering them.

A sample Exim implementation is provided in Appendix A.


Chapter 1. Background

Here we cover the advantages of filtering mail during an incoming SMTP transaction, rather than following the more conventional approach of offloading this task to the mail routing and delivery stage. We also provide a brief introduction to the SMTP transaction.


1.1. Why Filter Mail During the SMTP Transaction?

1.1.1. Status Quo

If you receive spam, raise your hands. Keep them up.

If you receive computer virii or other malware, raise your hands too.

If you receive bogus Delivery Status Notifications (DSNs), such as "Message Undeliverable", "Virus found", "Please confirm delivery", etc, related to messages you never sent, raise your hands as well. This is known as Collateral Spam.

This last form is particularly troublesome, because it is harder to weed out than "standard" spam or malware, and because such messages can be quite confusing to recipients who do not possess godly skills in parsing message headers. In the case of virus warnings, this often causes unnecessary concern on the recipient's end; more generally, a common tendency will be to ignore all such messages, thereby missing out on legitimate DSNs.

Finally, I want those of you who have lost legitimate mail into a big black hole - due to misclassification by spam or virus scanners - to lift your feet.

If you were standing before and are still standing, I suggest that you may not be fully aware of what is happening to your mail. If you have been doing any type of spam filtering, even by manually moving mails to the trash can in your mail reader, let alone by experimenting with primitive filtering techniques such as DNS blacklists (SpamHaus, SPEWS, SORBS...), chances are that you have lost some valid mail.


1.1.2. The Cause

Spam, just like many other artifacts of greed, is a social disease. Call it affluenza, or whatever you like; lower life forms seek to destroy a larger ecosystem, and if successful, will actually end up ruining their own habitat in the end.

Larger social issues and philosophy aside: You - the mail system administrator - face the very concrete and real life dilemma of finding a way to deal with all this junk.

As it turns out, there are some limitations with the conventional way that mail is being processed and delegated by the various components of mail transport and delivery software. In a traditional setup, one or more Mail Exchanger(s) accept most or all incoming mail deliveries to addresses within a domain. Often, they then forward the mail to one or more internal machines for further processing, and/or delivery to the user's mailboxes. If any of these servers discovers that it is unable to perform the requested delivery or function, it generates and returns a DSN back to the sender address in the original mail.

As organizations started deploying spam and virus scanners, they often found that the path of least resistance was to work these into the message delivery path, as mail is transferred from the incoming Mail Exchanger(s) to internal delivery hosts and/or software. For instance, a common way filter out spam is by routing the mail through SpamAssassin or other software before it is delivered to a user's mailbox, and/or rely on spam filtering capabilities in the user's Mail User Agent.

Options for dealing with mail that is classified as spam or virus at this point are limited:

  *  You can return a Delivery Status Notification back to the sender. The

problem is that nearly all spam and e-mail borne virii are delivered with faked sender addresses. If you return this mail, it will invariably go to innocent third parties -- perhaps warning a grandmother in Sweden, who uses Mac OS X and does not know much about computers, that she is infected by the Blaster worm. In other words, you will be generating Collateral Spam.

  *  You can drop the message into the bit bucket, without sending any

notification back to the sender. This is an even bigger problem in the case of False Positives, because neither the sender nor the receiver will ever know what happened to the message (or in the receiver's case, that it ever existed).

  *  Depending on how your users access their mail (for instance, if they

access it via the IMAP protocol or use a web-based mail reader, but not if they retreive it over POP-3), you may be able to file it into a separate junk folder for them -- perhaps as an option in their account settings.

This may be the best of these three options. Even so, the messages may remain unseen for some time, or simply overlooked as the receiver more-or-less periodically scans through and deletes mail in their "Junk" folder.


1.1.3. The Solution

As you would have guessed by now, the One True solution to this problem is to do spam and virus filtering during the SMTP dialogue from the remote host, as the mail is being received by the inbound mail exchanger for your domain. This way, if the mail turns out to be undesirable, you can issue a SMTP reject response rather than face the dilemma described above. As a result:

  *  You will be able to stop the delivery of most junk mail early in the

SMTP transaction, before the actual message data has been received, thus saving you both network bandwidth and CPU processing.

  *  You will be able to deploy some spam filtering techniques that are not

possible later, such as SMTP transaction delays and Greylisting.

  *  You will be able to notify the sender in case of a delivery failure

(e.g. due to an invalid recipient address) without directly generating Collateral Spam

We will discuss how you can avoid causing collateral spam indirectly as a result of rejecting mail forwarded from trusted sources, such as mailing list servers or mail accounts on other sites [1].

  *  You will be able to protect yourself against collateral spam from

others (such as bogus "You have a virus" messages from anti-virus software).

OK, you can lower your hands now. If you were standing, and your feet disappeared from under you, you can now also stand up again.


1.2. The Good, The Bad, The Ugly

Some filtering techniques are more suitable for use during the SMTP transaction than others. Some are simply better than others. Nearly all have their proponents and opponents.

Needless to say, these controversies extend to the methods described here as well. For instance:

  *  Some argue that DNS checks penalize individual mail senders purely

based on their Internet Service Provider (ISP), not on the merits of their particular message.

  *  Some point out that ratware traps like SMTP transaction delays and

Greylisting are easily overcome and will be less effective over time, while continuing to degrade the Quality of Service for legitimate mail.

  *  Some find that Sender Authorization Schemes like the Sender Policy

Framework give ISPs a way to lock their customers in, and do not adequately address users who roam between different networks or who forward their e-mail from one host to another.

I will steer away from most of these controversies. Instead, I will try to provide a functional description of the various techniques available, including their possible side effects, and then talk a little about my own experiences using some of them.

That said, there are some filtering methods in use today that I deliberately omit from this document:

  *  Challenge/response systems (like [http://tmda.net/] TMDA). These are

not suitable for SMTP time filtering, as they rely on first accepting the mail, then returning a confirmation request to the Envelope Sender. This technique is therefore outside the scope of this document. [2]

  *  Bayesian Filters. These require training specific to a particular user,

and/or a particular language. As such, these too are not normally suitable for use during the SMTP transaction (But see User Settings and Data).

  *  Micropayment Schemes are not really suitable for weeding out junk mail

until all the world's legitimate mail is sent with a virtual postage stamp. (Though in the mean time, they can be used for the opposite purpose - that is, to accept mail carrying the stamp that would otherwise be rejected).

Generally, I have attempted to offer techniques that are as precise as possible, and to go to great lengths to avoid False Positives. People's e-mail is important to them, and they spend time and effort writing it. In my view, willfully using techniques or tools that reject large amounts of legitimate mail is a show of disrespect, both to the people that are directly affected and to the Internet as a whole. [3] This is especially true for SMTP-time system wide filtering, because end recipients usually have little or no control over the criteria being used to filter their mail.


1.3. The SMTP Transaction

SMTP is the protocol that is used for mail delivery on the Internet. For a detailed description of the protocol, please refer to RFC 2821, as well as Dave Crocker's introduction to [http://www.brandenburg.com/specifications/ draft-crocker-mail-arch-00.htm] Internet Mail Architecture.

Mail deliveries involve an SMTP transaction between the connecting host (client) and the receiving host (server). For this discussion, the connecting host is the peer, and the receiving host is your server.

In a typical SMTP transaction, the client issues SMTP commands such as EHLO , MAIL FROM:, RCPT TO:, and DATA. Your server responds to each command with a 3-digit numeric code indicating whether the command was accepted (2xx), was subject to a temporary failure or restriction (4xx), or failed definitively/ permanently (5xx), followed by some human readable explanation. A full description of these codes is included in [http://www.ietf.org/rfc/ rfc2821.txt] RFC 2821.

A best case scenario SMTP transaction typically consists of the following relevant steps:

Table 1-1. Simple SMTP dialogue

|Client |Server | +-------------------------------------+-------------------------------------+ |Initiates a TCP connection to server.|Presents an SMTP banner - that is, a |

|                                     |greeting that starts with the code   |
|                                     |220 to indicate that it is ready to  |
|                                     |speak SMTP (or usually ESMTP, a      |
|                                     |superset of SMTP):                   |
|                                     |220 your.f.q.d.n ESTMP...            |
+-------------------------------------+-------------------------------------+
|Introduces itself by way of an Hello |Accepts this greeting with a 250     |
|command, either HELO (now obsolete)  |response. If the client used the     |
|or EHLO, followed by its own Fully   |extended version of the Hello command|
|Qualified Domain Name:               |(EHLO), your server knows that it is |
|EHLO peers.f.q.d.n                   |capable of handling multi-line       |
|                                     |responses, and so will normally send |
|                                     |back several lines indicating the    |
|                                     |capabilities offered by your server: |
|                                     |250-your.f.q.d.n Hello ...           |
|                                     |250-SIZE 52428800                    |
|                                     |250-8BITMIME                         |
|                                     |250-PIPELINING                       |
|                                     |250-STARTTLS                         |
|                                     |250-AUTH                             |
|                                     |250 HELP                             |
|                                     |                                     |
|                                     |If the PIPELINING capability is      |
|                                     |included in this response, the client|
|                                     |can from this point forward issue    |
|                                     |several commands at once, without    |
|                                     |waiting for the response to each one.|
+-------------------------------------+-------------------------------------+
|Starts a new mail transaction by     |Issues a 250 response to indicate    |
|specifying the Envelope Sender:      |that the sender is accepted.         |
|MAIL FROM:<sender@address>           |                                     |
|                                     |                                     |

+-------------------------------------+-------------------------------------+ |Lists the Envelope Recipients of the |Issues a response to each command ( |

|message, one at a time, using the    |2xx, 4xx, or 5xx, depending on       |
|command:                             |whether delivery to this recipient   |
|RCPT TO:<receiver@address>           |was accepted, subject to a temporary |
|                                     |failure, or rejected).               |

+-------------------------------------+-------------------------------------+ |Issues a DATA command to indicate |Responds 354 to indicate that the |

|that it is ready to send the message.|command has been provisionally       |
|                                     |accepted.                            |
+-------------------------------------+-------------------------------------+
|Transmits the message, starting with |Replies 250 to indicate that the     |
|RFC 2822 compliant header lines (such|message has been accepted.           |
|as: From:, To:, Subject:, Date:,     |                                     |
|Message-ID:). The header and the body|                                     |
|are separated by an empty line. To   |                                     |
|indicate the end of the message, the |                                     |
|client sends a single period (".") on|                                     |
|a separate line.                     |                                     |
+-------------------------------------+-------------------------------------+
|If there are more messages to be     |Disconnects.                         |
|delivered, issues the next MAIL FROM:|                                     |
|command. Otherwise, it says QUIT, or |                                     |
|in rare cases, simply disconnects.   |                                     |

+-------------------------------------+-------------------------------------+


Chapter 2. Techniques

In this chapter, we look at various ways to weed out junk mail during the SMTP transaction from remote hosts. We will also try to anticipate some of the side effects from deploying these techniques.


2.1. SMTP Transaction Delays

As it turns out, one of the more effective ways of stopping spam is by imposing transaction delays during an inbound SMTP dialogue. This is a primitive form of teergrubing, see: [http://www.iks-jena.de/mitarb/lutz/ usenet/teergrube.en.html] http://www.iks-jena.de/mitarb/lutz/usenet/ teergrube.en.html

Most spam and nearly all e-mail borne virii are delivered directly to your server by way of specialized SMTP client software, optimized for sending out large amounts of mail in a very short time. Such clients are commonly known as Ratware.

In order to accomplish this task, ratware authors commonly take a few shortcuts that, ahem, "diverge" a bit from the RFC 2821 specification. One of the intrinsic traits of ratware is that it is notoriously impatient, especially with slow-responding mail servers. They may issue the HELO or EHLO command before the server has presented the initial SMTP banner, and/or try to pipeline several SMTP commands before the server has advertised the PIPELINING capability.

Certain Mail Transport Agents (such as Exim) automatically treat such SMTP protocol violations as synchronization errors, and immediately drop the incoming connection. If you happen to be using such an MTA, you may already see a lot of entries to this effect in your log files. In fact, chances are that if you perform any time-consuming checks (such as DNS checks) prior to presenting the initial SMTP banner, such errors will occur frequently, as ratware clients simply do not take the time to wait for your server to come alive (Things to do, people to spam).

We can help along by imposing additional delays. For instance, you may decide to wait:

  *  20 seconds before presenting the initial SMTP banner,

  *  20 seconds after the Hello (EHLO or HELO) greeting,

  *  20 seconds, after the MAIL FROM: command, and

  *  20 seconds after each RCPT TO: command.

Where did 20 seconds come from, you ask. Why not a minute? Or several minutes? After all, RFC 2821 mandates that the sending host (client) should wait up to several minutes for every SMTP response. The issue is that some receiving hosts, particularly those that use Exim, may perform Sender Callout Verification in response to incoming mail delivery attempts. If you or one of your users send mail to such a host, it will contact the Mail Exchanger (MX host) for your domain and start an SMTP dialogue in order to validate the sender address. The default timeout of such Sender Callout Verifications is 30 seconds - if you impose delays this long, the peer's sender callout verification would fail, and in turn the original mail delivery from you/your user might be rejected (usually with a temporary failure, which means the message delivery will be retried for 5 days or so before the mail is finally returned to the sender).

In other words, 20 seconds is about as long as you can stall before you start interfering with legitimate mail deliveries.

If you do not like imposing such delays on every SMTP transaction (say, you have a very busy site and are low on machine resources), you may choose to use "selective" transaction delays. In this case, you could impose the delay:

  *  If there is a problem with the peer's DNS information (see DNS checks).

  *  After detecting some sign of trouble during the SMTP transaction (see

SMTP checks).

  *  Only in the highest-numbered MX host in your DNS zone, i.e. the mail

exchanger with the last priority. Often, Ratware specifically target these hosts, whereas legitimate MTAs will try the lower-numbered MX hosts first.

In fact, selective transaction delays may be a good way to incorporate some less conclusive checks that we will discuss in the following sections. You probably do not wish to reject the mail outright based the results from e.g. the SPEWS blacklist, but on the other hand, it may provide a strong enough indication of trouble that you can at least impose transaction delays. After all, legitimate mail deliveries are not affected, other than being subjected to a slight delay.

Conversely, if you find conclusive evidence of spamming (e.g. by way of certain SMTP checks), and your server can afford it, you may choose to impose an extended delay, e.g. 15 minutes or so, before finally rejecting the delivery [4]. This is for little or no benefit other than slowing down the spammer a little bit in their quest to reach as many people as possible before DNS blacklists and other collaborative network checks catch up. In other words, pure altruism on your side. :-)

In my own case, selective transaction delays and the resulting SMTP synchronization errors account for nearly 50% of rejected incoming delivery attempts. This roughly translates into saying that nearly 50% of incoming junk mail is stopped by SMTP transaction delays alone.

See also What happens when spammers adapt....


2.2. DNS Checks

Some indication of the integrity of a particular peer can be gleaned directly from the Domain Name System (DNS), even before SMTP commands are issued. In particular, various DNS blacklists can be consulted to find out if a particular IP address is known to violate or fulfill certain criteria, and a simple pair of forward/reverse (DNS/rDNS) lookups can be used as a vague indicator of the host's general integrity.

Moreover, various data items presented during the SMTP dialogue (such as the name presented in the Hello greeting) can be subjected to DNS validation, once it becomes available. For a discussion on these items, see the section on SMTP checks, below.

A word of caution, though. DNS checks are not always conclusive (e.g. a required DNS server may not be responding), and not always indicative of spam. Moreover, if you have a very busy site, they can be expensive in terms of processing time per message. That said, they can provide useful information for logging purposes, and/or as part of a more holistic integrity check.


2.2.1. DNS Blacklists

DNS blacklists (DNSbl's, formerly called "Real-time Black-hole Lists" after the original blacklist, "mail-abuse.org") make up perhaps the most common tool to perform transaction-time spam blocking. The receiving server performs one or more rDNS lookups of the peer's IP address within various DNSbl zones, such as "dnsbl.sorbs.net", "opm.blitzed.org", "lists.dsbl.org", and so forth. If a matching DNS record is found, a typical action is to reject the mail delivery. [5]

If in addition to the DNS address ("A" record) you look up the "TXT" record of an entry, you will typically receive a one-line description of the listing, suitable for inclusion in a SMTP reject response. To try this out, you can use the "host" command provided on most Linux and UNIX systems: host -t txt 2.0.0.127.dnsbl.sorbs.net

There are currently hundreds of these lists available, each with different listing criteria, and with different listing/unlisting policies. Some lists even combine several listing criteria into the same DNSbl, and issue different data in response to the rDNS lookup, depending on which criterion affects the address provided. For instance, a rDNS lookup against sbl-xbl.spamhaus.org returns 127.0.0.2 for IP addresses that are believed by the SpamHaus staff to directly belong to spammers and their providers, 127.0.0.4 response for Zombie Hosts, or a 127.0.0.6 response for Open Proxy servers.

Unfortunately, many of these lists contain large blocks of IP addresses that are not directly responsible for the alleged violations, don't have clear listing / delisting policies, and/or post misleading information about which addresses are listed[6]. The blind trust in such lists often cause a large amount of what is referred to as Collateral Damage (not to be confused with Collateral Spam).

For that reason, rather than rejecting mail deliveries outright based on a single positive response from DNS blacklists, many administrators prefer to use these lists in a more nuanced fashion. They may consult several lists, and assign a "score" to each positive response. If the total score for a given IP address reaches a given threshold, deliveries from that address are rejected. This is how DNS blacklists are used by filtering software such as SpamAssassin (Spam Scanners).

One could also use such lists as one of several triggers for SMTP transaction delays on incoming connections (a.k.a. "teergrubing"). If a host is listed in a DNSbl, your server would delay its response to every SMTP command issued by the peer for, say, 20 seconds. Several other criteria can be used as triggers for such delays; see the section on SMTP transaction delays.


2.2.2. DNS Integrity Check

Another way to use DNS is to perform a reverse lookup of the peer's IP address, then a forward lookup of the resulting name. If the original IP address is included in the result, its DNS integrity has been validated. Otherwise, the DNS information for the connecting host is not valid.

Rejecting mails based on this criterion may be an option if you are a militant member of the DNS police, setting up an incoming MX for your own personal domain, and don't mind rejecting legitimate mail as a way to impress upon the sender that they need to ask their own system administrator to clean up their DNS records. For everyone else, the result of a DNS integrity check should probably only be used as one data point in a larger set of heuristics. Alternatively, as above, using SMTP transaction delays for misconfigured hosts may not be a bad idea.


2.3. SMTP checks

Once the SMTP dialogue is underway, you can perform various checks on the commands and arguments presented by the remote host. For instance, you will want to ensure that the name presented in the Hello greeting is valid.

However, even if you decide to reject the delivery attempt early in the SMTP transaction, you may not want to perform the actual rejection right away. Instead, you may stall the sender with SMTP transaction delays until after the RCPT TO:, then reject the mail at that point.

The reason is that some ratware does not understand rejections early in the SMTP transaction; they keep trying. On the other hand, most of them give up if the RCPT TO: fails.

Besides, this gives a nice opportunity to do a little teergrubing.


2.3.1. Hello (HELO/EHLO) checks

Per RFC 2821, the first SMTP command issued by the client should be EHLO (or if unsupported, HELO), followed by its primary, Fully Qualified Domain Name. This is known as the Hello greeting. If no meaningful FQDN is available, the client can supply its IP address enclosed in square brackets: "[1.2.3.4]". This last form is known as an IPv4 address "literal" notation.

Quite understandably, Ratware rarely present their own FQDN in the Hello greeting. Rather, greetings from ratware usually attempt to conceal the sending host's identity, and/or to generate confusing and/or misleading "Received:" trails in the message header. Some examples of such greetings are:

  *  Unqualified names (i.e. names without a period), such as the "local

part" (username) of the recipient address.

  *  A plain IP address (i.e. not an IP literal); usually yours, but can be

a random one.

  *  Your domain name, or the FQDN of your server.

  *  Third party domain names, such as yahoo.com and hotmail.com.

  *  Non-existing domain names, or domain names with non-existing name

servers.

  *  No greeting at all.


2.3.1.1. Simple HELO/EHLO syntax checks

Some of these RFC 2821 violations are both easy to check against, and clear indications that the sending host is running some form of Ratware. You can reject such greetings -- either right away, or e.g. after the RCPT TO: command.

First, feel free to reject plain IP addresses in the Hello greeting. Even if you wish to generously allow everything RFC 2821 mandates, recommends, and suggests, you will note that IP addresses should always be enclosed in square brackets when presented in lieu of a name. [7]

In particular, you may wish to issue a strongly worded rejection message to hosts that introduce themselves using your IP address - or for that matter, your host name. They are plainly lying. Perhaps you want to stall the sender with an exceedingly long SMTP transaction delay in response to such a greeting; say, hours.

For that matter, my own experience indicates that no legitimate sites on the internet present themselves to other internet sites using an IP address literal (the [x.y.z.w] notation) either. Nor should they; all hosts sending mail directly on the internet should use their valid Fully Qualified Domain Name. The only use of use of IP literals I have come across is from mail user agents on my local area network, such as Ximian Evolution, configured to use my server as outgoing SMTP server (smarthost). Indeed, I only accept literals from my own LAN.

You may or may not also wish to reject unqualified host names (host names without a period). I find that these are rarely (but not never - how's that for double negative negations) legitimate.

Similarly, you can reject host names that contain invalid characters. For internet domains, only alphanumeric letters and hyphen are valid characters; a hyphen is not allowed as the first character. (You may also want to consider the underscore a valid character, because it is quite common to see this from misconfigured, but ultimately well-meaning, Windows clients).

Finally, if you receive a MAIL FROM: command without first having received a Hello greeting, well, polite people greet first.

On my servers, I reject greetings that fail any of these syntax checks. However, the rejection does not actually take place until after the RCPT TO: command. In the mean time, I impose a 20 second transaction delay after each SMTP command (HELO/EHLO, MAIL FROM:, RCPT TO:).


2.3.1.2. Verifying the Hello greeting via DNS

Hosts that make it this far have presented at least a superficially credible greeting. Now it is time to verify the provided name via DNS. You can:

  *  Perform a forward lookup of the provided name, and match the result

against the peer's IP address

  *  Perform a reverse lookup of the peer's IP address, and match it against

name provided in the greeting.

If either of these two checks succeeds, the name has been verified.

Your MTA may have a built-in option to perform this check. For instance, in Exim (see Appendix A), you want to set "helo_try_verify_hosts = *", and create ACLs that take action based on the "verify = helo" condition.

This check is a little more expensive in terms of processing time and network resources than the simple syntax checks. Moreover, unlike the syntax checks, a mismatch does not always indicate ratware; several large internet sites, such as hotmail.com, yahoo.com, and amazon.com, frequently present unverifiable Hello greetings.

On my servers, I do a DNS validation of the Hello greeting if I am not already stalling the sender with transaction delays based on prior checks. Then, if this check fails, I impose a 20 second delay on every SMTP command from this point forward. I also prepare a "X-HELO-Warning:" header that I will later add to the message(s), and use to increase the SpamAssassin score for possible rejection after the message data has been received.


2.3.2. Sender Address Checks

After the client has presented the MAIL FROM: <address> command, you can validate the supplied Envelope Sender address as follows. [8]


2.3.2.1. Sender Address Syntax Check

Does the supplied address conform to the format <localpart@domain>? Is the domain part a syntactically valid Fully Qualified Domain Name?

Often, your MTA performs these checks by default.


2.3.2.2. Impostor Check

In the case where you and your users send all your outgoing mail only through a select few servers, you can reject messages from other hosts in which the "domain" of the sender address is your own.

A more general alternative to this check is Sender Policy Framework.


2.3.2.3. Simple Sender Address Validation

If the address is local, is the "local part" (the part before the @ sign) a valid mailbox on your system?

If the address is remote, does the "domain" (the part after the @ sign) exist?


2.3.2.4. Sender Callout Verification

This is a mechanism that is offered by some MTAs, such as Exim and Postfix, to validate the "local part" of a remote sender address. In Postfix terminology, it is called "Sender Address Verification".

Your server contacts the MX for the domain provided in the sender address, attempting to initiate a secondary SMTP transaction as if delivering mail to this address. It does not actually send any mail; rather, once the RCPT TO: command has been either accepted or rejected by the remote host, your server sends QUIT.

By default, Exim uses an empty envelope sender address for such callout verifications. The goal is to determine if a Delivery Status Notification would be accepted if returned to the sender.

Postfix, on the other hand, defaults to the sender address < postmaster@domain> for address verification purposes (domain is taken from the $myorigin variable). For this reason, you may wish to treat this sender address the same way that you treat the NULL envelope sender (for instance, avoid SMTP transaction delays or Greylisting, but require Envelope Sender Signatures in recipient addresses). More on this in the implementation appendices.

You may find that this check alone may not be suitable as a trigger to reject incoming mail. Occasionally, legitimate mail, such as a recurring billing statement, is sent out from automated services with an invalid return address. Also, an unfortunate side effect of spam is that some users tend to mangle the return address in their outgoing mails (though this may affect the "From:" header in the message itself more often than the Envelope Sender).

Moreover, this check only verifies that an address is valid, not that it was authentic as the sender of this particular message (but see also Envelope Sender Signature).

Finally, there are reports of sites, such as "aol.com", that will unconditionally blacklist any system from which they discover sender callout requests. These sites may be frequent victims of Joe Jobs, and as a result, receive storms of sender callout requests. By taking part in these DDoS (Distributed Denial-of-Servcie) attacks, you are effectively turning yourself into a pawn in the hands of the spammer.


2.3.3. Recipient Address Checks

This should be simple, you say. A recipient address is either valid, in which case the mail is delivered, or invalid, in which case your MTA takes care of the rejection by default.

Let us have a look, shall we?


2.3.3.1. Open Relay Prevention

Do not relay mail from remote hosts to remote addresses! (Unless the sender is authenticated).

This may seem obvious to most of us, but apparently this is a frequently overlooked consideration. Also, not everyone may have a full grasp of the various internet standards related to e-mail addresses and delivery paths (consider "percent hack domains", "bang (!) paths", etc).

If you are unsure whether your MTA acts as an an Open Relay, you can test it via "relay-test.mail-abuse.org". At a shell prompt on your server, type: telnet relay-test.mail-abuse.org

This is a service that will use various tests to see whether your SMTP server appears to forward mail to remote e-mail addresses, and/or any number of address "hacks" such as the ones mentioned above.

Preventing your servers from acting as open relays is extremely important. If your server is an open relay, and spammers find you, you will be listed in numerous DNS blacklists instantly. If the maintainers of certain other DNS blacklists find you (by probing, and/or by acting on complaints), you will be listed in those for an extended period of time.


2.3.3.2. Recipient Address Lookups

This, too may seem banal to most of us. It is not always so.

If your users' mail accounts and mailboxes are stored directly on your incoming mail exchanger, you can simply check that the "local part" of the recipient address corresponds to a valid mailbox. No problem here.

There are two scenarios where verification of the recipient address is more cumbersome:

  *  If your machine is a backup MX for the recipient domain.

  *  If your machine forwards all mail for your domain to another

(presumably internal) server.

The alternative to recipient address verification is to accept all recipient addresses within these respective domains, which in turn means that you or the destination server might have to generate a Delivery Status Notification for recipient addresses that later turn out to be invalid. Ultimately, this means that you would be generating collateral spam.

With that in mind, let us see how we can verify the recipient in the scenarios listed above.


2.3.3.2.1. Recipient Callout Verification

This is a mechanism that is offered by some MTAs, such as Exim and Postfix, to verify the "local part" of a remote recipient address (see Sender Callout Verification for a description of how this works). In Postfix terminology, this is called "Recipient Address Verification".

In this case, server attempts to contact the final destination host to validate each recipient address before you, in turn, accept the RCPT TO: command from your peer.

This solution is simple and elegant. It works with any MTA that might be running on the final destination host, and without access to any particular directory service. Moreover, if that MTA happens to perform a fuzzy match on the recipient address (this is the case with Lotus Domino servers), this check will accurately reflect whether the recipient address is eventually going to be accepted or not - something which may not be true for the mechanisms described below.

Be sure to keep the original Envelope Sender intact for the recipient callout, or the response from the destination host may not be accurate. For instance, it may reject bounces (i.e. mail with no envelope sender) for system users and aliases, as described in Accept Bounces Only for Real Users.

Among major MTAs, Exim and Postfix support this mechanism.


2.3.3.2.2. Directory Services

Another good solution would be a directory service (e.g. one or more LDAP servers) that can be queried by your MTA. The most common MTAs all support LDAP, NIS, and/or various other backends that are commonly used to provide user account information.

The main sticking point is that unless the final destination host of the e-mail already uses such a directory service to map user names to mailboxes, there may be some work involved in setting this up.


2.3.3.2.3. Replicated Mailbox Lists

If none of the options above are viable, you could fall back to a "poor man's directory service", where you would periodically copy a current list of mailboxes from the machine where they are located, to your MX host(s). Your MTA would then consult this list to validate RCPT TO: commands in incoming mail.

If the machine(s) that host(s) your mailboxes is/are running on some flavor of UNIX or Linux, you could write a script to first generate such a list, perhaps from the local "/etc/passwd" file, and then copy it to your MX host (s) using the "scp" command from the [http://www.openssh.org/] OpenSSH suite. You could then set up a "cron" job (type man cron for details) to periodically run this script.


2.3.3.3. Dictionary Attack Prevention

Dictionary Attack is a term used to describe SMTP transactions where the sending host keeps issuing RCPT TO: commands to probe for possible recipient addresses based on common names (often alphabetically starting with "aaron", but sometimes starting later in the alphabet, and/or at random). If a particular address is accepted by your server, that address is added into the spammer's arsenal.

Some sites, particularly larger ones, find that they are frequent targets of such attacks. From the spammer's perspective, chances of finding a given username on a large site is better than on sites with only a few users.

One effective way to combat dictionary attacks is to issue increasing transaction delays for each failed address. For instance, the first non-existing recipient address can be rejected with a 20-second delay, the second address with a 30-second delay, and so on.


2.3.3.4. Accept only one recipient for DSNs

Legitimate Delivery Status Notifications should be sent to only one recipient address - the originator of the original message that triggered the notification. You can drop the connection if the Envelope Sender address is empty, but there are more than one recipients.


2.4. Greylisting

The greylisting concept is presented by Evan Harris in a whitepaper at: [http://projects.puremagic.com/greylisting/] http://projects.puremagic.com/ greylisting/.


2.4.1. How it works

Like SMTP transaction delays, greylisting is a simple but highly effective mechanism to weed out messages that are being delivered via Ratware. The idea is to establish whether a prior relationship exists between the sender and the receiver of a message. For most legitimate mail it does, and the delivery proceeds normally.

On the other hand, if no prior relationship exists, the delivery is temporariliy rejected (with a 451 SMTP response). Legitimate MTAs will treat this response accordingly, and retry the delivery in a little while[9]. In contrast, ratware will either make repeated delivery attempts right away, and /or simply give up and move on to the next target in its address list.

Three pieces of information from a delivery attempt, referred to a as a triplet are used to uniquely identify the relationship between a sender and a receiver:

  *  The Envelope Sender.

  *  The sending host's IP address.

  *  The Envelope Recipient.

If a delivery attempt was temporarily rejected, this triplet is cached. It remains greylisted for a given amount of time (nominally 1 hour), after which it is whitelisted, and new delivery attempts would succeed. If no new delivery attempts occur prior to a given timeout (nominally 4 hours), then the triplet expires from the cache.

If a whitelisted triplet has not been seen for an extended duration (at minimum one month, to account for monthly billing statements and the like), it is expired. This prevents unlimited growth of the list.

These timeouts are taken from Evan Harris' original greylisting whitepaper (or should we say, ahem, "greypaper"?) Some people have found that a larger timeout may be needed before greylisted triplets expire, because certain ISPs (such as earthlink.net) retry deliveries only every 6 hours or similar. [10]


2.4.2. Greylisting in Multiple Mail Exchangers

If you operate more than one incoming mail exchangers, and each exchanger maintains its own greylisting cache, then:

  *  First-time deliveries from a given sender to one of your users may

theoretically be delayed up to N times the initial 1-hour delay, where N is the number of mail exchangers. This is because the message would likely be retried at a different server than the one that issued the 451 response to the initial delivery. In the worst case, the sender host may not get around to retrying the delivery to the first exchanger for 4 hours, or until after the greylist triplet has expired, thereby causing the delivery attempt to be rejected over and over again, until the sender gives up (usually after 4 days or so).

In practice, this is unlikely. If a delivery attempt temporarily fails, the sender host normally retries the delivery immediately, using a different MX. Thus, after one hour, any of these MX hosts would accept the message.

  *  Even after a triplet has been whitelisted in one of your MXs, the next

message with the same triplet will be greylisted if it is delivered to a different MX.

For these reasons, you may want to implement a solution where the database of greylist triplets is shared between your incoming mail exchangers. However, since the machine that hosts this database would become a single point of failure, you would have to take a sensible action if that machine is down (e.g. accept all deliveries). Or you could use database replication techniques and have the SMTP server fall back to one of the replicating servers for lookups.


2.4.3. Results

In my own experience, greylisting gets rid of about 90% of unique junk mail deliveries, after most of the SMTP checks previously described are applied! If you used greylisting as a first defense, it would likely catch an even higher percentage of incoming junk mail.

Conversely, there are virtually zero False Positives resulting from this technique. All major Mail Transport Agents perform delivery retries after a temporary failure, in a manner that will eventually result in a successful delivery.

The downside to greylisting is a legitimate mail from people who have not e-mailed a particular recipient in the past is subject to a one-hour delay (or maybe several hours, if you operate several MX hosts).

See also What happens when spammers adapt....


2.5. Sender Authorization Schemes

Various schemes have been developed for sender verification where not only the validity, but also the authenticity, of the sender address is checked. The owner of a internet domain specifies certain criteria that must be fulfilled in authentic deliveries from senders within that domain.

Two early proposed schemes of this kind were:

  *  MAIL-FROM MX records, conceived by Paul Vixie <paul (at) vix.com>

  *  Reverse Mail Exchanger (RMX) records as an addition to DNS itself,

conceived and published by Hadmut Danisch <hadmut (at) danisch.de>.

Under both of these schemes, all mails from <user@domain.com> had to come from the hosts specified in <domain.com>'s DNS zone.

These schemes have evolved. Alas, they have also forked.


2.5.1. Sender Policy Framework (SPF)

"Server Policy Framework" (previously "Sender Permitted From") is perhaps the most well-known scheme for sender authorization. It is loosely based on the original schemes described above, but allows for a bit more flexibility in the criteria that can be posted by the domain holder.

SPF information is published as a TXT record in a domain's top-level DNS zone. This record can specify:

  *  which hosts are allowed to send mail from that domain

  *  the mandatory presence of a GPG (GNU Privacy Guard) signature in

outgoing mail from the domain

  *  other criteria; see [http://spf.pobox.com/] http://spf.pobox.com/ for

details.

The structure of the TXT record is still undergoing development, however basic features to accomplish the above are in place. It starts with the string v=spf1, followed by such modifiers as:

  *  a - the IP address of the domain itself is a valid sender host

  *  mx - the incoming mail exchanger for that domain is also a valid sender

  *  ptr - if a rDNS lookup of the sending host's IP address yields a name

within the domain portion of the sender address, it is a valid sender.

Each of these modifiers may be prefixed with a plus sign (+), minus sign (-), question mark (?), or tilde (~) to indicate whether it specifies an authorative source, an non-authorative source, a neutral stance, or a likely non-authorative source, respectively.

Each modifier may also be extended with a colon, followed by an alternate domain name. For instance, if you are a Comcast subscriber, your own DNS zone may include the string "-ptr:client.comcast.net ptr:comcast.net" to indicate that your outgoing e-mail never comes from a host that resolves to anything.client.comcast.net, but could come from other hosts that resolve to anything.comcast.net.

SPF information is currently published for a number of high-profile internet domains, such as aol.com, altavista.com, dyndns.org, earthlink.net, and google.com.

Sender authorization schemes in general and SPF in particular are not universally accepted. In particular, one objection is that domain holders may effectively establish a monopoly on relaying outgoing mail from their users/ customers.

Another objection is that SPF breaks traditional e-mail forwarding - the forwarding host may not have the authority to do so per the SPF information in the envelope sender domain. This is partly addressed via [http:// spf.pobox.com/srs.html] SRS, or Sender Rewriting Scheme, wherein the forwarder of the mail will modify the Envelope Sender address to the format: user=source.domain@forwarder.domain


2.5.2. Microsoft Caller-ID for E-Mail

Similar to SPF, in that acceptance criteria are posted via a TXT record in the sending domain's DNS zone. However, rather than relying on simple keywords, MS CIDE information consists of fairly large structures encoded in XML. The XML schema is published under a license by Microsoft.

While SPF would nominally be used to check the Envelope Sender address of an e-mail, MS CIDE is mainly a tool to validate the RFC 2822 header of the message itself. Thus, the earliest point at which such a check could be applied would be after the message data has been delivered, before issuing the final 250 response.

Quite frankly, dead on arrival. Encumbered by patent issues and sheer complexity.

That said, Recent SPF tools posted on [http://spf.pobox.com/] http:// spf.pobox.com/ are capable of checking MS Caller-ID information in addition to SPF.


2.5.3. RMX++

(part of Simple Caller Authorization Framework - SCAF). This scheme is developed by Hadmut Danisch, who also conceived of the original RMX.

RMX++ allows for dynamic authorization by way of HTTP servers. The domain owner publishes a server location via DNS, and the receiving host contacts that server in order to obtain an authorization record to verify the authenticity of the caller.

This scheme allows the domain owner more fine-grained control of criteria used to authenticate the sender address, without having to publicly reveal the structure of their network (as with SPF information in static TXT records). For instance, an example from Hadmut is an authorization server that allows no more than five messages from a given address per day after business hours, then issues an alert once the limit has been reached.

Moreover, SCAF is not limited to e-mail, but can also be used to provide caller authentication for other services such as Voice over IP (VoIP).

One possible downside with RMX++, as noted by Rick Stewart <rick.stewart (at) theinternetco.net>, is its impact on machine and network resources: Replies from HTTP servers are not as widely cached as information obtained directly via DNS, and it is signifcantly more expensive to make an HTTP request than a DNS request.

Further, Rick notes that the dynamic nature of RMX++ makes faults harder to track. If there is a five-message-per-day limit, as in the example above, and one message gets checked five times, then the limit is hit with a single message. It makes re-checking a message impossible.

For more information on RMX, RMX++, and SCAF, refer to: [http:// www.danisch.de/work/security/antispam.html] http://www.danisch.de/work/ security/antispam.html.


2.6. Message data checks

Time has come to look at the content of the message itself. This is what conventional spam and virus scanners do, as they normally operate on the message after it has been accepted. However, in our case, we perform these checks before issuing the final 250 response, so that we have a chance to reject the mail on the spot rather than later generating Collateral Spam.

If your incoming mail exchangers are very busy (i.e. large site, few machines), you may find that performing some or all of these checks directly in the mail exchanger is too costly. In particular, running Virus Scanners and Spam Scanners do take up a fair amount of CPU bandwidth and time.

If so, you will want to set up dedicated machines for these scanning operations. Most server-side anti-spam and anti-virus software can be invoked over the network, i.e. from your mail exchanger. More on this in the following chapters, where we discuss implementation for the various MTAs.


2.6.1. Header checks

2.6.1.1. Missing Header Lines

RFC 2822 mandates that a message should contain at least the following header lines:
From: ... To: ... Subject: ... Message-ID: ... Date: ...

The absence of any of these lines means that the message is not generated by a mainstream Mail User Agent, and that it is probably junk [11].


2.6.1.2. Header Address Syntax Check

Addresses presented in the message header (i.e. the To:, Cc:, From: ... fields) should be syntactically valid. Enough said.


2.6.1.3. Simple Header Address Validation

For each address in the message header:

  *  If the address is local, is the local part (before the @ sign) a valid

mailbox?

  *  If the address is remote, does the domain part (after the @ sign)

exist?


2.6.1.4. Header Address Callout Verification

This works similar to Sender Callout Verification and Recipient Callout Verification. Each remote header address is verified by calling the primary MX for the corresponding domain to determine if a Delivery Status Notification would be accepted.


2.6.2. Junk Mail Signature Repositories

One trait of junk mail is that it is sent to a large number of addresses. If 50 other recipients have already flagged a particular message as spam, why couldn't you use this fact to decide whether or not to accept the message when it is delivered to you? Better yet, why not set up Spam Traps that feed a public pool of known spam?

I am glad you asked. As it turns out, such pools do exist:

  *  [http://razor.sf.net/] Razor

  *  [http://pyzor.sf.net/] Pyzor

  *  Distributed Checksum Clearinghouse (DCC)

These tools have progressed beyond simple signature checks that only trigger if you receive an identical copy of a message that is known to be junk mail. Rather, they evaluate common patterns, to account for slight variations in the message header and body.


2.6.3. Binary garbage checks

Messages containing non-printable characters are rare. When they do show up, the message is nearly always a virus, or in some cases spam written in a non-western language, without the appropriate MIME encoding.

One particular case is where the message contains NUL characters (ordinal zero). Even if you decide that figuring out what a non-printable character means is more complex than beneficial, you might consider checking for this character. That is because some Mail Delivery Agents, such as the [http:// asg.web.cmu.edu/cyrus/] Cyrus Mail Suite, will ultimately reject mails that contain it. [12]. If you use such software, you should definitely consider getting rid of NUL characters.

On the other hand, the (now obsolete) RFC 822 specification did not explicitly prohibit NUL characters in the message. For this reason, as an alternative to rejecting mails containing it, you may choose to strip these characters from the message before delivering it to Cyrus.


2.6.4. MIME checks

Similarly, it might be worthwhile to validate the MIME structure of incoming message. MIME decoding errors or inconsistencies do not happen very often; but when they do, the message is definitely junk. Moreover, such errors may indicate potential problems in subsequent checks, such as File Attachment Checks, Virus Scanners, or Spam Scanners.

In other words, if the MIME encoding is illegal, reject the message.


2.6.5. File Attachment Check

When was the last time someone sent you a Windows screensaver (".scr" file) or Windows Program Information File (".pif") that you actually wanted?

Consider blocking messages with "Windows executable" file attachment(s) - i.e. file names that end with a period followed by any of a number of three-letter combinations such as the above. This check consumes significantly less resources on your server than Virus Scanners, and may also catch new virii for which a signature does not yet exist in your anti-virus scanner.

For a more-or-less comprehensive list of such "file name extensions", please visit: [http://support.microsoft.com/default.aspx?scid=kb;EN-US; 290497] http://support.microsoft.com/default.aspx?scid=kb;EN-US;290497.


2.6.6. Virus Scanners

A number of different server-side virus scanners are available. To name a few:

  *  [http://www.vanja.com/tools/sophie/] Sophie

  *  [http://www.kapersky.com/] KAVDaemon

  *  [http://clamav.elektrapro.com/] ClamAV

  *  [http://www.sald.com/] DrWeb

In situations where you are not willing to block all potentially dangerous files based on their file names alone (consider ".zip" files), such scanners are helpful. Also, they will be able to catch virii that are not transmitted as file attachments, such as the "Bagle.R" virus that arrived in March, 2004.

In most cases, the machine performing the virus scan does not need to be your mail exchanger. Most of these anti-virus scanners can be invoked on a different host over a network connection.

Anti-virus software mainly detect virii based on a set of signatures for known virii, or virus definitions. These need to be updated regularly, as new virii are developed. Also, the software itself should at any time be up to date for maximum accuracy.


2.6.7. Spam Scanners

Similarly, anti-spam software can be used to classify messages based on a large set of heuristics, including their content, standards compliance, and various network checks such as DNS Blacklists and Junk Mail Signature Repository. In the end, such software typically assigns a composite "score" to each message, indicating the likelihood that the message is spam, and if the score is above a certain threshold, would classify it as such.

Two of the most popular server-side heuristic anti-spam filters are:

  *  [http://www.spamassassin.org/] SpamAssassin

  *  [http://www.brightmail.com/] BrightMail

These tools undergo a constant evolution as spammers find ways to circumvent their various checks. For instance, consider "creative" spelling, such as "GR0W lO 1NCH35". So, just like anti-virus software, if you use anti-spam software, you should update it frequently for the highest level of accuracy.

I use SpamAssassin, although to minimize impact on machine resources, it is no longer my first line of defense. Out of approximately 500 junk mail delivery attempts to my personal address per day, about 50 reach the point where they are being checked by SpamAssassin (mainly because they are forwarded from one of my other accounts, so the checks described above are not effective). Out of these 50 messages, one message ends up in my inbox approximately every 2 or 3 days.


2.7. Blocking Collateral Spam

Collateral Spam is more difficult to block with the techniques described so far, because it normally arrives from legitimate sites using standard mail transport software (such as Sendmail, Postfix, or Exim). The challenge is to distinguish these messages from valid Delivery Status Notifications returned in response to mail sent from your own users. Here are some ways that people do this:


2.7.1. Bogus Virus Warning Filter

Most of the time, collateral spam is virus warnings generated by anti-virus scanners[13]. In turn, the wording in the Subject: line of these virus warnings, and/or other characteristics, is usually provided by the anti-virus software itself. As such, you could create a list of the more common characteristics, and filter out such bogus virus warnings.

Well, aren't you in luck - someone already did this for you. :-)

Tim Jackson <tim (at) timj.co.uk> maintains a list of bogus virus warnings for use with SpamAssassin. This list is available at: [http://www.timj.co.uk/ linux/bogus-virus-warnings.cf] http://www.timj.co.uk/linux/ bogus-virus-warnings.cf.


2.7.2. Publish SPF info for your domain

The purpose of the Sender Policy Framework is precisely to protect against Joe Jobs; i.e. to prevent forgeries of valid e-mail addresses.

If you publish SPF records in the DNS zone for your domain, then recipient hosts that incorporate SPF checks would not have accepted the forged message in the first place. As such, they would not be sending a Delivery Status Notification to your site.


2.7.3. Enveloper Sender Signature

A different approach that I am currently experimenting with myself is to add a signature in the local part of the Envelope Sender address in outgoing mail, then check for this signature in the Envelope Recipient address before accepting incoming Delivery Status Notifications. For instance, the generated sender address might be of the following format: localpart=signature@domain

Normal message replies are unaffected. These replies go to the address in the From: or Reply-To: field of the message, which are left intact.

Sounds easy, doesn't it? Unfortunately, generating a signature that is suitable for this purpose is a bit more complex than it sounds. There are a couple of conflicting considerations to take into account:

  *  To gain any benefit from this method, the signed envelope sender

address that you generate should be useless in the hands of spammers. Typically, this would imply that the signature incorporates a time stamp that would eventually expire:
sender=timestamp=hash@domain

  *  If you send mail to a site that incorporates Greylisting, your envelope

sender address should remain constant for that particular recipient. Otherwise, your mail will continuously be greylisted.

With this in mind, you could generate a Envelope Sender based on the Envelope Recipient address:
sender=recipient=recipient.domain=hash@domain Although this address does not expire, if you start seeing junk mail to it, you will at least know the source of the leak - it is incorported in the recipient address. Moreover, you can easily block specific recipient address signatures, without affecting normal mail delivery to that same recipient.

  *  Two more issues occur with mailing list servers. Usually, replies to

request mails (such as "subscribe"/"unsubscribe") are sent with no envelope sender.

  +  The first issue pertains to servers that send responses back to the

        Envelope Sender address of the request mail (as in the case of <
        discuss@en.tldp.org>). The problem is that commands for the mailing
        list server (such as subscribe or unsubscribe) are typically sent to
        one or more different addresses (e.g. <discuss-subscribe@en.tldp.org>
        and <discuss-unsubscribe@en.tldp.org>, respectively) than the address
        used for list mail. Hence, the subscriber address will be different
        from the sender address in messages sent to the list itself -- and in
        this example, also different from the address that will be generated
        for unsubscription requests. As a result, you may not be able to post
        to the list, or unsubscribe.
       
          The compromise would be to incorporate only the recipient domain in
        the sender signature. The sender address might then look like:
        subscribername=en.tldp.org=hash@subscriber.domain                    

  +  The second issue pertains to those that send responses back to the

        reply address in the message header of the request mail (such as <
        spam-l-request@peach.ease.lsoft.com>). Since this address is not
        signed, the response from the list server would be blocked by your
        server.
       
          There is not much you can do about this, other than to "whitelist"
        these particular servers in such a way that they are allowed to
        return mail to unsigned recipient addresses.

At this point, this approach starts losing some of its edge. Moreover, even legitimate DSNs are rejected unless the original mail has been sent via your server. Thus, you should only consider doing this if for those of your users that do not roam, or otherwise send their outgoing mail via servers outside your control.

That said, in situations where none of the above concerns apply to you, this method gives you a good way to not only eliminate collateral spam, but also a way to educate the owners of the sites that (presumably unwittingly) generate it. Moreover, as a side benefit, sites that perform Sender Callout Verification will only get a positive response from you if the original mail was, indeed, sent from your site. In essence, you are reducing your exposure to sender address forgeries by spammers.

You could perhaps allow your users to specify whether to sign outgoing mails, and if so, specify which hosts should be allowed to return mails to the unsigned version of their address. For instance, if they have system accounts on your mail server, you could check for the existence and content, respectively, of a given file in their home directory.


2.7.4. Accept Bounces Only for Real Users

Even if you check for envelope sender signatures, there may be a loophole that allows bogus bounces to be accepted. Specifically, if your users have to opt in to the scheme, you are probably not checking for this signature in mails sent to system aliases, such as postmaster or mailer-daemon. Moreover, since these users do not generate outgoing mail, they should not receive any bounces.

You can reject mail if it is sent to such system aliases, or alternatively, if there is no mailbox for the provided recipient address.


Chapter 3. Considerations

Some specific considerations come into play as a result of system-wide SMTP time filtering. Here we cover some of those.


3.1. Multiple Incoming Mail Exchangers

Most domains list more than one incoming Mail Exchangers (a.k.a. "MX hosts" ). If you do so, then bear in mind that in order to have any effect, any SMTP time filtering you incorporate on the primary MX has to be incorporated on all the others as well. Otherwise, the sending host would simply sidestep filtering by retrying the mail delivery through your backup server(s).

If the backup server(s) are not under your control, ask yourself whether you need multiple MXs in the first place. In this situation, chances are that they serve only as redundant mail servers, and that they in turn forward the mail to your primary MX. If so, you probably don't need them. If your host happens to be down for a little while, that's OK -- well-behaved sender hosts will retry deliveries for several days before giving up [9].

A situation where you may need multiple MXs is to perform load balancing between several servers - i.e. if you receive so much mail that one machine alone could not handle it. In this case, see if you could offload some tasks (such as virus and spam scanners) to other machines, in order to reduce or eliminate this need.

Again, if you do decide to keep using several MXs, your backup servers need to be (at least) as restrictive as the primary server, lest filtering in the primary MX is useless.

See also the section on Greylisting for additional concerns related to multiple MX hosts.


3.2. Blocking Access to Other SMTP Servers

Any SMTP server that is not listed as a public Mail Exchanger in the DNS zone of your domain(s) should not accept incoming connections from the internet. All incoming mail traffic should go through your incoming mail exchanger(s).

This consideration is not unique to SMTP servers. If you have machines that only serve an internal purpose within your site, use a firewall to restrict access to these.

This is a rule, so therefore there must be exceptions. However, if you don't know what they are, then the above applies to you.


3.3. Forwarded Mail

You should take care not to reject mail as a result of spam filtering if it is forwarded from "friendly" sources, such as:

  *  Your backup MX hosts, if any. Supposedly, these have already filtered

out most of the junk (see Multiple Incoming Mail Exchangers).

  *  Mailing lists, to which you or your users subscribe. You may still

filter such mail (it may not be as criticial if it ends up in a black hole). However, if you reject the mail, you may end up causing the list server to automatically unsubscribe the recipient.

  *  Other accounts belonging to the recipient. Again, rejections will

generate collateral spam, and/or create problems for the host that forwards the mail.

You may see a logistical issue with the last two of these sources: They are specific to each recipient. How to you allow each user to specify which hosts they want to whitelist, and then use such individual whitelists in a system-wide SMTP-time filtering setup? If the message is forwarded to several recipients at your site (as may often be true in the case of a mailing list), how do you decide whose whitelist to use?

There is no magic bullet here. This is one of those situations where we just have to do a bit of work. You can decide to accept all mails, regardless of spam classification, so long as it is sent from a host in the whitelist of any one of the recipients. For instance, in response to each RCPT TO: command, we can match the sending host against the corresponding user's whitelist. If found, set a flag that will prevent a subsequent rejection. Effectively, you are using an aggregate of each recipient's whitelist.

The implementation appendices cover this in more detail.


3.4. User Settings and Data

There are other situations where you may want to support settings and data for each user at site. For instance, if you scan incoming mail with SpamAssassin (see Spam Scanners), you may want to allow for individual spam thresholds, acceptable languages and character sets, and Bayesian training/ data.

A sticking point is that SMTP-time filtering of incoming mail is done at the system level, before mail is being delivered to a particular user, and as such, does not lend itself too well to individual preferences. A single message may have several recipients; and unlike the case with Forwarded Mail, using an aggregate of each recipient's preferences is not a good option. Consider a scenario where you have users from different linguistic backgrounds.

As it turns out, though, there is a modification to this truth. The trick is to limit the number of recipients in incoming messages to one, so that the message can be analyzed in accordance with the settings and data that belongs to the corresponding user.

To do this, you would accept the first RCPT TO:, then issue a SMTP 451 (defer) response to subsequent commands. If the caller is a well-behaved MTA, it will know how to interpret this response, and try later. (If it is confused, then, well, it is probably a sender from which you don't want to receive mail in the first place).

Obviously, this is a hack. Every mail sent to several users at your site will be slowed down by 30 minutes or more per recipient. Especially in corporate environments, where it is common to see e-mail discussions involving several people on the inside and several others on the outside, and where timelines of mail deliveries are essential, this is probably not a good solution at all.

Another issue that mainly pertains to corporate enterprises and other large sites is that incoming mail is often forwarded to internal machines for delivery, and that recipients don't normally have accounts on the mail exchanger. It may still be possible to support user-specific settings and data in these situations (e.g. via database lookups or LDAP queries), but you may also want to consider whether it's worth the effort.

That said, if you are on a small site, and where you are not afraid of delayed deliveries, this may be an acceptable way to allow each user to fine tune their filtering criteria.


Chapter 4. Questions & Answers

In this section I try to anticipate some of the questions that may come up, and to answer them. If you have questions that are not listed, and/or would like to provide extra input in this section, please provide feedback.

When Spammers Adapt

Q: What happens when spammers adapt and try to get around the techniques

described in this document?

Q: What happens when spammers adapt and try to get around the techniques described in this document?

  1. Well, that depends. :-)

Some of the checks described (such as SMTP checks and Greylisting) specifically target ratware behavior. It is certainly possible to imagine that this behavior will change if enough sites incorporate these checks. Hatmut Danisch notes: Ratware contains buggy SMTP protocols because they didn't need to do any better. It worked this way, so why should they have spent more time? Meanwhile "ratware" has a higher quality, and even the quality of spam messages has significantly improved. Once enough people reject spam by detecting bad SMTP protocols, spam software authors will simply improve their software.

That said, there are challenges remaining for such ratware:

  *  To get around SMTP transaction delays, they need to wait for each

response from the receiving SMTP server. At that point, we have collectively accomplished a significant reduction in the rate of mail that a given spamming host is able to deliver per unit of time. Since spammers are racing against time to deliver as many mails as possible before DNS blocklists and collaborative content filters catch up, we are improving the effectiveness of these tools.

The effect is similar to the goal of Micropayment Schemes, wherein the sender spends a few seconds working on a computational challenge for each recipient of the mail, and adds a resulting signature to the e-mail header for the recipient to validate. The main difference, aside from the complexity of these schemes, is that they require the participation of virtually everyone in the world before they can effectively be used to weed out spam, whereas SMTP transaction delays start being effective with the first recipient machine that implements it.

  *  To get around a HELO/EHLO check, they need to provide a proper

greeting, i.e. identify themselves with a valid Fully Qualified Domain Name. This provides for increased traceability, especially with receiving Mail Transport Agents that do not automatically insert the results of a rDNS lookup into the Received: header of the message.

  *  To get all of the Sender Address Checks, they need to provide their own

valid sender address (or, at least, a valid sender address within their own domain). Nuff said.

  *  To get around Greylisting, they need to retry deliveries to temporarily

failed recipients addresses after one hour (but before four hours). (As far as implementation goes, in order to minimize machine resources, rather than keeping a copy of each temporarily failed mail, ratware may keep only a list of temporarily failed recipients, and perform a second sweep through those addresses after an hour or two).

Even so, greylisting will remain fairly effective in conjunction with DNS Blacklists that are fed from Spam Traps. That is because the mandatory one-hour retry delay will give these lists a chance to list the sending host.

Software tools, such as Spam Scanners and Virus Scanners, are in constant evolution. As spammers evolve, so do these (and vice versa). As long as you use recent versions of these tools, they will remain quite effective.

Finally, this document is itself subject to change. As the nature of junk mail changes, people will come up with new, creative ways to block it.


Appendix A. Exim Implementation

Here we cover the integration of techniques and tools described in this document into the Exim Mail Transport Agent.


A.1. Prerequisites

For these examples, you need the Exim Mail Transport Agent, preferrably with Tom Kistner's Exiscan-ACL patch applied. Prebuilt Exim+Exiscan-ACL packages exist for the most popular Linux distributions as well as FreeBSD; see the [http://duncanthrax.net/exiscan-acl/] Exiscan-ACL home page for details[14].

The final implementation example at the end incorporates these additional tools:

  *  [http://www.spamassassin.org/] SpamAssassin - a popular spam filtering

tool that analyzes mail content against a large and highly sophisticated set of heuristics.

  *  [http://packages.debian.org/unstable/mail/greylistd] greylistd - a

simple greylisting solution written by yours truly, specifically with Exim in mind.

Other optional software is used in examples throughout.


A.2. The Exim Configuration File

The Exim configuration file contains global definitions at the top (we will call this the main section), followed by several other sections[15]. Each of these other sections starts with:
begin section

We will spend most of our time in the acl section (i.e. after begin acl); but we will also add and/or modify a few items in the transports and routers sections, as well as in the main section at the top of the file.


A.2.1. Access Control Lists

As of version 4.xx, Exim incorporates perhaps the most sophisticated and flexible mechanism for SMTP-time filtering available anywhere, by way of so-called Access Control Lists (ACLs).

An ACL can be used to evaluate whether to accept or reject an aspect of an incoming message transaction, such as the initial connection from a remote host, or the HELO/EHLO, MAIL FROM:, or RCPT TO: SMTP commands. So, for instance, you may have an ACL named acl_rcpt_to to validate each RCPT TO: command received from the peer.

An ACL consists of a series of statements (or rules). Each statement starts with an action verb, such as accept, warn, require, defer, or deny, followed by a list of conditions, options, and other settings pertaining to that statement. Every statement is evaluated in order, until a definitive action (besides warn) is taken. There is an implicit deny at the end of the ACL.

A sample statement in the acl_rcpt_to ACL above may look like this: deny message = relay not permitted !hosts = +relay_from_hosts !domains = +local_domains : +relay_to_domains delay = 1m

This statement will reject the RCPT TO: command if it was not delivered by a host in the "+relay_from_hosts" host list, and the recipient domain is not in the "+local_domains" or "+relay_to_domains" domain lists. However, before issuing the "550" SMTP response to this command, the server will wait for one minute.

To evaluate a particular ACL at a given stage of the message transaction, you need to point one of Exim's policy controls to that ACL. For instance, to use the acl_rcpt_to ACL mentioned above to evaluate the RCPT TO:, the main section of your Exim configuration file (before any begin keywords) should include:
acl_smtp_rcpt = acl_rcpt_to

For a full list of such policy controls, refer to section 14.11 in the Exim specifications.


A.2.2. Expansions

A large number of expansion items are available, including run-time variables, lookup functions, string/regex manipulations, host/domain lists, etc. etc. An exhaustive reference for the last x.x0 release (i.e. 4.20, 4.30..) can be found in the file "spec.txt"; ACLs are described in section 38.

In particular, Exim provides twenty general purpose expansion variables to which we can assign values in an ACL statement:

  *  $acl_c0 - $acl_c9 can hold values that will persist through the

lifetime of an SMTP connection.

  *  $acl_m0 - $acl_m9 can hold values while a message is being received,

but are then reset. They are also reset by the HELO, EHLO, MAIL, and RSET commands.


A.3. Options and Settings

The main section of the Exim configuration file (before the first begin keyword) contains various macros, policy controls, and other general settings. Let us start by defining a couple of macros we will use later: # Define the message size limit; we will use this in the DATA ACL. MESSAGE_SIZE_LIMIT = 10M

# Maximum message size for which we will run Spam or Virus scanning. # This is to reduce the load imposed on the server by very large messages. MESSAGE_SIZE_SPAM_MAX = 1M

Macro defining a secret that we will use to generate various hashes. PLEASE CHANGE THIS!. SECRET = some-secret

Let us tweak some general Exim settings: # Treat DNS failures (SERVFAIL) as lookup failures. # This is so that we can later reject sender addresses # within non-existing domains, or domains for which no # nameserver exists. dns_again_means_nonexist = !+local_domains : !+relay_to_domains

# Enable HELO verification in ACLs for all hosts helo_try_verify_hosts = *

# Remove any limitation on the maximum number of incoming # connections we can serve at one time. This is so that while # we later impose SMTP transaction delays for spammers, we # will not refuse to serve new connections. smtp_accept_max = 0

# ..unless the system load is above 10 smtp_load_reserve = 10

# Do not advertise ESMTP "PIPELINING" to any hosts. # This is to trip up ratware, which often tries to pipeline # commands anyway. pipelining_advertise_hosts = :

Finally, we will point some Exim policy controls to five ACLs that we will create to evaluate the various stages of an incoming SMTP transaction: acl_smtp_connect = acl_connect acl_smtp_helo = acl_helo acl_smtp_mail = acl_mail_from acl_smtp_rcpt = acl_rcpt_to acl_smtp_data = acl_data


A.4. Building the ACLs - First Pass

In the acl section (following begin acl), we need to define these ACLs. In doing so, we will incorporate some of the basic Techniques described earlier in this document, namely DNS checks and SMTP checks.

In this pass, we will do most of the checks in acl_rcpt_to, and leave the other ACLs largely empty. That is because most of the commonly used ratware does not understand rejections early in the SMTP transaction - it keeps trying. On the other hand, most ratware clients give up if the RCPT TO: fails.

We create all these ACLs, however, because we will use them later.


A.4.1. acl_connect

# This access control list is used at the start of an incoming # connection. The tests are run in order until the connection # is either accepted or denied.

acl_connect:

# In this pass, we do not perform any checks here. accept


A.4.2. acl_helo

# This access control list is used for the HELO or EHLO command in # an incoming SMTP transaction. The tests are run in order until the # greeting is either accepted or denied.

acl_helo:

# In this pass, we do not perform any checks here. accept


A.4.3. acl_mail_from

# This access control list is used for the MAIL FROM: command in an # incoming SMTP transaction. The tests are run in order until the # sender address is either accepted or denied. #

acl_mail_from:

# Accept the command. accept


A.4.4. acl_rcpt_to

# This access control list is used for every RCPT command in an # incoming SMTP message. The tests are run in order until the # recipient address is either accepted or denied.

acl_rcpt_to:

# Accept mail received over local SMTP (i.e. not over TCP/IP). # We do this by testing for an empty sending host field. # Also accept mails received from hosts for which we relay mail. # # Recipient verification is omitted here, because in many # cases the clients are dumb MUAs that don't cope well with # SMTP error responses. # accept hosts = : +relay_from_hosts

# Accept if the message arrived over an authenticated connection, # from any host. Again, these messages are usually from MUAs, so # recipient verification is omitted. # accept authenticated = *

###################################################################### # DNS checks ###################################################################### # # The results of these checks are cached, so multiple recipients # does not translate into multiple DNS lookups. #

# If the connecting host is in one of a select few DNSbls, then # reject the message. Be careful when selecting these lists; many # would cause a large number of false postives, and/or have no # clear removal policy. # deny dnslists = dnsbl.sorbs.net : \

                  dnsbl.njabl.org : \                                        
                  cbl.abuseat.org : \                                        
                  bl.spamcop.net                                             
    message     = $sender_host_address is listed in $dnslist_domain\         
                  ${if def:dnslist_text { ($dnslist_text)}}                  

# If reverse DNS lookup of the sender's host fails (i.e. there is # no rDNS entry, or a forward lookup of the resulting name does not # match the original IP address), then reject the message. # deny

    message     = Reverse DNS lookup failed for host $sender_host_address.   
    !verify     = reverse_host_lookup                                        

###################################################################### # Hello checks ######################################################################

# If the remote host greets with an IP address, then reject the mail. # deny message = Message was delivered by ratware log_message = remote host used IP address in HELO/EHLO greeting condition = ${if isip {$sender_helo_name}{true}{false}}

# Likewise if the peer greets with one of our own names # deny message = Message was delivered by ratware log_message = remote host used our name in HELO/EHLO greeting. condition = ${if match_domain{$sender_helo_name}\

                       {$primary_hostname:+local_domains:+relay_to_domains}\ 
                       {true}{false}}                                        

deny message = Message was delivered by ratware log_message = remote host did not present HELO/EHLO greeting. condition = ${if def:sender_helo_name {false}{true}}

# If HELO verification fails, we add a X-HELO-Warning: header in # the message. # warn

    message     = X-HELO-Warning: Remote host $sender_host_address \         
                  ${if def:sender_host_name {($sender_host_name) }}\         
                  incorrectly presented itself as $sender_helo_name          
    log_message = remote host presented unverifiable HELO/EHLO greeting.     
    !verify     = helo                                                       

###################################################################### # Sender Address Checks ######################################################################

# If we cannot verify the sender address, deny the message. # # You may choose to remove the "callout" option. In particular, # if you are sending outgoing mail through a smarthost, it will not # give any useful information. # # Details regarding the failed callout verification attempt are # included in the 550 response; to omit these, change # "sender/callout" to "sender/callout,no_details". # deny

    message     = <$sender_address> does not appear to be a \                
                  valid sender address.                                      
    !verify     = sender/callout                                             

###################################################################### # Recipent Address Checks ######################################################################

# Deny if the local part contains @ or % or / or | or !. These are # rarely found in genuine local parts, but are often tried by people # looking to circumvent relaying restrictions. # # Also deny if the local part starts with a dot. Empty components # aren't strictly legal in RFC 2822, but Exim allows them because # this is common. However, actually starting with a dot may cause # trouble if the local part is used as a file name (e.g. for a # mailing list). # deny local_parts = ^.*[@%!/|] : ^\\.

# Drop the connection if the envelope sender is empty, but there is # more than one recipient address. Legitimate DSNs are never sent # to more than one address. # drop

    message      = Legitimate bounces are never sent to more than one \      
                   recipient.                                                
    senders      = : postmaster@*                                            

condition = $recipients_count

# Reject the recipient address if it is not in a domain for # which we are handling mail. # deny message = relay not permitted !domains = +local_domains : +relay_to_domains

# Reject the recipient if it is not a valid mailbox. # If the mailbox is not on our system (e.g. if we are a # backup MX for the recipient domain), then perform a # callout verification; but if the destination server is # not responding, accept the recipient anyway. # deny

    message     = unknown user                                               
    !verify     = recipient/callout=20s,defer_ok                             

# Otherwise, the recipient address is OK. # accept


A.4.5. acl_data

# This access control list is used for message data received via # SMTP. The tests are run in order until the recipient address # is either accepted or denied.

acl_data:

# Add Message-ID if missing in messages received from our own hosts. warn condition = ${if !def:h_Message-ID: {1}}

    hosts       = : +relay_from_hosts                                        
    message     = Message-ID: <E$message_id@$primary_hostname>