Introduction
Who is it for
Portability
Downloads
Sample uses
Fan-in mail collector
Antivirus filter
Bayesian* or any other technique en vogue
Filtering against external spammers blacklist*
Challenge/response whitelist*
Industrial mail processor
Generic POP3 server
Ruleset
How it works
Samples
List of special header fields
Performance
Copyright, license and feedback
*) Default set of rules has these.
Green Mail Filter is an all-purpose mail processor, it has lots of possible uses.
Green is a multi-user server application, it processes users' mail. Although it
comes with a default installation of just one user, it's by no means limited with that.
Green can be thought as a platform for developing mail filters.
In the shortest possible way - Green collects users' mail messages from somewhere, runs
each received message through a set of rules (aka ruleset) and finally puts the message
to one of the user's mailboxes. The primary use for such behaviour was separating
legitimate mail (default "mail" mailbox) from spam (default "spam" mailbox).
Green collects user's mail from external POP3 accounts configured for each user.
There is also a separate spooling directory (.spool) for each user, all the files that
appear there get collected as well. Spooling messages as plain files can be useful in an
environment where mail servers are already in place and filtering needs to be added.
The processing of the collected messages is fully determined with the exact rules
making up the user's ruleset. Therefore the Green filter is only as good as it's
ruleset is. See detailed explanation of how ruleset works here.
When message processing is complete, the message is put to one of the user's mailboxes
(each mailbox is a subdirectory in a user's directory). Green also contains a POP3
server, and so the user can connect to it and read her mail as usual.
To start using Green, please follow these minimum steps:
1. Download and install. Among other things, the installer prompts for a username for
the user account being created. Let's say you enter jsmith here.
2. The Green core is installed as a system service and starts immediately.
3. Start Green Mail Filter manager application.
4. Under Users/jsmith configure the POP3 password you'd like to be used in the internal
Green's mail server. Let's say you enter mypass here.
5. Under Users/jsmith/Accounts, create and configure one or more records for external
POP3 accounts, these are used for collecting mail from.
6. Start your mail client (Outlook, TheBat, whatever) and configure it to use POP3 server
at 127.0.0.1:110 (this is the builtin Green's mail server). Set login information
to jsmith (username) and mypass (password).
7. Check your mail.
One thing to note is that how several mailboxes ("mail", "spam", etc.) can be read through single
POP3 account created for single user at 127.0.0.1:110. Here is how - if the user logs in to Green's
POP3 server using her "username" and "password", she gets a view of a default "mail" mailbox.
If the user wants to read other mailboxes (ex. "spam"), she logs in using a special username:
"username/spam" and the same "password". In other words, Green's POP3 server accepts usernames
of the form "username/mailbox" where mailbox name defaults to "mail".
Green is likely to be used by an enthusiastic e-mail user who has at significant knowledge
of how e-mail works. The more e-mail and networking knowledge - the better. Users that can
do programming will benefit from fully exploiting the Green's potential.
Developers can use Green as a platform for building arbitrary mail filters. As Green rulesets
can use Python scripts, knowledge of this wonderful programming language is certainly a plus.
E-mail providers can install Green on the server side and give their customers additional
value by configuring mail filtering in any way they like.
Companies that face a need of processing lots of e-mail can use Green as a generic mail
processor.
Green runs under Windows 2000 or better.
Green contains three main modules - server core (green.exe),
Win32 service wrapper (greensvc.exe) and GUI control panel (greenmgr.exe). The server core
is written in portable C++ in a portable fashion, therefore it can be more or
less easily ported to Unix. Service wrapper has no meaning under Unix, and so it needs
not to be ported. The GUI control panel is written in Delphi and so it'd be problematic
to port. On the other hand (1) the control panel is not a required component of the server
which is fully configurable with standalone XML files, and (2) it's theoretically possible
that the GUI control panel is run on a separate machine from the service core so it can
control the server core running on a separate machine.
Server core should run under Windows 98 or better, but because it's installed as a
service, it's only reasonable to run it under Windows 2000 or higher. Besides, extensive
tests were not performed under Windows 98 at all.
Binary installation package (Windows 2000 or higher, requires administrative rights to install):
Version 1.4, released Sept 13, 2005 (changes)
Installer (~3,8M): green-1.4.exe (same thing, zipped: green-1.4.zip)
Installer signature: green-1.4.exe.sig
Listed below are a few sample applications for Green filter. As each is nothing
but a few rules in a ruleset, they can obviously be mixed and matched in any
imaginable way. Some of these sample rulesets come bundled with the installation in
the "samples" subdirectory and you can try them out using "Import" popup menu item
in the GUI control panel.
1. Fan-in mail collector:
This is the simplest possible application. Green collects user's mail from
one or more external POP3 accounts as well as from spooling directory and puts
it all to "mail" mailbox so that the user can fetch it all in one place from
the Green's POP3 server. Not much of a filtering taking place though.
2. Antivirus filter:
If you install a 3rd party antivirus, possibly a free one that comes with a
command line scanner, it's easy to set up a rule which would execute this
scanner against each received message and filter appropriately.
3. Bayesian (included with default ruleset) or any other technique en vogue:
For a developer or enthusiastic user with programming knowledge, Green could be
a solid prototyping/production ground. As Green's rules can contain arbitrary Python
code, there is no limit on what can be done. Want it bayesian ? Want path analysis ?
Want a database of patterns ? Want to try any other brand new idea ? Green serves as
a generic filtering platform.
5. Challenge/response whitelist (included with default ruleset):
The idea behind whitelisting is having a list of known e-mail senders you only expect
e-mail from. Having automated whitelist is similar, but it attempts to distinguish persons
sending mail to you from spam machines. The separation is done by challenging each unknown
sender with a request to proof she's not a machine. If she is not a machine, she reads
the challenging reply message the filter sends back to her and follows the instructions
included in it. On the other hand, noone ever reads responses to spam, and so spam is
filtered out.
Details vary (change the ruleset and you change them), but at default, in it's reply message
Green prompts that your peers send you a single mail message once with a subject line that
contains a short random number, ex. 333. Receiving such a message proves that the author
has read the message and reacted upon, hence she's a human. Once such a message is received,
the sender is permanently whitelisted.
Cool as they may sound, automated whitelists have so many downsides to them, right to the point
at which you wouldn't want to use them. Although I'm not very much against the challenge/response
filtering, you may check the following links to see why people strongly believe it's bad:
http://kmself.home.netcom.com/Rants/challenge-response.html
http://tardigrade.net/challengeresponse.html
http://richi.co.uk/blog/2005/05/why-challengeresponse-is-bad.html
http://www.businessweek.com/magazine/content/03_27/b3840044.htm
Just to make things worse, the idea has fallen to the patent madness, the challenge response technology
is covered with several US patents, and although there is a handful of products that have it implemented
(see this:
http://spamlinks.net/filter-cr.htm), and
there have been a few suits filed, the outcome is unclear.
Keep in mind that with Green you are not limited with any given set of rules, it can do anything.
Doing even such a complex thing as challenge response is piece of cake, in fact it's a single rule.
You may think about the challenge/response whitelist as of a mere example of the expressive power of
Green rulesets. Whitelisting is just one of the spam fighting ideas, but if you come up with a better
one, go right ahead, drop in a rule or two and away you go !
6. Industrial mail processor:
Automatically processing large volumes of incoming mail is a task that frequently appears
in different industries. Using Python scripts you can make Green connect to anything you
have and use it for any purpose - maintaining a mailing list, verifying digital signatures,
sorting support mail, and so on and so forth. As Green collects messages that are spooled
as regular files, it's easy to send it mail for processing too (although you might want to
decrease the spool reading interval in the server configuration).
7. Generic POP3 server:
Finally, Green can be used as just as a POP3 server, as it in fact contains one. Anything that gets
to the users' mailboxes is served out via POP3. No filtering at all here.
How it works:
Ruleset is the heart of Green Mail Filter message processing.
Ruleset is an ordered collection of rules. Rules are applied to each message one by one starting
with the first rule and proceeding to the next until some rule dispatches the message to a particular
mailbox or the end of the ruleset is reached.
Each rule has a textual (and otherwise meaningless) description, which is only useful for a
user reading the ruleset. Each rule can be enabled or disabled. Disabled rules are obviously
skipped. Each rule can also have an expiration date, so that a ruleset does not get clogged
with the temporary created rules. Expired rules are silently dropped from the ruleset.
Each rule contains one or more matches (something to match messages against) and one or more
actions (applied to a matched message). The process of message processing by a rule is as follows.
First, the message is shown to each match one by one starting with the first. Given a message,
a match examines it and returns either match or no match. If any of the matches doesn't match the
message, the rule processing terminates and server proceeds to the next rule in the ruleset. Then,
if all of the matches did match the message, the rule is said to be activated and all of it's actions
are applied to the message one by one starting with the first. Each action does whatever it feels
necessary and returns the name of the mailbox to put the message to or empty name if it makes no
such decision. As soon as some action returns a non-empty mailbox name, the rule activation is
considered complete and it's further actions are not executed at all. If all actions return empty
mailbox names, the server proceeds to the next rule hoping it can make a decision. If on the other
hand the rule returns a mailbox name, the message is put to that mailbox and processing stops.
There are three kinds of matches that can be used in rules. The simplest kind of match matches any
message, and as such is useful for wildcard rules, that are applied to all messages. The second kind
of match is a regular expression match, it compares specified message header fields values against
specified regular expressions and matches if they do. The third and the most flexible sort of match
is a Python script match. It's really an arbitrary snippet of Python code which is executed in the
context of the message being processed. What it does is up to it's designer, it can do anything,
and it matches as soon as it returns result = "match".
Similarly, there are three kinds of actions. The simplest kind of action does exactly what
actions are for and nothing else - it simply returns the specified name of the mailbox. Second kind
of action is a Python script action, again, it's a piece of Python code which can do anything,
then possibly returning result = "mailbox_name". Finally, the third kind of action is the thing
that allows Green to do all sorts of magic things - it's an action to modify the ruleset at runtime,
i.e. add or remove rules on the fly.
The simpler matches and actions you can examine by the samples. The
most advanced action - the ruleset modifying action - is described here though. Such action has a
child rule in it, the rule remains in a latent state and never has a chance to execute.
Whenever such action is activated, it inserts a copy of the child rule somewhere in the ruleset -
at the top of the ruleset, before the current rule, after the current rule, at the end of the ruleset.
At this moment the child rule is hatched and becomes the real rule - the part of the ruleset.
Note the most important thing - the current rule (containing the action being activated) is already
executed, and before it started executing, all of it's textual parts have been adapted to the current
message - {{header-field-name}} entries have been replaced with appropriate values. Now, what this
really means is that all the subordinate rules of all the actions on and on down to the very bottom
of the subordinate rules tree have been adapted too. Therefore the copy of the subordinate rule being
inserted is a copy adapted to the message being processed, not an ad-hoc copy.
Here is one more advanced issue. Whenever Green reads mail from external POP3 servers, it does it
in two steps. First, only the header of the message is fetched with TOP command. This header is
parsed and passed to the ruleset exactly as described above. Many times the filtering decision
can be made based on the header only, and many times you wouldn't even want to fetch the rest of the
message once you see it's header (ex. worms and such). This makes the first filtering pass. If, upon
a successful filtering, the decision is made to dispatch the mail to a mailbox, it's fetched with
RETR command and just put to the already known target mailbox. If the name of the mailbox returned
from the first filtering pass is "<delete>", the message is physically deleted on the server with
DELE command without fetching. But, there also are many times when filtering can only be carried out
upon a message body. If this is a case, some rule should return a "<contents>" mailbox name
to put the message to. This is another special "mailbox name", and it means - fetch the message body
contents and make a second pass through the ruleset. If "<contents>" is the result of the
first filtering pass, message body is fetched with RETR command, and filtering starts again from
the top of the ruleset. This makes the second filtering pass. By that time the entire message with
header and body physically exists in some local file and it's name is added to the header as a special
header field "x-green-message-filename" so that matches and actions can act upon it. Note that only Python
matches and actions receive that field, because only they are have capabilities of doing anything with
the file.
Here is a few samples of both matches and actions:
Consider this mail message as an example:
Received: from some.fake.name(fakedsl-123-45-67-89.fake.name [123.45.67.89])
by hut.user.com (8.12.10/8.12.10) with SMTP id i8FGu6Wc033206
for <innocent@user.com> Wed, 15 Sep 2004 10:56:10 -0600 (MDT)
Received: from 98.76.54.32 by smtp.fake.name;
Wed, 15 Sep 2004 16:56:27 +0000
From: "Believe Me" <believe.me@great.stuff>
To: innocent@user.com
Subject: Get great stuff for a great price.
We have a new offer for you. Buy cheap stuff through our online store.
- Private online ordering
- World wide shipping
Order your stuff offshore and save over 70%!
Best regards,
Bad Businessman
This message is matched with the following header field value matches (see the GUI control
panel application):
Regular expression match example #1:
Header field value(s) to match: from
Regular expression to match field value against:
.*Bel.*
Notes: header field name can be put in lowercase, upper case or mixed case, no difference. The
regular expression syntax to conform is described here.
Regular expression match match example #2:
Header field value(s) to match: x-green-parsed-from-mailbox
Regular expression to match field value against:
believe.me@great\.stuff
Notes: before the message is passed to the ruleset, it's header is decorated with a number
of extra header fields that simplify processing. All such extra fields begin with "x-green-".
Here is the list of all the extra fields. In this example x-green-parsed-from-mailbox
is used, and it's preset to the mailbox only part of the From address. If bare "From" was used instead,
a message with
From: "Gotcha: believe.me@great.stuff" <fooled@you.com>
would have matched, and that would be a bad thing.
Regular expression match example #3:
Header field value(s) to match: x-green-parsed-from-mailbox
Regular expression to match field value against:
{{x-green-parsed-from-mailbox}}
Notes: before a rule is applied to a message, all the macro entries {{field-name}} anywhere in
the textual parts of the rule are replaced with the values of the corresponding fields of this
particular message. In this case, before a rule is invoked, it's regular expression becomes
"believe.me@great\.stuff". Also note that such a match matches any message as in A={{A}}.
Regular expression match example #4:
Header field value(s) to match: subject;from
Regular expression to match field value against:
.*stuff.*
.*great.*
If there is more than once field occurance, each must match: true
Notes: more than one header field can be specified, as well as more that one regular
expression. The behaviour of such a match then depends on the "each must match" flag -
if it's set, each header field value must match each of the regular expressions...
Regular expression match example #5:
Header field value(s) to match: subject;from
Regular expression to match field value against:
.*Believe.*
If there is more than once field occurance, each must match: false
... and if it's not set - any value may match any regular expression.
Python match example #1:
from random import randint
if randint(0, 1) == 0:
result = "match"
else:
result = ""
Notes: this matches messages at random. Hardly useful, except for load balancing or something.
Python match example #2:
if "received" in headers:
for i in range(headers["received:count"]):
if headers["received:%d" % i].find("trusted.com") >= 0:
result = "match"
break
else:
result = ""
else:
result = ""
Notes: this is the "right" way to enumerate multiple values in the header.
Python match example #3:
assert headers.get("from", "") == "{{from}}"
result = ""
Notes: both ways of accessing the header can be used, although headers["name"] is highly preferred.
Always return "match" or empty string, if the result remains None, you get "None" mailbox.
Python action example #1:
if headers["subject"].startswith("Spam"):
result = "spam"
else:
result = "" # who knows
Notes: Actions return mailbox name. An empty mailbox name means "no decision".
Python action example #2:
if headers.has_key("x-green-message-filename"):
if open(headers["x-green-message-filename"]).read().find("VIRUS") >= 0:
result = "quarantine"
else:
result = "<contents>"
Notes: "x-green-message-filename" is yet another special header field which is set to
the full file name of the message being processed. You can create your own mailboxes at
will. <contents> is a special "mailbox name", a message being put to that mailbox
is fetched from the originating server for filtering in full, not just it's headers.
Use this when you need to filter based on the message body.
Python action example #3:
from shutil import copy
if headers.has_key("x-green-message-filename"):
copy(headers["x-green-message-filename"], headers["x-green-user-copy-to-this-directory"])
result = "<delete>"
else:
result = ""
Notes: <delete> is a special "mailbox name", a message being put to that mailbox is
silently deleted. Header field values that begin with "x-green-user-" are user-specific and
are configured on the user configuration page in the GUI manager.
Python action example #4:
db_connection = shared_state["db_connection"]
db_connection.execute("insert into messages (id) values(\"%s\")" % headers["x-green-message-id"])
Notes: x-green-message-id is a globally unique value that can be used for identifying messages.
By a strange coincidence it's also the name of the file the filtered message is being put to
at the end of the game. shared_state is the global dictionary available to all the scripts, and
it's contents is preserved across different scripts invocations. shared_state keeps it's state
until the server is stopped and is useful for storing stuff for later use. Other that shared_state,
all the execution context is cleared between scripts invocations.
Here is the list of all the special fields added to the message header
before passing to the ruleset:
x-green-username:
The name of the current message recipient user.
x-green-quoted-header:
A list of "> "-prefixed lines of the message header. Useful for replies and quoting.
x-green-expires-...:
A string with ISO date which is "..." in the future. See the GUI control panel for the
possible values of "...".
x-green-message-size:
An estimated size of the message in bytes. It is available on the first filtering pass
too, even though message body hasn't been fetched yet. The output from the "LIST" POP3
command is used for this estimate.
x-green-random-short, x-green-random-long:
Strings with random numbers, short is in range 0-999 and long is in range 0-4294967295.
x-green-message-id:
Globally unique message identifier. Contains high precision time, message header hash
and random component extracted from a GUID. Useful for tracking messages between rule
invocations, state saving and such. It also equals to the name of the file where the
message is finally put after filtering.
x-green-message-filename:
This is only passed to the Python matches/actions on the second pass filtering and
it contains a full name of the temporary file the current message is stored in. The
file contains the copy of the message including headers and everything. You should not
delete or rename the file.
x-green-training-mailbox:
This is only set on a manual training pass and contains the name of the mailbox the
user has selected as the "true" message destination mailbox. Trainable rules are the
only ones to care.
x-green-parsed-...:
Parsed mailbox list. "..." is any other header field name, but only a few can actually
be parsed - "from", "to", "cc", "sender" and "reply-to". For these, this parsed value
contains strings like "mailbox-list(mailbox(local-part(xljxuygdkhu)domain(yahoo.com)))",
for the rest it contains just a copy of the original field and can be ignored.
x-green-parsed-...-mailbox:
Similar to x-green-parsed-... but takes parsing one step further i.e. for the parseable
fields "from", "to", "cc", "sender" and "reply-to" it contains valid mail addresses,
ex. xljxuygdkhu@yahoo.com.
x-green-user-...:
Such strings contain arbitrary user-specific configuration parameters that can be
edited on the user configuration page and reside in user.xml. Ex: x-green-user-canonical-mailbox
contains the name of the user known to the SMTP server which can be used for sending mail.
x-green-...:
All the server configuration options that can be edited on the options configuration
page and reside in config.xml get passed as x-green-... strings. Ex: x-green-smtp-address
is the global setting for the outgoing SMTP server.
Performance of such a complex system is always difficult to measure, and with Green
it certainly depends on the particular ruleset. The results of load testing should
give at least some idea.
The load tests were run on Pentium 4, 3.2 GHz, 1G RAM, 2 striped IDE HDD, under
Windows 2003, with certain NTFS performance tweaks applied.
The load test config had 500 users having 50000 different real life mail messages in their mailboxes
in total, total size of 350M, all having the same short static ruleset containing a few regular
expressions and a few Python scripts, with all activity for all users repeating periodically in 5
minutes - each user logs in randomly once in 5 minutes, reads mail, filters it and puts back to some
other user's mailbox, so that it keeps looping forever. If my math is right, such a test matches a load
of a real life mail server servicing at least 10 times as many users.
The server quickly came to congestion, it predictably ate all the CPU it could and about 400M of memory.
Then disk I/O became the bottleneck, the server was spending about 30% of it's CPU time in the kernel.
This test was even more likely to be I/O bound for the particular ruleset. As noted before it's
essentially empty, with a regular expression match here and there and a few Python scripts several
lines long, i.e. no significant CPU processing per message.
Anyhow, the server was filtering messages at a sustained rate of slightly over 100 messages per
second, and it kept running like that for days, filtering millions of messages.
Version 1.4, Sept 13, 2005:
Has GUI fixed. Fixed an issue with outgoing client POP3 connections again, this time
it's about having an external POP3 server down, the POP3 client code kept retrying again
and again. Now it's one attempt per configured timeout.
Version 1.3, Sept 12, 2005:
Fixed an issue with multiple outgoing client POP3 connections initiated at the same
time to the same server/account (minor threading issue). This build has management
console GUI completely broken :(
Version 1.2, Sept 10, 2005:
Minor GUI fixes concerning management console with large fonts.
Version 1.1, Aug 10, 2005:
Default ruleset is no longer empty/passthrough, now it contains rules that implement
blacklisting + bayesian + challenge-response filtering.
Version 1.0, Jul 31, 2005:
Initial release (the 2003 early prototype release doesn't count).
Green is developed and written by Dmitry Dvoinikov <dmitry@targeted.org>
(c) 2003-2005 Dmitry Dvoinikov
http://www.targeted.org/
Portions of this software:
Python (c) 1991-1995 Stichting Mathematisch Centrum, CNRI
LibXML library (c) 1998-2003 Daniel Veillard
Boost C++ libraries (c) Contributors
Boost Regex library (c) 1998-2003 Dr. John Maddock
SSLeay suite (c) 1995-1998 Eric Young
See LICENSE for more information
Use this forum
to talk about Green, or you can mail the author at dmitry@targeted.org.