Story Data
Nov 04, 2006
Author: Admin
Topic: SourceCode
print
Forward
4876 reads
Integrating SpamAssassin into hMailServer
This article explains how I integrated SpamAssassin into a very busy email, large scale email environment. If you are serious about filtering email without blocking legitimate email, then you should use SpamAsssassin. SpamAssassin scores email based on a variety of different methods. SpamAssassin considers the results of realtime blocking lists, DCC, SPF, and many content scans. Additionally, each user can set his or her own threshold.
The biggest obstacle to using SpamAssassin is performance. It is not obvious as to how to get enough performance out of SpamAssassin to utilize it in a large scale effort. When most admins first try using SpamAssassin, they install it onto the email server and then set up a script to launch SpamAssasin once for each email. This accomplishes the scoring, but crushes the performance of the email server because there is a lot of overhead when starting Perl and SpamAssassin.
The next thing people try is to run SpamAssassin as a daemon. This is where you hear the term SpamD. SpamD loads once and then stays running. Then you can use a much smaller program called SpamC to communicate with SpamD. SpamC is a client program that was written in C. It is a reasonably efficient way to send a text file to SpamD and receive a text file back from SpamD. This method is far better than launching SpamAssassin once per email.
Once an admin gets SpamD up and running, the next obvious thing to try is running SpamD on a separate server from the email server. This reduces the load on the email server. With SpamD successfully running on a separate server, you can launch SpamC on the email server and have it talk to SpamD. This is an excellent solution, because SpamC launches and execute quickly. The CPU and memory intense scanning takes place in SpamD, and that does not drag down the performance of the email server once you place SpamD on a separate box.
Many admins will be content to stop here. For most operations, this will offer enough performance. But I suggest we take things a step further. There are two things we can do to improve performance and scalability even more. The first thing we can do is set up a cluster of FreeBSD boxes running SpamD. We can use an open source product like pfSense to handle the load balancing duties. This makes it feasible to have many inexpensive SpamD boxes running.
The other thing we can do is integrate the SpamC code into the email server so no external processes need to be launched. Even though SpamC is lightweight and efficient code, there is still a performance penalty from launching a process every time an email is received. I suggest using a COM object to handle the usual SpamC duties in process. Then have the email server call the COM object instead of SpamC when parsing email. This avoids the need to launch a process for each email. The net result is far better performance and drastically more scalability. Feel free to download my SpamC COM object and my Event Handler for hMailServer.
[end of article]