Feb 10

Rapidly Growing SpamAssassin Bayes Tokens File (bayes_toks)

On a mail filter I maintain, there is a site-wide bayes database that is periodically trained by hand. It sits quietly and doesn’t change much over time. That is, until the Bayes database was moved to a new server. The SpamAssassin configuration was identical between the old system and the new system, there was just one problem: On the new server, the bayes_toks file was rapidly growing until it was quite large. Huge, in fact. Its size was expanding by several gigabytes per hour.

I checked all the usual things: auto learning was off, auto expiration was off, the permissions and user were set correctly, and so on. And yet it grew, constantly and swiftly.

After hours of searching and not finding anything, and various methods of tinkering, I found the answer. I backed up and restored the bayes database like so:

sa-learn --backup > bayes_backup.txt
sa-learn --restore bayes_backup.txt

After that, the toks file was once again left alone and didn’t grow. I suspect the problem was due to moving from a 32-bit platform to a 64-bit platform but that’s just speculation really, or it could be some other difference in the perl versions and libraries on the two servers.

In case you couldn’t tell, I was trying to use a bunch of different ways to word this problem, going off of the various Google searches I did trying to track it down. Hopefully others will hit this post in the future and it will save them some time. :-)