-
Notifications
You must be signed in to change notification settings - Fork 4
/
README
2163 lines (1646 loc) · 96.3 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
DSPAM v3.10.2
COPYRIGHT (C) 2002-2012 DSPAM Project
http://dspam.sourceforge.net/
LICENSE
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
CREDITS
Original Work By
Lead development till 3.8.0: Jonathan A. Zdziarski <[email protected]>
Lead development after 3.8.0: Stevan Bajic <[email protected]>
PostgreSQL driver: Rustam Aliyev <[email protected]>
External Lookup module: Hugo Monteiro <[email protected]>
Various:
Feb/2006 Cove Schneider <[email protected]>
Jan/2006 Norman Maurer <[email protected]>
Your name is missing? Let us know with a reference to your commit, and we'll
add you to the list.
COPYRIGHT
As of 12 January 2009 the copyright is owned by the DSPAM Project, represented
by a team of people, including:
Alexander Prinsier
Dov Zamir
Hugo Monteiro
Ion-Mihai Tetcu
Paul Cockings
Stevan Bajic
TABLE OF CONTENTS
General DSPAM Information
1.0 About DSPAM
1.1 Installation and Configuration
1.2 Testing
1.3 Troubleshooting
1.4 DSPAM Tools
1.5 Agent Commandline Arguments
Advanced DSPAM functionality
2.0 Linking with libdspam
2.1 Configuring groups
2.2 External Inoculation Theory
2.3 Client/Server Mode
2.4 LMTP
2.5 DSPAM User Preferences
2.6 Fallback Domains
2.7 External User Lookup
Miscellaneous
3.0 Bugs, Feature Requests
3.1 Ports / Packages
3.2 GIT Access
1.0 ABOUT DSPAM
DSPAM is an open-source, freely available anti-spam solution designed to combat
unsolicited commercial email using advanced statistical analysis. In short,
DSPAM filters spam by learning what spam is and isn't. It does this by learning
each user's individual mail behavior. This allows DSPAM to provide
highly-accurate, personalized filtering for each user on even a large system
and provides an administratively maintenance free solution capable of learning
each user's email behaviors with very few false positives.
While DSPAM is focused around spam filtering, many have found alternative
uses for all types of two-concept document classification.
DSPAM is rapidly gaining a large support forum and being used in many large-
scale implementations. Contributions to the project are welcome via the
dspam-dev mailing list or in the form of financial contributions.
Many of the foundational principles incorporated into this software were
contributed by Paul Graham's white paper on combatting spam, which can be
found at http://paulgraham.com/spam.html. Much research and development has
resulted in many new approaches being added onto the DPSAM project as well,
some of which are explained in white papers on the DSPAM home page.
DSPAM can be implemented as a total solution, or as a library which developers
may link their projects to the dspam core engine (libdspam) in accordance with
the GPL license agreement. This enables developers to incorporate libdspam as
a "drop-in" for instant spam filtering within their applications - such as mail
clients, other anti-spam tools, and so on.
PLEASE NOTE: DSPAM and libdspam are distributed under the AGPL license, not the
LGPL. Commercial licensing is available for those who seek to redistribute
DSPAM or some of DSPAM's components/libraries in their non-GPL products.
Please contact us for more information about commercial licensing.
The DSPAM package is split up into the following pieces:
DSPAM AGENT
The DSPAM agent is the command center for all shell and daemon operations.
If you're using DSPAM as a filtering solution, this is the 'dspam' (or dspamc)
binary you're likely going to be talking to via commandline.
LIBDSPAM: CORE ENGINE
The DSPAM core processing engine, also known as libdspam, provides all critical
spam filtering functions. The engine is embedded into other dspam components
(such as the agent) and is responsbile for the actual filtering logic.
If you're not a developer, you don't need to be concerned with this component
as it is automatically compiled in with the build.
WEB UI
The Web UI (User Interface) is designed to allow end-users to review their
spam quarantine and history, graphs, and to delete their spam permanently.
They can also optionally use the quarantine to perform all of their training.
The UI also includes some basic administrative tools to change settings and
manage user quarantines.
TOOLS
Some basic tools which have been provided to manage dictionaries, automate
corpus feeding, and perform other diagnostic operations related to DSPAM.
Some of these include dspam_train, dspam_stats, and dspam_dump.
HISTORY OF COPYRIGHT
Original work was done by Jonathan A. Zdziarski.
In 2006 the copyright was handed over to Sensory Networks.
In 2009 Sensory Networks handed over the full copyright to the DSPAM Project,
represented by a team of people, including:
Alexander Prinsier
Dov Zamir
Hugo Monteiro
Ion-Mihai Tetcu
Paul Cockings
Stevan Bajic
1.1 INSTALLATION
IMPLEMENTATION OPTIONS
There are many different ways to deploy DSPAM onto an existing network. The
most popular approaches are:
1. As a delivery agent proxy
When your mail server gets ready to deliver mail to a user's mailbox it calls
a delivery agent of some sort. On most UNIX systems, this is procmail, maildrop,
mail.local, or a similar tool. When used as a delivery proxy, the DSPAM agent
is called in place of your existing agent - or better put, it can masquerade
as the local delivery agent. DSPAM then processes the message and will call
the /real/ delivery agent to pass the good mail into the user's mailbox,
quarantining the bad mail. DSPAM can optionally tag and deliver both spam
and legitimate mail.
In the diagram below, MTA refers to Mail Transfer Agent, or your mail server
software: Postfix, Sendmail, Exim, etc. LDA refers to the Local Delivery
Agent: Procmail, Maildrop, etc..
BEFORE:
[MTA] ---> [LDA] ---> (User's Mailbox)
AFTER:
[MTA] ---> [DSPAM] ---> [LDA] ---> (User's Mailbox)
\
\--> [Quarantine]
[End User] ------> [Web UI]
2. As a POP3 Proxy
If you don't want to tinker with your existing mail server setup, DSPAM can
be combined with one of a few open source programs designed to act as a POP3
proxy. This means spam is filtered whenever the user checks their mail,
rather than when it is delivered. The benefit to this is that you can set up
a small machine on your network that will connect to your existing mail server,
so no integration is needed. It also allows your users to arbitarily point their
mail client at it if they desire filtering. The drawback to this approach is
that the POP3 protocol has no way to tell the mail client that a message is
spam, and so the user will have to download the spam (tagged, of course).
BEFORE:
[End User] ---> [POP3 Server]
AFTER:
[End User] ---> [POP3 Proxy] <--> [DSPAM]
\
\--> [POP3 Server]
3. As an SMTP Relay
Newer versions of DSPAM have seen features that allow it to function more
easily as an SMTP relay. An SMTP relay sits in front of your existing mail
server (requiring no integration). To use an SMTP relay, the MX records for
your domains are repointed to the relay machine running DSPAM. DSPAM then
relays the good (and optionally bad) mail to the existing SMTP server. This
allows you to use DSPAM with even a Windows-based destination mail server
as no integration is necessary. See doc/relay.txt for one example of how to
do this with Postfix.
BEFORE:
{ Internet } ---> [Company Mail Server]
AFTER:
{ Internet } ---> [ Inbound SMTP Relay ] ---> [Company Mail Server]
( MTA <> DSPAM ) SMTP
\ or
\--> [Quarantine] LMTP
[End User] ------> [Web UI]
UPGRADING DSPAM
Please see the file UPGRADING
FRESH INSTALLATION
0. PREREQUISITES
DSPAM can use one of many different backends to store its information, and
you will need to decide on one and install the appropriate software before
you can build DSPAM. The following storage backends are presently available:
Driver Requirements
-------------------------------------------------------------------------
T mysql_drv: MySQL client libraries (and a server to connect to)
T pgsql_drv: PostgreSQL client libraries (and a server to connect to)
sqlite_drv: SQLite v2.7.7 or above (scheduled for removal)
sqlite3_drv: SQLite v3.x
*T hash_drv: None (Self-Contained Hash-Based Driver)
Legend:
* Default storage driver
T Thread-safe (Required for running DSPAM in server daemon mode)
In general, MySQL is one of the faster solutions with a smaller storage
footprint and is well suited for both small and large-scale implementations.
The hash driver (inspired by Bill Yerazunis' CRM Sparse Spectra algorithm)
is the fastest solution by far and requires no dependencies. It supports
an auto-extend feature to grow the file size as needed and is very
fast and compact. It does however lack some features (such as merged
groups support) and uses a lot of memory to mmap() users.
Also note that a database created with the hash driver is currently not safe
to move between 32/64 bit systems or big/little endian systems.
Documentation for any additional setup of your selected storage driver can
be found in the doc/ directory. You'll need to follow any steps outlined in
the storage driver documentation before continuing.
You can download MySQL from http://www.mysql.com.
You can download PostgreSQL from http://www.postgresql.com.
You can download SQLite from http://www.sqlite.org.
1. CONFIGURATION
DSPAM uses autoconf, so configuration is fairly standardised with other
UNIX-based software:
./configure [options]
DSPAM supports the configuration options below. Generally, the default
configuration is more than acceptable, so it's a good idea not to tweak too
many settings unless you know what you are doing.
PATH SWITCHES
--prefix=DIR
Specify an alternative root prefix for installation. The default is
/usr/local. This does not affect the location of dspam.conf (which
defaults to /etc). Use --sysconfdir= for this.
--sysconfdir=DIR
Specify an alternative home for the dspam.conf file. The default is /etc.
--with-dspam-home=DIR
Specify an alternative DSPAM home for installation. This can alternatively
be changed in dspam.conf, but is convenient to do on the configure line.
The default is $prefix/var/dspam, or /usr/local/var/dspam.
--with-logdir=DIR
Specify an alternative log directory. The default is $dspam_home/log. Do
not set this to /var/log unless DSPAM will have permissions to write to
the directory.
FILESYSTEM SCALE
The default filesystem scale is "small-scale", and writes each user to
its own directory in the top-level DSPAM home data directory.
The following two switches allow the scale to be changed to be more
suitable for larger installations.
--enable-large-scale
Switch for large-scale implementation. User data will be stored as
$HOME/data/u/s/user instead of $HOME/data/user
--enable-domain-scale
Switch for domain-scale implementation. When used, DSPAM expects
username@domain to be passed in as the user id and user data will be
stored as $HOME/data/example.org/user and $HOME/opt-in/example.org/user.dspam
instead of $HOME/data/user
INTEGRATION SWITCHES
--with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]
Specify your storage driver selection(s). A storage driver is a driver
written specifically for DSPAM to store tokens, signature data, and
perform other proprietary operations. The default driver is hash_drv.
The following drivers have been provided:
mysql_drv: MySQL Drivers
pgsql_drv: PostgreSQL Drivers
sqlite_drv: SQLite v2.x Drivers (scheduled for removal)
sqlite3_drv: SQLite v3.x Drivers
hash_drv: Self-Contained Hash Database
If you are a packager, or wish to have multiple drivers built for any
reason you may specify multiple drivers by separating them with commas.
This will cause the storage driver specified in dspam.conf to be
dynamically loaded at runtime rather than statically linked. If you wish
to build only one driver, but dynamically, then specify it twice as in
--with-storage-driver=mysql_drv,mysql_drv.
If you will be compiling DSPAM to operate as a server daemon or to deliver
via SMTP/LMTP, you will need to use a thread-safe driver (outlined in the
chart earlier in this document).
You may also need to use some of the driver-specific configure flags
(discussed in the DRIVER SPECIFIC CONFIGURATION OPTIONS section below).
--disable-trusted-user-security
Administrators who wish to disable trusted user security may do so by
using this configure flag. This will cause DSPAM to treat each user as
if they were "trusted" which could allow them to potentially execute
arbitrary commands on the server via DSPAM. Because of this, administrators
should only use this option on either a closed server, or configure their
DSPAM binary to be executable only by users who can be trusted. This
option SHOULD NOT be used as a solution to your MTA dropping privileges
prior to calling DSPAM. Instead, see the TRUSTED SECURITY section of this
document.
--enable-homedir
When enabled, instead of checking for $HOME/$USER/opt-in/
$USER[.dspam|.nodspam], DSPAM will check for a .dspam|.nodspam file in the
user's home directory. DSPAM will also store each user's data in ~/.dspam
when this option is enabled. Because of this, DSPAM will automatically
install and run setuid root so that it can read each user's home directory.
Note:
This function is incompatible with most implementations of the Web UI,
since it requires access to read each user's home directory. Therefore,
only use this option if you will not be using the Web UI or plan on
doing something asinine like running it as root.
--enable-daemon
Builds DSPAM with support for daemon mode, and builds associated dspamc
thin client. Pthreads is required to build for daemon mode and the
storage driver used must be thread-safe.
DRIVER SPECIFIC CONFIGURE SWITCHES
Some storage drivers have their own custom configuration switches:
mysql_drv:
--with-mysql-includes=DIR
Specify a path to the MySQL includes
--with-mysql-libraries=DIR
Specify a path to the MySQL libraries
(Currently links to -lmysqlclient, also -lcrypto on some systems)
--enable-virtual-users
Tells DSPAM to create virtual user ids. Use this if your users don't
actually exist on the system (e.g. in /etc/passwd if using a password
file)
--enable-preferences-extension
MySQL supports the preferences extension, which stores user preferences
in mysql instead of flat files (the built-in method)
--disable-mysql4-initialization
If you are compiling libdspam for use with a third party application,
and the third party application makes its own calls to libmysqlclient,
you should use this option to disable libdspam's initialization and
cleanup of libmysqlclient, and allow the application to manage this.
This option suppresses libdspam's calls to mysql_server_init and
mysql_server_end.
Note:
Please see the file doc/mysql_drv.txt for more information
about configuring the mysql_drv storage driver.
pgsql_drv:
--with-pgsql-includes=DIR
Specify a path to the PgSQL includes
--with-pgsql-libraries=DIR
Specify a path to the PgSQL libraries
(Currently links to -lpq, and netlibs on some systems)
--enable-virtual-users
Tells DSPAM to create virtual user ids. Use this if your users don't
actually exist on the system (e.g. in /etc/passwd if using a password
file)
--enable-preferences-extension
Postgres supports the preferences extension, which stores user
preferences in pgsql instead of flat files (the built-in method)
Note:
Please see the file doc/pgsql_drv.txt for more information about
configuring the pgsql_drv storage driver.
sqlite_drv:
sqlite3_drv:
--with-sqlite-includes=DIR
Specify a path to the SQLite includes
--with-sqlite-libraries=DIR
Specify a path to the SQLite libraries
DEBUGGING SWITCHES
--enable-debug
Turns on support for debugging output. This option allows you to turn on
debugging messages for all or some users by editing dspam.conf or setting
--debug on the commandline. Enabling debug in configure only adds support
for debug to be compiled in, it must still be activated using one of the
options prescribed above. Debugging support itself doesn't use up very
many additional resources, so it should be safe to leave enabled on
non-enterprise class systems.
--enable-verbose-debug
Turns on extremely verbose debugging output. --enable-debug is implied.
Never use this on production builds!
Note:
When verbose debug is compiled in, DSPAM performs many additional
mathematical calculations regardless of whether or not it's been
activated. You shouldn't use --enable-verbose-debug for production
builds unless you have serious issues you can't resolve.
FEATURE ACTIVATION
--enable-clamav
Enables support for Clam Antivirus. DSPAM can interface directly with
clamd to perform virus scanning and can be configured to react in
different ways to viruses. See dspam.conf for more information.
ADDITIONAL CONFIGURATION OPTIONS
The remainder of configuration options are located in dspam.conf, which
is installed in sysconfdir (default: /usr/local/etc) upon a make install.
It is generally a good idea to review dspam.conf and make any changes
necessary prior to using DSPAM.
2. BUILDING AND INSTALLING
After you have run configure with the correct options, build and install
DSPAM by performing:
make && make install
Note:
If you are a developer wanting to link to the core engine of dspam,
libdspam will be built during this process. Please see the
example.c file for examples of how to link to and use libdspam. Static
and dynamic libraries are built in the .libs directory. Needed headers
will be installed in $prefix$/include/dspam.
3. PERMISSIONS
In the typical UNIX environment, you'll need to worry about the following
permissions:
The CGI User: This is the user your web server (most likely Apache) is
running as. This is commonly 'nobody' or 'web'. You can find this in
Apache's httpd.conf by searching for 'User'. The CGI user will need
the ability to access the following components of DSPAM:
- Ability to execute the dspam binary
- Ability to read and write to dspam_home/data/
- Trusted user permissions in dspam.conf ("Trust [username]")
- The execution 'Group' used must match the group dspam is running as
(this is typically 'mail', 'dspam', or similar)
The MTA User: This is the user your mail server software is running as when
it executes DSPAM. This is usually daemon, mail, exim, etc. This is
typically different from the user the MTA runs and polices itself as, to
avoid security problems. Consult your MTA's documentation for more info.
The MTA user will require:
- The ability to execute the dspam binary
- Trusted user permissions in dspam.conf ("Trust [username]")
Systems Administrators: In order to perform administrative functions,
systems administratiors will require:
- The ability to execute dspam-related binaries
- Trusted user permissions in dspam.conf ("Trust [username]")
Note:
If the MTA is communicating with DSPAM via LMTP (explained later), then
execution permissions are not necessary
Note about FreeBSD:
FreeBSD's default MTA user is 'mailnull'
FreeBSD's default delivery agent also changes its uid, and so in order
to call it, dspam must be installed as setuid root to work on the
commandline properly. This is done automatically on install.
Understanding Trusted User Security
DSPAM has tighter security for untrusted users on the system to prevent
them from touching other user's data or passing arbitrary commands to the
delivery agent DSPAM calls. "Trusted User Security" is a simple system
whereby any unsafe functions are not available to a user calling dspam
unless they are within dspam.conf's trusted user list.
Local non-privileged users should be able to use DSPAM without any problems
while remaining untrusted, as long as they behave. For example, an untrusted
user cannot set their DSPAM username to any name other than their username.
Untrusted users are also limited to the delivery options set by the
system administrator, and cannot redirect how DSPAM delivers mail.
A list of trusted users is maintained in dspam.conf. This file should
include a list of trusted users who should be allowed to set the dspam user,
passthru parameters, and other information that would be potentially
dangerous for a malicious user to be able to set. You'll need to ensure
that your CGI user, MTA user, and system administrators are on the list.
4. MAIL SERVER INTEGRATION
As previously mentioned, there are three popular ways to implement DSPAM:
As a delivery proxy:
The default approach integrates DSPAM directly with the mail server and
filters spam as mail comes in. Please see the appropriate instructions
in doc/ pertaining to your MTA.
As a POP3 proxy:
This alternative approach implements a POP3 proxy where users
connect to the proxy to check their email, and email is filtered when
being downloaded. The POP3 proxy is a much easier approach, as it
requires much less integration work with the mail server (and is ideal
for implementing DSPAM on Exchange, etcetera). Please see the file
doc/pop3filter.txt.
As an SMTP Relay:
DSPAM can be configured as an SMTP relay, a.k.a appliance. You
can set it up to sit in front of your real mail server and then point
your MX records at it. DSPAM will then pass along the good mail to
your real SMTP server. See doc/relay.txt for more information. The
example provided uses Postfix and MySQL.
Trusted users and the MTA
If you are using an MTA that changes its userid to match the destination
user before calling DSPAM, you won't be able to provide pass-thru
arguments to DSPAM (these are the commandline arguments that DSPAM in turn
passed to the local delivery agent, in such a configuration).
You will need to pre-configure the "default" pass-thru arguments in DSPAM.
This can be done by declaring an untrusted delivery agent in dspam.conf.
When DSPAM is called by an untrusted user, it will automatically force their
DSPAM user id and passthru delivery agent arguments specified in dspam.conf.
This information will override any passthru commandline parameters
specified by the user. For example:
UntrustedDeliveryAgent "/bin/mail -d $u"
The variable $u informs DSPAM that you would like the destination username
to be used in the position $u is specified, so when DSPAM calls your LDA
for user 'bob', it will call it with:
/bin/mail -d bob
5. ALIASES
There are essentially two different ways a user might train DSPAM. The first
is by using the Web UI, which allows them to retrain via the "History"
tab. This works quite well, as users must visit the Web UI occasionally
to review their quarantine anyway (and reverse any false positives). We'll
discuss this shortly in section 1.1.8.
The more common approach to training, discussed here, is to allow users to
simply forward their spam to an email address where DSPAM can analyze and
learn it. DSPAM uses a signature-based system, where a serial number of
sorts is appended to each email processed by DSPAM. DSPAM reads this serial
number when the user forwards (or bounced) a message to what is called their
"spam email address". The serial number points to temporary information
stored on the server (for 14 days by default) containing all of the
information necessary for DSPAM to relearn the message. This is necessary
in order to relearn the *exact* message DSPAM originally processed.
Note:
If you are using an IMAP based system, Web-based email, or other form of
email management where the original messages are stored on the server in
pristine format, you can turn this signature feature off by setting
"TrainPristine on" in dspam.conf. DSPAM will then use the message itself
that you provide it to train, which MUST be identical to the original
message in order to retrain properly.
Because DSPAM learns each user's specific email behavior, it's necessary
to identify the user in order to program their specific filtering database.
This can be done in one of three ways:
The Simple Way:
If you are using the MySQL or PgSQL storage drivers, the original
numeric user id can be embedded in the signature, requiring only one
central spam alias to be necessary for the entire system. To configure
this, uncomment the appropriate UIDInSignature option in dspam.conf:
# MySQLUIDInSignature on
# PgSQLUIDInSignature on
Now all you'll need is a single system-wide alias, and DSPAM will train
the appropriate user when it sees the signature. An example of an alias
might look like:
spam:"|/usr/local/bin/dspam --user root --class=spam --source=error"
Similarly, you may also wish to have a false-positive alias for users who
prefer to tag spam rather than quarantine it:
notspam:"|/usr/local/bin/dspam --user root --class=innocent --source=error"
Note:
The 'root' user represents any active dspam user. It is necessary to
supply a username on the commandline or DSPAM will bail on
an error, however the user will be changed internally once the signature
is read.
The Kind-of-Simple Way:
If you're not using one of the above storage drivers, the next easiest
way to configure aliases is to have DSPAM parse the 'To:' header of the
message and use a catch-all subdomain to direct all mail into DSPAM for
retraining. You can then instruct your users to email addresses like
'[email protected]'. The ParseToHeaders option (available
in dspam.conf) will parse the To: header of forwarded messages and
set the username to either 'bob' or '[email protected]', depending
on how it is configured. DSPAM can also set the training mode to either
"learn spam" or "learn notspam" depending on whether the user specified
a spam- or notspam- address in the To: header.
This is ideal if you don't want to set up a separate alias for each user
on your system (The Hard Way). If you're fortunate enough to have a
mail server that can perform regular expression matching, you can set up
your system without a subdomain, and just use addresses like
[email protected]. For the rest of us, it will be necessary to set up
a subdomain catch-all directly into DSPAM. For example:
@relearn.example.org "|/usr/local/bin/dspam"
Don't forget to set the appropriate ParseToHeaders and related options in
dspam.conf as well. More specific instructions can be found in dspam.conf
itself. In most cases, the following will suffice:
ParseToHeaders on
ChangeUserOnParse user
ChangeModeOnParse on
The Old Way (A.K.A. The Hard Way)
If neither of the easy ways are possible, you're stuck with doing it
the hard way. This means you'll need a separate spam alias (and notspam
alias, if users are tagging mail) for each user. To do this, you will
need to create an email address for each user, so that DSPAM can
analyze and learn for that specific user. For example:
spam-bob: "|/usr/local/bin/dspam --user bob --class=spam --source=error"
You will end up having one alias per mail user on the system, two if you
do not use DSPAM's CGI quarantine (an additional one using notspam-). Be
sure the aliases are unique and each username matches the name after the
--user flag. A tool has been provided called dspam_genaliases. This tool
will read the /etc/passwd file and write out a dspam aliases file that can
be included in your master aliases table.
To report spam, the user should be instructed to forward each spam to
spam-user@yourhost
It doesn't really matter what you name these aliases, so long as the flags
being passed to dspam are correct for each user. It might be a good idea
to create an alias custom to your network, so that spammers don't forward
spam into it. For example, notspam-yourcompany-bob or something.
Note About Security:
You might be wondering if a user can forward a spam to another user's
address, or whether a spammer can forward a spam to another user's
notspam address. The answer is "no". The key to all mail-based retraining
is the signature embedded in each email. The signature is stored with
each user's own user id, and so not only does the incoming message have
to bear a valid signature, but it also has to be stored on the system with
the correct user id. This prevents any kind of alias abuse.
6. NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS
Non-SQL Based Nightly Purge
If you are NOT running a SQL-based solution, then you should configure
dspam_clean to run under cron nightly. This clean tool will read all
signature databases and purge signatures that are older than 14 days
(configurable), purge abandoned tokens, and remove unimportant tokens.
Without this tool, old signatures will continue to pile up.
Be sure the user running cleanup has full read/write permissions on the
DSPAM data files.
0 0 * * * /usr/local/bin/dspam_clean [options]
See the dspam_clean description for more information
SQL-Based Nightly Purge
SQL-Based solutions include a nightly SQL script to perform the same basic
tasks as dspam_clean, and it does it much faster and with more finesse.
You can find instructions about each driver's purge functions in
the driver's README (doc/[driver].txt) for performing nightly
maintenance. Most SQL drivers will include a purge script in the
src/tools.[driver] directory. For example:
0 0 * * * mysql --user=[user] --pass=[pass] [db] < /path/to/purge-4.1.sql
Log Rotation
The system log and user logs can fill up fairly quickly, when all that's
really needed to generate graphs are the last two to three weeks of data.
You can configure a nightly log cleanup using dspam_logrotate:
0 0 * * * dspam_logrotate -a 30 -d /usr/local/var/dspam/data
7. NOTIFICATIONS
DSPAM is capable of sending three different notifications to users:
- A "First Run" message sent to each user when they receive their first
message through DSPAM.
- A "First Spam" message sent to each user when they receive their first
spam
- A "Quarantine Full" message sent to each user when their quarantine box
is > 2MB in size (note: the 2MB limit is hardcoded in DSPAM).
These notifications can be activated by copying the txt/ directory from the
distribution into DSPAM's home (by default /usr/local/var/dspam). You can
alter the location of this directory by setting "TxtDirectory" in dspam.conf.
Example:
/usr/local/var/dspam/txt/firstrun.txt
/usr/local/var/dspam/txt/firstspam.txt
/usr/local/var/dspam/txt/quarantinefull.txt
You will want to modify these templates prior to installing them to reflect the
correct email addresses and URLs (look for 'example.org').
NOTE: The quarantine warning is reset when the user clicks 'Delete All', but
is not reset if they use "Delete Selected". If the user doesn't wish to
receive reminders, they should use the "Delete Selected" function instead
of "Delete All".
You'll need to also set "Notifications" to "on" in dspam.conf.
8. THE WEB UI
The Web UI (CGI client) can be run from any executable location on
a web server, and detects its user's identity from the REMOTE_USER
environment variable. This means you'll need to use HTTP password
authentication to access the CGI (Any type of authentication will work,
so long as Apache supports the module). This is also convenient in that you
can set up authentication using almost any existing system you have.
The only catch is that you'll need the usernames to match the actual
DSPAM usernames used the system. A copy of the shadow password file
will suffice for most common installs.
The accompanying files in the webui/ folder should be copied into your
document root and cgi-bin, as specified.
Note:
Some authentication mechanisms are case insensitive and will
authenticate the user regardless of the case they type it in. DSPAM,
on the other hand, is case sensitive and the case of the username used
will need to match the case on the system. If you suffer from this
authentication problem, and are certain all of your users' usernames are
in lowercase, you can add the following line of code to the CGI right
after the call to &ReadParse...
$ENV{'REMOTE_USER'} = lc($ENV{'REMOTE_USER'});
The CGI will need to function in the same group as the dspam agent in order
to work with the files in dspam_home. The best way to do this is to create
a separate virtualhost specifically for the CGI and assign it to run in the
MTA group using Apache's suexec. If you are using procmail, additional
configuration may also be necessary (see below).
Note:
Apache users do NOT take on the identity of the groups specified in
/etc/group so you will need to specifically assign the group in
httpd.conf.
Note about Procmail:
Because the DSPAM Web UI is a CGI script, DSPAM will not retain its
setuid privileges when called. If you are running procmail, this will
become a problem as procmail requires root privileges to deliver. The
easiest hack around this is to create a procmail.dspam binary and make it
setuid root, then make it executable only by the mail group (or
whatever group DSPAM and the CGI run in).
The DSPAM Web UI has a minimal configuration inside the configure.pl script.
You'll want to check and make sure all of the settings are correct. In
most cases, the only that will be necessary to change are the large-scale
or domain-scale flags.
BEFORE PROCEEDING:
Check and make sure (Again) that the CGI user from Apache's httpd.conf is
added as a trusted user in dspam.conf.
Default Preferences
Now would be a good time to set the system's default preferences. This can
be done using the dspam_admin tool. For example:
dspam_admin ch pref default trainingMode TEFT
dspam_admin ch pref default spamAction quarantine
dspam_admin ch pref default spamSubject "[SPAM]"
dspam_admin ch pref default enableWhitelist on
dspam_admin ch pref showFactors off
The default preferences are used for any users who have not yet set their
own preferences. You can also control which preferences the user may
override by changing the "AllowOverride" settings in dspam.conf.
By default, the parameters specified on the commandline will be used (if
any). If, however, a preference is found for the particular user those
preferences will override the commandline.
GD Graphing Library
If you plan on leaving DSPAM's logging function enabled, and would like to
produce pretty graphs for your users, the graph.cgi script requires the
following be installed on your machine:
- GD Graphics Library (http://www.boutell.com/gd/)
Compile with png support
- The following PERL modules:
(http://www.perl.com/CPAN/modules/by-module/GD/)
. GD
. GD-Graph3d
. GDGraph
. GDTextUtil
. CGI
Typically this can be accomplished on the commandline:
perl -MCPAN -e 'install GD::Graph3d'
Configuring Administrators
Once you've configured the Web UI, you'll want to edit the 'admins' file to
contain a list of users who are permitted to use the administration suite.
Configuring Sub-Administrators / Domain Level Administrators
It is possible to delegate the management of users to a list of sub-admins/
domain level admins. To accomplish that you should edit the 'subadmins'
file to contain a list of sub-admins/domain level admins which are permitted
to switch their username while using the DSPAM control center.
Opt-In/Out
If you would like your users to be able to opt in/out of DSPAM filtering,
add the correct option to the nav_preferences.html template, depending on
your configuration (for example, if you have an opt-in system, you'll want to
add the opt-in option). Note: This currently only works with the preferences
extension, and not drop files.
<INPUT TYPE=CHECKBOX NAME=optIn $C_OPTIN$>
Opt into DSPAM filtering
<INPUT TYPE=CHECKBOX NAME=optOut $C_OPTOUT$>
Opt out of DSPAM filtering
1.2 TESTING
If you've installed from an RPM, there's a good chance that the packager
went to the trouble of testing already. If you're building from sources,
however, you'll need to find a way to ensure your configuration isn't broken.
Most software packages are supplied with a test suite to determine if the
software is functioning properly. Since DSPAM's correct function relies
primarily on having the correct permissions and mail server configuration,
a test script fails to provide the level of testing required for such a
package. The following exercise has been provided to test dspam's correct
functioning on your system. This exercise does not test the Web UI, but only
the core dspam agent.
Before running the test, you should have completed section 1.1's instructions
for compiling and installing dspam as well as configured your mail server
to support dspam.
1. Create a new user account on your system. It is important that this be a
new account to prevent any unrelated email from being delivered during
testing. Be sure to configure a spam alias for the test account.
2. Send a short (10 words or less) email to the account, and pick it up
using your favorite mail client.
3. Run dspam_stats [username] on the server. You should see a value of 1
for "TI" or "Total Innocent" as shown below:
dspam-test 0 TP 1 TN 0 FN 0 FP
If you receive an error such as "unable to open /usr/local/var/dspam... for
reading", then the dspam agent is not configured correctly. The problem
could exist in either your mail server configuration or one or more of the
permissions on the directory or agent. Check your configuration and
permissions, and repeat this step until the correct results are experienced.
4. Run dspam_dump [username] to get a complete list of tokens and their
statistics. Each token should have an I: (innocent) hit count of 1. The
tokens will be represented as 64-bit values, for example:
3126549390380922317 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
13884833415944681423 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
14519792632472852948 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
8851970219880318167 S: 0 I: 1 LH: Mon Aug 4 11:40:12 2003
To view statistics for a particular token, run dspam_dump [username] [token]
where token is the plain-text token value. For example:
% dspam_dump bill FREE
7717766825815048192 S: 00265 I: 00068 P: 0.7358
5. Forward the test message to the spam alias you've created for the test
account. Provide enough time for the message to have processed.
6. Run dspam_stats [username] on the server again. Now, the value for TN
should be zero and the value for FN (false negatives) should be 1 as shown
below:
dspam-test 0 TP 0 TN 1 FN 0 FP
If this is not the case, check the group permissions of the dspam agent as
well as the permissions your MTA uses when piping to aliases.
7. Run dspam_dump [username] again. make sure that _EVERY_ token now has an
I: of zero and a S: of 1:
3126549390380922317 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
13884833415944681423 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
14519792632472852948 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
8851970219880318167 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003
If you have some tokens that do not have an S: of 1 or an I: of 0, the dspam
signature was not found on the email, and this could be due to a lot of
things.
1.3 TROUBLESHOOTING
Problem: No files are being created in the user directory
Solution: Check the directory permissions of the directory. The user
directory must be writable by the user the dspam agent is running
as as well as the CGI user.
Problem: False positives are never being delivered
Solution: Your CGI most likely doesn't have the privileges required by