Skip to content

Commit

Permalink
2016/05/09 00:09:20
Browse files Browse the repository at this point in the history
  • Loading branch information
MasahikoIto committed May 8, 2016
1 parent e9d799a commit e140531
Show file tree
Hide file tree
Showing 5 changed files with 627 additions and 0 deletions.
158 changes: 158 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@

sf-0.1 -- spam filter for UNIX-like systems

Copyright (C) 2006 Masahiko Ito

These programs is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.

These programs is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
these programs; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA

Mail suggestions and bug reports for these programs to
"Masahiko Ito" <[email protected]>

History
=======

�� 2006/10/10 Ver. 0.1 release (1st)

What's this ?
=============

�٥������������򤷤Ť餤�ȴ��������ϼ��ʻ䤬����äȥ���ץ�ʹͤ����Ǥ���̤�
�������ΤǤϡ��Ȼפ�����ʬ�ʤ�β���˴�Ť��ƥե��륿��������Ƥߤ��顢�빽
�����Ƥ��ޤä��Ȥ�����ʪ��

Preinstall
==========

sf-0.1�ϡ��ʲ��Υ��եȥ����������Ѥ��Ƥ��ޤ��������˥��󥹥ȡ��뤬ɬ�פǤ���

�� KAKASI - ����������(�����޻�)�Ѵ��ץ������ (http://kakasi.namazu.org/)
�� SQLite home page (http://www.sqlite.org/)

Install
=======

�� tar xvzf sf-0.1.tar.gz
�� cd sf-0.1
�� cp sf-*.sh /anywhere/bin/
�� mkdir ~/.sf

Algorithm
=========

��spam�ؽ��ơ��֥�(�ʲ�white�ơ��֥�)����spam�᡼����Ρ�ñ��פȡֽи�����פ�
�ݻ����� spam�ؽ��ơ��֥�(�ʲ�black�ơ��֥�)��spam�᡼����Ρ�ñ��פȡֽи���
���פ��ݻ����ޤ���

create table t_white (
term text primary key,
count long long int
);
create table t_black (
term text primary key,
count long long int
);

����᡼��˴ؤ��Ƹ�����Ԥ���硢�ޤ����Υ᡼����ʸ���ñ��(tango1��n)�פ�ʬ��
�������줾��Ρֽи����(count1��n)�פ򥫥���Ȥ��ޤ���

��ñ��(tango1)�פ򸡺������ˤ���white�ơ��֥�򸡺�����white�ơ��֥���νи���
����x�ֽи����(count1)��/��white�ơ��֥������ñ��νи���������¡פ���ޤ�
(white_score)��

��ñ��(tango1)�פ򸡺������ˤ���black�ơ��֥�򸡺�����black�ơ��֥���νи���
����x�ֽи����(count1)��/��black�ơ��֥������ñ��νи���������¡פ���ޤ�
(black_score)��

(white_score / (white_score + black_score)) - 0.5 ���ᡢ������ñ��(tango1)
�פ��Ф���֥������פȤ��롣�֥������פ�-0.5��+0.5���ͤ��ꡢ�ޥ��ʥ��ͤ�spam
�������⤯���ץ饹�ͤ���spam�������⤤ñ��Ǥ��뤳�Ȥ��̣���롣

�ʲ���Ʊ�ͤ˻Ĥ�Ρ�ñ��פˤĤ��Ƥ�֥������פ��ᡢ��ñ��(tango1��n)�פ�����
�������פ��פ������ι���ͤ��ޥ��ʥ��ͤʤ��spam��Ƚ�Ǥ��롣

How to use
==========

sf_init.sh
----------

$ sf_init.sh -h
Usage : sf_init.sh
Initialize database.

spam��Ƚ�Ǥ����Ѥ���ǡ����١������������ޤ��������ƥ�����Ѥ�����ֺǽ�˰�
�٤����¹Ԥ��ޤ���

sf_add.sh
---------

$ sf_add.sh -h
Usage : sf_add.sh [-w|--white|-b|--black] [-v|--vacuum] [file ...]
Add data to database.
-w, --white add data to white database.
-b, --black add data to black database.
-v, --vacuum vacuum after add.

�ǡ����١����γؽ�(�ɲ�)��Ԥ��ޤ���������ä����spam�Τ�����������spam��Ƚ��
����ʤ��ä�ʪ��-b���ץ����dzؽ������ޤ����ޤ�����spam�Τ��������ä� spam��Ƚ
�Ǥ��줿ʪ��-w���ץ����dzؽ������ޤ��������ƥ���������ˡ�spam����spam �򤽤�
����100�᡼�����ٳؽ������Ƥ����С�90%�ʾ�(?)�����٤ǿ���ʬ������ޤ���

sf_del.sh
---------

$ sf_del.sh -h
Usage : sf_del.sh [-w|--white|-b|--black] [-v|--vacuum] [file ...]
Del data from database.
-w, --white add data to white database.
-b, --black add data to black database.
-v, --vacuum vacuum after del.

�ǡ����١����γؽ�(���)��Ԥ��ޤ����ؽ��ߥ�����ä��������Ѥ��ޤ���

sf_check.sh
-----------

$ sf_check.sh -h
Usage : sf_check.sh [-w|--white|-b|--black] [file ...]
Check file.
-w, --white check white?
-b, --black check black?
return 0 when check is true.
return 1 when check is false.

���ϥե�����(�ޤ���stdin)�����Ƥ򸡺���������������(�¿���)��stdout�˽��ϸ塢��
����̤����ʤ�0�����ʤ�1���֤��ޤ��������������ϡ�spam�ξ��ޥ��ʥ��ͤȤʤꡢ
spam�Ǥʤ�����0.0�ʾ���ͤȤʤ�ޤ���̵�ؽ��ξ��θ����������Ͼ�� 0.0�ʾ��
�ʤ�ޤ���

procmail�Ȥ�Ϣ��
================

�ºݤ�spam����ʬ���˴ؤ��Ƥ�procmail�Ȥ�Ϣ�����ɤ��Ȼפ��ޤ���

$ cat ~/.procmailrc
:0 HB
* ? sf_check.sh -b
/home/�ۤ�/Mail/spam/.

sf_check.sh�ˤ��spam(-b)�Ǥ�����򸡺��������Ǥ����/home/�ۤ�/Mail/spam/�˥�
������Ǽ���ޤ���

BUGS
====

�� ������ץȤ��Ϥ��ǡ�����ʣ����ʸ�������ɤ����ߤ��Ƥ������������Ƚ�Ǥ���
��ʤ����⤷��ޤ���

128 changes: 128 additions & 0 deletions sf_add.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
#! /bin/sh
#
# spam filter programs by [email protected]
#
#----------------------------------------------------------------------
#
# functions
#
function show_help () {
echo "Usage : $0 [-w|--white|-b|--black] [-v|--vacuum] [file ...]"
echo "Add data to database."
echo " -w, --white add data to white database."
echo " -b, --black add data to black database."
echo " -v, --vacuum vacuum after add."
}
#----------------------------------------------------------------------
#
# main routin
#
if [ "X$1" = "X-h" -o "X$1" = "X--help" ]
then
show_help
exit 0
fi
#
if [ "X${SFDIR}" = "X" ]
then
SFDIR=${HOME}/.sf
fi
#
if [ "X${SFDB}" = "X" ]
then
SFDB=sf.db
fi
#
SFDB_PATH=${SFDIR}/${SFDB}
#
maxlength=50; export maxlength
tab=`echo -n -e '\t'`
zsp=`echo -n -e '\241\241'`
#
table=""
file=""
vacuum=""
#
while [ $# != 0 ]
do
case $1 in
-w|--white )
table="t_white"
;;
-b|--black )
table="t_black"
;;
-v|--vacuum )
vacuum="vacuum;"
;;
* )
file="${file} $1"
;;
esac
shift
done
#
if [ "X${table}" = "X" ]
then
show_help
exit 0
fi
#
for i in `cat ${file} |\
nkf -e -X |\
kakasi -w -ieuc -oeuc |\
sed -e "s/${zsp}/ /g;s/${tab}/ /g" |\
awk '{gsub(/ /,"\n");print}' |\
awk 'BEGIN{ \
maxlength = ENVIRON["maxlength"]; \
} \
{ \
if (length($0) <= maxlength){ \
print; \
} \
}' |\
tr -d '"'`
do
echo -n $i | tr -d '[:cntrl:]'
echo ""
done >/tmp/sf_add.1.$$.tmp
#
echo "begin;" >/tmp/sf_add.2.$$.tmp
#
for i in `cat /tmp/sf_add.1.$$.tmp |\
sort |\
uniq -c |\
sed -e "s/^ *//;s/${tab}/,/"`
do
count=`echo $i | cut -d, -f1`
term=`echo $i | cut -d, -f2-`
if [ "X${term}" = "X" ]
then
: # do nothing
else
result=`echo "select term from ${table} where term=\"${term}\";" |\
sqlite3 ${SFDB_PATH}`
if [ "X${result}" = "X" ]
then
echo "insert into ${table} values (\"${term}\",${count});" >>/tmp/sf_add.2.$$.tmp
else
echo "update ${table} set count=count+${count} where term=\"${term}\"; " >>/tmp/sf_add.2.$$.tmp
fi
fi
done
#
echo "end;" >>/tmp/sf_add.2.$$.tmp
cat /tmp/sf_add.2.$$.tmp |\
sqlite3 ${SFDB_PATH}
#
result=`echo "select sum(count) from ${table};" |\
sqlite3 ${SFDB_PATH}`
echo "update t_total set count=${result} where tablenm=\"${table}\";" |\
sqlite3 ${SFDB_PATH}
#
echo ${vacuum} |\
sqlite3 ${SFDB_PATH}
#
rm /tmp/sf_add.*.$$.tmp
#
exit 0
Loading

0 comments on commit e140531

Please sign in to comment.