返回列表 回复 发帖

[讨论] 收集DSPAM的使用效果

hi,all
调查的初衷是想知道使用 dspam 的用户群大概有多少,以及使用 dspam 至少半个月后的效果;
同时也方便大家一起交流,对社区以后是否改进dspam 的一个重要参考,请过路的朋友踊跃跟贴,谢谢大家。

请不要在这个贴子里发与dspam使用效果无关的信息,谢谢:

2010年1月12日22:00,100用户左右,使用Spam-lock+Dspam+Clamav,导入Extmail官方的训练库;
2010年1月13日15:30,识别率见下表:
MetricCalculated as
Overall accuracy
(since last reset)90.233%
(SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification
(since last reset)76.623%
(Spam catch rate only)
Spam ratio
(of total processed)44.238%
Total SPAM messages (both caught & missed) / Total number of messages
 
 SPAM messagesGood messages
Since last reset18 missed3 missed
59 caught135 delivered
76.623% caught2.174% missed
Total processed by filter481 missed16 missed
2974 caught4339 delivered
From corpus493 fed10 fed


2010年1月15日16:00 识别率
MetricCalculated as
Overall accuracy
(since last reset)81.81%
(SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification
(since last reset)74.04%
(Spam catch rate only)
Spam ratio
(of total processed)50.01%
Total SPAM messages (both caught & missed) / Total number of messages
 
 SPAM messagesGood messages
Since last reset413 missed13 missed
1178 caught738 delivered
74.041% caught1.731% missed
Total processed by filter877 missed26 missed
4093 caught4942 delivered
From corpus1080 fed13 fed

2010年1月17日统计一下24小时的投递状态“458 SPAM, 96 Good, 5 Spam Misses, 1 False Positives ”识别基本到了99%

2010年1月31日20:00,识别率
MetricCalculated as
Overall accuracy
(since last reset)99.02%
(SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification
(since last reset)99.73%
(Spam catch rate only)
Spam ratio
(of total processed)50.906%
Total SPAM messages (both caught & missed) / Total number of messages
 
 SPAM messagesGood messages
Since last reset1 missed10 missed
370 caught721 delivered
99.730% caught1.368% missed
Total processed by filter954 missed241 missed
7953 caught8349 delivered
From corpus1222 fed121 fed
ExtMail邮件开发网
liushaobo@extmail.org
探索高性能的Anti-Spam组合
难得沙发:

以下是4-5天时间的效果统计:
                                     SPAM messages      Good messages
Since last reset                       31 missed               2 missed
                                               2110 caught           4215 delivered
                                               98.552% caught      0.047% missed
Since last reset
SPAM messages             Good messages
1 missed                               1 missed
242 caught                           29 delivered
99.588% caught                   3.333% missed
Total processed by filter
538 missed                          16 missed
4140 caught                         4434 delivered
From corpus   310 fed          8 fed

不过我的信头还是没有dspam 。是不是good的都不会经过dspam?
刚帮你排列了一下

[ 本帖最后由 liushaobo 于 2009-5-8 10:50 编辑 ]
-----------------------------------一个客户的7天的  (并且有做白名单不过dspam)


Metric                                                                                   Calculated   as
Overall accuracy (since last reset)         93.976%              (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset)     91.589%                   (Spam catch rate only)
Spam ratio (of total processed)             48.846%                           Total SPAM messages (both caught & missed) / Total number of messages


                                                        SPAM messages              Good messages   
Since last reset                         18 missed                          17 missed
                                                 196 caught                       350 delivered
                                                  91.589% caught           4.632% missed
Total processed by filter             395 missed               30 missed
                                                2822 caught                 3339 delivered
From corpus                               235 fed                    11 fed

[ 本帖最后由 rodge 于 2009-5-8 10:58 编辑 ]
Performance Statistics - Fri May 8 11:49:50 2009
Metric                 Calculated as
Overall accuracy (since last reset)        99.593%        (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset)        99.569%        (Spam catch rate only)
Spam ratio (of total processed)        89.791%        Total SPAM messages (both caught & missed) / Total number of messages
        SPAM messages        Good messages
Since last reset        482 missed        6 missed
111260 caught        8114 delivered
99.569% caught        0.074% missed
Total processed by filter        1059 missed        82 missed
129389 caught        14750 delivered
From corpus        2392 fed        58 fed


开头效果很不错,这几天忽然差了很多,每天错判为正常邮件的垃圾邮件大概有个200封(400用户)

[ 本帖最后由 archerhu 于 2009-5-9 08:18 编辑 ]
原帖由 archerhu 于 2009-5-8 11:50 发表
Performance Statistics - Fri May 8 11:49:50 2009
Metric                 Calculated as
Overall accuracy (since last reset)        99.593%        (SPAM messages caught + Good messages delivered) / Total number of messages
Spam ide ...
贝叶斯的算法不是一次可以解决终身的,抓住近期明显的垃圾信件并强化正常邮件长期坚持训练,多看看贝叶斯算法描述
其实还需要结合smtp行为的过滤,才能相对完整点。我也发现dspam 过些时间,部分新的spam就会漏判.
请引用或摘抄本站文章信息的朋友,保留本站链接及作者信息,保护版权,谢谢。

构建高性能大容量开源邮件系统- ExtMail

Postfix在中国官方网站
原帖由 hzqbbc 于 2009-5-11 16:58 发表
其实还需要结合smtp行为的过滤,才能相对完整点。我也发现dspam 过些时间,部分新的spam就会漏判.
最近发现垃圾邮件报告不太好使,于是把正常邮件和垃圾邮件分开,用dspam_train训练,发现:
1.部分垃圾邮件无论学习多少次都学不会,可能因为内容为空,有附件的那些
2.其他垃圾邮件第一次学习一定不会pass,起码要两次,甚至大部分需要学习3-4次的才能pass。我用dspam命令对单个邮件样本学习也是必须第二次执行才能学会,但是对同类型的(内容几乎一样)的垃圾邮件还是判断不出来,由此开始严重怀疑我们平常报告垃圾邮件插件的效果
SPAM messages Good messages
Since last reset 0 missed 0 missed
13307 caught 44141 delivered
100.000% caught 0.000% missed
Total processed by filter 1967 missed 91 missed
60561 caught 178720 delivered
From corpus 4932 fed 150 fed

这是10天的记录

我多次进行了reset,但是邮件太多,手动retrain不过来了,正确率大概能在90%多点。

恩,我也发现有些邮件学习了多次后仍然学不会,以英文邮件居多

对中文垃圾邮件的处理,效果是非常好,直接进了“垃圾邮件”目录
看来还得结合行为过滤才行. 看看我们的

Metric   Calculated as
Overall accuracy (since last reset) 97.869% (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset) 98.025% (Spam catch rate only)
Spam ratio (of total processed) 65.028% Total SPAM messages (both caught & missed) / Total number of messages
  SPAM messages Good messages
Since last reset 33 missed 3 missed
1638 caught 15 delivered
98.025% caught 16.667% missed
Total processed by filter 436 missed 20 missed
5200 caught 3011 delivered
From corpus 233 fed 14 fed
请引用或摘抄本站文章信息的朋友,保留本站链接及作者信息,保护版权,谢谢。

构建高性能大容量开源邮件系统- ExtMail

Postfix在中国官方网站
Metric                     Calculated as
Overall accuracy (since last reset)         93.836%         (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset)         87.205%         (Spam catch rate only)
Spam ratio (of total processed)         43.462%         Total SPAM messages (both caught & missed) / Total number of messages

        SPAM messages          Good messages
Since last reset         390 missed         42 missed
2658 caught         3919 delivered
87.205% caught         1.060% missed
Total processed by filter         390 missed         42 missed
2658 caught         3923 delivered
From corpus         215 fed         11 fed
总精准度        98.939%        (捕捉成功的垃圾邮件 + 已投递的正常邮件)÷ 邮件总数
垃圾邮件捕捉成功命中率        99.084%        捕捉成功的垃圾邮件 ÷ 垃圾邮件总数(包括已捕捉的和漏判的)
垃圾邮件比例        67.921%        垃圾邮件总数(包括已捕捉的和漏判的)÷ 邮件总数
        垃圾邮件        正常邮件
自上次重置统计后        92 漏判        48 漏判
        9947 捕捉        3110 投递
        99.084% 捕捉率        1.520% 漏判率
过滤统计        471 漏判        63 漏判
        12580 捕捉        6101 投递
From corpus        251 fed        18 fed

[ 本帖最后由 zhaodongxi 于 2009-10-10 08:11 编辑 ]
我的效果不大好
erformance Statistics - Tue Apr 20 15:06:57 2010
Metric                   Calculated as
Overall accuracy (since last reset)         65.797%         (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset)         50.638%         (Spam catch rate only)
Spam ratio (of total processed)         50.116%         Total SPAM messages (both caught & missed) / Total number of messages
          SPAM messages         Good messages
Since last reset         3365 missed         1248 missed
3452 caught         5422 delivered
50.638% caught         18.711% missed
Total processed by filter         3449 missed         1259 missed
3452 caught         5610 delivered
From corpus         9149 fed         335 fed
erformance Statistics - Tue Apr 20 15:06:57 2010
Metric                   Calculated as
Overall accuracy (since last reset)         65.797%         (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset)         50.638%         (Spam catch rate only)
Spam ratio (of total processed)         50.116%         Total SPAM messages (both caught & missed) / Total number of messages
          SPAM messages         Good messages
Since last reset         3365 missed         1248 missed
3452 caught         5422 delivered
50.638% caught         18.711% missed
Total processed by filter         3449 missed         1259 missed
3452 caught         5610 delivered
From corpus         9149 fed         335 fed
返回列表