收集DSPAM的使用效果
hi,all调查的初衷是想知道使用 dspam 的用户群大概有多少,以及使用 dspam 至少半个月后的效果;
同时也方便大家一起交流,对社区以后是否改进dspam 的一个重要参考,请过路的朋友踊跃跟贴,谢谢大家。
[color=red]请不要在这个贴子里发与dspam使用效果无关的信息,谢谢:[/color]
2010年1月12日22:00,100用户左右,使用Spam-lock+Dspam+Clamav,导入Extmail官方的训练库;
[b]2010年1月13日15:30,识别率见下表:[/b]
[table=98%][tr][td=1,1,251][b][font=宋体][size=3]Metric[/size][/font][/b][/td][td=1,1,341][b][font=宋体][size=3]Calculated as[/size][/font][/b][/td][td=1,1,144][b][font=宋体][size=3][/size][/font][/b][/td][/tr][tr][td=1,1,251][font=宋体][size=3]Overall accuracy
(since last reset)90.233%[/size][/font][/td][td=1,1,341][font=宋体][size=3](SPAM messages caught + Good messages delivered) / Total number of messages[/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=1,1,251][font=宋体][size=3]Spam identification
(since last reset)76.623%[/size][/font][/td][td=1,1,341][font=宋体][size=3](Spam catch rate only)[/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=1,1,251][font=宋体][size=3]Spam ratio
(of total processed)44.238%[/size][/font][/td][td=1,1,341][font=宋体][size=3]Total SPAM messages (both caught & missed) / Total number of messages[/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=2,1][font=宋体][size=3] [/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=1,1,251][b][font=宋体][size=3] [/size][/font][/b][/td][td=1,1,341][b][font=宋体][size=3]SPAM messages[/size][/font][/b][/td][td=1,1,144][b][font=宋体][size=3]Good messages[/size][/font][/b][/td][/tr][tr][td=1,3][font=宋体][size=3]Since last reset[/size][/font][/td][td=1,1,341][font=宋体][size=3]18 missed[/size][/font][/td][td=1,1,144][font=宋体][size=3]3 missed[/size][/font][/td][/tr][tr][td=1,1,341][font=宋体][size=3]59 caught[/size][/font][/td][td=1,1,144][font=宋体][size=3]135 delivered[/size][/font][/td][/tr][tr][td=1,1,341][font=宋体][size=3]76.623% caught[/size][/font][/td][td=1,1,144][font=宋体][size=3]2.174% missed[/size][/font][/td][/tr][tr][td=1,2][font=宋体][size=3]Total processed by filter[/size][/font][/td][td=1,1,341][font=宋体][size=3]481 missed[/size][/font][/td][td=1,1,144][font=宋体][size=3]16 missed[/size][/font][/td][/tr][tr][td=1,1,341][font=宋体][size=3]2974 caught[/size][/font][/td][td=1,1,144][font=宋体][size=3]4339 delivered[/size][/font][/td][/tr][tr][td][font=宋体][size=3]From corpus[/size][/font][/td][td=1,1,341][font=宋体][size=3]493 fed[/size][/font][/td][td=1,1,144][font=宋体][size=3]10 fed[/size][/font][/td][/tr][/table]
[b]2010年1月15日16:00 识别率[/b]
[table=98%][tr][td=1,1,251][b][font=宋体][size=3]Metric[/size][/font][/b][/td][td=1,1,341][b][font=宋体][size=3]Calculated as[/size][/font][/b][/td][td=1,1,144][b][font=宋体][size=3][/size][/font][/b][/td][/tr][tr][td=1,1,251][font=宋体][size=3]Overall accuracy
(since last reset)81.81%[/size][/font][/td][td=1,1,341][font=宋体][size=3](SPAM messages caught + Good messages delivered) / Total number of messages[/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=1,1,251][font=宋体][size=3]Spam identification
(since last reset)74.04%[/size][/font][/td][td=1,1,341][font=宋体][size=3](Spam catch rate only)[/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=1,1,251][font=宋体][size=3]Spam ratio
(of total processed)50.01%[/size][/font][/td][td=1,1,341][font=宋体][size=3]Total SPAM messages (both caught & missed) / Total number of messages[/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=2,1][font=宋体][size=3] [/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=1,1,251][b][font=宋体][size=3] [/size][/font][/b][/td][td=1,1,341][b][font=宋体][size=3]SPAM messages[/size][/font][/b][/td][td=1,1,144][b][font=宋体][size=3]Good messages[/size][/font][/b][/td][/tr][tr][td=1,3][font=宋体][size=3]Since last reset[/size][/font][/td][td=1,1,341][font=宋体][size=3]413 missed[/size][/font][/td][td=1,1,144][font=宋体][size=3]13 missed[/size][/font][/td][/tr][tr][td=1,1,341][font=宋体][size=3]1178 caught[/size][/font][/td][td=1,1,144][font=宋体][size=3]738 delivered[/size][/font][/td][/tr][tr][td=1,1,341][font=宋体][size=3]74.041% caught[/size][/font][/td][td=1,1,144][font=宋体][size=3]1.731% missed[/size][/font][/td][/tr][tr][td=1,2][font=宋体][size=3]Total processed by filter[/size][/font][/td][td=1,1,341][font=宋体][size=3]877 missed[/size][/font][/td][td=1,1,144][font=宋体][size=3]26 missed[/size][/font][/td][/tr][tr][td=1,1,341][font=宋体][size=3]4093 caught[/size][/font][/td][td=1,1,144][font=宋体][size=3]4942 delivered[/size][/font][/td][/tr][tr][td][font=宋体][size=3]From corpus[/size][/font][/td][td=1,1,341][font=宋体][size=3]1080 fed[/size][/font][/td][td=1,1,144][font=宋体][size=3]13 fed[/size][/font][/td][/tr][/table]
[b]2010年1月17日统计一下24小时的投递状态“458 SPAM, 96 Good, 5 Spam Misses, 1 False Positives ”识别基本到了99%[/b]
[b]2010年1月31日20:00,识别率[/b]
[b][table=98%][tr][td=1,1,251][b][font=宋体][size=3]Metric[/size][/font][/b][/td][td=1,1,341][b][font=宋体][size=3]Calculated as[/size][/font][/b][/td][td=1,1,144][b][font=宋体][size=3][/size][/font][/b][/td][/tr][tr][td=1,1,251][font=宋体][size=3]Overall accuracy
(since last reset)99.02%[/size][/font][/td][td=1,1,341][font=宋体][size=3](SPAM messages caught + Good messages delivered) / Total number of messages[/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=1,1,251][font=宋体][size=3]Spam identification
(since last reset)99.73%[/size][/font][/td][td=1,1,341][font=宋体][size=3](Spam catch rate only)[/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=1,1,251][font=宋体][size=3]Spam ratio
(of total processed)50.906%[/size][/font][/td][td=1,1,341][font=宋体][size=3]Total SPAM messages (both caught & missed) / Total number of messages[/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=2,1][font=宋体][size=3] [/size][/font][/td][td][font=宋体][size=3][/size][/font][/td][/tr][tr][td=1,1,251][b][font=宋体][size=3] [/size][/font][/b][/td][td=1,1,341][b][font=宋体][size=3]SPAM messages[/size][/font][/b][/td][td=1,1,144][b][font=宋体][size=3]Good messages[/size][/font][/b][/td][/tr][tr][td=1,3][font=宋体][size=3]Since last reset[/size][/font][/td][td=1,1,341][font=宋体][size=3]1 missed[/size][/font][/td][td=1,1,144][font=宋体][size=3]10 missed[/size][/font][/td][/tr][tr][td=1,1,341][font=宋体][size=3]370 caught[/size][/font][/td][td=1,1,144][font=宋体][size=3]721 delivered[/size][/font][/td][/tr][tr][td=1,1,341][font=宋体][size=3]99.730% caught[/size][/font][/td][td=1,1,144][font=宋体][size=3]1.368% missed[/size][/font][/td][/tr][tr][td=1,2][font=宋体][size=3]Total processed by filter[/size][/font][/td][td=1,1,341][font=宋体][size=3]954 missed[/size][/font][/td][td=1,1,144][font=宋体][size=3]241 missed[/size][/font][/td][/tr][tr][td=1,1,341][font=宋体][size=3]7953 caught[/size][/font][/td][td=1,1,144][font=宋体][size=3]8349 delivered[/size][/font][/td][/tr][tr][td][font=宋体][size=3]From corpus[/size][/font][/td][td=1,1,341][font=宋体][size=3]1222 fed[/size][/font][/td][td=1,1,144][font=宋体][size=3]121 fed[/size][/font][/td][/tr][/table][/b] 难得沙发:
以下是4-5天时间的效果统计:
SPAM messages Good messages
Since last reset 31 missed 2 missed
2110 caught 4215 delivered
98.552% caught 0.047% missed Since last reset
SPAM messages Good messages
1 missed 1 missed
242 caught 29 delivered
99.588% caught 3.333% missed
Total processed by filter
538 missed 16 missed
4140 caught 4434 delivered
From corpus 310 fed 8 fed
不过我的信头还是没有dspam 。是不是good的都不会经过dspam?
刚帮你排列了一下 :)
[[i] 本帖最后由 liushaobo 于 2009-5-8 10:50 编辑 [/i]] -----------------------------------一个客户的7天的 (并且有做白名单不过dspam)
Metric Calculated as
Overall accuracy (since last reset) 93.976% (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset) 91.589% (Spam catch rate only)
Spam ratio (of total processed) 48.846% Total SPAM messages (both caught & missed) / Total number of messages
SPAM messages Good messages
Since last reset 18 missed 17 missed
196 caught 350 delivered
91.589% caught 4.632% missed
Total processed by filter 395 missed 30 missed
2822 caught 3339 delivered
From corpus 235 fed 11 fed
[[i] 本帖最后由 rodge 于 2009-5-8 10:58 编辑 [/i]] Performance Statistics - Fri May 8 11:49:50 2009
Metric Calculated as
Overall accuracy (since last reset) 99.593% (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset) 99.569% (Spam catch rate only)
Spam ratio (of total processed) 89.791% Total SPAM messages (both caught & missed) / Total number of messages
SPAM messages Good messages
Since last reset 482 missed 6 missed
111260 caught 8114 delivered
99.569% caught 0.074% missed
Total processed by filter 1059 missed 82 missed
129389 caught 14750 delivered
From corpus 2392 fed 58 fed
开头效果很不错,这几天忽然差了很多,每天错判为正常邮件的垃圾邮件大概有个200封(400用户)
[[i] 本帖最后由 archerhu 于 2009-5-9 08:18 编辑 [/i]] [quote]原帖由 [i]archerhu[/i] 于 2009-5-8 11:50 发表 [url=http://www.extmail.org/forum/redirect.php?goto=findpost&pid=59142&ptid=10552][img]http://www.extmail.org/forum/images/common/back.gif[/img][/url]
Performance Statistics - Fri May 8 11:49:50 2009
Metric Calculated as
Overall accuracy (since last reset) 99.593% (SPAM messages caught + Good messages delivered) / Total number of messages
Spam ide ... [/quote]
贝叶斯的算法不是一次可以解决终身的,抓住近期明显的垃圾信件并强化正常邮件长期坚持训练,多看看贝叶斯算法描述 其实还需要结合smtp行为的过滤,才能相对完整点。我也发现dspam 过些时间,部分新的spam就会漏判. [quote]原帖由 [i]hzqbbc[/i] 于 2009-5-11 16:58 发表 [url=http://www.extmail.org/forum/redirect.php?goto=findpost&pid=59251&ptid=10552][img]http://www.extmail.org/forum/images/common/back.gif[/img][/url]
其实还需要结合smtp行为的过滤,才能相对完整点。我也发现dspam 过些时间,部分新的spam就会漏判. [/quote]
最近发现垃圾邮件报告不太好使,于是把正常邮件和垃圾邮件分开,用dspam_train训练,发现:
1.部分垃圾邮件无论学习多少次都学不会,可能因为内容为空,有附件的那些
2.其他垃圾邮件第一次学习一定不会pass,起码要两次,甚至大部分需要学习3-4次的才能pass。我用dspam命令对单个邮件样本学习也是必须第二次执行才能学会,但是对同类型的(内容几乎一样)的垃圾邮件还是判断不出来,由此开始严重怀疑我们平常报告垃圾邮件插件的效果 SPAM messages Good messages
Since last reset 0 missed 0 missed
13307 caught 44141 delivered
100.000% caught 0.000% missed
Total processed by filter 1967 missed 91 missed
60561 caught 178720 delivered
From corpus 4932 fed 150 fed
这是10天的记录
我多次进行了reset,但是邮件太多,手动retrain不过来了,正确率大概能在90%多点。
恩,我也发现有些邮件学习了多次后仍然学不会,以英文邮件居多
对中文垃圾邮件的处理,效果是非常好,直接进了“垃圾邮件”目录 看来还得结合行为过滤才行. 看看我们的
Metric Calculated as
Overall accuracy (since last reset) 97.869% (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset) 98.025% (Spam catch rate only)
Spam ratio (of total processed) 65.028% Total SPAM messages (both caught & missed) / Total number of messages
SPAM messages Good messages
Since last reset 33 missed 3 missed
1638 caught 15 delivered
98.025% caught 16.667% missed
Total processed by filter 436 missed 20 missed
5200 caught 3011 delivered
From corpus 233 fed 14 fed Metric Calculated as
Overall accuracy (since last reset) 93.836% (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset) 87.205% (Spam catch rate only)
Spam ratio (of total processed) 43.462% Total SPAM messages (both caught & missed) / Total number of messages
SPAM messages Good messages
Since last reset 390 missed 42 missed
2658 caught 3919 delivered
87.205% caught 1.060% missed
Total processed by filter 390 missed 42 missed
2658 caught 3923 delivered
From corpus 215 fed 11 fed 总精准度 98.939% (捕捉成功的垃圾邮件 + 已投递的正常邮件)÷ 邮件总数
垃圾邮件捕捉成功命中率 99.084% 捕捉成功的垃圾邮件 ÷ 垃圾邮件总数(包括已捕捉的和漏判的)
垃圾邮件比例 67.921% 垃圾邮件总数(包括已捕捉的和漏判的)÷ 邮件总数
垃圾邮件 正常邮件
自上次重置统计后 92 漏判 48 漏判
9947 捕捉 3110 投递
99.084% 捕捉率 1.520% 漏判率
过滤统计 471 漏判 63 漏判
12580 捕捉 6101 投递
From corpus 251 fed 18 fed
[[i] 本帖最后由 zhaodongxi 于 2009-10-10 08:11 编辑 [/i]] 我的效果不大好
erformance Statistics - Tue Apr 20 15:06:57 2010
Metric Calculated as
Overall accuracy (since last reset) 65.797% (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset) 50.638% (Spam catch rate only)
Spam ratio (of total processed) 50.116% Total SPAM messages (both caught & missed) / Total number of messages
SPAM messages Good messages
Since last reset 3365 missed 1248 missed
3452 caught 5422 delivered
50.638% caught 18.711% missed
Total processed by filter 3449 missed 1259 missed
3452 caught 5610 delivered
From corpus 9149 fed 335 fed erformance Statistics - Tue Apr 20 15:06:57 2010
Metric Calculated as
Overall accuracy (since last reset) 65.797% (SPAM messages caught + Good messages delivered) / Total number of messages
Spam identification (since last reset) 50.638% (Spam catch rate only)
Spam ratio (of total processed) 50.116% Total SPAM messages (both caught & missed) / Total number of messages
SPAM messages Good messages
Since last reset 3365 missed 1248 missed
3452 caught 5422 delivered
50.638% caught 18.711% missed
Total processed by filter 3449 missed 1259 missed
3452 caught 5610 delivered
From corpus 9149 fed 335 fed SPAM messages Good messages
Since last reset missed missed
caught delivered
N/A% caught N/A% missed
Total processed by filter 546 missed 17 missed
2703 caught 3891 delivered
From corpus 642 fed 8 fed
xxxxx
自己开发的行为过滤配合DSPAM 出来的结果非常满意了。[[i] 本帖最后由 pencat 于 2010-7-2 12:59 编辑 [/i]] 楼上哥们共享一下子嘛。。。。 [i=s] 本帖最后由 liu-zhanxian 于 2011-7-22 01:02 编辑 [/i]
效果一般
页:
[1]