Please use this identifier to cite or link to this item: http://103.99.128.19:8080/xmlui/handle/123456789/337
Title: A Modified Naïve Bayesian-based Spam Filter using Support Vector Machine
Other Titles: 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT 2019)
ICASERT 2019
Authors: Hossain, Md. Sabir
Zubair, Md.
Rahman, Mohammad Obaidur
Patwary, Muhammad Kamrul Hossain
Rajib, Md. Golam Sarwar
Keywords: Spam
Bayesian Approach
SVM
Tokenization
Spamicity
Dataset
Issue Date: 3-May-2019
Publisher: EWU
Abstract: The ever-growing problem which is threatening the current mailing system is spam. Spam is nothing but an unsolicited bulk e-mail frequently sent in a financial nature which generates the need for creating an anti-spam filter. Amongst many spam filtering techniques, the most advanced method "Naïve Bayesian filtering" using the Support Vector Machine (SVM) have been implemented. Spammers are very careful about the filtering techniques. For that very reason, dynamic filtering is needed and the proposed method meets the demand. The algorithm splits the received email into tokens and uses Bayes' theorem of probability to calculate the probability of spam for each token to determine the total spam probability of the mail. Implementation of SVM instead of corpora is one of the added features of the algorithm. The most challenging feature was to take the words as well as whole sentences as input in the SVM as tokens and feature vectors. The inclusion of sentences in the dataset training has increased the accuracy of detecting spam and ham. Natural Language Tool Kit (NLTK) has been used as a useful language processing tool to tokenize the sentences and also to understand the meaning of the same types of sentences to some extent. As a test mail is being compared by word to word and also sentence to sentence from the training datasets to determine if the mail is spam or not, it will improve the performance of the filter. With some simple modifications, the filter can be used in both server and client end. The efficiency increases gradually with the increased number of email it processes.
URI: http://103.99.128.19:8080/xmlui/handle/123456789/337
Appears in Collections:proceedings in CSE

Files in This Item:
File Description SizeFormat 
A Modified Naïve Bayesian-based Spam Filter.pdf1.12 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.