Please use this identifier to cite or link to this item:
http://103.99.128.19:8080/xmlui/handle/123456789/337
Title: | A Modified Naïve Bayesian-based Spam Filter using Support Vector Machine |
Other Titles: | 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT 2019) ICASERT 2019 |
Authors: | Hossain, Md. Sabir Zubair, Md. Rahman, Mohammad Obaidur Patwary, Muhammad Kamrul Hossain Rajib, Md. Golam Sarwar |
Keywords: | Spam Bayesian Approach SVM Tokenization Spamicity Dataset |
Issue Date: | 3-May-2019 |
Publisher: | EWU |
Abstract: | The ever-growing problem which is threatening the current mailing system is spam. Spam is nothing but an unsolicited bulk e-mail frequently sent in a financial nature which generates the need for creating an anti-spam filter. Amongst many spam filtering techniques, the most advanced method "Naïve Bayesian filtering" using the Support Vector Machine (SVM) have been implemented. Spammers are very careful about the filtering techniques. For that very reason, dynamic filtering is needed and the proposed method meets the demand. The algorithm splits the received email into tokens and uses Bayes' theorem of probability to calculate the probability of spam for each token to determine the total spam probability of the mail. Implementation of SVM instead of corpora is one of the added features of the algorithm. The most challenging feature was to take the words as well as whole sentences as input in the SVM as tokens and feature vectors. The inclusion of sentences in the dataset training has increased the accuracy of detecting spam and ham. Natural Language Tool Kit (NLTK) has been used as a useful language processing tool to tokenize the sentences and also to understand the meaning of the same types of sentences to some extent. As a test mail is being compared by word to word and also sentence to sentence from the training datasets to determine if the mail is spam or not, it will improve the performance of the filter. With some simple modifications, the filter can be used in both server and client end. The efficiency increases gradually with the increased number of email it processes. |
URI: | http://103.99.128.19:8080/xmlui/handle/123456789/337 |
Appears in Collections: | proceedings in CSE |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
A Modified Naïve Bayesian-based Spam Filter.pdf | 1.12 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.