CUET DIGITAL REPOSITORY

A Modified Naïve Bayesian-based Spam Filter using Support Vector Machine

Show simple item record

dc.contributor.author Hossain, Md. Sabir
dc.contributor.author Zubair, Md.
dc.contributor.author Rahman, Mohammad Obaidur
dc.contributor.author Patwary, Muhammad Kamrul Hossain
dc.contributor.author Rajib, Md. Golam Sarwar
dc.date.accessioned 2021-10-27T05:43:41Z
dc.date.available 2021-10-27T05:43:41Z
dc.date.issued 2019-05-03
dc.identifier.uri http://103.99.128.19:8080/xmlui/handle/123456789/337
dc.description.abstract The ever-growing problem which is threatening the current mailing system is spam. Spam is nothing but an unsolicited bulk e-mail frequently sent in a financial nature which generates the need for creating an anti-spam filter. Amongst many spam filtering techniques, the most advanced method "Naïve Bayesian filtering" using the Support Vector Machine (SVM) have been implemented. Spammers are very careful about the filtering techniques. For that very reason, dynamic filtering is needed and the proposed method meets the demand. The algorithm splits the received email into tokens and uses Bayes' theorem of probability to calculate the probability of spam for each token to determine the total spam probability of the mail. Implementation of SVM instead of corpora is one of the added features of the algorithm. The most challenging feature was to take the words as well as whole sentences as input in the SVM as tokens and feature vectors. The inclusion of sentences in the dataset training has increased the accuracy of detecting spam and ham. Natural Language Tool Kit (NLTK) has been used as a useful language processing tool to tokenize the sentences and also to understand the meaning of the same types of sentences to some extent. As a test mail is being compared by word to word and also sentence to sentence from the training datasets to determine if the mail is spam or not, it will improve the performance of the filter. With some simple modifications, the filter can be used in both server and client end. The efficiency increases gradually with the increased number of email it processes. en_US
dc.language.iso en_US en_US
dc.publisher EWU en_US
dc.subject Spam en_US
dc.subject Bayesian Approach en_US
dc.subject SVM en_US
dc.subject Tokenization en_US
dc.subject Spamicity en_US
dc.subject Dataset en_US
dc.title A Modified Naïve Bayesian-based Spam Filter using Support Vector Machine en_US
dc.title.alternative 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT 2019) en_US
dc.title.alternative ICASERT 2019 en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account