dc.description.abstract |
In today's digital era, social media platforms such as Facebook, Twitter, and YouTube play crucial roles in facilitating idea expression and interpersonal connections. However, alongside increased connectivity, these platforms have inadvertently facilitated negative behaviors, notably cyberbullying. While extensive research has delved into cyberbullying in high-resource languages like English, there remains a significant dearth of resources for low-resource languages such as Bengali, Arabic, Tamil, and others, particularly concerning language modeling. This study aims to bridge this gap by developing a cyberbullying text identification system, named BullyFilterNeT, tailored specifically for social media texts, with Bengali serving as a test case. The intelligent BullyFilterNeT system effectively tackles challenges associated with Out-of-Vocabulary (OOV) words inherent in non-contextual embeddings and addresses the limitations of context-aware feature representations. To provide a comprehensive analysis, three non-contextual embedding models—GloVe, FastText, and Word2Vec—are developed for feature extraction in Bengali. These embedding models are integrated into classification models employing both statistical methods (SVM, SGD, Libsvm) and deep learning architectures (CNN, VDCNN, LSTM, GRU). Furthermore, the study utilizes six transformer-based language models; mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT to overcome shortcomings observed in earlier models. Notably, the BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, demonstrating its efficacy in identifying cyberbullying text in the Bengali language. |
en_US |