Identification of cyberbullying Bangla Linguistics Texts using deep learning and transformer based approaches.

Saifullah, Md Khalid

dc.contributor.author	Saifullah, Md Khalid
dc.date.accessioned	2025-09-23T05:28:55Z
dc.date.available	2025-09-23T05:28:55Z
dc.date.issued	2024-07-24
dc.identifier.uri	http://103.99.128.19:8080/xmlui/handle/123456789/521
dc.description	Thesis in CSE	en_US
dc.description.abstract	In today's digital era, social media platforms such as Facebook, Twitter, and YouTube play crucial roles in facilitating idea expression and interpersonal connections. However, alongside increased connectivity, these platforms have inadvertently facilitated negative behaviors, notably cyberbullying. While extensive research has delved into cyberbullying in high-resource languages like English, there remains a significant dearth of resources for low-resource languages such as Bengali, Arabic, Tamil, and others, particularly concerning language modeling. This study aims to bridge this gap by developing a cyberbullying text identification system, named BullyFilterNeT, tailored specifically for social media texts, with Bengali serving as a test case. The intelligent BullyFilterNeT system effectively tackles challenges associated with Out-of-Vocabulary (OOV) words inherent in non-contextual embeddings and addresses the limitations of context-aware feature representations. To provide a comprehensive analysis, three non-contextual embedding models—GloVe, FastText, and Word2Vec—are developed for feature extraction in Bengali. These embedding models are integrated into classification models employing both statistical methods (SVM, SGD, Libsvm) and deep learning architectures (CNN, VDCNN, LSTM, GRU). Furthermore, the study utilizes six transformer-based language models; mBERT, bELECTRA, IndicBERT, XML-RoBERTa, DistilBERT, and BanglaBERT to overcome shortcomings observed in earlier models. Notably, the BanglaBERT-based BullyFilterNeT achieves the highest accuracy of 88.04% in our test set, demonstrating its efficacy in identifying cyberbullying text in the Bengali language.	en_US
dc.language.iso	en	en_US
dc.publisher	CUET	en_US
dc.relation.ispartofseries	TCD-64;T-352
dc.subject	Cyberbullying; large language modelling; deep learning; transformers models; natural language processing (NLP); fine tuning; OOV; harmful messages	en_US
dc.title	Identification of cyberbullying Bangla Linguistics Texts using deep learning and transformer based approaches.	en_US
dc.type	Thesis	en_US