A Comparative Analysis of Text Vectorization and Machine Learning Classifiers for Fake News Detection

Main Article Content

Ashutosh Dhamija, Mukesh Kumar

Abstract

In today’s digital era, the media landscape has seamlessly transitioned from print to online platforms, leading to an unprecedented increase in information accessibility and exchange. However, this transformation has also intensified a major challenge, the rapid proliferation of fake news, which refers to fabricated or misleading information that can be easily produced and disseminated. This paper addresses the growing global concern of misinformation and explores potential solutions through machine learning techniques. The proposed study develops a model designed to assess the authenticity of news articles by evaluating multiple text vectorization methods, specifically the Bag-of-Words approach using both Count Vectorizer and TF-IDF Vectorizer. Two classification algorithms, namely the Multinomial Naive Bayes Classifier and the Passive Aggressive Classifier, are employed to detect fake news. The study further investigates how text pre-processing influences overall model performance. The dataset chosen for training is comprised of 67.7% curated information, while the remaining 33.3% remains untrained raw data. Notably, the model demonstrates a noteworthy efficiency rate of 93.78% under optimum conditions. This strong result demonstrates how well the suggested methodology works to differentiate between real and fake news.

Article Details

Section
Articles