ORIGINAL ARTICLE
Figure from article: Arabic News Dataset...
 
KEYWORDS
TOPICS
ABSTRACT
The spread of fake news and AI-generated deepfakes poses a significant threat to public trust, particularly in Arabic-speaking contexts where resources for misinformation detection remain limited. Although prior studies have examined Arabic fake-news detection, existing datasets often rely on simplistic synthetic content or manually collected articles that do not fully reflect realworld journalistic writing. This study addresses this gap by constructing a high-quality dataset of 21,000 authentic news articles from Al Arabiya andCNNArabic and generating an equal number of fabricated articles using the GPT-4.1-mini model. Four deception strategies were employed: context shifting, exaggeration, contradiction, and misattribution. The resulting dataset, comprising more than 42,000 articles, was used to fine-tune the bidirectional encoder representations from transformers (BERT)-base-uncased model, achieving 96% accuracy in distinguishing real from fake news. This work introduces a large-scale Arabic news dataset, a reproducible deepfake generation methodology for Arabic news, and a transformer-based baseline for misinformation detection. However, the study is limited to modern standard Arabic and text-only content, highlighting the need for future research on cross-domain and dialectal robustness.
Journals System - logo
Scroll to top