Arabic News Dataset Collection and Generation for Deepfake-Text Detection

The spread of fake news and AI-generated deepfakes poses a significant threat to public trust, particularly in Arabic-speaking contexts where resources for misinformation detection remain limited. Although prior studies have examined Arabic fake-news detection, existing datasets often rely on simplistic synthetic content or manually collected articles that do not fully reflect realworld journalistic writing. This study addresses this gap by constructing a high-quality dataset of 21,000 authentic news articles from Al Arabiya andCNNArabic and generating an equal number of fabricated articles using the GPT-4.1-mini model. Four deception strategies were employed: context shifting, exaggeration, contradiction, and misattribution. The resulting dataset, comprising more than 42,000 articles, was used to fine-tune the bidirectional encoder representations from transformers (BERT)-base-uncased model, achieving 96% accuracy in distinguishing real from fake news. This work introduces a large-scale Arabic news dataset, a reproducible deepfake generation methodology for Arabic news, and a transformer-based baseline for misinformation detection. However, the study is limited to modern standard Arabic and text-only content, highlighting the need for future research on cross-domain and dialectal robustness.

Submit your paper

Instructions for Authors

Archive

Indexes

We process personal data collected when visiting the website. The function of obtaining information about users and their behavior is carried out by voluntarily entered information in forms and saving cookies in end devices. Data, including cookies, are used to provide services, improve the user experience and to analyze the traffic in accordance with the Privacy policy. Data are also collected and processed by Google Analytics tool (more).

You can change cookies settings in your browser. Restricted use of cookies in the browser configuration may affect some functionalities of the website.

I agree I do not agree