Authors: Kaveri Rajaram Bhosle, Prashant M Yawalkar

Abstract: Wireless Capsule Endoscopy (WCE) has revolution-ized gastrointestinal (GI) diagnostics by enabling non-invasive visualization of the entire digestive tract. Despite its clinical ad-vantages, a single WCE examination produces tens of thou-sands of image frames, making manual analysis time-consuming and prone to inter-observer variability. In recent years, arti-ficial intelligence—particularly deep learning—has emerged as a powerful tool for au-tomated GI disease classification. This review presents a comprehensive analysis of existing approaches for WCE-based GI image analysis, including traditional ma-chine learning methods, convolutional neural networks (CNNs), transformer-based architectures, and hy-brid models. The paper critically examines commonly used benchmark datasets such as Kvasir, Kvasir-Capsule, HyperKvasir, and WCECCD, along with evaluation metrics and optimization strategies adopted in recent studies. Furthermore, the role of explainable artifi-cial intelligence techniques, including attention mechanisms and Grad-CAM, is discussed in enhancing model interpretability and clinical trust. Key challenges such as class imbalance, limited annotated data, cross-dataset generalization, and real-time deployment constraints are identified. Finally, emerging research directions including multimodal learning, domain adaptation, temporal video modeling, and foundation models for medical imaging are out-lined. This review aims to provide researchers and clinicians with a structured understand-ing of current advancements and future opportunities in automated GI disease diagnosis.

DOI: http://doi.org/10.5281/zenodo.20848747