Diabetes is one of the most common metabolic diseases (DM) globally, accompanied by mild to severe secondary complications, including diabetic retinopathy (DR), which can damage the retina and lead to vision loss. DR detection is crucial, as early treatment of DR can effectively prevent vision loss in patients. Image processing systems have been developed for diabetic retinopathy (DR) screening to partially address the growing screening demands associated with the increasing diabetic population worldwide, but the inaccuracy of diabetic retinopathy diagnosis remains a key issue. This project aims to enhance diagnostic accuracy by developing a novel convolutional neural network (CNN) model based on residual learning combined with self-attention mechanism, leveraging deep learning. The model achieved significant performance of 81% accuracy. These results underscore the effectiveness of the model in classifying the severity of diabetic retinopathy, indicating a major advancement in diagnostic methods.
This project introduces a custom convolutional neural network (CNN) architecture enhanced with residual learning and a self-attention mechanism to classify diabetic retinopathy (DR) into multiple severity levels. The novel architecture excels at feature extraction by combining global and local information, overcoming traditional CNN limitations in medical image interpretation. A key contribution lies in handling class imbalance through data augmentation and focal loss, significantly improving minority class detection. Additionally, the team deployed explainable AI (XAI) techniques like Grad-CAM to ensure transparency in model decisions.
The self-attention residual network achieved an accuracy of 81% in multi-class DR classification. This result validates the effectiveness of the architecture for real-world DR screening, enhancing diagnostic reliability and aiding early intervention strategies. The model represents a practical step forward for scalable, AI-powered retinal screening systems.
Despite the model's promising performance, several limitations were encountered. The dataset, although annotated, had class imbalances that required synthetic augmentation. The model also demands high computational resources during training, which may hinder deployment in low-resource clinical environments.
Future enhancements will focus on real-time mobile deployment using edge-computing models like MobileNet variants. The integration of multimodal data, including optical coherence tomography (OCT), can also enrich diagnostic precision. Moreover, longitudinal analysis will be explored to predict DR progression over time.