This study addresses the challenge of food image classification using small-scale datasets by developing an ensemble Convolutional Neural Network (CNN) model. Focusing on a curated subset of the Food-101 dataset (5,000 images across 5 categories), the project evaluates three CNN architectures: ResNet, MobileNetV2, and InceptionV3. Through transfer learning and ensemble techniques, the model achieves 92.7% accuracy, demonstrating CNN's effectiveness even with limited data. The research implements advanced training strategies including learning rate scheduling and data augmentation, while providing model interpretability through Grad-CAM, LIME, and SHAP visualizations. A user-friendly GUI enables practical deployment, showcasing applications in dietary management and food quality assessment. The project highlights how carefully designed CNN ensembles can overcome data scarcity challenges in food recognition tasks.
MobileNetV2: 87% accuracy | ResNet: 87% accuracy | InceptionV3: 85% accuracy
Ensemble Model: 92.7% accuracy (Adam optimizer, learning rate 0.001, batch size 32)
The system combines multiple advanced CNN architectures with careful preprocessing and augmentation strategies. Images are resized to 299x299 pixels and normalized using ImageNet statistics. The ensemble model weights predictions from each architecture based on their individual test accuracy (ResNet: 0.35, MobileNetV2: 0.33, InceptionV3: 0.32). Training utilizes the Adam optimizer with a learning rate scheduler that reduces the rate by 0.1 when validation loss plateaus. Data augmentation includes random resized crops, horizontal flips, rotations, and color jitter to improve generalization from the limited dataset.
This project successfully demonstrates that carefully designed CNN ensembles can achieve excellent classification performance even with small datasets. The combination of transfer learning, strategic data augmentation, and model interpretability techniques provides a robust framework for food image analysis. The implemented GUI makes these advanced capabilities accessible for practical applications in dietary monitoring and food quality assessment.
The current model is limited to five food categories from the Food-101 dataset. Performance may decrease when applied to more diverse or complex food images not represented in the training data. The ensemble approach also requires more computational resources than single-model solutions. Future work could explore more efficient ensemble methods and expand the category coverage.
Future enhancements could include: expanding to more food categories while maintaining performance, developing mobile-friendly versions of the model, incorporating real-time classification capabilities, and integrating nutritional information with classification results. Additional work could also explore few-shot learning techniques to further reduce data requirements.