
Project Overview
Coffee Disease Detection is a computer vision system that classifies coffee leaf images into five categories: healthy leaves and four diseases (Cercospora, Leaf Rust, Miner, and Phoma) that can significantly reduce crop yields if left untreated.
The project started as an idea I pitched during Le Wagon’s Data Science & AI bootcamp. It was selected as one of two final projects, and I led a team of four through the full lifecycle: dataset preparation, model training, API development, and cloud deployment.
The Problem
Coffee farmers often identify diseases too late, when visible damage is already extensive. The four target diseases present differently:
- Cercospora - fungal disease causing brown spots on leaves
- Leaf Rust (Roya) - orange/yellow pustules on leaf undersides
- Miner - insect larvae creating visible tunnels through leaf tissue
- Phoma - fungal blight causing leaf necrosis
Some of these look similar in early stages. The system needs to catch diseases early, which means optimizing for recall over precision. A false alarm is preferable to a missed infection.
Model Architecture
Multi-Architecture Comparison
Rather than committing to a single model, the project trains and compares three architectures:
- VGG16 Transfer Learning - pre-trained on ImageNet, fine-tuned with anti-overfitting measures (dropout, early stopping). Provides stable baseline performance.
- EfficientNetB0 - optimized for the accuracy-efficiency tradeoff. Smaller model with competitive performance.
- Custom CNN - lightweight architecture designed for smaller datasets. Useful as a benchmark and for resource-constrained deployment.
Each model uses automatic architecture detection for seamless loading at inference time.
Disease-Focused Optimization
The key insight: in a medical/agricultural classification task, the cost of errors is asymmetric. Missing a disease (false negative) is far worse than flagging a healthy leaf (false positive). The training pipeline reflects this:
- Custom class weights reduce the weight of the healthy class and increase weights for rare diseases, forcing the model to learn disease features more aggressively.
- Disease recall metric - a custom TensorFlow metric that measures recall exclusively across disease classes, ignoring healthy classification accuracy.
- Adaptive learning rates based on dataset size, with multi-phase training (frozen feature extraction, then full fine-tuning).
Data Pipeline
- Letterboxing preprocessing to 224x224 maintaining aspect ratio
- Augmentation (rotation, flip, brightness, contrast) that preserves disease-specific patterns
- Automatic train/validation/test splitting with stratification
- Class distribution analysis to inform weighting strategy
Deployment
FastAPI Backend
A RESTful API accepts images via file upload or base64 encoding, runs inference, and returns the predicted class with confidence scores. The API includes production caching for model loading optimization and comprehensive error handling.
Streamlit Frontend
A separate web application provides a simple upload-and-predict interface, making the system accessible to users without technical knowledge.
Infrastructure
- Docker containerization for reproducible deployment
- Google Cloud Platform for scalable cloud-based inference
- MLflow for experiment tracking, model registry, and versioning
Team Leadership
As project lead, I was responsible for:
- Defining the project scope and technical approach
- Coordinating task distribution across four team members
- Maintaining the GitHub repository with clear structure and documentation
- Presenting the final project to the bootcamp cohort
The project was delivered end-to-end: from the initial pitch through data collection, model experimentation, API development, and production deployment.
Impact
The system demonstrates that affordable computer vision can help smallholder farmers protect their crops. By catching diseases early through a simple photo upload, farmers can apply targeted treatment before significant yield loss occurs. The modular architecture (separate API, model registry, web frontend) makes it straightforward to integrate into existing agricultural extension programs.
Project Details
Objective
Build a reliable classification system that prioritizes disease detection recall, helping coffee farmers identify plant diseases early enough to act.
Theme
Agricultural AI with focus on precision farming and disease prevention.
Date
November 15, 2025