Coffee Disease Detection

Machine Learning

Computer vision system that identifies diseases in coffee plants from leaf images using CNN architectures, deployed as a production API.

Machine Learning

Computer Vision

Deep Learning

Agriculture

CNN

TensorFlow

FastAPI

Streamlit

View Demo View Code

Gallery

Project Overview

Coffee Disease Detection is a computer vision system that classifies coffee leaf images into five categories: healthy leaves and four diseases (Cercospora, Leaf Rust, Miner, and Phoma) that can significantly reduce crop yields if left untreated.

The project started as an idea I pitched during Le Wagon’s Data Science & AI bootcamp. It was selected as one of two final projects, and I led a team of four through the full lifecycle: dataset preparation, model training, API development, and cloud deployment.

The Problem

Coffee farmers often identify diseases too late, when visible damage is already extensive. The four target diseases present differently:

Cercospora - fungal disease causing brown spots on leaves
Leaf Rust (Roya) - orange/yellow pustules on leaf undersides
Miner - insect larvae creating visible tunnels through leaf tissue
Phoma - fungal blight causing leaf necrosis

Some of these look similar in early stages. The system needs to catch diseases early, which means optimizing for recall over precision. A false alarm is preferable to a missed infection.

Model Architecture

Multi-Architecture Comparison

Rather than committing to a single model, the project trains and compares three architectures:

VGG16 Transfer Learning - pre-trained on ImageNet, fine-tuned with anti-overfitting measures (dropout, early stopping). Provides stable baseline performance.
EfficientNetB0 - optimized for the accuracy-efficiency tradeoff. Smaller model with competitive performance.
Custom CNN - lightweight architecture designed for smaller datasets. Useful as a benchmark and for resource-constrained deployment.

Each model uses automatic architecture detection for seamless loading at inference time.

Disease-Focused Optimization

The key insight: in a medical/agricultural classification task, the cost of errors is asymmetric. Missing a disease (false negative) is far worse than flagging a healthy leaf (false positive). The training pipeline reflects this:

Custom class weights reduce the weight of the healthy class and increase weights for rare diseases, forcing the model to learn disease features more aggressively.
Disease recall metric - a custom TensorFlow metric that measures recall exclusively across disease classes, ignoring healthy classification accuracy.
Adaptive learning rates based on dataset size, with multi-phase training (frozen feature extraction, then full fine-tuning).

Data Pipeline

Letterboxing preprocessing to 224x224 maintaining aspect ratio
Augmentation (rotation, flip, brightness, contrast) that preserves disease-specific patterns
Automatic train/validation/test splitting with stratification
Class distribution analysis to inform weighting strategy

Deployment

FastAPI Backend

A RESTful API accepts images via file upload or base64 encoding, runs inference, and returns the predicted class with confidence scores. The API includes production caching for model loading optimization and comprehensive error handling.

Streamlit Frontend

A separate web application provides a simple upload-and-predict interface, making the system accessible to users without technical knowledge.

Infrastructure

Docker containerization for reproducible deployment
Google Cloud Platform for scalable cloud-based inference
MLflow for experiment tracking, model registry, and versioning

Team Leadership

As project lead, I was responsible for:

Defining the project scope and technical approach
Coordinating task distribution across four team members
Maintaining the GitHub repository with clear structure and documentation
Presenting the final project to the bootcamp cohort

The project was delivered end-to-end: from the initial pitch through data collection, model experimentation, API development, and production deployment.

Impact

The system demonstrates that affordable computer vision can help smallholder farmers protect their crops. By catching diseases early through a simple photo upload, farmers can apply targeted treatment before significant yield loss occurs. The modular architecture (separate API, model registry, web frontend) makes it straightforward to integrate into existing agricultural extension programs.

Project Details

Objective

Build a reliable classification system that prioritizes disease detection recall, helping coffee farmers identify plant diseases early enough to act.

Theme

Agricultural AI with focus on precision farming and disease prevention.

Date

November 15, 2025

Technologies

Machine Learning

Computer Vision

Deep Learning

Agriculture

CNN

TensorFlow

FastAPI

Streamlit

Coffee Disease Detection

Gallery

Project Overview

The Problem

Model Architecture

Multi-Architecture Comparison

Disease-Focused Optimization

Data Pipeline

Deployment

FastAPI Backend

Streamlit Frontend

Infrastructure

Team Leadership

Impact

Project Details

Objective

Theme

Date

Category

Technologies