Back to Projects

Coffee Disease Detection

Machine Learning

Computer vision system that identifies diseases in coffee plants from leaf images using CNN architectures, deployed as a production API.

Machine Learning
Computer Vision
Deep Learning
Agriculture
CNN
TensorFlow
FastAPI
Streamlit
Coffee Disease Detection

Project Overview

Coffee Disease Detection is a computer vision system that classifies coffee leaf images into five categories: healthy leaves and four diseases (Cercospora, Leaf Rust, Miner, and Phoma) that can significantly reduce crop yields if left untreated.

The project started as an idea I pitched during Le Wagon’s Data Science & AI bootcamp. It was selected as one of two final projects, and I led a team of four through the full lifecycle: dataset preparation, model training, API development, and cloud deployment.

The Problem

Coffee farmers often identify diseases too late, when visible damage is already extensive. The four target diseases present differently:

  1. Cercospora - fungal disease causing brown spots on leaves
  2. Leaf Rust (Roya) - orange/yellow pustules on leaf undersides
  3. Miner - insect larvae creating visible tunnels through leaf tissue
  4. Phoma - fungal blight causing leaf necrosis

Some of these look similar in early stages. The system needs to catch diseases early, which means optimizing for recall over precision. A false alarm is preferable to a missed infection.

Model Architecture

Multi-Architecture Comparison

Rather than committing to a single model, the project trains and compares three architectures:

  • VGG16 Transfer Learning - pre-trained on ImageNet, fine-tuned with anti-overfitting measures (dropout, early stopping). Provides stable baseline performance.
  • EfficientNetB0 - optimized for the accuracy-efficiency tradeoff. Smaller model with competitive performance.
  • Custom CNN - lightweight architecture designed for smaller datasets. Useful as a benchmark and for resource-constrained deployment.

Each model uses automatic architecture detection for seamless loading at inference time.

Disease-Focused Optimization

The key insight: in a medical/agricultural classification task, the cost of errors is asymmetric. Missing a disease (false negative) is far worse than flagging a healthy leaf (false positive). The training pipeline reflects this:

  • Custom class weights reduce the weight of the healthy class and increase weights for rare diseases, forcing the model to learn disease features more aggressively.
  • Disease recall metric - a custom TensorFlow metric that measures recall exclusively across disease classes, ignoring healthy classification accuracy.
  • Adaptive learning rates based on dataset size, with multi-phase training (frozen feature extraction, then full fine-tuning).

Data Pipeline

  • Letterboxing preprocessing to 224x224 maintaining aspect ratio
  • Augmentation (rotation, flip, brightness, contrast) that preserves disease-specific patterns
  • Automatic train/validation/test splitting with stratification
  • Class distribution analysis to inform weighting strategy

Deployment

FastAPI Backend

A RESTful API accepts images via file upload or base64 encoding, runs inference, and returns the predicted class with confidence scores. The API includes production caching for model loading optimization and comprehensive error handling.

Streamlit Frontend

A separate web application provides a simple upload-and-predict interface, making the system accessible to users without technical knowledge.

Infrastructure

  • Docker containerization for reproducible deployment
  • Google Cloud Platform for scalable cloud-based inference
  • MLflow for experiment tracking, model registry, and versioning

Team Leadership

As project lead, I was responsible for:

  • Defining the project scope and technical approach
  • Coordinating task distribution across four team members
  • Maintaining the GitHub repository with clear structure and documentation
  • Presenting the final project to the bootcamp cohort

The project was delivered end-to-end: from the initial pitch through data collection, model experimentation, API development, and production deployment.

Impact

The system demonstrates that affordable computer vision can help smallholder farmers protect their crops. By catching diseases early through a simple photo upload, farmers can apply targeted treatment before significant yield loss occurs. The modular architecture (separate API, model registry, web frontend) makes it straightforward to integrate into existing agricultural extension programs.

Project Details

Objective

Build a reliable classification system that prioritizes disease detection recall, helping coffee farmers identify plant diseases early enough to act.

Theme

Agricultural AI with focus on precision farming and disease prevention.

Date

November 15, 2025

Category

Machine Learning

Technologies

Machine Learning
Computer Vision
Deep Learning
Agriculture
CNN
TensorFlow
FastAPI
Streamlit