HAWKEYE Crop Intelligence - Technical Report

Executive Summary

Stress Discovery

HAWKEYE Crop Intelligence performs zero‑label crop‑stress discovery by combining SimSiam self‑supervised representation learning with K‑Means clustering, identifying 8.27% stressed area in the Jessore AOI and separating four distinct land‑cover clusters.

The pipeline ingests Sentinel‑2 multispectral bands with DEM/slope, learns 512‑dim embeddings, clusters them without human labels, and exports a georeferenced stress map and overlays for agronomic triage.

8.27%

Stressed Area

Clusters

512-dim

Embeddings

30m

Resolution

Project Scope

Operational Scale

Operationalize near–real‑time crop‑stress mapping that turns unlabeled satellite scenes into actionable field‑level guidance, emphasizing unsupervised discovery where ground truth is scarce.

Primary AOI is the Jessore agricultural region in Bangladesh; the demo operates at 30 m processing scale for Earth Engine throughput while preserving Sentinel‑2 spectral fidelity for reliable vegetation indices.

Intention & Objectives

Mission Critical

Replace manual scouting with automated, statistically validated stress discovery that requires zero human labels while remaining interpretable via vegetation/water indices.
Deliver binary stress rasters, confidence overlays, and cluster breakdowns that agronomists and insurers can use for targeted interventions and early claims triage.

Data Sources

Multi-Modal Fusion

Sentinel‑2 Optical

B2/B3/B4 (RGB), B5/B8 (red‑edge/NIR), B11/B12 (SWIR) for robust index computation and spectral separability.

SRTM DEM

Terrain normalization and context for waterlogging and drainage effects.

AOI Window

Jessore bounding box [89.2, 23.2, 89.35, 23.35], 2023‑11‑01 to 2024‑02‑28, cloud cover <15%.

Target Variables

Output Specifications

Pixel‑wise classes: stressed vs non‑stressed vegetation exported as binary $$0/1$$ rasters and probability overlays.
Unsupervised clusters: four land‑surface categories—Healthy, Water, Bare Soil, Stressed—derived from embedding space.

Methodology

Pipeline Architecture

Multi‑modal feature stack: normalized Sentinel‑2 bands plus derived NDVI/NDWI/moisture and slope features in a common grid for representation learning.

Self‑supervised stage: SimSiam trains on unlabeled tiles to produce 512‑dim embeddings that encode vegetation condition and context.

Clustering stage: K‑Means $$k=4$$ partitions embeddings; the stress cluster is selected via minimum mean NDVI and confirmed by low NIR/high SWIR signatures.

Index Rules & Thresholds

Scientific Validation

Key index rules guiding the stress detection:

NDVI

Normalized Difference Vegetation Index:

$$ \text{NDVI} = \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}} $$

NDWI

$$ \text{NDWI} = \frac{G - \text{NIR}}{G + \text{NIR}} $$

Indicating surface water and leaf water content

Moisture Index

Capturing canopy water-content:

$$ \text{MI} = \frac{\text{NIR} - \text{SWIR}_{1}}{\text{NIR} + \text{SWIR}_{1}} $$

Model Architecture

SimSiam + K-Means

Representation learner: SimSiam with shared encoder, projector, and predictor; stop‑gradient on the target branch prevents collapse without negatives.

Backbone

SimpleBackbone CNN
512‑dim global descriptors
SGD/AdamW training

Clustering

scikit‑learn K‑Means $$k=4$$ over per‑pixel feature vectors; centroid assignment yields the per‑pixel cluster map.

Core Mathematics

Key Formulas

SimSiam Loss

$$ L = -\frac{1}{2}\big(\cos(p_{1}, \text{sg}(z_{2})) + \cos(p_{2}, \text{sg}(z_{1}))\big) $$

Where sg denotes stop-gradient

K‑Means Objective

$$ \min_{\{\mu_{j}\}} \sum_{j=1}^{k} \sum_{x_{i}\in C_{j}} \lVert x_{i} - \mu_{j} \rVert^{2} $$

With K‑Means++ initialization for stability

Stress Percent

$$ \%\text{Stress} = 100 \cdot \frac{\sum \mathbf{1}[c(x)=\text{stress}]}{\text{pixels}} $$

Quantifying stressed area

Implementation

Code Snippets

Index Computation

nir, red, green, swir1 = data[:,:,4], data[:,:,0], data[:,:,1], data[:,:,5]
ndvi = (nir - red) / (nir + red + 1e-6)
ndwi = (green - nir) / (green + nir + 1e-6)
moisture = (nir - swir1) / (nir + swir1 + 1e-6)
feature_stack = np.stack([ndvi, ndwi, moisture, *[data[:,:,i] for i in range(min(7, data.shape[-1]))]], axis=-1)

SimSiam Training Loop

z1 = projector(backbone(x1)); z2 = projector(backbone(x2))
p1, p2 = predictor(z1), predictor(z2)
loss = -0.5 * (F.cosine_similarity(p1, z2.detach(), dim=-1).mean()
             +  F.cosine_similarity(p2, z1.detach(), dim=-1).mean())

Stress Cluster Selection

means = [ndvi[cluster_map==k].mean() for k in range(4)]
stress_idx = int(np.argmin(means))
stress_map = (cluster_map == stress_idx).astype(np.uint8)

Performance Metrics

Results Obtained

Discovery Metrics

8.27% stressed area over the Jessore AOI with four clusters automatically identified from unlabeled scenes.

Spectral Validation

Stressed cluster shows $$ \mu_{\text{NDVI}}=0.42 $$, $$ \sigma_{\text{NDVI}}=0.08 $$, and $$ \mu_{\text{NDWI}}=0.22 $$, consistent with lower vigor and water content.

Resolution & Scale

30 m processing scale for the demo's Earth Engine constraints while retaining Sentinel‑2 spectral bands for reliable differentiation.

Confidence

High—patterns are consistent across independent indices and robust under alternative tilings/augmentations.

Strategic Impact

Decision Relevance

Zero‑label discovery surfaces emergent agronomic risk without ground surveys, enabling earlier scouting, irrigation checks, and targeted agronomy at sub‑field resolution.
Quantified stress percent provides a planning KPI for procurement, logistics, and insurance partners to anticipate yield shortfalls and trigger interventions.
Embedding‑driven clusters enrich monitoring beyond a single index, isolating water bodies and bare soil to reduce false alerts and focus attention.

Technical Foundation

What Makes It Work

Self‑Supervised Invariances

Learned from augmented tiles transfer to unseen parcels, improving separability of subtle stress states with minimal compute.

Multi‑Index Feature Construction

Injects plant physiology priors that stabilize clustering under illumination and soil‑background changes.

Post‑Hoc Statistical Labeling

Of clusters via NDVI minima yields an interpretable, auditable stress selection criterion.

Deployment

Operationalization & Outputs

Deliverables

GeoTIFF stress masks
Cluster rasters
Styled overlays
JSON report with stressed‑area %, cluster counts

Integration

Earth Engine data pull → PyTorch SimSiam training → K‑Means clustering → vector/raster export with minimal manual supervision.

Constraints

Limitations & Risks

Cloud and Phenology

Optical reliance requires date windows with <15% cloud and comparable phenological stages for fair clustering.

Domain Shift

Crop rotations, varietal differences, and soil backgrounds can shift spectra; periodic re‑tiling and retraining mitigate drift.

Scale Trade‑Off

30 m demo scale reduces I/O cost but limits smallest detectable patches; production can revert to 10–20 m when quotas allow.

Roadmap

Future Scope

Temporal Tracking

Monthly embeddings and delta‑clustering to measure stress trajectories and intervention effects.

Yield Linkage

Calibrate stress percent against harvest outcomes to estimate expected tonnes at risk per block.

Alerts and APIs

Push notifications when stress share exceeds 10% or when stressed clusters expand week‑over‑week.