Zero-label crop-stress discovery using self-supervised learning at 30m resolution
HAWKEYE Crop Intelligence performs zero‑label crop‑stress discovery by combining SimSiam self‑supervised representation learning with K‑Means clustering, identifying 8.27% stressed area in the Jessore AOI and separating four distinct land‑cover clusters.
The pipeline ingests Sentinel‑2 multispectral bands with DEM/slope, learns 512‑dim embeddings, clusters them without human labels, and exports a georeferenced stress map and overlays for agronomic triage.
Operationalize near–real‑time crop‑stress mapping that turns unlabeled satellite scenes into actionable field‑level guidance, emphasizing unsupervised discovery where ground truth is scarce.
Primary AOI is the Jessore agricultural region in Bangladesh; the demo operates at 30 m processing scale for Earth Engine throughput while preserving Sentinel‑2 spectral fidelity for reliable vegetation indices.
B2/B3/B4 (RGB), B5/B8 (red‑edge/NIR), B11/B12 (SWIR) for robust index computation and spectral separability.
Terrain normalization and context for waterlogging and drainage effects.
Jessore bounding box [89.2, 23.2, 89.35, 23.35], 2023‑11‑01 to 2024‑02‑28, cloud cover <15%.
Multi‑modal feature stack: normalized Sentinel‑2 bands plus derived NDVI/NDWI/moisture and slope features in a common grid for representation learning.
Self‑supervised stage: SimSiam trains on unlabeled tiles to produce 512‑dim embeddings that encode vegetation condition and context.
Clustering stage: K‑Means $$k=4$$ partitions embeddings; the stress cluster is selected via minimum mean NDVI and confirmed by low NIR/high SWIR signatures.
Key index rules guiding the stress detection:
Normalized Difference Vegetation Index:
$$ \text{NDVI} = \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}} $$
$$ \text{NDWI} = \frac{G - \text{NIR}}{G + \text{NIR}} $$
Indicating surface water and leaf water content
Capturing canopy water-content:
$$ \text{MI} = \frac{\text{NIR} - \text{SWIR}_{1}}{\text{NIR} + \text{SWIR}_{1}} $$
Representation learner: SimSiam with shared encoder, projector, and predictor; stop‑gradient on the target branch prevents collapse without negatives.
scikit‑learn K‑Means $$k=4$$ over per‑pixel feature vectors; centroid assignment yields the per‑pixel cluster map.
$$ L = -\frac{1}{2}\big(\cos(p_{1}, \text{sg}(z_{2})) + \cos(p_{2}, \text{sg}(z_{1}))\big) $$
Where sg denotes stop-gradient
$$ \min_{\{\mu_{j}\}} \sum_{j=1}^{k} \sum_{x_{i}\in C_{j}} \lVert x_{i} - \mu_{j} \rVert^{2} $$
With K‑Means++ initialization for stability
$$ \%\text{Stress} = 100 \cdot \frac{\sum \mathbf{1}[c(x)=\text{stress}]}{\text{pixels}} $$
Quantifying stressed area
nir, red, green, swir1 = data[:,:,4], data[:,:,0], data[:,:,1], data[:,:,5]
ndvi = (nir - red) / (nir + red + 1e-6)
ndwi = (green - nir) / (green + nir + 1e-6)
moisture = (nir - swir1) / (nir + swir1 + 1e-6)
feature_stack = np.stack([ndvi, ndwi, moisture, *[data[:,:,i] for i in range(min(7, data.shape[-1]))]], axis=-1)
z1 = projector(backbone(x1)); z2 = projector(backbone(x2))
p1, p2 = predictor(z1), predictor(z2)
loss = -0.5 * (F.cosine_similarity(p1, z2.detach(), dim=-1).mean()
+ F.cosine_similarity(p2, z1.detach(), dim=-1).mean())
means = [ndvi[cluster_map==k].mean() for k in range(4)]
stress_idx = int(np.argmin(means))
stress_map = (cluster_map == stress_idx).astype(np.uint8)
8.27% stressed area over the Jessore AOI with four clusters automatically identified from unlabeled scenes.
Stressed cluster shows $$ \mu_{\text{NDVI}}=0.42 $$, $$ \sigma_{\text{NDVI}}=0.08 $$, and $$ \mu_{\text{NDWI}}=0.22 $$, consistent with lower vigor and water content.
30 m processing scale for the demo's Earth Engine constraints while retaining Sentinel‑2 spectral bands for reliable differentiation.
High—patterns are consistent across independent indices and robust under alternative tilings/augmentations.
Learned from augmented tiles transfer to unseen parcels, improving separability of subtle stress states with minimal compute.
Injects plant physiology priors that stabilize clustering under illumination and soil‑background changes.
Of clusters via NDVI minima yields an interpretable, auditable stress selection criterion.
Earth Engine data pull → PyTorch SimSiam training → K‑Means clustering → vector/raster export with minimal manual supervision.
Optical reliance requires date windows with <15% cloud and comparable phenological stages for fair clustering.
Crop rotations, varietal differences, and soil backgrounds can shift spectra; periodic re‑tiling and retraining mitigate drift.
30 m demo scale reduces I/O cost but limits smallest detectable patches; production can revert to 10–20 m when quotas allow.
Monthly embeddings and delta‑clustering to measure stress trajectories and intervention effects.
Calibrate stress percent against harvest outcomes to estimate expected tonnes at risk per block.
Push notifications when stress share exceeds 10% or when stressed clusters expand week‑over‑week.