HAWKEYE Disease Intelligence - Technical Report

Executive Summary

Early‑Warning Intelligence

HAWKEYE Disease Intelligence fuses epidemiology, weather, demographics, and satellite context to forecast dengue with 7‑day MAPE of 9.8% for Dhaka while revealing a strong urban metabolism link via nightlights, enabling policy actions up to two weeks in advance.

The system integrates 30+ datasets with rigorous preprocessing and delivers nowcasts, risk maps, and intervention windows as API outputs for health operations centers.

9.8%

7-Day MAPE

Fused Features

1,105

Temporal Records

0.16%

Missing Data

Project Scope

Operational Scale

Provide city‑scale dengue incidence forecasts and risk alerts with weekly cadence, operationalized for Dhaka (2022–2025), and designed for replication across districts through standardized data adapters and harmonized feature schemas.

The deployment pairs statistical forecasting with causal signal analysis to keep models interpretable and resilient to shifting data quality across municipal sources.

Intention & Objectives

Mission Critical

Deliver reliable 7‑day dengue forecasts and risk tiers that drive resource allocation, vector control scheduling, and clinical surge planning.
Quantify leading indicators (temperature, rainfall, humidity, radiance) and produce defensible explanations for public‑health briefings and after‑action reviews.

Data Sources

Multi-Modal Fusion

Health Surveillance

Weekly dengue case counts and population‑normalized rates compiled for Dhaka wards and aggregates.

Environmental

Daily temperature, humidity, rainfall, and pressure used as lagged regressors for vector ecology dynamics.

Geospatial

VIIRS nightlights radiance as an urban activity proxy and mobility surrogate; satellite basemaps for optional cartographic context.

Administrative & Demographics

Ward boundaries and population density layers for normalization and targeting.

Data Coverage & Quality

1,105 temporal records, 36 fused features, 0.16% missing data after harmonization.

Target Variables

Output Specifications

Targets: Weekly dengue cases and incidence per 100k population to stabilize across demographic differences.
Predictors: Lagged weather covariates, nightlights aggregates, month/seasonality flags, and holiday/behavioral proxies where available.

Methodology

Pipeline Architecture

Ingestion and QA: Standardize sources, fix calendar alignments, forward‑fill short gaps under strict caps, and verify monotonic ward aggregations for totals.

Feature engineering: Create 7–28‑day lags for temperature, humidity, and rainfall, plus rolling means and anomalies; aggregate nightlights monthly and align to epidemiological weeks.

Forecasting layer: Prophet‑based univariate core with exogenous regressors where supported, backed by gradient‑boosted tabular models for sensitivity and ablation analysis.

Key Signals & Lags

Leading Indicators

Key causal relationships guiding the forecasting system:

Temperature→Dengue

Pearson correlation with 14-day lag:

$$ \text{Corr}\big(T_{t-14},\,\text{Dengue}_{t}\big)=0.324 $$

Socio‑Economic Modulation

$$ r=0.88 \text{ between nightlights and GDP, informing contact patterns} $$

Data Sufficiency

36 features with low missingness enable robust analysis:

$$ N=1105 \text{ records, } 0.16\% \text{ missing data} $$

Modeling Framework

Prophet + Gradient Boosting

Forecast core: Prophet with weekly seasonality, holiday effects where available, and tuned changepoint prior; exogenous covariates engineered as lags and anomalies.

Benchmark Ensemble

Gradient boosting (XGBoost/LightGBM)
Tabular regressions on lagged features
Capture nonlinearities

Early‑Warning Horizon

Policy window up to 14 days to stage spraying, larval source reduction, and ward staffing changes.

Core Mathematics

Key Formulas

Mean Absolute Percentage Error

$$ \text{MAPE} = \frac{100}{N}\sum_{t=1}^{N}\left|\frac{y_{t}-\hat{y}_{t}}{y_{t}}\right| $$

Primary evaluation at 7-day horizon

Lagged Covariance Check

$$ \text{Corr}\big(T_{t-14},\,\text{Dengue}_{t}\big)=0.324 $$

Validating thermal lead against cases

Growth Rate Week‑Over‑Week

$$ g_{t} = \frac{y_{t}-y_{t-1}}{y_{t-1}} $$

Used for surge detection and thresholded alerting

Implementation

Code Snippets

Lag Feature Construction

df["temp_lag_14"] = df["temp"].shift(14)
df["hum_lag_14"]  = df["humidity"].shift(14)
df["rain_lag_14"] = df["rain"].shift(14)
df["nl_month"]    = nightlights.resample("W").mean().reindex(df.index, method="ffill")
features = ["temp_lag_14","hum_lag_14","rain_lag_14","nl_month"]

Prophet with Weekly Seasonality

m = Prophet(weekly_seasonality=True, yearly_seasonality=True)
for col in features:
    m.add_regressor(col, standardize=True)
fit = m.fit(df.rename(columns={"cases":"y","date":"ds"}))
forecast = m.predict(df_future)  # 7-day horizon

Gradient Boosting Benchmark

X = df[features].shift(1).dropna()
y = df["cases"].loc[X.index]
model = xgb.XGBRegressor(max_depth=6, n_estimators=400, learning_rate=0.05)
model.fit(X, y)
yhat = model.predict(df_future[features])

Performance Metrics

Results Obtained

Forecast Accuracy

7‑day MAPE $$=9.8\%$$ on Dhaka, meeting decision‑grade thresholds for weekly operational planning.

Leading Indicators

Statistically significant temperature→dengue link at 14‑day lag $$ r=0.324 $$ with $$ N=1105 $$, $$ p\ll 0.001 $$.

Data Integrity

Fused urban dataset exhibits 0.16% missingness, enabling stable training and fewer imputations.

Feature Sufficiency

36 features with low missingness enable robust cross‑validation and ablation checks.

Strategic Impact

Decision Relevance

A sub‑10% 7‑day MAPE allows ward‑level resource staging, outpatient staffing changes, and targeted vector control before peak incidence weeks.
Causal‑consistent thermal lag supports public guidance on heat/rainfall vigilance windows and aligns with entomological cycles for intervention timing.
Nightlights‑informed context connects activity intensity with exposure risk, guiding surveillance routing toward likely transmission corridors.

Technical Foundation

What Makes It Work

Harmonized Lags & Anomalies

Convert noisy weather streams into biologically meaningful regressors while preserving interpretability.

Dual‑Track Modeling

Prophet + boosted trees provides both transparent baselines and non‑linear capture for robustness under changing behaviors.

Data Integrity

Low missingness and consistent ward delineations reduce label noise and improve week‑ahead stability.

Deployment

Operationalization & Outputs

Deliverables

Weekly nowcasts with 80/95% intervals
Ward risk tiers
Surge alerts
Feature attributions for briefing decks

Integration

Batch ETL refreshes sources weekly and publishes forecasts and risks to the Urban Intelligence API and Vercel dashboards.

Constraints

Limitations & Risks

Reporting Artifacts

Case backlog dumps and testing policy shifts can transiently break stationarity; outlier detection and robust losses mitigate but do not remove risk.

Exogenous Shocks

Mobility restrictions or blackouts alter transmission dynamics; models require rapid recalibration using recent windows.

Transferability

Elasticities and optimal lags vary by microclimate; city‑specific validation is required when scaling nationally.

Roadmap

Future Scope

Entomological Integration

Fuse mosquito trap counts and vector suitabilities for mechanistic hybrid models.

Spatial Diffusion

Graph‑based spillover modeling between wards using mobility or road‑network priors.

Active Monitoring

Alerting when thermal/rainfall precursors cross city‑specific thresholds to trigger pre‑emptive operations.