Quantifying sample size sensitivity in machine learning models for digital mapping of soil organic matter: implications for MRV protocols

- 336914
Posters
Favorite this paper
How to cite this paper?
Abstract

Monitoring Soil Organic Matter (SOM) is essential for Monitoring, Reporting, and Verification (MRV) protocols in agricultural ecosystems. However, balancing sampling costs with fit-for-purpose modeling remains a critical bottleneck for scalable DSM. This study evaluates the sensitivity of the Random Forest (RF) algorithm to training sample density for predicting SOM (0–30 cm) across a 52-ha agricultural area in Chile. The methodology integrated georeferenced SOM data with multi-source geoenvironmental covariates at 30 m resolution. Predictors included terrain attributes (elevation, slope, curvatures), Sentinel-1/2 multispectral and radar bands, and ALOS PALSAR data. To quantify model sensitivity, multiple sample size scenarios (n) were generated using conditioned Latin Hypercube Sampling (cLHS) within stratified principal components (PC1–PC5). RF models were optimized via grid-search and validated against an independent dataset using RMSE, R2, MAPE, and RPIQ. K-means clustering (k=3) was applied to performance metrics to identify Low, Medium, and High-performance tiers. Results indicated a clear performance plateau within the "Medium" cluster at an average of 26 samples, translating to a density of 1.98 ha/sample. High-performance stability was achieved at approximately 1.7 ha/sample (~30 samples). Below these thresholds, model error increased significantly and predictability (R2) became low. These findings demonstrate that increasing sampling density beyond 2 ha/sample yields diminishing returns in predictive accuracy for this landscape. This study provides a data-driven framework for optimizing soil sampling designs, ensuring robust SOM predictions while minimizing operational costs. Furthermore, integrating regional datasets through similarity-based weighting can enhance local model performance, effectively reducing the necessity for intensive primary and additional sampling.

Share your ideas or questions with the authors!

Did you know that the greatest stimulus in scientific and cultural development is curiosity? Leave your questions or suggestions to the author!

Sign in to interact

Have a question or suggestion? Share your feedback with the authors!

Institutions
  • 1 ESALQ/USP
  • 2 Neutral Farming
  • 3 Centro de Investigación y Desarrollo Agrícola
Track
  • Monitoring, Reporting and Verification (MRV) protocols
Keywords
Pedometrics
cLHS
Precision agriculture
Carbon sequestration
fit-for-purpose