MACHINE LEARNING SENSITIVITY TO RANDOM STATE LEARNING DATABASE SPLITTING: CASE STUDY OF PLOWED AGRICULTURAL LANDS ORGANIC CARBON PREDICTION WITH SENTINEL-2 IMAGES IN THE ALTIPLANO REGION

- 320305
Oral
Favorite this paper
How to cite this paper?
Abstract

This study assesses the sensitivity of the Random Forest (RF) machine learning (ML) model to randomization induced by training/testing dataset splitting for map soil organic carbon (SOC) content in plowed agriculture plots. The analysis combining Sentinel-2 (S2) and topographic (T) information derived from SRTM Digital Elevation Model. A training dataset comprising SOC measurements from 253 soil samples of plowed lands in the altiplano region, paired with corresponding S2 and T data, was used to train the RF regression model across 500 distinct training/testing splits, each generated using a different random state hyperparameter setting. To reduce multicollinearity and identify the most influential features, Recursive Feature Elimination with 10-fold cross-validation (RFEcv 10-fold) and variance inflation factor (VIF) analyses were performed. SOC predictions displayed substantial variability in R² and RMSE metrics, attributed to the inherent imbalance in the randomized training/testing partitioning.

Share your ideas or questions with the authors!

Did you know that the greatest stimulus in scientific and cultural development is curiosity? Leave your questions or suggestions to the author!

Sign in to interact

Have a question or suggestion? Share your feedback with the authors!

Institutions
  • 1 National Agrarian University
Track
  • 14. Artificial intelligence for earth observation
Keywords
Soil organic carbon mapping
plowed land
machine learning
Sentinel-2
splitting