To cite this paper use one of the standards below:
This study assesses the sensitivity of the Random Forest (RF) machine learning (ML) model to randomization induced by training/testing dataset splitting for map soil organic carbon (SOC) content in plowed agriculture plots. The analysis combining Sentinel-2 (S2) and topographic (T) information derived from SRTM Digital Elevation Model. A training dataset comprising SOC measurements from 253 soil samples of plowed lands in the altiplano region, paired with corresponding S2 and T data, was used to train the RF regression model across 500 distinct training/testing splits, each generated using a different random state hyperparameter setting. To reduce multicollinearity and identify the most influential features, Recursive Feature Elimination with 10-fold cross-validation (RFEcv 10-fold) and variance inflation factor (VIF) analyses were performed. SOC predictions displayed substantial variability in R² and RMSE metrics, attributed to the inherent imbalance in the randomized training/testing partitioning.
With nearly 200,000 papers published, Galoá empowers scholars to share and discover cutting-edge research through our streamlined and accessible academic publishing platform.
Learn more about our products:
This proceedings is identified by a DOI , for use in citations or bibliographic references. Attention: this is not a DOI for the paper and as such cannot be used in Lattes to identify a particular work.
Check the link "How to cite" in the paper's page, to see how to properly cite the paper