MACHINE LEARNING SENSITIVITY TO RANDOM STATE LEARNING DATABASE SPLITTING: CASE STUDY OF PLOWED AGRICULTURAL LANDS ORGANIC CARBON PREDICTION WITH SENTINEL-2 IMAGES IN THE ALTIPLANO REGION

- 320305
Oral
Favoritar este trabalho
Como citar esse trabalho?
Resumo

This study assesses the sensitivity of the Random Forest (RF) machine learning (ML) model to randomization induced by training/testing dataset splitting for map soil organic carbon (SOC) content in plowed agriculture plots. The analysis combining Sentinel-2 (S2) and topographic (T) information derived from SRTM Digital Elevation Model. A training dataset comprising SOC measurements from 253 soil samples of plowed lands in the altiplano region, paired with corresponding S2 and T data, was used to train the RF regression model across 500 distinct training/testing splits, each generated using a different random state hyperparameter setting. To reduce multicollinearity and identify the most influential features, Recursive Feature Elimination with 10-fold cross-validation (RFEcv 10-fold) and variance inflation factor (VIF) analyses were performed. SOC predictions displayed substantial variability in R² and RMSE metrics, attributed to the inherent imbalance in the randomized training/testing partitioning.

Compartilhe suas ideias ou dúvidas com os autores!

Sabia que o maior estímulo no desenvolvimento científico e cultural é a curiosidade? Deixe seus questionamentos ou sugestões para o autor!

Faça login para interagir

Tem uma dúvida ou sugestão? Compartilhe seu feedback com os autores!

Instituições
  • 1 National Agrarian University
Eixo Temático
  • 14. Inteligência artificial para observação da terra
Palavras-chave
Soil organic carbon mapping
plowed land
machine learning
Sentinel-2
splitting