To cite this paper use one of the standards below:
Limited data availability and high sampling costs pose significant challenges for soil carbon modeling. While Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) are commonly employed, this study benchmarks the effectiveness of Flow Matching for modeling complex, high-dimensional distributions of soil data. We introduce an Unconditional Flow Matching framework applied to the LUCAS soil dataset. Our methodology involves training models on the global dataset without class labels, generating synthetic data, and validating performance through rigorous statistical divergence and adversarial validation protocols. Preliminary results demonstrate the model's high fidelity. Single-variable distributions of key soil properties (e.g., pH, Organic Carbon) were reproduced with near-perfect indistinguishability, achieving a mean Adversarial Validation AUC of approximately 0.53 (where 0.50 represents perfect indistinguishability). In multivariate assessments, which challenge the model to capture complex inter-variable correlations across the entire soil population, the framework achieved a mean AUC of 0.77. These findings indicate that Flow Matching effectively preserves both marginal distributions and the global multivariate structure of the data. Ongoing research aims to investigate the learned latent manifold to identify which specific soil-feature correlations (e.g., organo-mineral interactions) contribute most to the remaining adversarial gap. By improving the model’s capture of these nonlinear dependencies, we expect to enhance the dataset's generative integrity. This framework represents a transformative and scalable solution for generating realistic soil data in regions where physical sampling remains economically prohibitive.
With nearly 200,000 papers published, Galoá empowers scholars to share and discover cutting-edge research through our streamlined and accessible academic publishing platform.
Learn more about our products:
This proceedings is identified by a DOI , for use in citations or bibliographic references. Attention: this is not a DOI for the paper and as such cannot be used in Lattes to identify a particular work.
Check the link "How to cite" in the paper's page, to see how to properly cite the paper