Machine learning classifier algorithms applied for sedimentary provenance based on detrital zircon ages

CARVALHO, Manuela; DORÉ, Bernardo; FONSECA, Paulo; VALERIANO, CLAUDIO

Machine learning classifier algorithms applied for sedimentary provenance based on detrital zircon ages

- 305641

Oral presentation

How to cite this paper?

Abstract

High-quality statistical analysis is required for the proper interpretation of the detrital zircon geochronological data, which is easily generated by machine-learning algorithms. Therefore, in this study, five machine-learning classifier algorithms (Logistic Regression, K-Neighbors, Random Forest, Extra Trees, and Hist Gradient Boosting) were tested for estimating the detrital zircon U-Pb age provenance apportionment of two source areas of the Cenozoic Resende half-graben, southeast Brazil.

The detrital zircon U-Pb ages dataset corresponds to near-concordant (within 10 %) syn-crystallization, inherited, and metamorphic U-Pb zircon ages referenced in Carvalho et al. (2023). The sample dataset comprises between 86 and 115 published detrital zircons of U-Pb ages from sandstones. The source dataset contains a compilation of 637 and 959 ages from the faulted (a) and flexural (b) margins of the surrounding basement rocks.

A training dataset was generated from the source database with data augmentation coding, resulting in a table with 3000 observations and 6 features (apportionment classes). Each observation corresponds to a random selection of 100 zircon ages from the source database, divided accordingly with the 6 apportionment classes. The classes were delineated from 0 to 100 % of each source reservoir, varying with increments of 20 (100a-0b, 80a-20b, 60a-40b, 40a-60b, 20a-80b, 0a-100b). For each class, the training dataset had 500 different combinations of zircon ages. Ages were grouped and counted in 7 features, drawn according to Earth's time scale (Cenozoic, Mesozoic, Paleozoic, Neoproterozoic, Mesoproterozoic, Paleoproterozoic, and Archean). Missing values in the samples’ dataset were corrected with a Simple Imputer coding using a median descriptive statistic so that all columns had the same number of observations (= 115). Features were standardized with a standard scaler preprocessing code. A train-test split was applied with a size of 70–30%. Tests were performed 50 times, and the final result consisted of modes and means, respectively, for class predictions and accuracy-error values.

For the Resende Basin database, the classifier models allowed the estimation of the proportion of the faulted and flexural margin contribution with accuracy and precision varying both from ~ 59 % to ~ 67 %. The same interval was measured for the success rate (f1-score). Of the 5 tested algorithms, Logistic Regression had the best performance. Results show a predominance of zircon age contribution from the flexural margin reservoir, with a prevalence of the 0a-100b and 40a-60b apportionment classes (~ 67 % of the results).

Sample classification and model performance are strongly dependent on the input database, the more robust the training dataset, the greater the model accuracy will be, so the challenge is finding the best training dataset.

Institutions

¹ Universidade do Estado do Rio de Janeiro (UERJ)
² OptiMargin Software Co.
³ Brasil

Track

5. Isotopes in Sedimentary Systems: Stratigraphy, Provenance and Petroleum Systems

Keywords

Python

Scikit Learn

Geochronology

SSAGI 2024

Proceedings of the XIII South American Symposium on Isotope Geology

Machine learning classifier algorithms applied for sedimentary provenance based on detrital zircon ages

How to cite this paper?

Share your ideas or questions with the authors!

Streamline your Scholarly Event