Comparing model building NIRS paradigms – Norris vs. Martens
Every day NIRS is used as a prediction tool for attributes in foods. The prediction models used are based on two cultures. One is spectroscopic/ statistical; initiated by K. Norris at USDA, Beltsville laboratories, U.S.A and applied in full scale e.g. for estimating protein in cereals by P. C. Williams, Canadian Grain Commission in the 1970s .They employed positive peak spectral assignments established from earlier measurements of pure protein for selection of wavelengths that were included in multiple linear regression (MLR) models. The chemometric culture that emerged in Sweden, Norway and U.S.A from the 1970s exploited pragmatically “quick and dirty” the positive and negative covariance in the spectra in calibration and prediction models by Partial Least Squares Regression (PLSR) and Artificial Neural Networks (ANN). In 1983 PLS was first used by H. Martens and S. Å. Jensen [1] in prediction of protein in wheat. PLS was successfully spread because it needed less expertise than the spectroscopic/ statistical approach even if both gave equal local results.
We aim at, for theoretical reasons, to compare the global robustness of the knowledge-based method of Norris with the more pragmatic approach of chemometrics. The dataset that we will be using is a unique barley material (T) of 515 samples including the three classes of normal (N) barley, carbohydrate (C) and protein (P) mutants grown in widely different environments (greenhouse and field). These barley samples have an extremely large variation in protein (9.7-21.9%), starch (22.8-59.0%) and β-glucan (2.2-17.8 %); even transgressing what is known from the literature for the six major cereal species combined.
For the first approach, we will select acknowledged assigned positive wavelengths and verify them by correlations to each of the three chemical components for the T, N, C and P materials. In the second approach, three different strategies is investigated by advanced chemometrics : 1) calculating the Net Analyte Signal [2], 2) using the target projected loading [3] and 3) using the correlations from a moving window PLS [4]. The task is to find which of the two cultures that makes the most robust chemometric calibration and prediction on the T dataset with its three classes.
References:
1. H Martens, SÅ Jensen (1983). Partial least squares regression: A new two-stage NIR calibration method. In: Progress in Cereal Chemistry and Technology, Vol. 5a (Ed: J. Holas and J. Kratochvil), Elsevier, Amsterdam, 607-647.
2. A Lorber, KM Faber, BR Kowalski (1997): Net Analyte Signal Calculation in Multivariate Calibration, Analytical Chemistry, 69, 1620-1626.
3. T Rajalahti, R Arneberg, FS Berven, K-M Myhr, RJ Ulvik, OM Kvalheim (2009): Biomarker discovery in mass spectral profiles by means of selectivity ratio plot , Chemometrics and Intelligent Laboratory Systems , 95, 35-48.
4. J-H Jiang, RJ Berry, HW Siesler, Y Ozaki (2002): Wavelength Interval Selection in Multicomponent Spectral Analysis by Moving Window Partial Least-Squares Regression with Applications to Mid-Infrared and Near-Infrared Spectroscopic Data, Analytical Chemistry, 74, 3555-3565