Method development to predict Klason lignin content in sugarcane from stalk surface using NIR spectroscopy and multivariate calibration
INTRODUCTION: The imminent shortage of oil reserves (world's energy supply) along with society's concerns about environmental preservation are the main reasons that led governments to seek strategies for increase the consumption of fuels that are renewable and sustainable. Biofuels like ethanol obtained from sugarcane residues (bagasse and straw) has the potential to meet the growing global demand for renewable energy cost and low polluting power. The most abundant renewable biological resource is lignocellulosic biomass. In this way, there is an interest to increase the biomass in sugarcane for biofuel or chemicals production. Among the lignocellulosic biomass components, lignin is strategically important. However, the lignin determination is a time consuming procedure, expensive and not environmentally friendly. Approaches based on near infrared spectroscopy (NIR) are simple, fast, present relative low cost, wide application and are environmentally friendly.
The aim of this work is to predict lignin content in sugarcane from stalk surface using NIR, partial least squares regression (PLS) and features selection methods.
EXPERIMENTAL: The samples used are owned by the germoplasm bank of the Sugarcane genetic breeding of Ridesa Program. The content of lignin was determined in 150 genotypes, using the dry bagasse from different parts of sugarcane stalk (underside, the middle part and top) and applying the Klason method. The independent variables (NIR spectra) were obtained from Varian FTIR 660 in the range of 10000 - 4000 cm-1, direct from stalk internal surface using diffuse reflectance.
The Kennard Stone algorithm was used to select 120 and 30 samples for calibration and prediction sets, respectively. Furthermore, the models were built using the partial least squares regression (PLS). Different algorithms for features selection were tested, i.e., interval PLS (iPLS), genetic algorithm (GA) and the ordered predictors selection method (OPS). All calculation were performed using home-built functions written for Matlab.
RESULTS AND DISCUSSION: The values of the lignin was in a range from 14.86 a 33.85%. The range of values for the data is relatively large because the lignin content is varied along the stalk. Several pre-processing methods were applied but the second derivative showed better performance. The quality of the models is assessed by the root mean square error (RMSE), the correlation coefficient R and ratio of performance deviation (RPD). When internal validation (cross validation—CV) is applied, the error and correlation coefficients are named RMSECV and Rcv, respectively. For external validation (predicting samples – P) the error and correlation coefficients are named RMSEP and Rp, respectively. RMSECV, Rcv, RMSEP, Rp and RPD values obtained for the full model and OPS, GA and iPLS algorithms for 7 latent variables were respectively: Full: 2.25, 0.84, 0.93, 0.98, 1.66; OPS: 1.31, 0.95, 0.80, 0.99, 2.79; GA: 1.19, 0.96, 0.91, 0.99, 1.20; iPLS: 2.66, 0.77, 1.58, 0.94, 1.30. Relative error in PLS-OPS prediction was around 2.9%. Results confirm that the OPS method selected a lower number of variables with a higher predictive capacity and high accuracy to predict the lignin content in sugarcane, using a simplified procedure.