SUPPORT VECTOR MACHINES AND MID-INFRARED SPECTROSCOPY FOR GEOGRAPHICAL CLASSIFICATION OF GREEN ARABICA COFFEE
Coffee is an important commodity for Brazil which is the largest producer and exporter in the world. The two most cultivated species are Coffea canephora and Coffea arabica, the last one is consider of greater economic value because result in a better quality beverage. Climate, species, genotypes, cultivation practices and industrialization are also critical for final beverage quality. Thus, the development of analytical methods for coffee authentication is important to ensure the bean origin. The objective was to develop a methodology for geographical classification of different genotypes of green arabica coffee using Fourier transform mid-infrared spectroscopy (FTIR) and support vector machines (SVM). The SVM is a tool for classification problems with excellent generalization ability. In distinction to the classical artificial neural networks, SVM formulation leads to a quadratic programming with linear constraints and global minimum. Were analyzed seventy-four samples of green arabica coffee from twenty genotypes cultivated in the cities of Paranavaí, Cornélio Procópio, Mandaguari and Londrina. The spectra (Shimadzu, IR Affinity-1) were collected in the range of 1900 to 800 cm-1. To data analysis, a SVM was built using radial basis as kernel function and the one-against-all multiclass approach. The C and parameters of SVM were optimized using the grid search and the model performance was assessed by sensitivity (SNS) and specificity (SPC) for the test samples. The samples were successfully classified with SNS of 97.5% and SPC of 96.9%. In previously works, same samples were classified with partial least squares with discriminant analysis (PLS-DA) achieving SNS of 100% and SPC of 98.6%. Using the PLS-DA combined with a radial basis function network (RBF) the performance was SNS of 99.1% and SPC of 99.6%. The SVM with FTIR proved to be an effective method for green arabica geographical classification and reached performance similar to PLS-DA and PLS-DA/RBF.