Abstract
The rich flavors and antioxidant properties within Maofeng tea are mainly attributable to numerous secondary metabolites and thus may be related to quality. Motivated by this finding, our study presents a sparse representation (SR) scheme to analyze the content of various secondary metabolites and thus identify discriminating compounds (DCs) for the modelling of Maofeng tea quality. We first identified the DCs in terms of an interpretable sparse recovery strategy with LASSO regression. The optimal regularization term was estimated by a specific Karush–Kuhn–Tucker (KKT) optimal condition. Then, qualitative analysis models were trained with screened DCs and utilized to predict the quality of unseen Maofeng samples. For this purpose, 96 Maofeng samples of 6 different quality grades were collected, and standardized stoichiometry techniques determined 21 quality-related bioactive compounds and empirical quality indicators. The experimental results show that epigallocatechin (EGC), epicatechin (EC), gallocatechin gallate (GCG) and total catechins (TC) were identified as significant discriminating features, and the KNN algorithm provided the best assessment accuracy of 95.79%. Overall, the result demonstrates the superior performance of benchmarks for enhancing the reliable prediction of Maofeng tea quality, not only prediction accuracy but also providing interpretable assessment.
•Discriminating compound analysis (DCA) was devised to predict Maofeng quality.•Sparse representation strategy enhances the model accuracy and interpretability.•Karush–Kuhn–Tucker condition was performed to enhance the model robustness.•The superiority of DCA over existing method is demonstrated through experiment.•KNN model yielded best quality prediction accuracy with proposed DCA method.