Selection indices and support vector machines in the selection of sugarcane families

Belo Afonso Muetanene; Luiz Alexandre Peternelli; Policarpo Carneiro; Felipe Lopes da Silva; Danilo Pereira Barbosa; José Ivo Ribeiro Júnior

doi:10.37856/bja.v98i1.4321

Autores

Belo Afonso Muetanene UniLÃºrio
Luiz Alexandre Peternelli UFV-DET
Policarpo Carneiro UFV-DET
Felipe Lopes da Silva UFV-Departamento de Agronomia
Danilo Pereira Barbosa IFG
José Ivo Ribeiro Júnior UFV-DET

DOI:

https://doi.org/10.37856/bja.v98i1.4321

Resumo

The present study aimed to compare the following selection indices: Smith and Hazel multiplicative, Mulamba and Mock's, and the support vector machines algorithm (SVM) for sugarcane families selection. We considered the genotypic values for family means of the tons of stalks per hectare per family (GVFTSH) as the ideal selection approach to select sugarcane families. We used the dataset from Moreira et al. (2021), in that study, the authors conducted five experiments, in each experiment 22 sugarcane families were evaluated, we constructed the selection indices via a mixed models approach, adopting a selection percentage of 18% of the top families for the selection process. The selection indices were used to conduct an indirect selection of the tons of stalks per hectare per family (TSH) through the total number of stalks per plot (NS), stalks diameter (SD, in centimeters) and stalk height (SH, in meters). For the support vector machines (SVM), the explanatory traits were as follows: number of stalks (NS), stalk diameter (SD) and stalk height (SH), the response trait was the TSH, the selection criterion was to select only sugarcane families with a production of TSH higher than the overall mean. We also produced synthetic data via multivariate simulation to improve the SVM training performance, as we only had 22 sugarcane families in each experiment, a number of families insufficient to train the SVM. In this study, for the selection via SVM, the selected families were ranked based on their decreasing probability of being classified as selected, and the SVM best parameters were obtained via grid search. In general, the Smith and Hazel index using the broad sense heritability as economic weight presented the best performance, as it presented the highest coincidence coefficient values with the GVFTSH in 80% of the experiments. In our study, the SVM had worse performance than the selection indices, mainly when compared to Smith and Hazel index using the broad sense heritability as economic weight. The lower performance for support vector machines obtained, is probably due to the smaller sample size used to estimate the correlation matrix, impacting on the dataset simulation used to train the support vector machines.