Genetic divergence between sweet sorghum genotypes by the WARD-MLM procedure

Sweet sorghum has the potential for use in bioethanol production as a complement to sugarcane. This work aimed to identify sweet sorghum genotypes by mixed models, using the Ward-MLM method. Trials were carried out in the 2013/2014 and 2014/2015 crop years at the experimental area of Embrapa Agropecuária Oeste, in Dourados, Mato Grosso do Sul. The experiment consisted of a randomized blocks design, with three replications, using 16 sweet sorghum genotypes from the breeding program of Embrapa Milho e Sorgo. The genotype effect was significant for all the variables analyzed, and the genotypes x environments interaction was significant for most of these variables. Estimates of genetic parameters indicate that gains with selection can be obtained for the dry mass yield of panicles and oBrix. The Ward-MLM procedure is useful for detecting genetic divergence and clustering genotypes by simultaneously using morphological, agronomic, and molecular descriptors. In addition, this method showed that three was the ideal number of groups, according to the pseudo-F and pseudo-t criteria, identifying the most divergent groups. When crossed, these groups have a higher probability of generating genotypes with high yield and variability.


INTRODUCTION
The composition of the world energy matrix over the history was predominantly marked by nonrenewable energy sources. However, given the large amount of pollutant gases emitted by fossil fuels and constant pressures on the environmental sustainability, countries have been engaged in searching and adopting new renewable energy sources (CREMONEZ et al., 2015;KUMAR et al., 2016). In this context, biofuel production emerges as an alternative to fossil fuel, constituting a promising world energy matrix, especially in the Brazilian scenario.
The importance of diversifying the raw material used in the biofuel subgroups (ethanol and biodiesel) has been addressed in several studies. Ethanol productions have a marked presence of sugarcane as the primary raw material, which has made Brazil the world's largest bioethanol producer (ANP, 2016). However, alternative crops must be developed to meet the country's high energy demand. Sweet sorghum (Sorghum bicolor L.) has the potential for use in bioethanol production, besides sugarcane. S. bicolor is considered as a "smart crop" for it produces fuel from biomass and fermentable sugars, which, when industrialized, are transformed into ethanol and/or food (PEREIRA FILHO et al., 2013).
Despite the great relevance of sweet sorghum in the agro-energetic scenario, only a few studies have addressed its genetic improvement. Thus, genotypes must be accurately selected, and the genetic variability in the crop must be characterized, aiming at the composition of breeding programs. One of the primary objectives of breeding programs is to identify plants with a gene of interest for the desired trait. Therefore, the study of genetic diversity is fundamental in developing the potential of sweet sorghum cultivars for ethanol production.
Among the different multivariate methods to quantify the genetic variability of a species, the Ward-MLM (Modified Location Model), proposed by Franco et al. (1998), stands out. This method is performed in two steps: first, groups are defined by the Ward clustering method (WARD, 1963), using the Gower dissimilarity matrix (GOWER, 1971); subsequently, the mean vector of each subgroup is estimated by the MLM procedure. This method also enables defining the optimum number of groups and is reliable in the identification of the best probability of each genotype to be allocated into a given group (GONÇALVES et al., 2009).
The Ward-MLM procedure has been effective in discriminating genotypes of several crops, such as maize (COIMBRA et al., 2009;RIBEIRO et al., 2015); common bean (CABRAL et al., 2011;OLIVEIRA et al., 2013); soybean 1327 GENETIC DIVERGENCE... SANTOS, A. et al. (TEODORO et al., 2016); coffee (RODRIGUES et al., 2016); sugarcane (BRASILEIRO et al., 2014); sour passion fruit (SILVA et al., 2014); guava (CAMPOS et al., 2013); and castor (OLIVEIRA et al., 2013). However, despite the great importance of using accurate techniques in the quantification of genetic diversity (such as the Ward-MLM method), to date, no studies have reported the use of this procedure in sweet sorghum. Thus, this work aimed to identify sweet sorghum genotypes by mixed models, using the Ward-MLM method.

MATERIAL AND METHODS
Trials were carried out in the 2013/2014 and 2014/2015 crop seasons at experimental area of Embrapa Agropecuária Oeste, in Dourados, MS (22°17'S and 54°48'W, with 380 m of altitude). The soil of the experimental area was identified as Oxisol dystroferric clay texture. The climate of the region is Cwa, according to the Köppen classification, with hot summers and dry winters.
The following agronomic traits were evaluated: percentage of panicles (PP, %); percentage of stems (PS, %); dry mass content in the stems (DMCS, kg ha -1 ); dry mass content in the panicles (DMCP, kg ha -1 ); fresh mass yield of stems (FMYS, kg ha -1 ); fresh mass yield of leaves (FMYL, kg ha -1 ); fresh mass yield of panicles (FMYP, kg ha -1 ); dry mass yield of panicles (DMYP, kg ha -1 ); total fresh mass yield (TFMY, kg ha -1 ); total dry mass yield (TDMY, kg ha -1 ); mean dry mass (MDM, kg ha -1 ); broth yield (BY, l ha -1 ); and Brix (ºBrix). A subsample of five plants was collected to separate stems, leaves, and panicles, aiming at determining the PS and PP. Subsequently, this sample was dried in an oven at 65 °C for 72 h to determine MDM. The BY was calculated by the difference between TFMY and MDM. Brix was evaluated using a Portable Digital Refractometer, with a range of 0 -45% of Brix.
The statistical analysis consisted of joint analysis of variance, considering the two years, to obtain phenotypic means. Afterward, data were analyzed simultaneously, by the Ward-MLM procedure, to create the groups. For the Ward clustering method, the distance matrix was generated by the Gower's algorithm (GOWER, 1971). The ideal number of groups was determined according to the pseudo-F and pseudo-t 2 criteria. Differences between groups, the correlation between the variables, and the canonical (CAN) variables were examined graphically. The distance for the distribution of the trait proposed by Franco et al. (1998) was used to determine the dissimilarity among groups. All analyses were carried out in the statistical SAS software, using the procedures PROC-MLM and PROC-CANDISC (SAS INSTITUTE, 2009).
Genetic parameters from each trial were estimated for each variable, according to Cruz et al. (2014). Selective accuracy (SA) and the relative coefficient of variation (CV r ) were estimated using the method proposed by Resende (2002) and Resende and Duarte (2007), according to the where F c is the F-test value of the genotype effect associated with the analysis of variance, and b is the number of replications, in this case, blocks.

RESULTS AND DISCUSSION
Effect of genotype was significant for all analyzed variables, which allows inferring about the presence of genetic variability among the studied genotypes (Table 1). The variables TFMY, BY, and ºBrix should be highlighted for ethanol production since cultivars with high biomass production, associated with high BY and ºBrix are more promising for this purpose.
Significant Genotypes vs. Environments interactions (G x E) were observed. This fact is related to the differences in macroenvironmental aspects of the sites, such as the climatic variables (temperature and rainfall) that influence traits expression. For variables FMYS, FMYL, TDMY, BY, and ºBrix, genotypes were not influenced by the environments, that is, genotypes x environments interaction was not significant for this set of 1328 GENETIC DIVERGENCE... SANTOS, A. et al. variables. The absence of genotype x environment interaction allows recommending genotypes based only on the mean of environments. ** and *: significant at (P<0.01) and significant at (P<0,05), respectively, by the F test. SV: sources of variation; DF: degrees of freedom; percentage of panicles (PP, %); percentage of stems (PS, %); dry mass content in the stems (DMCS, kg ha -1 ); dry mass content in panicles (DMCP, kg ha -1 ); fresh mass yield of stems (FMYS, kg ha -1 ); fresh mass yield of leaves (FMYL, kg ha -1 ); fresh mass yield of panicles (FMYP, kg ha -1 ); dry mass yield of panicles (DMYP, kg ha -1 ); total fresh mass yield (TFMY, kg ha -1 ); total dry mass yield (TDMY, kg ha -1 ); mean dry mass (MDM, kg ha -1 ); broth yield (BY, l ha -1 ); and Brix (ºBrix).
In order to infer more details on the genetic variability in the set of genotypes analyzed, as well as predict genetic gains with selection, genetic parameters associated with variables were estimated (Table 2). A marked interaction with the environment was detected for all variables, which is confirmed by the higher magnitudes of the phenotypic coefficients of variation ‫ܥ(‬ መ ܸ ி ), in detriment to the genotypic coefficients of variation ‫ܥ(‬ መ ܸ ) for all variables.
Estimates of genotypic coefficient of determination (H 2 ) were higher than 70% for variables TFMY, PP, DMYP, FMYP, BY, and ºBrix. These estimated values suggest the possibility of obtaining genetic gains and allow the phenotypic selection of superior individuals for these variables. However, since the genotypic coefficient of determination is not the property of a trait, but of the population and environmental conditions, the values obtained can be considered only as indicative of favorable heritability for the development of sweet sorghum genotypes with high ºBrix contents and high BY.
The genotypic coefficient of variation ‫ܥ(‬ መ ܸ ) is a more accurate choice of breeding strategy since it allows understanding the genotypic variability of the different traits. However, for a real inference on each variable aiming at breeding, the genotypic coefficient of variation and the residual coefficient of variation must be analyzed using the variation index (Iv). Thus, all variables presented values lower than the unit, indicating that the greatest variation between genotypes is due to the environment.
Only DMYP and ºBrix showed values of relative coefficient of variation (CVr) higher than the unit, suggesting that selection gains can be obtained (VENCOVSKY; BARRIGA, 1992). Thus, the values observed for the variables cited are sufficient to provide accurate inference about the genetic value of the progenies.
The trial's precision was measured by the estimate of the selective accuracy (As) since it reflects the quality of the information and procedures used in predicting breeding values. This measure is associated with the selection accuracy and refers to the correlation between predicted genetic values and true genetic values of individuals (PIMENTEL et al., 2014). In this context, the results represent high experimental accuracy since the estimates of selective accuracy were higher than 0.70 (RESENDE; DUARTE, 2007).  Percentage of panicles (PP, %), percentage of stems (PS, %); dry mass content in the stems (DMCS, kg ha -1 ); dry mass content in panicles (DMCP, kg ha -1 ); fresh mass yield of stems (FMYS, kg ha -1 ); fresh mass yield of leaves (FMYL, kg ha -1 ); fresh mass yield of panicles (FMYP, kg ha -1 ); dry mass yield of panicles (DMYP, kg ha -1 ); total fresh mass yield (TFMY, kg ha -1 ); total dry mass yield (TDMY, kg ha -1 ); mean dry mass (MDM, kg ha -1 ); broth yield (BY, l ha -1 ); and Brix (ºBrix).
The logarithmic likelihood function was applied according to the pseudo-F and pseudo-t 2 criteria, associated with the likelihood ratio test, to set the optimal number of groups required. Thus, three groups were formed, with a maximum increase of 78.15 (Figure 1). The indication of the number of groups is an innovative aspect of the Ward-MLM procedure, comparing to other hierarchical methods, and it results in a more accurate and less subjective clustering formation. The major contributors to the diversity analysis in the first canonical variable were variables PS, FMYS, MDM, and BY. In the second canonical variable, which explained 21.17% of the variation of the greatest contributors, were variables PS, PP, DMYP, FMYP, and DMCP (Table 3).
Assessing traits associated with BY and quality is essential to increase ethanol production. These results demonstrate the relevance of these traits for studies on genetic diversity, and consequently for the choice of parents to be used in crosses aimed at optimizing the development of sweet sorghum cultivars. total fresh mass yield (TFMY, kg ha -1 ); percentage of panicles (PP, %), percentage of stems (PS, %); dry mass content in the stems (DMCS, kg ha -1 ); dry mass content in panicles (DMCP, kg ha -1 ); fresh mass yield of stems (FMYS, kg ha -1 ); fresh mass yield of leaves (FMYL, kg ha -1 ); fresh mass yield of panicles (FMYP, kg ha -1 ); dry mass yield of panicles (DMYP, kg ha -1 ); total dry mass yield (TDMY, kg ha -1 ); mean dry mass (MDM, kg ha -1 ); broth yield (BY, l ha -1 ); and Brix (ºBrix).
The largest distances of dissimilarity between the groups, based on the Gower algorithm by the Ward-MLM strategy, were detected between groups I and III (573.27) and between groups II and III (479.62), while the shortest distance was observed between groups I and II (116.46) ( Table  4). In this case, to avoid exhausting genetic variability, as well as restrict possible gains obtained with selection, hybridization between genotypes from these groups is not recommended. This fact is because genetically related parents tend to have many common alleles, and therefore they provide little advantage due to the low level of heterozygosity (CRUZ et al., 2014). Crosses between genotypes from the other groups with those from group III would be interesting for the exploitation of the heterosis in the obtainment of transgressive individuals for the traits of interest. Sorghum breeding programs aim to increase the frequency of favorable alleles to maximize ethanol production. In this sense, using genes with larger genetic distance increases the probability of occurrence of superior segregants in advanced generations. However, genotypes should be chosen considering their per se performance. Thus, to produce commercial hybrids with high heterotic vigor, the focus should be on divergent genotypes that have superior performance for the main traits of agronomic importance and meet the breeding program goals for which these hybrids are being developed.
Due to the shorter distance between the genotypes belonging to groups I and II, the cross involving genotypes from these two groups is not recommended since it can generate progenies with high similarity. However, depending on the breeding program's strategy, this cross, considered as convergent, could facilitate the breeder's works in selecting superior lines in less time since both groups have higher means for most of the analyzed variables.
The first two canonical variables (CAN) obtained using the Ward-MLM procedure explained 100% of the total variation ( Figure 2). According to Cruz et al. (2012), when the first two CAN explain more than 80%, the genetic variability between genotypes can be observed by dispersing the scores in a two-dimensional plot. Thus, this high value indicates that the graphical representation of the first two canonical variables is adequate to verify the relation between groups and individuals within a group, increasing the reliability of the results. The distance between the groups agreed with the canonical variables. Groups 1 and 2 were the closest, with a distance of 116.46 (Table 5). Group 2 allocated the largest number of genotypes.

Figure 2.
Graph of the first two canonical variables for the three groups formed by the Ward-MLM procedure.

CONCLUSIONS
The Ward-MLM procedure is useful to detect genetic divergence and cluster genotypes by simultaneously using morphological and agronomic traits. In addition, this method showed that three was the ideal number of groups, according to pseudo-F and pseudo-t 2 criteria, identifying the most divergent groups.
The cross between groups I and III has a higher probability of generating genotypes with high yield and variability.