DISSIMILARITY MEASURES AND HIERARCHICAL METHODS FOR THE STUDY OF GENETIC DIVERSITY ON SOYBEAN

In analysis of the genetic diversity on soybean can be used agronomic, morphological and molecular traits, which are subjected to multivariate biometrical analysis. There are different multivariate methodologies available such as Euclidean distance, Mahalanobis distance and different hierarchical methods. However, studies that may assist in the choice of such methods are lacking. The aim of this paper was to evaluate the clustering standards of soybean genotypes using Euclidean and Mahalanobis distances, following different hierarchical methods. The experiment was conducted in “Capim Branco” farm which belongs to the Federal University of Uberlândia and were used a complete randomized block design composed of 15 soybean genotypes (nine breeding lines and six cultivars) and four replications. The agronomic traits evaluated were: number of days to flowering and to maturity, height of the plant at flowering and at maturity, height of the insertion of the first pod, number of nodes on the main stalk in flowering and at maturity, number of grains per pod, total number of pods, severity of Asian rust, number of pustules and yield. The data were submitted to multivariate analysis in GENES program. The Mahalanobis distance or the Euclidean distance obtained by agronomic traits allows the determination of soybean genetic diversity. The use of the Euclidean distance in hierarchical methods allows a greater group differentiation. The UPGMA method and the nearest neighbor method shows a greater accuracy using the Mahalanobis distance and Euclidean distance.


INTRODUCTION
The soybean (Glycine max L. Merrill) is a leguminous of great importance in the world scenario, such in the food context as in the economic context .
The main aim of soybean breeding programs, through the selection of plants that shows better characteristics, is to obtain genotypes with high grain productivity and resistance to biotic and abiotic factors, for the development of new cultivars that exceed the commercials ones . For the success of those programs is important the existence of genetic variability .
The adoption of strategies which aims to provide a greater selection gain is indispensable, considering the large number of genes involved in the control of quantitative traits.
In this way, breeders have recommended for the formation of a population base, the intercrossing between superior cultivars and divergent ones, resulting in hybrid combinations of a greater heterozygosis, in a way that the segregated generations would have a bigger possibility of obtaining superior genotypes .
In genetic diversity studies, agronomic, morphological and molecular traits can be used in multivariate biometric analysis. Among the existing techniques for the genetic divergence study, the most applied are dissimilarity measures, clustering methods, principal components and canonic variables, allowing to unify multiple information from a set of traits resulting in a bigger opportunity of choice between divergent parents in breeding programs .
The multivariate statistic technics are based in measures of dissimilarity, such as Euclidean distance, Mahalanobis distance and other different measures for molecular characters, being the application of such measures useful in genetic breeding programs by allowing the gathering of information about the degree of resemblance or difference between two or more genotypes (CRUZ et al., 2014).
To quantify the genetic difference between pairs of genotypes, can be used different measures of dissimilarity, such as Euclidean distance and the Mahalanobis distance. Both emphasize variations in morphological, agronomic and physiological characteristics. The Mahalanobis distance (D²) can be used as genetic diversity estimation when several traits are measured in distinct genotypes (Elias et al., 2007). However, the Euclidean distance can be determined just with averages, while the Mahalanobis distance adopts the averages, considering correlated traits and the residual covariance matrix, being needed an experiment with repetitions (CRUZ et al., 2014).
The use of methods that allow grouping genotypes based on some similarity or dissimilarity measures has the purpose of separate an original group of observation in several subgroups, in a way to obtain homogeneity inside such subgroups and heterogeneity among them. Thus, they are also alternatives to analyze and understand the data (CRUZ et al., 2014).
The hierarchical methods are applied in a widespread range by breeders. They determine the genetic diversity among the genotypes, in which the genotypes are grouped by a process that repeats in various levels according to the shorter distances. In that way, is established a dendrogram, where the groups are formed from the delimitation of a cutting line, that is made in a subjective way, without worrying about the optimal number of groups.
There are different hierarchical methods available that allows to obtain dendrograms from a distance matrix such as average linkage among groups (UPGMA), method of the nearest neighbor or single bond, method of the farther neighbor or complete connection and Ward's minimum variance method .
Therefore, there are several methodologies to study the genetic divergence. They are based on hierarchical methods and it is up to the researcher to choose the one that best fits to the set of data. Nowadays, many studies of genetic diversity have been conducted in soybean, and in most of them the UPGMA method is used (MATSUO et al., 2011;MIRANDA et al., 2007;PRIOLLI et al., 2010;SANTOS et al., 2013;SINGH et al., 2010;VAL et al., 2014;YAMANAKA et al., 2007). In this way, there is a lack of studies about concordance pattern of genotype grouping when distance measures and different grouping methods are adopted. Therefore, the objective of this work was to evaluate the genetic divergence of soybean cultivars from Euclidean and Mahalanobis distance with different hierarchical methods.

Implantation and conduction of the experiment
The trial was carried out in the experimental area of the Soybean Breeding Program, situated on the Capim Branco Farm (18° 52' S; 48° 20' W and 805m of Altitude), belonging to the Federal University of Uberlândia, located in the city of Uberlândia -MG, on the December 12, 2013.
The treatments were composed of 15 soybean genotypes, nine lines developed by the Soybean Breeding Program of the Federal University of Uberlândia, and six cultivars.
The complete randomized block with four replications experimental design was used. The plots were composed of four 5 m long rows of soybean plants, spaced 0.50 m between rows, and the usable area represented by the two center rows discarding 0.50 m of each extremity.
The experiment was installed using the conventional sowing system, in an area of dystrophic dark red latosol, which has been under soybean cultivation for several years. The soil was prepared with a plowing, two gradations, followed by the groove prior to sowing. Fertilization was performed with formula NPK 02-20-10.
Sowing was performed manually, using a density of 12 plants per linear meter. After the sowing, seed treatment was carried out with the application of a directed jet of fungicide of carboxanilide chemical composition (Carboxin) and Dimethyldithiocarbamate -trade name Vitavax -Thiran 200 SC, at a dose of 250 ml of commercial product per 100 kg of seed and insecticide Thiamethoxam, trade name Cruiser 350FS, in a dose of 200 mL per 100 kg of seed, as well as the inoculation with the bacteria Bradyrhizobium japonicum using the liquid inoculant Masterfix.
For weed control, pre-emergent applications were made using 2.0 L/ha -1 of the herbicide Dual Gold (S-metolachlor) and a post-emergent application for weed control with Cobra (lactofen) (0.4 L/ha -1 ) + Classic® (Chlorimuron-ethyl) (40 g/ha -1 ), to control invasive plants with wide leaves and narrow leaves, respectively.

Traits Evaluated
The stages of development in soybean were identified according to the scale of Fehr and Caviness (1977). All the evaluations were conducted on five plants, randomly sampled from each plot, and the following traits were evaluated: a) Number of days to flowering (NDF): the period that corresponds to the number of days elapsed between emergence and flowering (stage R1). b) Number of days to maturity (NDM): the period that corresponds to the number of days elapsed from emergence to the date that 95% of the pods mature (stage R8). h) Number of grains per pod: after harvesting, the total number of pods on each plant was counted. They were then ranked as NP1G (Number of pods with one grain), NP2G (Number of pods with two grains), and NP3G (Number of pods with three grains) and subsequently the TNP (Total number of pods) per plant was calculated.
i) Severity of Asian rust (SEV) (%): evaluations were carried out on five central trefoil leaflets per plot from the appearance of the first pustules until total defoliation, and the average severity was an estimate of the disease average in the plot. The diagrammatic scale shown below was adopted (GODOY et al., 2006). k) Yield (Y): The data obtained (grams per plot) was transformed into kg/ ha -1 , this being the yield corrected for 13% moisture, according to the equation: where FW is the final corrected weight of the sample, IW is the initial weight of the sample, IM is the initial moisture of the sample, and FM is the final moisture of the sample (13%).
Data were subjected to analysis of variance (ANOVA). The genetic distance between all pairs of genotypes were estimated by Mahalanobis distance and Euclidean distance. Optimization clustering methods were applied, such as: method of the nearest neighbor or single bond; method of the furthest neighbor or complete connection; method of linking groups mean (UPGMA); Ward's minimum variance method, performed by the computer program Genes (CRUZ, 2013).

RESULTS AND DISCUSSION
The dissimilarities estimates provide useful information to breeders by quantifying and informing about the degree of resemblance or difference presented between two genotypes (CRUZ; FERREIRA; PESSONI, 2011). The dissimilarities measures estimated by Mahalanobis distance oscillated from 8,9 (BRSGO 7560 and UFUS 7401) to 335,7 (BRSGO 7560 and UFUS 27) indicating a large genetic diversity among the studied genotypes (Table 1). These dissimilarity results were lower than the ones obtained by Oliveira et al. (2014), who in a study with 22 soybean genotypes, also including lineages developed by UFU soybean breeding program, observed amplitudes from 0,00013 to 25,00 for Mahalanobis distance. The Euclidean distance oscillated from 0,43 (UFUS 15 and UFUS 36) to 2,91 (M-Soy 6101 and UFUS 139) ( Table 2), being this amplitude higher than the one obtained by Torres et al. (2015), who in a study with 5 soybean genotypes, found an Euclidean distance oscillating between 0,96 to 2,91. The Mahalanobis distance obtained for the cultivars BRSGO 7560 and UFUS 7401 (Table 1) were the smaller distance (8,9), because of that being considered the most similar. On the other hand, the BRSGO 7560 and UFUS 27 genotypes were considered the most divergent by presenting a bigger difference (335,7). Rigon et al. (2012), analyzed the genetic divergence in 18 soybean cultivars and obtained values between 0,008 to 0,53. Almeida et al. (2011), worked with 11 soybean cultivars and observed an elevated magnitude of the Mahalanobis distance which was from 2,65 to 374,06 indicating a high genetic variability according to what was found in this study.
By observing the values of the Euclidean distance of the UFUS 15 and UFUS 36 genotypes (Table 2), it was noticed that they showed the shortest distance (0,43), in this way, being considered as the most similar genotypes. On the other hand, the M-Soy 6101 and UFUS 139 cultivars were the most distinctive by showing a longer distance (2, 91).
In the obtained measures by the Mahalanobis distance (Table 1) there was a high frequency of pairs with the greatest distances when one of the components was the UFUS 27 lineage.
On the other side, when the BRSGO 7560 and UFUS 7401 took part as one of the genotypes, the shortest distances were observed, meaning the smaller genetic dissimilarity.
In the measures obtained by the Euclidean distance (Table 2) it can be observed that the M-Soy 6101 genotype has a high dissimilarity with the most genotypes, except for the UFUS 26.
The cophenetic correlation is a coeficient that enable evaluate the adequancy betweeen the distance matrix and the dendrogram, allowing to increase the conclusions reliability in face of the dendrogram interpretation (Kopp et al., 2007). Besides that, with values of the correlation higher than 0,75 lower will be the distortion caused by the grouping . Cophenetic correlations higher than 0,75 can be considered high and indicate a good fitness of the distances showed in the dendrogram (MC GARIGAL et al., 2000).
In the method of the nearest neighbor, using the Mahalanobis distance matrix, was found a value of 0,53 in the cophenetic correlation (Table 3). Rigon et al. (2012), in a study with 18 soybean cultivars found a cophenetic correlation of 0,75 being, in this way higher than the one obtained in this study. The cophenetic correlation estimates obtained in the method of the furthest neighbor were 0,58 and 0,77 respectively to the Mahalanobis and Euclidean distances (Table 3). These results were partially higher than the ones obtained by Cargnelutti Filho et al. (2010) in studies using beanstalk, as those authors found values from 0,62 and 0,67 for the Mahalanobis and Euclidiana distances respectively.
In the UPGMA method, by using the Mahalanobis distance, it was observed a cophenetic correlation estimate of 0,67. Sousa et al. (2015), in a study using 110 soybean genotypes, reported the value of 0,70. Bertan et al. (2007) considered 17 morphologic traits in soybean genetic diversity studies, perceived a cophenetic correlation coefficient of 0,80. Still in the UPGMA method, when the Euclidean distance is considered, the cophenetic correlation was 0,86. Gonçalves et al. (2014), in a study using 65 bean genotypes, found a cophenetic correlation of 0,81 which come close to the estimates reached in this study.
In the grouping method of Ward, the values found for the cophenetic correlations were 0,62 (Mahalanobis) and 0,56 (Euclidean). Cargnelutti et al. (2008), by the analysis of 14 bean cultivars, reported estimates which were higher, as they observed a correlation of 0,90 either for the Mahalanobis distance, or for the Euclidean distance. In the dendrogram generated from the Mahalanobis and the Euclidiana distances, the cuts were made in a level of 50% of dissimilarity, agreeing with Santos et al. (2013) and Torres et al. (2015), who used the same percentage of dissimilarity to observe the group formation in soybean genetic diversity study.
In the dendrogram of the nearest neighbor for the Mahalanobis distance (Picture A) there were the formation of 5 groups in which 73,3% of the genotypes were gathered in the first group (BRSGO 7560; UFUS 7401; TMG 801; UFUS 110; UFUS 11; UFUS 26; UFUS 36; UFUS 54; UFUS 24; UFUS 15; UFUS 6901). The remaining groups II, III, IV e V were formed by only one genotype in each (6, 7%) being them UFUS 139; UFUS Riqueza; UFUS 27 and M-Soy 6101 respectively. In the study of Almeida et al. (2011), with 12 soybean cultivars, if they performed a cut in 50%, the result would be like the one reported in this study, as most of the genotypes studied also would agglomerate in just one group (91,6%). Peluzio et al. (2012), in an analysis of 12 soybean genotypes, obtained two groups and which one of them were formed by all the genotypes with an exception of just one, which was spotted in the second group. The groups IV and V condensed only one genotype in each one (6,7%), which were UFUS 139 and M-Soy 6101, respectively. Still in the Ward method with the Euclidean distance (Picture H) were verified five groups from which the first gathered 33,3% of cultivars (UFUS 15;UFUS 36;UFUS 24;UFUS 11;UFUS 27). The second group was composed by 20% of the genotypes (UFUS 26; UFUS 54; UFUS Riqueza). The groups III and IV were composed by the genotypes UFUS 139 and M-Soy 6101, respectively. Finally, the last group gathered 33,3% of the genotypes in the study (UFUS 110;UFUS 6901;TMG 801;UFUS 7401;BRSGO 7560). Arshad;Ali;Ghafoor (2006), in research using 33 soybean genotypes, got three groups, in which 42,4% of the genotypes joined in the same group, and the others go with 24,2% (group 2) and 33,3% (group 3). When comparing the hierarchical methods from the Mahalanobis distance and the Euclidean distance it was possible to observe that there was a coincidence of grouping of some genotypes. In the complete linkage method (the furthest neighbor) and the UPGMA method, the coincidence among the genotypes which constituted the same group was equal or higher than 50%. In relation to the number of groups formed it was possible to find that the Euclidean distance showed a greater capacity of differentiation in a group in comparison to the Mahalanobis distance. Araújo et al. (2014) comparing the hierarchical methods with 11 cotton cultivars obtained that the nearest neighbor, the furthest neighbor, UPGMA and Ward grouping method also resulted in 90% or more of similarity among the grouped genotypes.
Independently of the dissimilarity measure, Mahalanobis or Euclidean, it was verified that the nearest neighbor method and UPGMA made possible the identification of a greater number of groups. Arriel et al. (2006) also commented a better differentiation of genotypes by the UPGMA method.
The BRSGO 7560 and TMG 801 genotypes remained grouped in all used methods. It is important to highlight that both cultivars have tolerance to soybean Asian rust (MT foundation, 2011;GLASSENAP et al., 2015;POLIZEL et al., 2010), and it can, consequently be potential parents in soybean breeding programs focused in resistance to Phakopsora pachyrhizi, because besides the tolerance to the fungi they also present productive potential (PASSOS et al., 2014).
The genotypes UFUS 139 and M-Soy 6101 remained isolated in distinctive groups in all analyzed groupings except for the furthest neighbor and Ward grouping methods of the Mahalanobis distance. This isolation shows that those cultivars can be potential parents, as emphasized by Arriel et al. (2006).
Although it has been showed in several studies that the soybean has a narrow genetic base (HIROMOTO; VELLO, 1986;PRIOLLI et al., 2004) it was possible to verify in this study the existence of a genetic diversity even among enhanced genotypes, agreeing in this way with Oda et al. (2015) and Glassenap et al. (2015), who still stated that there is substantial genetic variability in soybean.
In the view of the plant breeding, the data processing by several grouping methods and based in diverse dissimilarity measures, considering the particularities of each one, it was possible to determine the most divergent genotype as in this way useful in the breeding programs.

CONCLUSIONS
The generalized Mahalanobis distance or the Euclidean distance obtained with agronomic traits allow determining the genetic diversity in soybean. The use of Euclidean measure in hierarchical methods allows a greater differentiation of groups in soybean comparing with Mahalanobis distance.
The UPGMA method and the nearest neighbor showed a greater agreement in genotype grouping of soybeans using the Mahalanobis distance and the Euclidean distance.

ACKNOWLEDGMENTS
The authors thank CNPq for the financial support (postgraduate scholarship) and FAPEMIG (Foundation for Research Support of the State of Minas Gerais) for providing financial assistance.