55º Congresso Brasileiro de Genética Resumos do 55º Congresso Brasileiro de Genética • 30 de agosto a 02 de setembro de 2009 Centro de Convenções do Hotel Monte Real Resort • Águas de Lindóia • SP • Brasil www.sbg.org.br - ISBN 978-85-89109-06-2 171 Fitting Bayesian Gaussian mixture model for marker dosage estimation in autopolyploids species: an application on sugarcane Silva, RR1; Mollinari, M1; Oliveira, KM2; Marconi, TG3; Souza, AP3; Garcia, AAF1 Departamento de Genética, Escola Superior de Agricultura “Luiz de Queiroz”, Universidade de São Paulo (ESALQ/USP), Piracicaba, SP, Brasil. 2Centro de Tecnologia Canavieira – CTC, Piracicaba, SP, Brasil. 3Centro de Biologia Molecular e Engenharia Genética (CBMEG), Universidade Estadual de Campinas (UNICAMP), Campinas, SP, Brasil. [email protected] 1 Keywords: simplex; duplex; triplex; mixture models; Bayesian Many important agronomic plant species are autopolyploids, having more than two copies of each homologous chromosome per cell. In this kind of species the chromosomes pair randomly at meiosis within homologous set. When analyzing data from molecular markers, it is only possible recognize presence or absence of bands in an electrophoresis gel. This leads to uncertainty about the marker dosage, i.e. the number of DNA fragments copies present in each locus. Each marker dosage results in different segregation patterns, hence, before map construction, it is necessary to estimate marker dosage. In this work, we show a methodology based on fitting a Bayesian Gaussian mixture model for estimating marker dosage in segregating populations of autopolyploid species. On this model, it was assumed that the population comprises 3 normal distributions, with different means and variances. Each marker was allocated on one of theses distributions, based on posterior probabilities. To check the efficiency of the approach, a dataset of 150 individuals was simulated, containing 798 markers where 80 % was simplex, 15 % duplex and 5 % triplex. Fitting the Gaussian mixture model through Metropolis-Hasting algorithm, the estimates of posterior distribution weights parameters was 78.95 %, 16.9 % and 4.14 % of simplex, duplex and triplex markers, respectively. After that, a real dataset of 573 markers derived from a full-sib family population of 100 individuals originated from crossing between sugarcane cultivars SP80-180 and SP80-4896 was analyzed. Fitting the Bayesian Gaussian mixture model, 520 markers were classified as simplex, 46 as duplex and 7 as triplex. The main advantage of this methodology in relation to others, such as chi-square test or confidence intervals based on binomial distribution, is that it tests simultaneously all segregation ratios, also considering multiple tests. Moreover, it allows the incorporation of prior information about the proportions of each markers dosage (simplex, duplex and triplex) in the genoma. Financial Support: CNPq.