Fitting Bayesian Gaussian mixture model for marker dosage

Propaganda
55º Congresso Brasileiro de Genética
Resumos do 55º Congresso Brasileiro de Genética • 30 de agosto a 02 de setembro de 2009
Centro de Convenções do Hotel Monte Real Resort • Águas de Lindóia • SP • Brasil
www.sbg.org.br - ISBN 978-85-89109-06-2
171
Fitting Bayesian Gaussian mixture model for marker
dosage estimation in autopolyploids species:
an application on sugarcane
Silva, RR1; Mollinari, M1; Oliveira, KM2; Marconi, TG3; Souza, AP3; Garcia, AAF1
Departamento de Genética, Escola Superior de Agricultura “Luiz de Queiroz”, Universidade de São Paulo (ESALQ/USP), Piracicaba, SP,
Brasil. 2Centro de Tecnologia Canavieira – CTC, Piracicaba, SP, Brasil. 3Centro de Biologia Molecular e Engenharia Genética (CBMEG),
Universidade Estadual de Campinas (UNICAMP), Campinas, SP, Brasil.
[email protected]
1
Keywords: simplex; duplex; triplex; mixture models; Bayesian
Many important agronomic plant species are autopolyploids, having more than two copies of each homologous
chromosome per cell. In this kind of species the chromosomes pair randomly at meiosis within homologous
set. When analyzing data from molecular markers, it is only possible recognize presence or absence of bands
in an electrophoresis gel. This leads to uncertainty about the marker dosage, i.e. the number of DNA fragments
copies present in each locus. Each marker dosage results in different segregation patterns, hence, before map
construction, it is necessary to estimate marker dosage. In this work, we show a methodology based on fitting
a Bayesian Gaussian mixture model for estimating marker dosage in segregating populations of autopolyploid
species. On this model, it was assumed that the population comprises 3 normal distributions, with different means
and variances. Each marker was allocated on one of theses distributions, based on posterior probabilities. To check
the efficiency of the approach, a dataset of 150 individuals was simulated, containing 798 markers where 80 % was
simplex, 15 % duplex and 5 % triplex. Fitting the Gaussian mixture model through Metropolis-Hasting algorithm,
the estimates of posterior distribution weights parameters was 78.95 %, 16.9 % and 4.14 % of simplex, duplex and
triplex markers, respectively. After that, a real dataset of 573 markers derived from a full-sib family population
of 100 individuals originated from crossing between sugarcane cultivars SP80-180 and SP80-4896 was analyzed.
Fitting the Bayesian Gaussian mixture model, 520 markers were classified as simplex, 46 as duplex and 7 as triplex.
The main advantage of this methodology in relation to others, such as chi-square test or confidence intervals
based on binomial distribution, is that it tests simultaneously all segregation ratios, also considering multiple tests.
Moreover, it allows the incorporation of prior information about the proportions of each markers dosage (simplex,
duplex and triplex) in the genoma.
Financial Support: CNPq.
Download