Omar E. Cornejo, Washington State University, Evolutionary genomics of Theobroma cacao and the genetic basis of the fruit color

Event Dates: November 8, 2013 - 4:00pm - 5:00pm

Theobroma cacao L (cacao: Malvaceae) is a small tree endemic to the Amazonian rain forest, where it most likely evolved. In cacao populations, a fraction of the population is able to self-fertilize, and other members naturally outcross, but the relative contribution of selfing and outcrossing in nature has not been clearly determined, and its demographic history is not well understood. I will present advances of an ongoing project aimed to generate full genome sequence data of 145 individual plants to investigate the evolutionary history of cacao and identify the genetic basis of phenotypic traits of interest. We show that cacao is a highly diverse population, with a heterozygosity of around five SNPs per kilobase per individual. Our estimates of inbreeding, from microsatellite data, suggest that different populations undergo different degrees of selfing, with rates varying from 0.25 to 0.9. The selfing estimates are consistent with the range of inbreeding coefficients estimated from individually sequenced genomes (F = 0.02 – 0.94, where F = [Hexp – Hobs]/Hexp, Hexp is the expected heterozygosity and Hobs is the observed heterozygosity). The distribution of variation genome wide, captured in the site frequency spectrum, suggests (via comparison with simulations in a maximum likelihood model) that the population of cacao has undergone a population bottleneck followed by expansion, and the timing of such event could have preceded the putative domestication times and consistent with the last maximum glacial period. Detailed analysis of the sequenced parents of mapping populations and a diversity panel has shed light on the genetic basis of pod color variation, revealing that a single synonymous substitution associated with a miRNA could be responsible for differences between green and red pod colors.The results so far achieved in this project highlight the utility of generating genomic resources for arboreal crops. We present the advances and challenges of this project, emphasizing the importance of the identification of plants with reliable phenotyping and appropriate genotyping with microsatellite or single nucleotide polymorphism (SNP) data prior to sequencing.