Linking taxa to function through contig clustering of microbial metagenomes

43 mins 55 secs,  218.03 MB,  WebM  640x360,  29.97 fps,  44100 Hz,  677.84 kbits/sec
Share this media item:
Embed this media item:


About this item
Image inherited from collection
Description: Quince, C (University of Glasgow)
Friday 28 March 2014, 13:45-14:30
 
Created: 2014-03-31 14:45
Collection: Mathematical, Statistical and Computational Aspects of the New Science of Metagenomics
Publisher: Isaac Newton Institute
Copyright: Quince, C
Language: eng (English)
Distribution: World     (downloadable)
Explicit content: No
Aspect Ratio: 16:9
Screencast: No
Bumper: UCS Default
Trailer: UCS Default
 
Abstract: Co-authors: Johannes Alneberg (KTH Royal Institute of Technology, Stockholm, Sweden), Brynjar Smaari Bjarnason (KTH Royal Institute of Technology, Stockholm, Sweden), Ino de Bruijn (KTH Royal Institute of Technology, Stockholm, Sweden), Melanie Schirmer (University of Glasgow), Joshua Quick (University of Birmingham), Nicholas J. Loman (University of Birmingham), Anders F. Andersson (KTH Royal Institute of Technology, Stockholm, Sweden), Konstantinos Gerasimidis (University of Glasgow)

Taxonomic profiling of microbial communities can answer the question of “Who is there?” This can be achieved either through marker gene sequencing or true shotgun metagenomics. The latter because the functional genes of all community members are sequenced allows us to answer the additional question: “What are they doing?” However, there is a third question that is key to understanding microbial communities: “Who is doing what?” This question has received much less attention because to answer it requires the extraction of complete genomes from metagenomes. Assembly of metagenomes can generate millions of contigs, assembled genome fragments, with no information on which contig derives from which genome. Here I will present CONCOCT, a novel algorithm that combines sequence composition, coverage across multiple samples, and read-pair linkage to automatically cluster contigs into genomes. CONCOCT uses a dimensionality reduction coupled to a Gaus sian mixture model, fit using a variational Bayesian algorithm which automatically identifies the optimal number of clusters. We demonstrate high recall and precision rates on artificial as well as real human gut metagenome datasets. Linking contigs into genome clusters, allows the frequencies of those clusters to be related to metadata, revealing function. We apply this approach to fecal metagenomes obtained from the E. coli O104:H4 epidemic (Germany, 2011) and are able to directly extract the outbreak genome. We also use it to identify organisms associated with inflammation in samples from children with Crohn’s disease.

Related Links

http://arxiv.org/abs/1312.4038 - arXiv preprint
Available Formats
Format Quality Bitrate Size
MPEG-4 Video 640x360    1.94 Mbits/sec 639.53 MB View Download
WebM * 640x360    677.84 kbits/sec 218.03 MB View Download
iPod Video 480x270    522.22 kbits/sec 167.91 MB View Download
MP3 44100 Hz 249.75 kbits/sec 80.40 MB Listen Download
Auto (Allows browser to choose a format it supports)