Hybrid assembly of the large and highly repetitive genome. Underlying software includes jellyfish kmer counter, a modified version of the celera assembler, superreads method for extending short reads and. Masurca can assemble data sets containing only short reads from illumina sequencing or a mixture of short reads and long reads sanger, 454, pacbio and nanopore. We use this method to produce an assembly of the large and complex genome of. Masurca maryland superread celera assembler is a whole genome assembly package that can combine short and long reads from different sequencing hardware.
Automated ensemble assembly and validation of microbial. The theory and practice of genome sequence assembly. Masurca assembler was genome and data dependent, as it. For assembly with illuminaonly data, the nga50 contig size for masurca assembly was twice as big compared with the allpathslg assembly, whereas the number of errors was 62% larger. Not unexpectedly, the mmu16 dataset was more challenging than the bacterial genome. The following is required all current major linux distributions include. We call our system the maryland superread celera assembler abbreviated masurca and pronounced mazurka. Institute for physical sciences and technology, university of maryland, college park, md 20742. The annotation and the genomic position are shown on the consensus sequence. Motivation secondgeneration sequencing technologies produce high coverage of the genome by short reads at a low cost, which has prompted development of new assembly methods. The best assembly of this dataset, as selected by imetamos, was masurca k 35. The megareads software, which is now incorporated into the masurca assembler, can handle hybrid assemblies of almost any plant or animal genome, including genomes as large as the 22 gbp loblolly pine. Automated ensemble assembly and validation of microbial genomes.
3 62 432 565 1482 624 1207 1246 854 518 1391 360 629 1313 1076 253 1236 255 474 319 1198 472 297 1357 399 1058 464 1640 685 1149 363 541 805 293 740 1278 1292 1467 1440 1167 765 353 801 1153 1375