Info • Branchiostoma floridae

Status

Project Status

The genome of Branchiostoma floridae is estimated to be approximately 575 Mb contained in 19 pairs of chromosomes, and is being sequenced to approximately 8.1 X depth.

The genome assembly release v.1.0 was annotated using the JGI annotation pipeline. Gene models and associated transcripts/proteins are predicted or mapped using a variety of tools based on cDNA, protein homology and ab initio methods. The current release contains approximately 50817 gene models composed of known Branchiostoma floridae genes as well as support from available Branchiostoma floridae EST and cDNA data.

Approximately 95% of Branchiostoma floridae full-length cDNAs mapped to the v.1.0 assembly. Average gene length is 9.1 kb and average transcript length is 1.4 kb, with the average protein containing 451 amino acids. There are approximately 7 exons per gene averaging 204 bp each with intron spacing of 1.3 kb. Gene functions have been automatically assigned based on homology to known genes. Manual curation of these annotations will start shortly.

Assembly Release

v2.0 (May, 2008): We have created a non-redundant representation (v2.0) of the genome sequence which is a mosaic of the two haplotypes found in assembly v1.0. Both assemblies can be downloaded from the Branchiostoma portal download page. The 1000 longest scaffolds of assembly version 1.5 were aligned to one another using MegaBLAST (Zhang, Schwartz et al. 2000), and manually curated into 398 connected sets of allelic scaffolds. In this process, 132 potential mis-joins were identified in Version 1.5 scaffolds and broken. Each of the 398 sets of allelic scaffolds was merged into a non-redundant representative sequence which is a mosaic of the two haplotypes, created by concatenating segments of haplotypes only between gene models, and to minimize the number of transitions between haplotypes. Among the possible tilings with the minimum number of transitions, we selected that which minimizes the total length of sequence gaps in the merged sequence. This method is similar in spirit to that applied to the Ciona savignyi genome by Small (Small, Brudno et al. 2007). Assembly v2.0 spans 522 Mb, with scaffold N/L50 = 62 / 2.6 Mb and contig N/L50 = 4916 / 28kb. The net assembly length is slightly longer than the estimated haploid genome size, which could be accounted for by contributions from internal assembly gaps, residual allelic redundancy and haplotype-unique sequences.

Putnam, N. H., T. Butts, et al. (2008). "The amphioxus genome and the evolution of the chordate karyotype." Nature: in press (doi:10.1038/ nature06967).

Small, K. S., M. Brudno, et al. (2007). "A haplome alignment and reference sequence of the highly polymorphic Ciona savignyi genome." Genome Biol 8(3): R41.

Zhang, Z., S. Schwartz, et al. (2000). "A greedy algorithm for aligning DNA sequences." J Comput Biol 7(1-2): 203-14.

v.1.0 (December 5, 2006): Approximately 6.5 Million shotgun reads were initially assembled using JAZZ. A high allelic polymorphism rate of 5-10% allowed the two haplotypes to be assembled separately at approximately 75% of genomic loci. There are a total of 3,032 scaffolds, with a total length of 923 Mb composed of 81,073 contigs. Half of the assembly is contained in 174 scaffolds, all at least 1.6 Mb in length. The length-weighted mean contig size (L50) is 26kb.

Genome Reference(s)

Funding

The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.