Arcobacter sp. SCGC AAA036-D18
As of April 24, 2024, the Genome Portal cannot register new Globus users. If you already have an existing Globus account registered with us, you will be able to stage and download your data. We apologize for the inconvenience and will remove the message when this functionality is restored.
Seq. Project name:
Arcobacter sp. SCGC AAA036-D18 ( Project ID: 403747 )
Product:
Standard Draft
Proposal Name:
Shedding Light on the Dark: Single-cell Genomics of Uncultivated Epsilonproteobacteria Inhabiting the Subseafloor Biosphere at Deep-Sea Hydrothermal Vents (Proposal ID: 275)
Project PI:
User Program:
CSP
Program Year:
2011
Scientific Program:
Microbial
Genome Portal:
Related Projects:
FD 1078543; SP 403747; AP 1423516
Release Date:
2011-12-21
Organism
Genus/species/strain/isolate: Arcobacter / Arcobacter sp. SCGC AAA036-D18 / SCGC AAA036D18 /
GOLD ID: Gp0013950
Data Submission
NCBI BioProject ID: 77949
NCBI Tax ID: 1001760
SRA accession:
  • SRP078811 (2016-07-18)
Contacts
JGI: IMG [email protected]
Request DNA:
Stefan M Sievert <[email protected]>
General Information
QD/SAG JGI SINGLE CELL QC AND ASSEMBLY REPORT - 4093941 Arcobacter sp. SCGC AAA036-D18 1) RAW DATA: LibraryName NumReads ReadType FileName IBFP 35931236 2x150 /house/sdm/prod/illumina/seq_data/fastq/2036.6.1731.TAGCTT.fastq.gz QC dir: /house/groupdirs/pi/project/4093941/ill.qd 2) READ FILTERING STATS: Reads were screened against human contaminants, synthetic oligos used in the Illumina sequencing process and normalized. Pairs of matching reads were removed from the dataset. Total input reads: 35931236 (100%) Num contam reads removed: 744 (0.0%) - human_chr1 497 0.00% - human_chr4 58 0.00% - human_chr12 28 0.00% - human_chr2 20 0.00% - human_chr3 18 0.00% - human_chr15 18 0.00% - human_chr5 18 0.00% - human_chr7 16 0.00% - human_chr11 12 0.00% - human_chr8 12 0.00% - human_chr21 10 0.00% - human_chr14 8 0.00% - human_chrX 8 0.00% - human_chr13 6 0.00% - human_chr18 6 0.00% - human_chr6 6 0.00% - human_chr9 4 0.00% - human_chr10 4 0.00% - human_chr16 4 0.00% - human_chr19 2 0.00% - human_chr20 2 0.00% - human_chr17 1 0.00% Artifact reads removed: 190836 (0.5%) Normalized reads removed: 35655424 (99.2%) Total reads removed: 35847004 (99.8%) Total reads remaining: 84232 (0.2%) 3) READ IDENTIFICATION STATS This step identifies contaminants but does not remove them from the dataset. Total input reads: 35931236 (100%) Num contam reads identified: 7878 (0.0%) - Escherichia 4336 0.01% - Delftia 2022 0.01% - Shigella 1468 0.00% - Cupriavidus 38 0.00% - Ralstonia 14 0.00% 4) ASSEMBLY STATS: b) Velvet assembly: Assembly stats of the Velvet assembly created by the velvet optimizer. The input reads have been filtered for contamination, artifacts and normalized. Avg GC Content: 29.72 +/- 9.29% Largest Contig: 6.1 kb Main genome scaffold total: 302 Main genome contig total: 303 Main genome scaffold sequence total: 402.6 KB Main genome contig sequence total: 402.5 KB (-> 0.0% gap) Main genome scaffold N/L50: 70/1.7 KB Main genome contig N/L50: 71/1.7 KB Number of scaffolds > 50 KB: 0 % main genome in scaffolds > 50 KB: 0.0% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- --------- ------- ----------- ----------- -------- All 302 303 402,598 402,509 99.98% 1 kb 140 141 293,790 293,711 99.97% 2.5 kb 37 37 135,531 135,531 100.00% 5 kb 6 6 33,864 33,864 100.00% c) Allpaths + Velvet simulated read pairs: Assembly stats of the ALLPATHS assembly. The input contains simulated 1-3 kb read pairs created from the Velvet assembly and reads that have been filtered for contamination, artifacts and normalized. Avg GC Content: 30.10 +/- 9.81% Largest Contig: 10.4 kb Main genome scaffold total: 108 Main genome contig total: 108 Main genome scaffold sequence total: 336.6 KB Main genome contig sequence total: 336.6 KB (-> 0.0% gap) Main genome scaffold N/L50: 28/4.2 KB Main genome contig N/L50: 28/4.2 KB Number of scaffolds > 50 KB: 0 % main genome in scaffolds > 50 KB: 0.0% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- --------- ------- ----------- ----------- -------- All 108 108 336,554 336,554 100.00% 1 kb 106 106 334,636 334,636 100.00% 2.5 kb 54 54 251,810 251,810 100.00% 5 kb 19 19 127,369 127,369 100.00% 10 kb 2 2 20,597 20,597 100.00% d) Merged Allpaths + Velvet simulated reads and optimized Velvet assemblies: Assembly stats of the final assembly. This assembly was performed using a hybrid of ALLPATHS and Velvet optimized assemblies. The ALLPATHS assembly was performed using reads that have been filtered for contamination, artifact and normalized and simulated 1-3 kb read pairs created by a Velvet assembly using the same read dataset. The Velvet optimized assemblies were performed using reads that have been filtered for contamination, artifact and normalized. These assemblies were chosen based on the best kmer based on largest contig lengths and varying coverage cutoffs. The contigs of the velvet assemblies and the ALLPATHS assembly were merged and unique sequences selected. Avg GC Content: 32.39 +/- 11.71% Largest Contig: 20.8 kb Main genome scaffold total: 164 Main genome contig total: 164 Main genome scaffold sequence total: 373.6 KB Main genome contig sequence total: 373.6 KB (-> 0.0% gap) Main genome scaffold N/L50: 26/4.6 KB Main genome contig N/L50: 26/4.6 KB Number of scaffolds > 50 KB: 0 % main genome in scaffolds > 50 KB: 0.0% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- --------- ------- ----------- ----------- -------- All 164 164 373,574 373,574 100.00% 1 kb 91 91 324,799 324,799 100.00% 2.5 kb 43 43 249,400 249,400 100.00% 5 kb 23 23 177,086 177,086 100.00% 10 kb 4 4 54,108 54,108 100.00% 5) KEY PIPELINE CMDS: a) Contamination removal step: Bwa version: 0.5.9-r16 Bwa aln params: Bwa sampe params: -A -P -s b) Artifact removal step: duk params: -k 22 -s 1 -c 1 c) Normalization step: kmernorm params: -k 21 -t 15 -c 2 d) Velvet assembly step for creating simulated read pairs: Velvet version: 1.1.04 Velvet optimizer version: 2.1.7 Velvet optimizer params: --v --s 41 --e 71 --t 1 --f "-shortPaired -fastq $FASTQ" --o "-ins_length 250 -min_contig_lgth 500" e) Simulated read pairing creation step: Wgsim version: 0.3.0 Wgsim params: -e 0 -1 76 -2 76 -r 0 -R 0 -X 0 f) ALLPATHS assembly step: ALLPATHS version: RunAllPathsLG r38445 Contents of in_libs.csv: library_name, project_name, organism_name, type, paired, frag_size, frag_stddev, insert_size, insert_stddev, read_orientation, genomic_start, genomic_end STD_1,project,assembly,fragment,1,200,35,,,inward,0,0 SIMREADS,project,assembly,jumping,1,,,1000,100,inward,0,0 g) Velvet assembly step (varying kmer and coverage cutoffs): Velvet version: 1.1.04 Best Kmer: 87 Best cutoff coverages: 1 5 auto 6) WORKFLOW STEPS: 1. Removed contamination (human contaminants). 2. Removed illumina artifacts (synthetic oligos used in the laboratory). 3. Normalized read coverage. 4. Created velvet assembly of the contam+artifact+normalized filtered data (using velvet optimiser). 5. Created simulated 1-3 kb read pairs using velvet contigs from step 4. 6. Created allpaths assembly using velvet simulated read pairs (step 5) and the contam+artifact+normalized filtered data. 7. Ran multiple velvet assemblies at different kmer and coverage windows using the contam+artifact+normalized filtered data. Merged velvet contigs with allpaths contigs from step 6 and select only unique sequences. 7) RELEASE DATE: 11/17/2011 8) AUTHORS: For additional information, please contact: Kecia Duffy - [email protected] Stephan Trong - [email protected] James Han - [email protected] This file was automatically generated by the single cell pipeline software (version 1.1.2).
Funding
The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
Groups
This portal belongs to the following groups
## Name Type
1 Epsilonproteobacteria