Chitinophaga sp. JGI 0001002-D4
Seq. Project name:
Chitinophaga sp. JGI 0001002-D4 ( Project ID: 1007698 )
Product:
Microbial Minimal Draft, Single Cell
Proposal Name:
Rhizosphere Grand Challenge Single Cell Sequencing (Proposal ID: 939)
Project PI:
User Program:
Grand Challenge
Program Year:
2009
Scientific Program:
Microbial
Related Projects:
FD 1007697; SP 1007698; AP 1007699
Release Date:
2016-09-24
Organism
Genus/species/strain/isolate: Chitinophaga / Chitinophaga sp. JGI 0001002-D04 / JGI 0001002-D04 /
GOLD ID: Gp0025867
Data Submission
NCBI BioProject ID: 176042
NCBI Tax ID: 1235985
SRA accession:
  • SRP024522 (2015-07-22)
Contacts
JGI: [email protected]
Request DNA:
Susannah Tringe <[email protected]>
General Information
QD/SAG JGI SINGLE-CELL QC AND ASSEMBLY REPORT - 1007698 Chitinophaga sp. JGI 0001002-D4 1) RAW DATA: LibraryName NumReads ReadType FileName BUAX 26218120 2x150 /house/groupdirs/pi/project/4095255/ill_dir/IT017/jigsaw/2105.4.1762.GTAGAG.2401.4.1922.GTAGAG.fastq QC Dir:/house/groupdirs/QAQC/rqc_archive/PI/1007698/ill.qd 2) STD READ FILTERING STATS: Reads were screened against human contaminants, synthetic oligos used in the Illumina sequencing process and normalized. Pairs of matching reads were removed from the dataset. Total input reads: 26218120 (100%) Num contam reads removed: 586 (0.0%) - human_chr8 284 0.00% - human_chr5 214 0.00% - human_chr4 82 0.00% - human_chr13 2 0.00% - human_chr2 2 0.00% - human_chr6 2 0.00% Artifact reads removed: 123286 (0.5%) Normalized reads removed: 25638178 (97.8%) Total reads removed: 25762050 (98.3%) Total reads remaining: 456070 (1.7%) 3) STD READ IDENTIFICATION STATS This step identifies contaminants but does not remove them from the dataset. Total input reads: 26218120 (100%) Num contam reads identified: 262 (0.0%) - Ralstonia 146 0.00% - Delftia 114 0.00% - Cupriavidus 2 0.00% 4) ASSEMBLY STATS: a) Velvet assembly using VelvetOptimizer: Assembly stats of the Velvet assembly created by the velvet optimizer. The input reads have been filtered for contamination, artifacts and normalized. Avg GC Content: 48.25 +/- 5.47% Largest Contig: 59.6 KB Main genome scaffold total: 317 Main genome contig total: 351 Main genome scaffold sequence total: 1.8 MB Main genome contig sequence total: 1.8 MB (-> 0.2% gap) Main genome scaffold N/L50: 28/21.7 KB Main genome contig N/L50: 32/17.7 KB Number of scaffolds > 50 KB: 2 % main genome in scaffolds > 50 KB: 6.4% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- --------- ------- ----------- ----------- -------- All 317 351 1,806,619 1,801,449 99.71% 1 kb 156 186 1,698,888 1,694,121 99.72% 2.5 kb 93 115 1,600,831 1,597,063 99.76% 5 kb 75 96 1,540,686 1,537,062 99.76% 10 kb 59 77 1,428,449 1,425,191 99.77% 25 kb 17 23 661,659 660,346 99.80% 50 kb 2 4 115,668 115,414 99.78% b) Allpaths + Velvet simulated read pairs: Assembly stats of the ALLPATHS assembly. The input contains simulated 1-3 kb read pairs created from the Velvet assembly and reads that have been filtered for contamination, artifacts and normalized. Avg GC Content: 48.80 +/- 5.41% Largest Contig: 60.8 KB Main genome scaffold total: 153 Main genome contig total: 153 Main genome scaffold sequence total: 1.8 MB Main genome contig sequence total: 1.8 MB (-> 0.0% gap) Main genome scaffold N/L50: 27/23.3 KB Main genome contig N/L50: 27/23.3 KB Number of scaffolds > 50 KB: 1 % main genome in scaffolds > 50 KB: 3.4% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- --------- ------- ----------- ----------- -------- All 153 153 1,769,224 1,769,224 100.00% 1 kb 152 152 1,768,237 1,768,237 100.00% 2.5 kb 99 99 1,686,583 1,686,583 100.00% 5 kb 78 78 1,615,504 1,615,504 100.00% 10 kb 63 63 1,502,940 1,502,940 100.00% 25 kb 18 18 682,167 682,167 100.00% 50 kb 1 1 60,758 60,758 100.00% c) Merged Allpaths and optimized Velvet assemblies: Assembly stats of the final assembly. This assembly was performed using a hybrid of the ALLPATHS (referenced above) and Velvet optimized assemblies. The Velvet optimized assemblies were performed using reads that have been filtered for contamination, artifact and normalized. These assemblies were chosen based on the best kmer based on largest contig lengths and varying coverage cutoffs. The contigs of the velvet assemblies and the ALLPATHS assembly were merged and unique sequences selected. Avg GC Content: 48.32 +/- 5.29% Largest Contig: 60.8 KB Main genome scaffold total: 293 Main genome contig total: 293 Main genome scaffold sequence total: 1.8 MB Main genome contig sequence total: 1.8 MB (-> 0.0% gap) Main genome scaffold N/L50: 27/24.1 KB Main genome contig N/L50: 27/24.1 KB Number of scaffolds > 50 KB: 2 % main genome in scaffolds > 50 KB: 6.5% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- --------- ------- ----------- ----------- -------- All 293 293 1,833,965 1,833,965 100.00% 1 kb 157 157 1,745,180 1,745,180 100.00% 2.5 kb 96 96 1,652,640 1,652,640 100.00% 5 kb 70 70 1,561,521 1,561,521 100.00% 10 kb 59 59 1,480,288 1,480,288 100.00% 25 kb 22 22 817,931 817,931 100.00% 50 kb 2 2 118,712 118,712 100.00% 5) KEY PIPELINE CMDS: a) Contamination removal step: Bwa version: 0.5.9-r16 Bwa aln params: Bwa sampe params: -A -P -s b) Artifact removal step: duk params: -k 22 -s 1 -c 1 c) Normalization step: kmernorm params: -k 21 -t 15 -c 2 d) Velvet assembly step for creating simulated read pairs: Velvet version: 1.1.04 Velvet optimizer version: 2.1.7 Velvet optimizer params: --v --s 51 --e 71 --i 4 --t 1 --f "-shortPaired -fastq $FASTQ" --o "-ins_length 250 -min_contig_lgth 500" e) Simulated read pairing creation step: Wgsim version: 0.3.0 Wgsim params: -e 0 -1 100 -2 100 -r 0 -R 0 -X 0 f) ALLPATHS assembly step: ALLPATHS version: r41043 Contents of in_libs.csv: library_name, project_name, organism_name, type, paired, frag_size, frag_stddev, insert_size, insert_stddev, read_orientation, genomic_start, genomic_end STD_1,project,assembly,fragment,1,200,35,,,inward,0,0 SIMREADS,project,assembly,jumping,1,,,3000,300,inward,0,0 g) Velvet assembly step (varying kmer and coverage cutoffs): Velvet version: 1.1.04 Best Kmer: 55 Best cutoff coverages: 1 5 10 auto 6) WORKFLOW STEPS: 1. Removed contamination (human contaminants). 2. Removed illumina artifacts (synthetic oligos used in the laboratory). 3. Normalized read coverage. 4. Created velvet assembly of the contam+artifact+normalized filtered data. 5. Created simulated 1-3 kb read pairs using velvet contigs from step 4. 6. Created allpaths assembly using velvet simulated read pairs (step 5) and the contam+artifact+normalized filtered data. 7. Ran multiple velvet assemblies at different kmer and coverage windows using the contam+artifact+normalized filtered data. Merged velvet contigs with allpaths contigs from step 6 and select only unique sequences. 7) RELEASE DATE: Tue Aug 7 16:10:34 PDT 2012 By Kecia Duffy- [email protected] 8) AUTHORS: For additional information, please contact: Kecia Duffy - [email protected] Stephan Trong - [email protected] James Han - [email protected] This file was automatically generated by the jigsaw pipeline software (version 2.0.4).
Funding
The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
Groups
This portal belongs to the following groups
## Name Type
1 Bacteroidetes