Info - Chloroflexi bacterium SCGC AAA257-O03

Seq. Project name:

Chloroflexi bacterium SCGC AAA257-O03 ( Project ID: 404486 )

Product:

Standard Draft

Proposal Name:

Unraveling the unique microbial diversity of the Etoliko lagoon in Western Greece through a single cell genomics approach (Proposal ID: 335)

Project PI:

George Tsiamis

User Program:

CSP

Program Year:

2011

Scientific Program:

Microbial

Genome Portal:

2264265202

Related Projects:

FD 1079102; SP 404486; AP 1423586

Release Date:

2012-05-14

Organism
Genus/species/strain/isolate:	Chloroflexi bacterium SCGC AAA257-O03 / SCGC AAA257-O03 /
GOLD ID:	Gp0015571

Data Submission

NCBI BioProject ID:	165537
NCBI Tax ID:	1130362
SRA accession:	SRP079217 (2016-07-21)

Contacts
JGI:	IMG [email protected]
Request DNA:	George Tsiamis <[email protected]>

General Information
	QD/SAG JGI SINGLE-CELL QC AND ASSEMBLY REPORT - 4096653 Chloroflexi bacterium SCGC AAA257-O03 1) RAW DATA: LibraryName NumReads ReadType FileName INWA 33309386 2x150 2259.2.1848.GGCTAC.fastq 2) STD READ FILTERING STATS: Reads were screened against human contaminants, synthetic oligos used in the Illumina sequencing process and normalized. Pairs of matching reads were removed from the dataset. Total input reads: 33309386 (100%) Num contam reads removed: 30 (0.0%) - human_chr2 6 0.00% - human_chr1 4 0.00% - human_chr14 4 0.00% - human_chr13 2 0.00% - human_chr11 2 0.00% - human_chrX 2 0.00% - human_chr21 2 0.00% - human_chr8 2 0.00% - human_chr4 2 0.00% - human_chr6 2 0.00% - human_chr15 2 0.00% Artifact reads removed: 85096 (0.3%) Normalized reads removed: 33026532 (99.2%) Total reads removed: 33111658 (99.4%) Total reads remaining: 197728 (0.6%) 3) STD READ IDENTIFICATION STATS This step identifies contaminants but does not remove them from the dataset. Total input reads: 33309386 (100%) Num contam reads identified: 228 (0.0%) - Escherichia 218 0.00% - Delftia 4 0.00% - Shigella 4 0.00% - Ralstonia 2 0.00% 4) ASSEMBLY STATS: a) Velvet assembly using VelvetOptimizer: Assembly stats of the Velvet assembly created by the velvet optimizer. The input reads have been filtered for contamination, artifacts and normalized. Avg GC Content: 45.28 +/- 5.61% Largest Contig: 86.4 KB Main genome scaffold total: 208 Main genome contig total: 240 Main genome scaffold sequence total: 842.8 KB Main genome contig sequence total: 839.3 KB (-> 0.3% gap) Main genome scaffold N/L50: 16/14.8 KB Main genome contig N/L50: 18/12.5 KB Number of scaffolds > 50 KB: 1 % main genome in scaffolds > 50 KB: 10.2% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- --------- ------- ----------- ----------- -------- All 208 240 842,798 839,289 99.58% 1 kb 107 133 773,387 770,340 99.61% 2.5 kb 58 75 697,150 695,014 99.69% 5 kb 40 54 636,758 634,978 99.72% 10 kb 24 35 521,344 520,042 99.75% 25 kb 3 4 166,659 166,456 99.88% 50 kb 1 1 86,365 86,305 99.93% b) Allpaths + Velvet simulated read pairs: Assembly stats of the ALLPATHS assembly. The input contains simulated 1-3 kb read pairs created from the Velvet assembly and reads that have been filtered for contamination, artifacts and normalized. Avg GC Content: 46.06 +/- 5.04% Largest Contig: 66.5 KB Main genome scaffold total: 91 Main genome contig total: 91 Main genome scaffold sequence total: 797.4 KB Main genome contig sequence total: 797.4 KB (-> 0.0% gap) Main genome scaffold N/L50: 14/18.6 KB Main genome contig N/L50: 14/18.6 KB Number of scaffolds > 50 KB: 1 % main genome in scaffolds > 50 KB: 8.3% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- --------- ------- ----------- ----------- -------- All 91 91 797,420 797,420 100.00% 1 kb 91 91 797,420 797,420 100.00% 2.5 kb 58 58 748,995 748,995 100.00% 5 kb 41 41 691,296 691,296 100.00% 10 kb 28 28 593,313 593,313 100.00% 25 kb 5 5 209,530 209,530 100.00% 50 kb 1 1 66,498 66,498 100.00% c) Merged Allpaths and optimized Velvet assemblies: Assembly stats of the final assembly. This assembly was performed using a hybrid of the ALLPATHS (referenced above) and Velvet optimized assemblies. The Velvet optimized assemblies were performed using reads that have been filtered for contamination, artifact and normalized. These assemblies were chosen based on the best kmer based on largest contig lengths and varying coverage cutoffs. The contigs of the velvet assemblies and the ALLPATHS assembly were merged and unique sequences selected. Avg GC Content: 45.37 +/- 5.25% Largest Contig: 72.0 KB Main genome scaffold total: 185 Main genome contig total: 185 Main genome scaffold sequence total: 863.7 KB Main genome contig sequence total: 863.7 KB (-> 0.0% gap) Main genome scaffold N/L50: 16/17.0 KB Main genome contig N/L50: 16/17.0 KB Number of scaffolds > 50 KB: 1 % main genome in scaffolds > 50 KB: 8.3% Minimum Number Number Total Total Scaffold Scaffold of of Scaffold Contig Contig Length Scaffolds Contigs Length Length Coverage -------- --------- ------- ----------- ----------- -------- All 185 185 863,677 863,677 100.00% 1 kb 96 96 805,767 805,767 100.00% 2.5 kb 58 58 749,546 749,546 100.00% 5 kb 41 41 691,123 691,123 100.00% 10 kb 27 27 583,491 583,491 100.00% 25 kb 5 5 215,289 215,289 100.00% 50 kb 1 1 72,034 72,034 100.00% 5) KEY PIPELINE CMDS: a) Contamination removal step: Bwa version: 0.5.9-r16 Bwa aln params: Bwa sampe params: -A -P -s b) Artifact removal step: duk params: -k 22 -s 1 -c 1 c) Normalization step: kmernorm params: -k 21 -t 15 -c 2 d) Velvet assembly step for creating simulated read pairs: Velvet version: 1.1.04 Velvet optimizer version: 2.1.7 Velvet optimizer params: --v --s 51 --e 71 --i 4 --t 1 --f "-shortPaired -fastq $FASTQ" --o "-ins_length 250 -min_contig_lgth 500" e) Simulated read pairing creation step: Wgsim version: 0.3.0 Wgsim params: -e 0 -1 100 -2 100 -r 0 -R 0 -X 0 f) ALLPATHS assembly step: ALLPATHS version: r41043 Contents of in_libs.csv: library_name, project_name, organism_name, type, paired, frag_size, frag_stddev, insert_size, insert_stddev, read_orientation, genomic_start, genomic_end STD_1,project,assembly,fragment,1,200,35,,,inward,0,0 SIMREADS,project,assembly,jumping,1,,,3000,300,inward,0,0 g) Velvet assembly step (varying kmer and coverage cutoffs): Velvet version: 1.1.04 Best Kmer: 61 Best cutoff coverages: 1 5 auto 6) WORKFLOW STEPS: 1. Removed contamination (human contaminants). 2. Removed illumina artifacts (synthetic oligos used in the laboratory). 3. Normalized read coverage. 4. Created velvet assembly of the contam+artifact+normalized filtered data. 5. Created simulated 1-3 kb read pairs using velvet contigs from step 4. 6. Created allpaths assembly using velvet simulated read pairs (step 5) and the contam+artifact+normalized filtered data. 7. Ran multiple velvet assemblies at different kmer and coverage windows using the contam+artifact+normalized filtered data. Merged velvet contigs with allpaths contigs from step 6 and select only unique sequences. 7) RELEASE DATE: Mon May 7 10:44:12 PDT 2012 By Alexander Spunde- [email protected] 8) AUTHORS: For additional information, please contact: Stephan Trong - [email protected] James Han - [email protected] This file was automatically generated by the jigsaw pipeline software (version 2.0.1).

General Information

QD/SAG JGI SINGLE-CELL QC AND ASSEMBLY REPORT - 
4096653
Chloroflexi bacterium SCGC AAA257-O03

1) RAW DATA:

LibraryName	NumReads	ReadType	FileName
INWA	33309386	2x150	2259.2.1848.GGCTAC.fastq


2) STD READ FILTERING STATS:

Reads were screened against human contaminants, synthetic oligos used
in the Illumina sequencing process and normalized. Pairs of matching
reads were removed from the dataset.

Total input reads:          33309386 (100%)
Num contam reads removed:   30 (0.0%)
  - human_chr2 	6	0.00%
  - human_chr1 	4	0.00%
  - human_chr14	4	0.00%
  - human_chr13	2	0.00%
  - human_chr11	2	0.00%
  - human_chrX 	2	0.00%
  - human_chr21	2	0.00%
  - human_chr8 	2	0.00%
  - human_chr4 	2	0.00%
  - human_chr6 	2	0.00%
  - human_chr15	2	0.00%
Artifact reads removed:     85096 (0.3%)
Normalized reads removed:   33026532 (99.2%)
Total reads removed:        33111658 (99.4%)
Total reads remaining:      197728 (0.6%)

3) STD READ IDENTIFICATION STATS

This step identifies contaminants but does not remove them from the dataset.

Total input reads:           33309386 (100%)
Num contam reads identified: 228 (0.0%)
  - Escherichia	218	0.00%
  - Delftia    	4	0.00%
  - Shigella   	4	0.00%
  - Ralstonia  	2	0.00%

4) ASSEMBLY STATS:

a) Velvet assembly using VelvetOptimizer:
Assembly stats of the Velvet assembly created by the velvet
optimizer. The input reads have been filtered for contamination,
artifacts and normalized.

Avg GC Content: 45.28 +/- 5.61%

Largest Contig: 86.4 KB

Main genome scaffold total: 208
Main genome contig total:   240
Main genome scaffold sequence total: 842.8 KB
Main genome contig sequence total:   839.3 KB (->  0.3% gap)
Main genome scaffold N/L50: 16/14.8 KB
Main genome contig N/L50:   18/12.5 KB
Number of scaffolds > 50 KB: 1
% main genome in scaffolds > 50 KB: 10.2%

 Minimum    Number    Number     Total        Total     Scaffold
Scaffold      of        of      Scaffold      Contig     Contig
 Length   Scaffolds  Contigs     Length       Length    Coverage
--------  ---------  -------  -----------  -----------  --------
    All       208        240      842,798      839,289    99.58%
   1 kb       107        133      773,387      770,340    99.61%
 2.5 kb        58         75      697,150      695,014    99.69%
   5 kb        40         54      636,758      634,978    99.72%
  10 kb        24         35      521,344      520,042    99.75%
  25 kb         3          4      166,659      166,456    99.88%
  50 kb         1          1       86,365       86,305    99.93%

b) Allpaths + Velvet simulated read pairs:
Assembly stats of the ALLPATHS assembly. The input contains simulated
1-3 kb read pairs created from the Velvet assembly and reads that
have been filtered for contamination, artifacts and normalized.

Avg GC Content: 46.06 +/- 5.04%

Largest Contig: 66.5 KB

Main genome scaffold total: 91
Main genome contig total:   91
Main genome scaffold sequence total: 797.4 KB
Main genome contig sequence total:   797.4 KB (->  0.0% gap)
Main genome scaffold N/L50: 14/18.6 KB
Main genome contig N/L50:   14/18.6 KB
Number of scaffolds > 50 KB: 1
% main genome in scaffolds > 50 KB:  8.3%

 Minimum    Number    Number     Total        Total     Scaffold
Scaffold      of        of      Scaffold      Contig     Contig
 Length   Scaffolds  Contigs     Length       Length    Coverage
--------  ---------  -------  -----------  -----------  --------
    All        91         91      797,420      797,420   100.00%
   1 kb        91         91      797,420      797,420   100.00%
 2.5 kb        58         58      748,995      748,995   100.00%
   5 kb        41         41      691,296      691,296   100.00%
  10 kb        28         28      593,313      593,313   100.00%
  25 kb         5          5      209,530      209,530   100.00%
  50 kb         1          1       66,498       66,498   100.00%

c) Merged Allpaths and optimized Velvet assemblies:
Assembly stats of the final assembly. This assembly was performed
using a hybrid of the ALLPATHS (referenced above) and Velvet
optimized assemblies. The Velvet optimized assemblies were performed
using reads that have been filtered for contamination, artifact and
normalized. These assemblies were chosen based on the best kmer based
on largest contig lengths and varying coverage cutoffs. The contigs
of the velvet assemblies and the ALLPATHS assembly were merged and
unique sequences selected.

Avg GC Content: 45.37 +/- 5.25%

Largest Contig: 72.0 KB

Main genome scaffold total: 185
Main genome contig total:   185
Main genome scaffold sequence total: 863.7 KB
Main genome contig sequence total:   863.7 KB (->  0.0% gap)
Main genome scaffold N/L50: 16/17.0 KB
Main genome contig N/L50:   16/17.0 KB
Number of scaffolds > 50 KB: 1
% main genome in scaffolds > 50 KB:  8.3%

 Minimum    Number    Number     Total        Total     Scaffold
Scaffold      of        of      Scaffold      Contig     Contig
 Length   Scaffolds  Contigs     Length       Length    Coverage
--------  ---------  -------  -----------  -----------  --------
    All       185        185      863,677      863,677   100.00%
   1 kb        96         96      805,767      805,767   100.00%
 2.5 kb        58         58      749,546      749,546   100.00%
   5 kb        41         41      691,123      691,123   100.00%
  10 kb        27         27      583,491      583,491   100.00%
  25 kb         5          5      215,289      215,289   100.00%
  50 kb         1          1       72,034       72,034   100.00%

5) KEY PIPELINE CMDS:

a) Contamination removal step:
Bwa version: 0.5.9-r16
Bwa aln params: 
Bwa sampe params: -A -P -s

b) Artifact removal step:
duk params: -k 22 -s 1 -c 1

c) Normalization step:
kmernorm params: -k 21 -t 15 -c 2

d) Velvet assembly step for creating simulated read pairs:
Velvet version: 1.1.04
Velvet optimizer version: 2.1.7
Velvet optimizer params: --v --s 51 --e 71 --i 4 --t 1 --f "-shortPaired -fastq $FASTQ" --o "-ins_length 250 -min_contig_lgth 500"

e) Simulated read pairing creation step:
Wgsim version: 0.3.0
Wgsim params: -e 0 -1 100 -2 100 -r 0 -R 0 -X 0

f) ALLPATHS assembly step:
ALLPATHS version: r41043
Contents of in_libs.csv:
library_name, project_name, organism_name, type, paired, frag_size, frag_stddev, insert_size, insert_stddev, read_orientation, genomic_start, genomic_end
STD_1,project,assembly,fragment,1,200,35,,,inward,0,0
SIMREADS,project,assembly,jumping,1,,,3000,300,inward,0,0

g) Velvet assembly step (varying kmer and coverage cutoffs):
Velvet version: 1.1.04
Best Kmer: 61
Best cutoff coverages:
1
5
auto

6) WORKFLOW STEPS:
1. Removed contamination (human contaminants).
2. Removed illumina artifacts (synthetic oligos used in the laboratory).
3. Normalized read coverage.
4. Created velvet assembly of the contam+artifact+normalized filtered data.
5. Created simulated 1-3 kb read pairs using velvet contigs from step 4.
6. Created allpaths assembly using velvet simulated read pairs (step 5) and the contam+artifact+normalized filtered data.
7. Ran multiple velvet assemblies at different kmer and coverage windows using the contam+artifact+normalized filtered data. Merged velvet contigs with allpaths contigs from step 6 and select only unique sequences.

7) RELEASE DATE:
Mon May  7 10:44:12 PDT 2012 By Alexander Spunde- [email protected]

8) AUTHORS:
For additional information, please contact:
Stephan Trong - [email protected]
James Han - [email protected]

This file was automatically generated by the jigsaw pipeline software (version 2.0.1).


								

Funding
The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Groups

This portal belongs to the following groups

##	Name	Type
1	Chloroflexi