Typically, this page will provide you with information about the genome you are viewing including the project status, assembly details, a list of collaborators, and the funding sources.
NOTE: You must register in order to annotate!Annotators
If you are a registered annotator, you can login to annotate by the various methods listed below. It will ask you to login if you are not already, and you can view/modify and enter additional information about the transcript and protein.
An updated list of annotators is generally maintained at a Community maintained site. This list will include the name of the annotator, their affiliation and contact information, the specific area of Organism biology that they represent, and the gene families or the pathways that the annotator intends to cover. Please contact the Principal Collaborator (contact information is maintained on the Organism portal) to obtain the list.
The Joint Genome Institute generated gene models for protein-coding genes using several algorithms. In an attempt to cut down on the redundancy, we filter the models based on homology (percent id and percent cover, completeness of model) and EST support to create a virtual track, called FilteredModels, which represents those choices. Models initially chosen for the FilteredModels track are automatically entered into the gene catalog, which is represented on the browser as the GeneCatalog track. This GeneCatalog track will eventually constitute our reference list of gene models, which will be submitted to GenBank. The annotation process consists of deciding whether the chosen model in the GeneCatalog track is correct, and if not, removing it from the track and promoting an alternative gene model or creating a new one. Once manual curation starts, the curators go through and search for gene models of interest. You are in charge of curating various annotation fields, as described below, and recording interesting findings that describe the genome biology of Organism.
Manual curation involves the following activities:
Where should I start?
How do I find a gene of interest?
There are several different entry points into the portal to find your gene of interest.
More detailed information for these tools can be found in the JGI portal Help! pages.
How do I know the selected gene model is good or not?
Compare the selected model with:
Pick the best available model. If it is different from the one in the GeneCatalog, replace it by changing the disposition on the annotation page. If no available models are good enough, you can create your own in the user track using the Track Editor tool.
Ideas on how to discover bad gene models and what to fix
To begin, prioritize your annotation effort. Gene models should be attacked in declining interest based on their possible functions in biological processes and based on the amount of supporting data. For example, protein features on the genome scaffolds can have (1) good support – including some mix of computational predictions, tBlastn matches to selected proteins, gapped (Blat) EST matches, or Blastx matches, (2) unmatched protein features that have protein support but lack sequence similarity to an annotated protein, (3) weak features that lack protein support but can have numerous good EST matches.
Next, determining the best available track in the genome browser. For a protein feature with sequence similarity matches to other proteomes, this process consists of identifying the best blastx match to the scaffold sequence or the best Blastp match to a predicted protein sequence. The best track is used to determine/modify the boundaries (coodinates) of the gene feature, which results in a need to re-determine of the best track in an iterative process (probe reversal). If there is a reasonable homologue to the gene feature in a tandem or near-tandem position, this is very likely the true match to the Organism gene sequence under investigation.
Using the best tracks as guides, modify the boundaries of the FilteredModel.
(1) [Should the gene model be merged with, or split from, another locus?]
A note on pseudogenes
It is far easier to identify a pseudogene than to demonstrate that a locus is a functional gene. Real genes are often outnumbered by their pseudogenes. However, a very recent pseudogene may be hard to distinguish from a working gene; there are cases of long ORFs with ATG start and a single stop codon, frequent transcription, yet have defective translation initiation.
Pseudogenes of moderate age are usually easily recognizable. They have accumulated many defects across the coding sequence but not so many that homology to the parent gene is blurred. A single internal stop codon is not completely persuasive as there are cases of mRNA editing. Plus, draft genome sequences do contain errors.
Older pseudogenes may have drifted off to the point of marginal recognizability. Determining a good parental sequence and event boundaries become problematical. Nonetheless, ancient pseudogenes can still establish whether a given protein domain was present at the time of formation – these are still identifiable long after point mutations have largely obliterated alignments. (Absence of a domain might also mean retro-insertion of an alternatively spliced mRNA.)
How do I replace the model in GeneCatalog?
Use the "disposition" field on the annotation page to replace models in the GeneCatalog. You can enter:
* "catalog" if the model is correct
Please do not forget to "delete" the incorrect models, otherwise they will appear concurrently with the correct ones. In some cases, you may want to leave several concurrent gene models in the catalog at the same locus if you cannot decide which is best or in the case of alternative splicing.
NOTE: The "delete" option does not delete the model or its annotation from the database. It simply removes it from the Catalog track.
How do I create a new/better model?
First, expand all model tracks and evaluate all models at the same locus. Likely, a better model was already generated by one of gene predictors but, for some reason, was not promoted to the GeneCatalog. If none of the models is good, create your own in the user track using a user-friendly interface described at http://genome.jgi-psf.org/help/track_editor.jsf
The Track Editor tool allows you to:
Once editing is finished, protein analysis will automatically be run on the newly edited model only if the model has been released. Releasing a model does not automatically add it to the GeneCatalog. You must go to the annotation page (or model web page) and set the disposition to "catalog". The automatically catalogued gene model will remain in the gene catalog at that locus unless it is manually deleted.
How do I access the annotation page?
There are multiple ways of accessing annotation pages
1) Through the gene model's Protein page
2) Via Advanced Searching directly against annotations
3) Through the GO/KEGG/KOG functional tools
How do I annotate a gene?
Once you are on the annotation page, you must login with your individual annotator username and password to be able to edit the fields listed on that page (to get the username and password, you must be a registered annotator).
The "transcript" page offers several fields that can be filled individually by clicking on the add button (or edit if you have already entered annotation for this gene). When completing these fields, bear in mind that your annotation will eventually be translated into a Gene entry in NCBI. Click here for an example of a NCBI Gene entry
Gene name – This is equivalent to GenBank Gene Symbol. The gene name nomenclature is organism specific, so for internal consistency refer to you commnity’s standards. Good examples of a well-defined naming convention are Daphnia and Chlamydomonas. Please do not deviate from the gene naming rules. If there are concerns, please send your comments to the Principal Collaborator by email. To avoid duplicating gene names, please check the list of roots. In case of conflicting names, as in every case where you have a question about annotation entered by someone else, we encourage you to communicate directly with the other annotators, while carbon copying (cc) to Principal Collaborator. Do not assign a name if the gene function is unknown. If no names are assigned to a gene, a numerical identifier (locus_id) will be assigned to every gene at the time of GeneBank submission (e.g. GENSP_12300).
Defline – Unike the gene name, this field is mandatory. It will be part of the field "Name" in the "General protein information" "Product/Function" section of the Gene entry, and will accompany the sequence in the Fasta format, right after the identifier at all databases including NCBI. It should be a short (<85 characters) precise description of the gene and gene product, and if possible, it should include the gene's main function(s). It should include the full standard name of the protein, but no acronyms, EC numbers, or species names, all of which can be listed in description field. Very often, the defline of a related entry in Swissprot can be used as such.
To avoid duplicating synonym names, you can use the "Advanced Search" page (type for example: "FBP*" before calling a Frataxin-Binding Protein FBP1). In case of conflicting names, as in every case where you have a question regarding annotation entered by someone else, we encourage you to communicate directly with the other annotators.
Defline rules and examples
There is no formatting (italics, bold face, underline) possible for Defline entries. The species and gene name (e.g. Genus speceis GENSP_010204) will be added automatically to create the final defline, so please do not enter these designations.
Gene Descriptions, Model notes, Dispositions and examples
Description can be as detailed as needed, provided that the information is accurate and useful to researchers not familiar with the type of protein. Include information about the functions of the protein, its domains, splicing variants, interactions or subcellular location, comments about its phylogenetic origin, relationship to paralogs and orthologs, clustering with genes of related function, or overlap with neighboring genes, etc. With proper caution, you can input your
Example of a gene description:
Literature as bibliographic references documenting the gene can also be placed here. Rather than whole reference, use the Pubmed identifiers (PMID) directly copied from the bottom of the Entrez page. For example: PMID: 12185496 . This will appear as a clickable link on the JGI transcript page, and later in the Bibliography section of the Gene entry.
Model notes are not indexed for searching, but are useful for any detailed analysis. If the model appears correct, choose "no issue" from the pull-down menu. Otherwise, you can either change the model (see above), or simply place your comments here, for future use by yourself or others. Indicate whether you suspect misassembly, e.g. "C-terminus probably represented by xxx-xxx on scaffold zzz". State how the model needs to be modified, e.g. "6th intron not supported by EST data", "probably misses an N-terminal extension of 120-150 residues", or "should be fused with upstream gene model", etc. You can also place here a FASTA (<80 character lines) of your version of the transcript or predicted protein. In case of splicing variants / alternative transcription starts, you should generate a new model AND describe variants in the model notes (if biologically significant, enter also in the description). As usual, be concise and precise.
Disposition field is used to decide whether the gene model should appear in the Catalog. You can enter "catalog" if the model is correct, "tentative" if you are unsure but is still the best available model, or "delete" if this is not a gene or you want to replace it by a model from another track. Do not forget to "delete" the erroneous models, otherwise they will appear concurrently with the correct ones. In some cases, you may want to leave several concurrent gene models in the catalog, for example if you cannot decide which is best, or in the case of an alternative splicing.
NOTE: "delete" does not delete the model or its annotation; it just removes it from the Catalog track.
EST evidence If in your opinion the gene model is supported by available ESTs please choose "Yes" in this field.
How do I keep track of curated models?
Search page if you already worked with a gene in the Portal and know its model name, protein or trancript identifiers, you can enter them here.
Make a note of your most interesting/bizarre findings, for it may be useful when it comes to writing your manuscript and may be selected for inclusion in the main genome paper. For example, a new gene family, an unusual domain structure, evidence for horizontal gene transfer, an unsuspected metabolic pathway, functional gene clustering, shortest exon, alternatively spliced variants with biological significance, overlapping genes, gene in an intron, etc... whatever deserves attention!!
Keep a list of interesting gene model names, protein page URLs or links created for the browser. Using the following link, you can access all models you have curated so far:
The functional annotations of genes are based on the Gene Ontology and KOG classifications. Please refer to http://www.geneontology.org and http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=12969510&dopt=Citation for details. At the time that we transfer the annotated genome sequence to NCBI in December, information will be fed automatically from the Automatic Ontology field, unless you fill in this field, so enter data here only if the automatic assignment has failed (false positives are not infrequent). Although this categorization may seem crude or even pointless at times, it is our only tool to automatically group genes by function, pathway etc. So please consider the functional annotation an important part of your effort to make the genome a useful tool in the long run.
NOTE: No online tool is available for annotating RNA genes at the moment.
A large body of automatic annotation is already available on the protein and transcript pages. Carefully read the JGI Help! pages. It gives an excellent overview of the many tools available to identify the function of a gene. The Search page allows you to search all models (searches the annotation of the protein hits). It also allows searching the items in the various alignment tracks and the gene models by name. The Advanced Search page searches only the Catalog and Filtered Model tracks. Fields searched include the automatic annotation and model name, plus the user-entered description, defline and gene name. Use the GO, KEGG and KOG pages for lists of genes that have been automatically assigned to a pathway or function.