Theobroma cacao public gene set pub3i 08 March 2012, D. Gilbert, gilbertd at indiana edu Gene class counts 21806 Class:Strong : >= 66% expression/homology evidence 5101 Class:Medium : >= 33% expression/homology evidence 2691 Class:Weak : >= 5% evidence, worth considering if more evidence turns up 13035 Class:Transposon : >= 33% transposon and no/weak expression 3465 Class:Poor : mixed bag of partial models 550 Class:None : no evidence 14977 Class:AltStrong : alternate transcripts from EST/rna assemblies 20 Class:AltMedium pub3i.good.ids main=29408 alts=14996 include Class:(Strong|Medium|Weak) and Alt transcripts First transcript ID ends with 't1' but isn't always the best of alternates. pub3i.good is corrected from pub3h,pub3g for CDS-exon errors: off-by-1, missing strand, partly mangled proteins, from transcript-gene-assembly software weak on CDS/protein methods. pub3h>3i: nupdate=570, ndrop=108 34 changemrna (CDS/exon/protein changes) 32 renamelocus ; 8 renamealt 491 newgoodlocus ; 5 other = newgoodlocus (shifted from notgood to good subset) 73 dropoverlaplocus; 21 droplocus ; 14 other = drop pub3g>3h: 2826 updated transcripts: 785 with CDS exon changes (333 main, 452 alts), 400 altered proteins, 1641 strand additions, 41 dropped records. CDS sequences in good set translate to protein sequences. There are CDS mismatches in non-good set. ------------------------------------------------------------ The cacao mitochondrial genome and associated genes, M16_mito_v1.0, have been withdrawn from public use for now (6 Dec 2020). This includes 214 genes mostly of Class Strong (112) or Medium, about 40 are 1-1 orthologs to other tested plant gene sets. The IDs for these are in cacao11genes_pub3g.mitoremoved.ids ------------------------------------------------------------ Gene data files: cacao11genes_pub3i.aa protein fasta cacao11genes_pub3i.cds coding dna transcript dna cacao11genes_pub3i.attr.txt gene annotation table (tabbed) cacao11genes_pub3i.gff gene location/annotation format cacao11genes_pub3i.good.ids IDs of Class:Strong|Medium|Weak (Alt included) cacao11genes_pub3i.good.{aa,tr,cds} fasta subset of Class:Strong|Medium|Weak Annotation fields in gene.attr.txt. Same values are in mRNA lines of gene GFF. transcriptID Thecc1EG000002t1 Thecc1EG000005t1 geneID Thecc1EG000002 Thecc1EG000005 isoform 1 1 quality1 Class:Strong Class:Strong quality2 Express:Strong Express:Strong quality3 Homology:OrthologStrong Homology:OrthologStrong quality4 Intron:Strong Intron:Strong quality5 Protein:complete Protein:complete aaSize 205 1269 cdsSize1 62% 77% cdsSize2 618/977 3810/4930 Name1 Cystathionine beta-synthase.. Kinesin-like calmodulin-binding.. Name2 82%T 74%T oname1 Uncharacterized protein Uncharacterized protein oname2 87%U 77%U groupname Cystathionine beta-synthase Kinesin-like calmodulin-binding.. Dbxref1 TAIR:AT5G10860.1 TAIR:AT5G65930.2 Dbxref2 82% 74% ortholog1 frave:gene01181 ricco:29682.m000589 ortholog2 87% 83% paralog1 Thecc1EG034062t1 Thecc1EG000957t1 paralog2 51% 12% uniprot1 UniRef50_B9I794 UniRef50_B9GJK9 uniprot2 87% 77% genegroup1 PLA9_G6641 PLA9_G3639 genegroup2 1/11/9 1/13/9 cacaoGD09 CGD0000016/C99.77 na cacaoTCR1 na Tc01_t000060/C99.83 intron1 100% 100% intron2 10/10 46/46 express1 94% 82% express2 75 99 estgroup LeafPistil LeafPistil location scaffold_1:7897-10405:+ scaffold_1:17413-27097:+ oid rna8b:r8L_g13025t00001 mar7g.mar11f:AUGepir7p1s1g7t1 score 7946 40120 Guide to cacao Evigene annotation table columns and GFF mRNA attributes: transcriptID (ID in gff mRNA) geneID (gene in gff mRNA, is Parent= to mRNA) isoform : alternate transcript number if > 1, matches ID suffix (t2,t3...) quality : evidence quality values for Expression Homology Intron Protein aaSize : protein aa length cdsSize : percent of transcript, cds length / transcript length Name : homology-derived gene name, P:Plant9 family, U:UniProt or T:TAIR, with percent align (88%P, 62%T, 74%U) oname : other name (from next best classifier above) Dbxref : cross reference gene IDs to TAIR, UniProt express : expressed span as percent of transcript estgroup : has significant expression from tissue groups Leaf,Pistil and/or Bean ortholog : protein orthology percent identity, and protein IDs paralog : protein paralogy percent identity, and gene ID genegroup : gene family ID from Orthomcl grouping of 9 plants genegroup2 : 1/11/9 found 1 cacao gene / 11 plant genes / 9 plant species (of 9 max) cacaoGD09 : equivalent Cacao CGD (Mars v0.9) gene cacaoTCR1 : equivalent Cacao Tc (Cirad v1.0) gene intron : evidence intron splices matched (10/10 for 5 matched introns) location : genome location oid : original model ID score : evidence score sum scorevec : evidence score vector Quality notes: Values are generally Strong/Medium/Weak/None Class: gene quality class as sum of evidence parts; Transposon, Poor special classes Express: Strong/Medium/Weak for percent of transcript with expression Homology: Ortholog if best match is other species, Paralog for this species Protein: complete or partial Intron: Strong/Medium/Weak depending on % and total of splice sites matched