The Thiamine diphosphate dependent Enzyme Engineering Database: A tool for the systematic analysis of sequence and structure relations
© Widmann et al; licensee BioMed Central Ltd. 2010
Received: 16 October 2009
Accepted: 1 February 2010
Published: 1 February 2010
Thiamine diphosphate (ThDP)-dependent enzymes form a vast and diverse class of proteins, catalyzing a wide variety of enzymatic reactions including the formation or cleavage of carbon-sulfur, carbon-oxygen, carbon-nitrogen, and especially carbon-carbon bonds. Although very diverse in sequence and domain organisation, they share two common protein domains, the pyrophosphate (PP) and the pyrimidine (PYR) domain. For the comprehensive and systematic comparison of protein sequences and structures the Thiamine diphosphate (ThDP)-dependent Enzyme Engineering Database (TEED) was established.
The TEED http://www.teed.uni-stuttgart.de contains 12048 sequence entries which were assigned to 9443 different proteins and 379 structure entries. Proteins were assigned to 8 different superfamilies and 63 homologous protein families. For each family, the TEED offers multisequence alignments, phylogenetic trees, and family-specific HMM profiles. The conserved pyrophosphate (PP) and pyrimidine (PYR) domains have been annotated, which allows the analysis of sequence similarities for a broad variety of proteins. Human ThDP-dependent enzymes are known to be involved in many diseases. 20 different proteins and over 40 single nucleotide polymorphisms (SNPs) of human ThDP-dependent enzymes were identified in the TEED.
The online accessible version of the TEED has been designed to serve as a navigation and analysis tool for the large and diverse family of ThDP-dependent enzymes.
Since the discovery of the first thiamine diphosphate (ThDP)-dependent enzyme in 1937, a multitude of them has been described and their catalytic mechanism was intensively analysed [1–3]. ThDP-dependent enzymes catalyze a wide variety of enzymatic reactions and therefore were assigned to the families of oxidoreductases, transferases, or lyases . The formation or cleavage of carbon-sulfur, carbon-oxygen, carbon-nitrogen, and especially carbon-carbon bonds are of utmost interest for bioorganic synthesis and organocatalysis [5, 6]. Because of their ability to form asymmetric C-C bonds, ThDP-dependent enzymes are versatile catalysts for a variety of biotransformations [7–12]. In addition, the ThDP-dependent enzyme family has been shown to possess a wide substrate spectrum ranging from small compounds like formaldehyde to bulky hydroxyl-phytanoyl-CoA molecules [13, 14]. For pharmacology, ThDP-dependent enzymes of human origin are of special interest. They have been identified as being involved in a variety of diseases like Alzheimer's disease and diabetes , and also play a role in tumor proliferation . Their highly diverse substrate specificity and catalytic activity is reflected in their sequence and structure which differs significantly between different families of ThDP-dependent enzymes. During the course of evolution, shuffling, rearrangement, and fusion of domains, as well as mutation, and gene duplications have led to the enormous diversity of ThDP-dependent enzymes [17, 18]. However, all ThDP-dependent enzymes contain at least two conserved domains, the pyrophosphate (PP) and the pyrimidine (PYR) domain, which have a similar structure  and are essential for binding and activating ThDP . The PYR domain has a conserved catalytic glutamic acid while the PP domain contains a conserved GDX25-30N motif [17, 20–22]. In addition to these two domains, additional domains were found such as the the transhydrogenase dIII domain (TH3) and the transketolase C-terminal domain (TKC) [17, 18, 23]. These additional domains are often not well characterised and in many cases their function in the catalytic process remains obscure . A unified classification scheme for ThDP-dependent enzymes based on a comprehensive analysis of sequence and structure does not yet exist. Based on a structural comparison, it was suggested that a total of 4 families should be sufficient to describe ThDP-dependent enzymes: DC (decarboxylases), TK (transketolases), OR (oxidoreductases), and KD (2-ketoacid dehydrogenase) . A sequence based evolutionary analysis suggested at least 6 different families, namely TK (transketolases)-like, PFRD (pyruvate ferredoxin reductase), 2OXO (2-oxoisovalerate dehydrogenase)-like, PDC (pyruvate decarboxylase)-like, SPDC (sulfopyruvate decarboxylase), and PPDC (phosphopyruvate decarboxylase) .
We established the Thiamine diphosphate dependent Enzyme Engineering Database (TEED) as a tool for a comprehensive and systematic comparison of ThDP-dependent enzymes from different protein families and annotated the conserved PP- and PYR domains. Thus, the TEED is the first data resource of ThDP-dependent enzymes which combines information on the individual protein families, sequence alignments and a consistent annotation of the conserved PYR and PP domains.
Construction and content
The Thiamine diphosphate (ThDP)-dependent Enzyme Engineering Database (TEED) was established by utilising the data warehouse system DWARF . The DWARF system is a collection of tools for the automated retrieval and integration of protein sequences and structures from different source databases and their subsequent integration into a local data warehouse system. The initial step in the construction of the database consisted of the selection of seed sequences of 62 proteins which represent members of the different ThDP-dependent protein families (Table A1, Additional file 1). Seed sequences were selected based on the enzymatic activity of the protein and the structural arrangement of protein domains. This selection was based on previous work [17, 18] which divided the members of the ThDP-dependent enzymes in different protein families.
Sequence entries with more than 98% sequence identity which shared the same source organism were assigned to the same protein entry. If more than one sequence was assigned to the same protein, the longest sequence was set as the reference sequence. If structural information was available for protein entries, structural monomers were downloaded from the Protein Data Bank  and stored as structure entries. Secondary structure information was calculated by DSSP  and displayed in the annotated multisequence alignments which were generated by ClustalW (v1.83) with default parameters . Additional annotation on structurally or functionally relevant residues (active site, disulfide bridges, signal peptide) were extracted from the NCBI entry and the respective residues were annotated in the TEED. Abbreviations for the established protein families are available in tabular form (Table A1, Additional file 1).
Features and functionalities
The online version of the TEED offers pre-calculated multisequence alignments and can be browsed by families, organisms, or structures. Phylogenetic trees are visualized by the program PHYLODENDRON . The PP and PYR domain of each ThDP-dependent protein family was manually annotated. If structural information for a protein homologous family was available, a structural alignment of the available structures was performed using STAMP .
Human ThDP-dependent enzymes
66 sequence entries from the TEED are of human origin (excluding sequences from crystal structure chains). Due to their medical importance they were systematically analysed. All human ThDP-dependent enzymes belong to only three superfamilies, the DC, TK, and K2 superfamily (Table A2, Additional file 2). The 66 sequences belong to 20 different proteins with several isoforms. The transketolase (gi: 205277463) with most isoforms (12) is implicated in the latent genetic disease Wernicke-Korsakoff syndrome  and has been found to be differentially expressed in the dorsolateral prefrontal cortex from patients with schizophrenia. Another human ThDP-dependent enzyme with many isoforms (7) is the 2-oxoisovalerate dehydrogenase subunit alpha (gi: 548403), also known as branched-chain alpha-keto acid dehydrogenase. This protein is involved in the catabolism of amino acids like isoleucine, leucine, and valine, and a defect causes the accumulation of these amino acids which leads to the maple syrup urine disease . One third of all sequence entries was labelled as 'putative' or 'unnamed' in the GenBank, and was assigned to a specific protein or protein family based on sequence similarity (Table A2, Additional file 2). However, because the function and substrate specificity can vary considerably even between homologous proteins, the assignment of a biochemical property based on sequence similarity only should be regarded as putative. All sequence entries were compared to the respective full sequence and were subsequently classified as either fragments or SNPs. Fragments consist of parts of the full sequence but show no exchange of amino acids while SNPs always show an exchange of amino acids (Table A2, Additional file 2).
Utility and discussion
The analysis of the human ThDP-dependent enzymes led to a reliable classification of several, previously unclassified proteins and demonstrates the advantage of a highly enriched database of a specific protein family. SNPs have been shown to play an important role in tumor development [34, 35] therefore a complete analysis for SNPs was included in the analysis of human ThDP-dependent enzymes. This analysis of SNPs is limited to sequences retrieved from GenBank  and thus complements specialised SNP repositories such as the dbSNP . Our analysis demonstrates that GenBank annotations are often incomplete and unreliable for the identification of proteins or protein variants. The transketolase (gi: 205277463) includes 12 different isoforms, of which 6 have been designated as protein fragments. Of these, only one sequence (gi: 193787540) shows an internal deletion, suggesting a truly altered protein product. The other 5 isoforms only show truncated N-termini and therefore could be sequencing artefacts of the original protein.
This kind of analysis is not limited to proteins from a specific organism but can be expanded to cover protein superfamilies or specific homologous families. It has been shown previously that a systematic classification of protein families can be used as a reliable framework for systematic analyses of protein families [38, 39] and for the engineering of protein mutants with improved biochemical properties [40, 41]. With the implemented domain annotation, an analysis is not limited to the whole protein sequence but protein families can also be specifically analyzed for differences and conserved features in the PP and PYR domains.
The database can be accessed on the level of sequence, structure, or organism. All protein entries link to the respective NCBI entries. Annotated multiple sequence alignments and phylogenetic trees are provided via the online accessible version of the TEED at http://www.teed.uni-stuttgart.de. For each family, the level of amino acid conservation is calculated by PLOTCON . BLAST searches  can be performed against the TEED using a local BLAST interface. Updates for the TEED will be performed regularly using an automated scripting system. For new sequence entries referring to a new structure in the Protein Data Bank (PDB), structure information is updated as well. New sequence and structure entries are assigned to existing homologous families and superfamilies based on their sequence similarity.
The Thiamine diphosphate dependent Enzyme Engineering Database (TEED) has been designed to serve as a navigation and analysis tool for the large and diverse family of ThDP-dependent enzymes. The annotation of the conserved pyrophosphate (PP) and pyrimidine (PYR) domains allows for a direct comparison and analysis of these domains between different families. Thus the TEED is a valuable tool for the study of the protein families of ThDP-dependent enzymes.
Availability and requirements
The Thiamine diphosphate dependent Enzyme Engineering Database (TEED) is online accessible at http://www.teed.uni-stuttgart.de. All information on families, sequence and structure data, as well as alignments and phylogenetic trees can be accessed by manual download.
List of abbreviations
Basic Local Alignment Search Tool
Define Secondary Structure of Proteins
Data warehouse system for analyzing protein families
Hidden Markov Model
Thiamine diphosphate dependent Enzyme Engineering Database
We acknowledge valuable contribution to the development of the domain annotation approach by Demet Sirim. We also thank Florian Wagner for support in the technical maintenance of the database. This work was supported by the DFG (PL145/6-1)
- Schellenberger A: Sixty years of thiamin diphosphate biochemistry. Biochimica Et Biophysica Acta-Protein Structure and Molecular Enzymology. 1998, 1385 (2): 177-186. 10.1016/S0167-4838(98)00067-3.View ArticleGoogle Scholar
- Jordan F: Current mechanistic understanding of thiamin diphosphatedependent enzymatic reactions. Natural Product Reports. 2003, 20 (2): 184-201. 10.1039/b111348h.PubMedView ArticleGoogle Scholar
- Frank RA, Leeper FJ, Luisi BF: Structure, mechanism and catalytic duality of thiamine-dependent enzymes. Cell Mol Life Sci. 2007, 64 (7-8): 892-905. 10.1007/s00018-007-6423-5.PubMedView ArticleGoogle Scholar
- Bairoch A, Bougueleret L, Altairac S, Amendolia V, Auchincloss A, Puy GA, Axelsen K, Baratin D, Blatter MC, Boeckmann B: The Universal Protein Resource (UniProt). Nucleic Acids Research. 2008, 36: D190-D195. 10.1093/nar/gkn141.View ArticleGoogle Scholar
- Enders D, Niemeier O, Henseler A: Organocatalysis by N-heterocyclic, carbenes. Chemical Reviews. 2007, 107 (12): 5606-5655. 10.1021/cr068372z.PubMedView ArticleGoogle Scholar
- Zeitler K: Extending mechanistic routes in heterazolium catalysis-promising concepts for versatile synthetic methods. Angewandte Chemie-International Edition. 2005, 44 (46): 7506-7510. 10.1002/anie.200502617.View ArticleGoogle Scholar
- Demir AS, Ayhan P, Sopaci SB: Thiamine pyrophosphate dependent enzyme catalyzed reactions: Stereoselective C-Cbond formations in water. Clean-Soil Air Water. 2007, 35 (5): 406-412. 10.1002/clen.200720003.View ArticleGoogle Scholar
- Mueller M, Gocke D, Pohl M: Thiamin diphosphate in biological chemistry: exploitation of diverse thiamin diphosphate-dependent enzymes for asymmetric chemoenzymatic synthesis. FEBS Journal. 2009, 276 (11):
- Pohl M, Sprenger GA, Muller M: A new perspective on thiamine catalysis. Curr Opin Biotechnol. 2004, 15 (4): 335-342. 10.1016/j.copbio.2004.06.002.PubMedView ArticleGoogle Scholar
- Berthold CL, Gocke D, Wood D, Leeper FJ, Pohl M, Schneider G: Structure of the branched-chain keto acid decarboxylase (KdcA) from Lactococcus lactis provides insights into the structural basis for the chemoselective and enantioselective carboligation reaction. Acta Crystallographica Section D-Biological Crystallography. 2007, 63: 1217-1224. 10.1107/S0907444907050433.View ArticleGoogle Scholar
- Iding H, Siegert P, Mesch K, Pohl M: Application of alpha-keto acid decarboxylases in biotransformations. Biochimica Et Biophysica Acta-Protein Structure and Molecular Enzymology. 1998, 1385 (2): 307-322. 10.1016/S0167-4838(98)00076-4.View ArticleGoogle Scholar
- Stillger T, Pohl M, Wandrey C, Liese A: Reaction engineering of benzaldehyde lyase from Pseudomonas fluorescens catalyzing enantioselective C-C bond formation. Organic Process Research & Development. 2006, 10 (6): 1172-1177.View ArticleGoogle Scholar
- Casteels M, Foulon V, Mannaerts GP, Van Veldhoven PP: Alpha-oxidation of 3-methyl-substituted fatty acids and its thiamine dependence. European Journal of Biochemistry. 2003, 270 (8): 1619-1627. 10.1046/j.1432-1033.2003.03534.x.PubMedView ArticleGoogle Scholar
- Bornemann S, Crout DHG, Dalton H, Hutchinson DW, Dean G, Thomson N, Turner MM: Stereochemistry of the Formation of Lactaldehyde and Acetoin Produced by the Pyruvate Decarboxylases of Yeast (Saccharomyces Sp) and Zymomonas-Mobilis - Different Boltzmann Distributions between Bound Forms of the Electrophile, Acetaldehyde, in the 2 Enzymatic-Reactions. Journal of the Chemical Society-Perkin Transactions 1. 1993, 309-311. 10.1039/p19930000309. 3
- Shils ME: Modern Nutrition in Health and Disease (Modern Nutrition in Health & Disease. 2006, Lippincott Williams & WilkinsGoogle Scholar
- Zhao J, Zhong CJ: A review on research progress of transketolase. Neurosci Bull. 2009, 25 (2): 94-99. 10.1007/s12264-009-1113-y.PubMedView ArticleGoogle Scholar
- Costelloe SJ, Ward JM, Dalby PA: Evolutionary analysis of the TPP-dependent enzyme family. J Mol Evol. 2008, 66 (1): 36-49. 10.1007/s00239-007-9056-2.PubMedView ArticleGoogle Scholar
- Duggleby RG: Domain relationships in thiamine diphosphate-dependent enzymes. Acc Chem Res. 2006, 39 (8): 550-557. 10.1021/ar068022z.PubMedView ArticleGoogle Scholar
- Wang JJL, Martin PR, Singleton CK: Aspartate 155 of human transketolase is essential for thiamine diphosphate magnesium binding, and cofactor binding is required for dimer formation. Biochimica Et Biophysica Acta-Protein Structure and Molecular Enzymology. 1997, 1341 (2): 165-172. 10.1016/S0167-4838(97)00067-8.View ArticleGoogle Scholar
- Candy JM, Duggleby RG: Structure and properties of pyruvate decarboxylase and site-directed mutagenesis of the Zymomonas mobilis enzyme. Biochimica Et Biophysica Acta-Protein Structure and Molecular Enzymology. 1998, 1385 (2): 323-338. 10.1016/S0167-4838(98)00077-6.View ArticleGoogle Scholar
- Fang R, Nixon PF, Duggleby RG: Identification of the catalytic glutamate in the E1 component of human pyruvate dehydrogenase. Febs Letters. 1998, 437 (3): 273-277. 10.1016/S0014-5793(98)01249-6.PubMedView ArticleGoogle Scholar
- Hawkins CF, Borges A, Perham RN: A Common Structural Motif in Thiamin Pyrophosphate-Binding Enzymes. Febs Letters. 1989, 255 (1): 77-82. 10.1016/0014-5793(89)81064-6.PubMedView ArticleGoogle Scholar
- Cromartie TH, Walsh CT: Escherichia-Coli Glyoxalate Carboligase - Properties and Reconstitution with 5-Deazafad and 1,5-Dihydrodeazafadh2. Journal of Biological Chemistry. 1976, 251 (2): 329-333.PubMedGoogle Scholar
- Fischer M, Thai QK, Grieb M, Pleiss J: DWARF--a data warehouse system for analyzing protein families. BMC Bioinformatics. 2006, 7: 495-10.1186/1471-2105-7-495.PubMedPubMed CentralView ArticleGoogle Scholar
- Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S: The Protein Data Bank. Acta Crystallogr D Biol Crystallogr. 2002, 58 (Pt 6 No 1): 899-907. 10.1107/S0907444902003451.PubMedView ArticleGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983, 22 (12): 2577-2637. 10.1002/bip.360221211.PubMedView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.PubMedPubMed CentralView ArticleGoogle Scholar
- PHYLODENDRON [http://iubio.bio.indiana.edu/treeapp/].
- Russell RB, Barton GJ: Multiple Protein-Sequence Alignment from Tertiary Structure Comparison - Assignment of Global and Residue Confidence Levels. Proteins-Structure Function and Genetics. 1992, 14 (2): 309-323. 10.1002/prot.340140216.View ArticleGoogle Scholar
- Zdobnov EM, Apweiler R: InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001, 17 (9): 847-848. 10.1093/bioinformatics/17.9.847.PubMedView ArticleGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763. 10.1093/bioinformatics/14.9.755.PubMedView ArticleGoogle Scholar
- Wang JJL, Martin PR, Singleton CK: A transketolase assembly defect in a Wernicke-Korsakoff syndrome patient. Alcoholism-Clinical and Experimental Research. 1997, 21 (4): 576-580.View ArticleGoogle Scholar
- Podebrad F, Heil M, Reichert S, Mosandl A, Sewell AC, Bohles H: 4,5-dimethyl-3-hydroxy-2[5H]-furanone (sotolone) - The odour of maple syrup urine disease. Journal of Inherited Metabolic Disease. 1999, 22 (2): 107-114. 10.1023/A:1005433516026.PubMedView ArticleGoogle Scholar
- Martin JI, Broaddus WC, Fillmore HI: A transcription factor decoy oligonucleotide that mimics the MMP-1 functional single nuclear polymorphism: A novel therapeutic for the inhibition of MMP-1 expression. Neuro-Oncology. 2004, 6 (4): 333-333.Google Scholar
- Mimori K, Inoue H, Shiraishi T, Ueo H, Mafune K, Tanaka Y, Mori M: A single-nucleotide polymorphism of SMARCB1 in human breast cancers. Genomics. 2002, 80 (3): 254-258. 10.1006/geno.2002.6829.PubMedView ArticleGoogle Scholar
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Research. 2009, 37: D26-D31. 10.1093/nar/gkn723.PubMedView ArticleGoogle Scholar
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Research. 2001, 29 (1): 308-311. 10.1093/nar/29.1.308.PubMedPubMed CentralView ArticleGoogle Scholar
- Fischer M, Pleiss J: The Lipase Engineering Database: a navigation and analysis tool for protein families. Nucleic Acids Res. 2003, 31 (1): 319-321. 10.1093/nar/gkg015.PubMedPubMed CentralView ArticleGoogle Scholar
- Knoll M, Hamm TM, Wagner F, Martinez V, Pleiss J: The PHA Depolymerase Engineering Database: A systematic analysis tool for the diverse family of polyhydroxyalkanoate (PHA) depolymerases. BMC Bioinformatics. 2009, 10: 89-10.1186/1471-2105-10-89.PubMedPubMed CentralView ArticleGoogle Scholar
- Seifert A, Pleiss J: Identification of selectivity-determining residues in cytochrome P450 monooxygenases: A systematic analysis of the substrate recognition site 5. Proteins-Structure Function and Bioinformatics. 2009, 74 (4): 1028-1035. 10.1002/prot.22242.View ArticleGoogle Scholar
- Seifert A, Vomund S, Grohmann K, Kriening S, Urlacher VB, Laschat S, Pleiss J: Rational Design of a Minimal and Highly Enriched CYP102A1 Mutant Library with Improved Regio-, Stereo- and Chemoselectivity. Chembiochem. 2009, 10 (5): 853-861. 10.1002/cbic.200800799.PubMedView ArticleGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.