Previously we identified, by homology to the E. coli gene, both the mouse and human cDNAs for 2-amino-3-ketobutyrate coenzyme A ligase, the second enzyme in the biochemical pathway that converts L-threonine to glycine [12]. In a search for the mouse cDNA of L-threonine dehydrogenase, which is the first enzyme in this pathway, I initially used the same approach. However, only expressed sequence tags belonging to sorbital dehydrogenase (and, with a much lower degree of homology, numerous isoforms of alcohol dehydrogenase) were identified. Nor were other candidate genes found in the human genomic sequence. Fortunately, Kao and Davis (1994) [14] had previously purified and characterised the porcine L-threonine dehydrogenase protein, that they had isolated from liver mitochondria and partially peptide, sequenced. This peptide sequence was used to identify mouse ESTs with significant homology by a back translation to nucleotides search. The program ESTblast [16] was used to construct a tentative mouse contiguous sequence from EST sequences. PCR primers were designed to match the 5' and 3' ends of the electronic contiguous sequence and used to amplify the gene from mouse liver and lung cDNA. After agarose gel electrophoresis, each primer set produced a single amplicon indicating that the gene is not alternatively spliced. The PCR products were cloned and sequenced. A blast search with the murine threonine dehydrogenase cDNA sequence of the pig EST database identified similar 5' and 3' ESTs (accession Nos. BE233801 and BI400146 respectively) and the sequence of these ESTs was utilised to design primers to amplify the pig L-threonine dehydrogenase from hepatocytes by RT-PCR.
Analysis of murine and porcine L-threonine dehydrogenase cDNAs
The 1508 bp mouse sequence has an ORF which encodes a 373 residue protein and has a ATTAAA polyadenylation signal at 1460–1465 (GenBank accession No. AY116662) (Fig. 1). A second clone (accession No. AF134346) includes 63 bp of 5'UTR and utilises a more 5' ATTAAA polyadenylation signal at 1350–1355. The predicted protein has a 41,461 Da molecular mass and an isoelectric point 8.45. The mouse genomic sequence for this cDNA is located on chromosome 14, band C (accession No. NW_000100, The Sanger Institute, UK). The gene spans 16.4 kb and consists of 9 exons. There is a 329 bp CpG island (64% CG) spanning the 5' untranslated exon (Fig. 2).
The pig sequence (GenBank accession No. AY095535) also has an ORF that encodes a 373-residue protein with a 41,432 Da molecular mass and an isoelectric point 7.67 (Fig. 3). At the nucleotide level, the porcine and mouse ORFs have 78% identity and at the protein level have 81% identity and 94% similarity. The potential polyadenylation signal on the pig sequence is homologous to the most 5' signal on the mouse sequence.
Comparison of the porcine L-threonine dehydrogenase ORF with sequenced peptides from the porcine L-threonine dehydrogenase enzyme
Evidence that the porcine cDNA encodes for L-threonine dehydrogenase comes from the high degree of similarity to sequenced peptides from the purified and structurally characterised porcine L-threonine dehydrogenase protein isolated from liver mitochondria [14]. The sequences of 13 porcine peptides have been aligned with the porcine ORF protein and have 98% identity over 212 residues (Fig. 4). The 5 mismatched residues are probably due to errors in peptide sequencing since they are located towards the end of the sequences.
Import into mitochondria
Mammalian L-threonine dehydrogenase is a nuclear encoded gene; the protein is synthesised in the cytoplasm and imported into mitochondria. The amino-terminal of the mature porcine L-threonine dehydrogenase protein isolated from mitochondria [14] corresponds to amino acid residue 51 on the porcine L-threonine dehydrogenase ORF (Fig. 4), which suggests that the pro-protein is cleaved to produce a 36 kDa mature enzyme. This value is close to that which would be expected since the mature porcine enzyme has a subunit molecular mass of 37 kDa on SDS-PAGE [14]. The amino-terminal region of the mouse, fly and nematode L-threonine dehydrogenase proteins all have characteristics of mitochondrial targeting sequences (Fig. 5), despite being the region of lowest similarity within the protein, having a high content of basic amino acids and few acidic amino acids [17].
Sequence homology in other species
A database search revealed the presence of L-threonine dehydrogenase genes in the genome of other organisms. The fly, Drosophila melanogaster has a 6 exon gene located on chromosome 3L which translates into a cDNA of 1288 bp, encoding the 367 residue CG5955 protein (accession No. AAF51607) (Fig. 5). The nematode, Caenorhabditis elegans has a 5-exon gene located on chromosome V (encompassing only the first 5 exons of the predicted 10 exons of the hypothetical gene product F08F3.4, accession No. AAB04871). By extending the fifth exon to the next polyadenylation site a 1217 bp cDNA is formed, encoding a 359-residue protein (Fig. 5). The cDNA sequences of both these genes are supported by EST data. The fly and nematode proteins have over 52% identity and 88% similarity to the mammalian proteins in the 306-residue, central core of the enzyme. Four exon/exon boundaries are conserved in two or more of the genes (Fig. 5). The search also revealed similar L-threonine dehydrogenase ESTs in amphibians, bony fishes, tunicates, flies, moths, mites, nematodes and trypanosomes, but not in higher plants and yeasts. Similarly, the gene for the second enzyme in this pathway, KBL, is also absent from the yeast, Saccharomyces cerevisiae [12], and no L-threonine dehydrogenase activity has been found in S. cerevisiae [18].
That L-threonine dehydrogenase sequences have been evolutionarily conserved between the Gram+ bacteria and mammals is shown by the homology between mouse and the amino-terminus peptide sequence from the threonine dehydrogenase of the Gram+ Firmicutes bacteria, C. sticklandii which has 54% identity and 93% similarity over 28 residues [15] (Fig. 6). C. sticklandii is an amino acid fermenting anaerobic bacterium that can grow on threonine as a sole substrate. Together, the mouse and C. sticklandii sequences enabled the identification of putative L-threonine dehydrogenase genes in a number of bacterial species such as Thermoplasma acidophilum, T. volcanium and Staphylococcus epidermidis. An alignment with the putative L-threonine dehydrogenase sequence, the SAV0542 gene, from S. aureus [19] that has 41% identity and 75% similarity to the mouse protein is shown in Fig. 6.
Mammalian threonine dehydrogenases have an NAD+ binding domain
A search of the protein structural database revealed that the closest matches with 19% identity were UDP-galactose 4-epimerases (GALE) from E. coli and Homo sapiens [20, 21]. GALE is a mixed alpha-helices/beta-sheet protein with a N-terminal NAD+ binding Rossmann-fold and belongs to the tyrosine-dependent oxidoreductase protein family (also known as short-chain dehydrogenases). The characteristic Tyr-X-X-X-Lys couple (residues 195 and 199) found in all family members are important for catalysis with the conserved tyrosine serving as the active-site base [21]. By comparison with the crystal structures of the GALE proteins, two domains were identified and it is likely that the substrate, L-threonine, is located in the cleft between the two domains. The larger amino-terminus domain (residues 58–231 on the mouse sequence) has a NAD+ binding motif. There are 12 conserved residues in the murine L-threonine dehydrogenase protein that are likely to contact the nicotinamide cofactor (Gly-62, Gly-65, Gly-68, Asp-88, Ile-107, His-127, Leu-131, Asn-147, Ser-169, Tyr-195, Lys-199 and Tyr-222) (Fig. 5). The smaller carboxy-terminus domain (residues 232–335) has little similarity to GALE and is likely to be involved in substrate binding.
Expression of L-threonine dehydrogenase mRNA in mouse tissues
To identify which tissues are likely to contribute to L-threonine dehydrogenase activity in the mouse, reverse-transcriptase real time PCR was used to examine the tissue distribution of L-threonine dehydrogenase mRNA. By reverse-transcriptase real time PCR L-threonine dehydrogenase expression was found in all tissues examined, being highest in liver, high in testis and spleen and lowest in skeletal muscle, relative to the expression of β-actin (Fig. 7A). Similar results were also obtained with another set of L-threonine dehydrogenase primers (located on exons 6 and 7) (data not shown). The expression of 2-amino-3-ketobutyrate coenzyme A ligase was also found in all tissues examined, being highest in liver and high in kidney. The expression level of the housekeeping gene β-actin was similar in all tissues examined. Another housekeeping gene, glyceraldehyde-3-phosphate dehydrogenase (G3PDH) was also used to standardise expression levels, but G3PDH expression showed greater variation between tissues, having higher expression in heart and skeletal muscle and lower expression in testis relative to β-actin (data not shown). After 40 cycles of PCR amplification, the amplicons were specific as verified by melting curve analysis (data not shown) and agarose gel electrophoresis (Fig. 7B).