- Research article
- Open Access
The FTO (fat mass and obesity associated) gene codes for a novel member of the non-heme dioxygenase superfamily
BMC Biochemistryvolume 8, Article number: 23 (2007)
Genetic variants in the FTO (fat mass and obesity associated) gene have been associated with an increased risk of obesity. However, the function of its protein product has not been experimentally studied and previously reported sequence similarity analyses suggested the absence of homologs in existing protein databases. Here, we present the first detailed computational analysis of the sequence and predicted structure of the protein encoded by FTO.
We performed a sequence similarity search using the human FTO protein as query and then generated a profile using the multiple sequence alignment of the homologous sequences. Profile-to-sequence and profile-to-profile based comparisons identified remote homologs of the non-heme dioxygenase family.
Our analysis suggests that human FTO is a member of the non-heme dioxygenase (Fe(II)- and 2-oxoglutarate-dependent dioxygenases) superfamily. Amino acid conservation patterns support this hypothesis and indicate that both 2-oxoglutarate and iron should be important for FTO function. This computational prediction of the function of FTO should suggest further steps for its experimental characterization and help to formulate hypothesis about the mechanisms by which it relates to obesity in humans.
Two recent reports [1, 2] characterized the strong association of a number of single nucleotide polymorphisms (SNPs) in intron 1 of the human FTO gene with an increased risk of obesity, characterized by an increase in body max index due to fat mass rather than lean mass that is seen in children as early as age seven .
However, the mechanisms by which this genetic variability relates to obesity remain obscure. These publications indicate that the function of FTO is unknown  and that its protein has no identified structural domain or link to other proteins that could be used to predict its function . Knowledge of the function of FTO is crucial to guide the search for a mechanism relating this gene to obesity.
Here we report evidence obtained by computational analysis indicating that the protein coded by FTO is a member of the non-heme dioxygenase (Fe(II)- and 2-oxoglutarate-dependent dioxygenases) superfamily.
Results and Discussion
In the course of the computational characterization of the FTO family (see Methods) we identified sequences homologous to human FTO in different eukaryote groups including vertebrates (from fish to mammals), green algae (Ostreococcus) and diatoms (Phaeodactylum and Thalassiosira) (see Figure 1 and Table 1).
Using sequence profiles of the N-terminal conserved region of the FTO family (corresponding to the human FTO sequence amino acid positions 57–324) members of the non-heme dioxygenase family were identified. Additionally, the secondary structure predictions of the FTO family showed high similarity with the known structures of AlkB, a member of the non-heme dioxygenase family [3–5]. We were not able to find significant homology in the C-terminal of the FTO family to other genes.
To investigate if fold recognition analysis would generate supporting results, we submitted the FTO N-terminal region as a query to an independent fold assignment system based on profile-profile comparisons (see Methods). The profiles generated for the human and E. coli AlkB proteins (PDB entries 2iuw and 2fdi) matched the FTO N-terminal region with an E-Value of 3.2 × 10-21 and 3.1 × 10-12, respectively (estimated error rate < 3%) despite their low level of sequence identity to the human FTO protein (approximately 17%). The next match corresponded to the hypothetical protein TM0957 from Thermotoga maritima, however it was considered unreliable given its short length (28 amino acids) and high E-value (0.02).
Given the E-values of the HMMer searches, the reliability of secondary structure predictions, and the fold assignment results, we are confident that the proteins of the FTO family (including the protein coded by the FTO human gene) are members of the non-heme dioxygenase superfamily.
Proteins of this superfamily catalyze different oxidative reactions on multiple substrates producing varied biological effects  and are characterized by a number of conserved amino acids involved in the binding of iron and 2-oxoglutarate (as a cofactor and co-substrate, respectively). We found these amino acids in human FTO and in its homologs (see Figure 1 and Table 1), suggesting that 2-oxoglutarate and iron are essential for the normal function of the FTO protein.
The FTO family is not a unique case as other families of the non-heme dioxygenase superfamily are also very divergent and their detection required non-trivial computational analysis . Due to the divergence of the FTO family from already known non-heme dioxygenases, we were unable to predict the target of the family's catalytic action.
The ubiquitous expression of FTO throughout many human tissues  indicates that it has an important function. The phylogenetic distribution of FTO homologs (consistently present in organisms from fish to mammals) suggests that this gene appeared during the evolution of vertebrates. Intriguingly, FTO homologs can be found in green algae Ostreococcus and diatoms, whereas they are apparently absent in insects, worms and fungi (see Figure 1 and Table 1). The most parsimonious explanation of this fact is the existence of independent events of horizontal gene transfer from vertebrates to protists. Horizontal gene transfer has previously been related to the evolution of several eukaryotic regulatory systems that function in development, differentiation and apoptosis . Concisely, horizontal transfer of FTO indicates that the FTO protein has a function that confers a selective advantage but that it is not indispensable, which agrees with a possible regulatory role.
For comparison, the hypoxia-inducible factor (HIF), a known member of the non-heme dioxygenase family, also has a wide phylogenetic distribution (from worms to mammals) and is ubiquitously expressed in all human tissues. HIF acts as a sensor of oxygen level and affects the expression of over one hundred genes . This molecule performs its activity by shuttling between the cytoplasm in normoxic conditions and the nucleus in hypoxic conditions .
To investigate if FTO could be acting in a similar manner, we studied its sequence using an algorithm for prediction of protein cellular localization (WolfPSORT; ). The results suggested with similar scores a cytoplasmic and a nuclear-cytoplasmic localization for this protein. This is consistent with human FTO's possible function as a metabolic sensor and nuclear effector.
The FTO human gene product has a predicted molecular mass of 50 KDa. With that mass it would need a Nuclear Localization Signal (NLS)  in order to act in the nucleus. Analysis of FTO's sequence using an algorithm that includes the prediction of NLS (PSORTII; ) suggested a 17 amino acid long bipartite NLS from positions 2 to 18 (Figure 2A) noted previously  but not experimentally verified. Further analysis of the family indicated that this region stands as a K/R rich region in comparison to the rest of the sequence, and that it is located in an N-terminal extension that is conserved in close human homologs from fish to mammals but not in the other FTO homologues we found in algae or diatomea.
In light of these computational results we hypothesize that FTO is a sensor of the cell's metabolic state and when dysfunctional can result in an obese phenotype. We identify the N-terminal of human FTO as having a high likelihood of determining its cellular localization, which could be verified by mutational analysis.
Here we have provided valuable information about FTO by indicating its possible catalytic function, and we have pointed to the amino acids involved in cofactor (Fe) and co-substrate (2-oxoglutarate) binding in human FTO as well as in its homologous proteins in other organisms, which could be used as models for the study of the human disease. This insight should help to guide experiments to clarify the mechanisms by which FTO relates to obesity and to accelerate the discovery of novel molecular therapies for this condition.
We first performed BLAST sequence similarity searches  using the human FTO protein as query against different sequence database resources: NCBI , ENSEMBL  and JGI . Multiple sequence alignments of protein sequences homologous to human FTO were generated with the program T-Coffee  using default parameters, slightly refined manually and visualized with the Belvu program (Figure 1. Top) .
Profiles of the alignment as global hidden Markov models (HMMs) were generated using HMMer . Profile-based sequence searches were performed against the Uniref50 and Uniref90 protein sequence databases  using HMMsearch . We used NAIL  to view and analyze the HMMsearch results, which provided a formatted view with hyperlinks to related web resources and coloring related to taxonomic information, thus facilitating the interpretation of the results.
Fold recognition analyses were performed using profile-to-profile comparisons of the HMM profile of the FTO family to profiles generated for each sequence of known structure with its homologues (HHpred server; [23, 24]). The significance of sequence-to-sequence, profile-to-sequence, and profile-to-profile matches were evaluated in terms of an E-value, which is an estimation of the probability of finding a better match by chance. Secondary structure predictions were performed using the PredictProtein Server [25, 26]. AlkB active center illustrations (Figure 1. Bottom) were generated with Pymol .
(Alkylated DNA repair protein)
(Expressed sequence tags)
(Find Genes using HMM)
(fat mass and obesity associated)
(Hidden Markov Models)
(Joint Genome Institute)
(National Center for Biotechnology Information)
(Nuclear Localization Signal)
(Single nucleotide polymorphisms)
Dina C, Meyre D, Gallina S, Durand E, Korner A, Jacobson P, Carlsson LM, Kiess W, Vatin V, Lecoeur C, et al.: Variation in FTO contributes to childhood obesity and severe adult obesity. Nat Genet. 2007, 39 (6): 724-726. 10.1038/ng2048.
Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, Perry JR, Elliott KS, Lango H, Rayner NW, et al.: A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007, 316 (5826): 889-894. 10.1126/science.1141634.
Aravind L, Koonin EV: The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome Biol. 2001, 2 (3): RESEARCH0007-10.1186/gb-2001-2-3-research0007.
Sundheim O, Vagbo CB, Bjoras M, Sousa MM, Talstad V, Aas PA, Drablos F, Krokan HE, Tainer JA, Slupphaug G: Human ABH3 structure and key residues for oxidative demethylation to reverse DNA/RNA damage. Embo J. 2006, 25 (14): 3389-3397. 10.1038/sj.emboj.7601219.
Yu B, Edstrom WC, Benach J, Hamuro Y, Weber PC, Gibney BR, Hunt JF: Crystal structures of catalytic complexes of the oxidative DNA/RNA repair enzyme AlkB. Nature. 2006, 439 (7078): 879-884. 10.1038/nature04561.
Ozer A, Bruick RK: Non-heme dioxygenases: cellular sensors and regulators jelly rolled into one?. Nat Chem Biol. 2007, 3 (3): 144-153. 10.1038/nchembio863.
Iyer LM, Aravind L, Coon SL, Klein DC, Koonin EV: Evolution of cell-cell signaling in animals: did late horizontal gene transfer from bacteria have a role?. Trends Genet. 2004, 20 (7): 292-299. 10.1016/j.tig.2004.05.007.
Semenza GL: Targeting HIF-1 for cancer therapy. Nat Rev Cancer. 2003, 3 (10): 721-732. 10.1038/nrc1187.
Kallio PJ, Okamoto K, O'Brien S, Carrero P, Makino Y, Tanaka H, Poellinger L: Signal transduction in hypoxic cells: inducible nuclear translocation and recruitment of the CBP/p300 coactivator by the hypoxia-inducible factor-1alpha. EMBO J. 1998, 17 (22): 6573-6586. 10.1093/emboj/17.22.6573.
Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K: WoLF PSORT: protein localization predictor. Nucleic Acids Res. 2007, W585-W587. 10.1093/nar/gkm259. 35 Web Server
Lusk CP, Blobel G, King MC: Highway to the inner nuclear membrane: rules for the road. Nat Rev Mol Cell Biol. 2007, 8 (5): 414-420. 10.1038/nrm2165.
Peters T, Ausmeier K, Ruther U: Cloning of Fatso (Fto), a novel gene deleted by the Fused toes (Ft) mouse mutation. Mammalian Genome. 1999, 10 (10): 983-986. 10.1007/s003359901144.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.
NCBI's BLAST server. [http://www.ncbi.nlm.nih.gov/BLAST/]
DOE Joint Genome Institute. [http://www.jgi.doe.gov/]
Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302 (1): 205-217. 10.1006/jmbi.2000.4042.
Sonnhammer EL, Hollich V: Scoredist: a simple and robust protein sequence distance estimator. BMC Bioinformatics. 2005, 6: 108-10.1186/1471-2105-6-108.
Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14 (9): 755-763. 10.1093/bioinformatics/14.9.755.
Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH: UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007, 23 (10): 1282-1288. 10.1093/bioinformatics/btm098.
Janelia farm Hmmer web site. [http://hmmer.janelia.org/]
Sanchez-Pulido L, Yuan YP, Andrade MA, Bork P: NAIL-Network Analysis Interface for Linking HMMER results. Bioinformatics. 2000, 16 (7): 656-657. 10.1093/bioinformatics/16.7.656.
HHpred web server. [http://toolkit.tuebingen.mpg.de/hhpred]
Soding J: Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005, 21 (7): 951-960. 10.1093/bioinformatics/bti125.
PredictProtein web server. [http://www.predictprotein.org/]
Rost B: PHD: predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymol. 1996, 266: 525-539.
Pymol web site. [http://pymol.sourceforge.net/]
Holm L, Sander C: Dali: a network tool for protein structure comparison. Trends Biochem Sci. 1995, 20 (11): 478-480. 10.1016/S0968-0004(00)89105-7.
Supplementary material web server. [http://www.pdg.cnb.uam.es/FTO]
Huska MR, Buschmann H, Andrade-Navarro MA: BiasViz: Visualization of amino acid biased regions in protein alignments. Bioinformatics. 2007, 23 (22): 3093-3094. 10.1093/bioinformatics/btm489.
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2007, D5-12. 10.1093/nar/gkl1031. 35 Database
MAA is a recipient of a Canada Research Chair in Bioinformatics.
LSP carried out the initial sequence and structural analysis of the domain. LSP and MAA interpreted the data and prepared the manuscript. All authors read and approved the final manuscript.