The acidic domains of the Toc159 chloroplast preprotein receptor family are intrinsically disordered protein domains

Background The Toc159 family of proteins serve as receptors for chloroplast-destined preproteins. They directly bind to transit peptides, and exhibit preprotein substrate selectivity conferred by an unknown mechanism. The Toc159 receptors each include three domains: C-terminal membrane, central GTPase, and N-terminal acidic (A-) domains. Although the function(s) of the A-domain remains largely unknown, the amino acid sequences are most variable within these domains, suggesting they may contribute to the functional specificity of the receptors. Results The physicochemical properties of the A-domains are characteristic of intrinsically disordered proteins (IDPs). Using CD spectroscopy we show that the A-domains of two Arabidopsis Toc159 family members (atToc132 and atToc159) are disordered at physiological pH and temperature and undergo conformational changes at temperature and pH extremes that are characteristic of IDPs. Conclusions Identification of the A-domains as IDPs will be important for determining their precise function(s), and suggests a role in protein-protein interactions, which may explain how these proteins serve as receptors for such a wide variety of preprotein substrates.


Background
Most chloroplast proteins are encoded in the nucleus and translated in the cytosol with an N-terminal transit peptide that facilitates recognition by receptors of the Toc complex. In Arabidopsis, two families of GTPases are responsible for preprotein recognition; the Toc34 and Toc159 receptors [1][2][3][4][5][6]. Toc159 interacts with transit peptides [6] at early stages of import [2,7], suggesting that it is the primary preprotein receptor. However, it is unknown precisely how this receptor recognizes preproteins, and its function in subsequent preprotein translocation remains unclear. There are four Toc159-related proteins in Arabidopsis: atToc159, -132, -120 and -90 [8,9]. These receptors are able to distinguish between semi-distinct classes of substrates; atToc159 is implicated in the import of photosynthetic proteins, while atToc132 and atToc120 appear to be functionally redundant, and are primarily involved in the import of non-photosynthetic proteins [6,8,[10][11][12]. The Toc159 receptors have three distinguishable regions: an N-terminal acidic (A-) domain and a central GTPase (G-) domain, which extend into the cytosol, and a C-terminal membrane (M-) domain that anchors the protein to the outer chloroplast membrane [8,13]. The G-and Mdomains of the Arabidopsis family members share ~65% sequence identity [8,10]. The G-domain is involved in targeting Toc159 to the chloroplast during initial Toc complex assembly [14][15][16], comprises at least part of a transit peptide binding site [6], and acts as part of a GTP-regulated switch for preprotein recognition [7,17]. Less information is available regarding the A-and M-domains. The A-domain is highly variable in amino acid sequence between species and among the Toc159 family members in Arabidopsis (~20% identity) and has no known conserved functional motifs [8,10]. Although it appears to be non-essential for Toc159 function [13,17,18], the Adomain has been hypothesized to confer differential substrate recognition, owing to the variability in amino acid sequence among family members [8], and evidence has recently been presented that the Toc159 A-domain interacts with actin [19]. Despite reports on its dispensability for Toc159 function, the size of the A-domain (it accounts for almost 50% of the length of Toc159) suggests that it is likely to confer some important function(s) to the receptor.
Based on hydrophobic cluster analysis of its A-domain, Toc159 has been proposed to belong to a growing class of natively unstructured or intrinsically disordered proteins (IDPs) [20], which show lack of globular structure over their entire length or contain large unstructured regions [21], and have been estimated to account for up to ~30% of all proteins in higher eukaryotes [21,22]. Several notable characteristics of the Toc159 family A-domains are consistent with their classification as IDPs. They possess a high number of charged (acidic) amino acid residues, have a repetitive amino acid sequence, demonstrate aberrant mobility during SDS-PAGE and are highly sensitive to proteolysis [4,13,20,[23][24][25][26]. In addition, IDPs are known to undergo extensive post-translational modification, and in particular, are enriched in phosphorylation sites [21]. Consistent with this observation, the A-domain of Toc159 was recently identified in a proteomic survey of phosphorylated Arabidopsis proteins [27].
IDP domains are involved in highly dynamic protein-protein interactions [21,22], often of high specificity and low affinity, and may interact with many different binding partners, including IDP regions of other proteins. During such interactions, IDPs often undergo induced folding, which has been proposed to explain how they are able to achieve specific, yet low affinity interactions with multiple binding partners [21,28]. In the current study, CD spectroscopy was used to demonstrate that the A-domain of two members of the Arabidopsis Toc159 family, atToc159 and atToc132, are IDPs. This represents the first investiga-tion into the structure of the A-domains of the Toc159 family, and has implications for future studies aimed at understanding the function of these domains, and the role of Toc159 receptors in general, in chloroplast protein import.

A-domains are predicted to be natively unfolded
A recent study led to the suggestion that the A-domain of atToc159 may be natively unfolded [20]. In the current study, disorder within atToc132 (AGI# At2g16640) and atToc159 (AGI# At4g02510) was predicted using FoldIndex [29] and IUPred [30]. Delineation of the A-domains was designated as previously described [10]. Both programs predict the A-domains of atToc132 and atToc159 (residues 1-455 and 1-727, respectively) to be mainly unfolded ( Figure 1). The A-domains of atToc120 (AGI# The A-domains of atToc159 and atToc132 are predicted to be largely disordered Figure 1 The A-domains of atToc159 and atToc132 are predicted to be largely disordered. IUPred [30] (top panels) and FoldIndex [29] (lower panels) were used for disorder predictions of full-length atToc132 (A) and atToc159 (B). The amino acid numbers of the A-, G-and M-domain boundaries are indicated. The regions predicted to be disordered are shaded in dark grey. At3g16620) and Toc159 from Pisum sativum (psToc159) (Accession AAF75761) are also predicted to be largely disordered (data not shown). A-domains of atToc159 and atToc132 were selected as representatives for further examination.
Expression and purification of 132A His and 159A His E. coli-expressed A-domains of atToc132 and atToc159 possessing N-terminal His 6 tags (132A His and 159A His ) were purified using Ni 2+ -charged resin (Figure 2A, lanes 2 and 5). To gain a level of purity suitable for CD spectroscopy, the proteins were further purified by ion exchange (Figure 2A, lanes 3 and 6), and the identities of the ionexchange purified proteins were confirmed by Western blotting ( Figure 2B). The theoretical molecular weights of 132A His and 159A His are ~50 kDa and ~76 kDa, respectively; however, these proteins migrate at an apparent molecular weight approximately 50 kDa larger than expected during SDS-PAGE ( Figure 2A). The same phenomenon has been observed for full-length Toc159 [13,18]; however, when the A-domain is proteolytically degraded, the remainder of the protein (G+M domains) migrates as expected [13,18]. Aberrant electrophoretic mobility is characteristic of acidic proteins [31], and is a common property of IDPs [26].
Structural analysis of 132A His and 159A His using CD spectroscopy CD spectroscopy was used to assess the secondary structure content of 132A His and 159A His . Under non-denaturing conditions both proteins show far-UV spectra typical of unfolded proteins, characterized by the presence of a deep minimum in the vicinity of 200 nm and a relatively low ellipticity at ~220 nm ( Figure 3A) [32]. Spectra were deconvoluted, revealing the presence of 76% and 63% random coil secondary structure in 132A His and 159A His , respectively (Table 1). This indicates that at physiological temperature and pH, 132A His and 159A His are mainly disordered, supporting the hypothesis that the A-domains are IDPs.
Expression and purification of recombinant A-domains of atToc159 and atToc132 Purified recombinant A-domains of atToc159 and atToc132 were analysed using circular dichroism (CD) and fluores-cence spectroscopy Figure 3 Purified recombinant A-domains of atToc159 and atToc132 were analysed using circular dichroism (CD) and fluorescence spectroscopy. (A) Far-UV CD spectra of 132A His and 159A His at 25°C and pH 8.0. Temperature-dependent and pH-dependent far-UV CD spectra of 132A His (B and C) and 159A His (D and E) are also shown. Summary of the deconvoluted data is shown in Table 1. Intrinsic fluorescence of 132A His excited at 295 nm was measured at pH 3.2 and 7.5 (inset, panel C).

Effects of temperature and pH on A-domain structure
To further characterize the structural properties of the Adomains, the effects of temperature and pH on the conformation of 132A His and 159A His were investigated. Both 132A His and 159A His exhibit a modest temperatureinduced gain in secondary structure, as shown by an increase in negative ellipticity at ~220 nm with increasing temperature ( Figure 3B, D). Spectra deconvolution reveals that the α-helical content of 159A His increases from 5% to 7%, and β-sheet content increases from 32% to 38% at 65°C as compared to 25°C, coinciding with a decrease in random coil content from 63% to 55% (Table 1). Random coil content of 132A His also decreases from 76% (at 25°C) to 49% (at 65°C), and again, there is a concomitant gain in β-sheet content from 22% to 43%, and in αhelical content from 3% to 7% (Table 1). Such gains in secondary structure with increasing temperature are characteristic of IDPs, and are in contrast to the typical loss of structure associated with the heating of globular proteins [26,32]. In addition, both proteins gain considerable overall secondary structure at low pH ( Figure 3C, E). Specifically, 159A His contains 51% random coil, 8% α-helix and 41% β-sheet at pH ~3, compared to 63% random coil, 5% α-helix and 32% β-sheet at neutral pH. Likewise, 132A His contains 55% random coil, 25% α-helix and 20% β-sheet at pH ~3 compared to 76% random coil, 3% αhelix and 22% β-sheet at neutral pH (Table 1). These increases in secondary structure at low pH may be attributed to a decrease in net charge at a pH below their respective theoretical pI values of 4.25 (132A His ) and 4.0 (159A His ). Presumably, a decrease in net charge leads to a decrease in electrostatic repulsion between negatively charged residues, allowing for partial folding. An increase in structure for 132A His at low pH can also be detected using fluorescence spectroscopy ( Figure 3C, inset). 132A His contains two Trp residues (residues 225 and 234) that fluoresce when excited at 295 nm (159A His does not contain Trp, so was not analyzed using fluorescence spectroscopy). The fluorescence maximum of 132A His shifts to a lower wavelength at pH 3, suggesting that the Trp residues are less exposed to solvent as a result of partial folding at low pH -a commonly observed phenomenon of acidic IDPs [26,32].

Trifluoroethanol induces structure of A-domains
In the presence of trifluoroethanol (TFE) both 132A His and 159A His show a notable increase in secondary structure from ~3-5% α-helix, 22-32% β-sheet in the absence of TFE to 28% α-helix, 28% β-sheet in 50% TFE ( Figure 4, Table 1). These results, as well as the behaviour of the proteins at temperature and pH extremes, highlight the conformational flexibility of the A-domains and indicate they have the ability to form secondary structure depending on their environment. This conformational flexibility may reflect an ability to undergo conformational changes as part of their physiological function, for example during ligand binding. Several IDPs are noted for their ability to undergo significant conformational changes upon binding to their substrates (reviewed in [21]).
Interestingly, the A-domains showed differences in the amount and types of secondary structure gained at low pH and in the presence of 50% TFE. In particular, 132A His gained more structure than 159A His under these conditions. These differences in conformational flexibility could reflect functional differences between the Adomains of atToc132 and atToc159.

Discussion
As part of an evolutionary study into the origin of Toc159, it was suggested that the A-domain of atToc159 might be natively unfolded [20]. In the current study, we decided to take a structural approach to investigate this possibility in more detail, to gain insight into the function of the Adomain. We started by using disorder prediction programs that strongly predicted the A-domains of atToc132 and atToc159 (as well as atToc120 and psToc159) to be unstructured. In agreement with these predictions, the Adomains were shown experimentally to be disordered under non-denaturing conditions, and underwent structural changes characteristic of IDPs at extremes of temperature and pH. Furthermore, in the presence of 50% TFE, both A-domains gained considerable structure, which together with the effects of extreme temperature and pH, shows that the proteins have the propensity to shift to a more ordered state under certain conditions, which could include association with binding partners. Overall, the data presented here are consistent with the classification of the A-domains as intrinsically disordered protein domains. To date, the function of the A-domains remains largely unknown, with the exception of the recently suggested role in binding to actin filaments [19], thus the identification of the Toc159 family A-domains as IDPs has several potential implications for its function. In general, IDPs have a large surface area under physiological conditions allowing them to interact with several binding partners simultaneously [26]. Indeed, the A-domain accounts for almost 50% of the total length of atToc159, which represents a large surface area available for multiple protein-protein interactions. Finally, perhaps the most intriguing potential function for the A-domain that emerges from the finding that it is an IDP is a role in transit peptide recognition. Transit peptides are variable in length (typically 50-70 amino acids) and sequence, are rich in hydroxylated amino acids, scarce in acidic amino acids, and lack a defined three-dimensional structure in aqueous solution [36][37][38][39]. It is unknown precisely how they are recognized by receptors of the Toc complex; however, it has been shown that subgroups of transit peptides contain distinct motifs that affect their import efficiency and receptor specificity [12,[40][41][42][43]. Therefore, it is interesting to speculate that the disordered nature of the A-domains may facilitate interactions with multiple motifs within transit peptides, allowing for differential recognition of preproteins. The differences in structural dynamics observed between 159A His and 132A His in this study may be reflective of such an ability to discriminate between preproteins. In addition, it has been proposed that IDPs with large surface area may act in a "fly-casting" mechanism to increase the speed of low affinity protein-protein interactions [44], which would allow for preproteins to be efficiently passed to downstream components of the chloroplast protein 132A His and 159A His gain structure in the presence of trif-luoroethanol (TFE) Figure 4 132A His and 159A His gain structure in the presence of trifluoroethanol (TFE). Far-UV CD spectra of 159A His (A) and 132A His (B) in the absence (-TFE) and presence of 10% and 50% TFE at 25°C. A summary of the deconvoluted data is shown in Table 1.
import apparatus. Such transient, low affinity interactions would be consistent with the reversible, energy-independent binding of preproteins to chloroplasts at the initial stages of import [2,5], and may also partially explain the inability to detect A-domain-preprotein interactions in vitro [6], as well as discrepancies observed in the order of preprotein binding to Toc34 and Toc159 [1,45]. Analogously, the Tom70 receptor of the yeast mitochondrial protein import apparatus contains a disordered region that possesses multiple interaction sites for its mitochondrial protein substrates [46,47].
While interactions between the Toc159 family A-domains and transit peptides is an attractive mechanism for transit peptide recognition, detection of transient protein-protein interactions is technically challenging, and it is not yet clear whether the techniques employed here will be sufficient to detect potential interactions between the Adomain and transit peptides. CD may prove useful in this regard if association is accompanied by a disorder-toorder transition [26]; the acquisition of α-helical structure in the presence of TFE suggests that the A-domain has a propensity to do so ( Figure 4). Techniques such as surface plasmon resonance, isothermal titration calorimetry, and nuclear magnetic resonance may also prove useful in future attempts to test whether such (transient) interactions take place; some of these techniques have been used in studies on one of the best characterized IDPs, the phosphorylated kinase inducible activation domain (pKID), and others [26,48,49].

Conclusions
In summary, the A-domain represents a large portion of the Toc159 receptors and differs significantly among members of this family. The function(s) of this domain, however, has remained elusive. In this study, the structure of the A-domains has been investigated for the first time.
The finding that the A-domains are intrinsically disordered has implications for understanding their function(s), and future studies on the Toc159 receptors will be aimed at identifying A-domain binding partners to help elucidate the role of this domain in chloroplast protein import.

Cloning, expression, and purification of 132A His and 159A His
The first 1365 or 2181 basepairs of the atTOC132 and atTOC159 cDNAs, which correspond to the A-domains of atToc132 (132A) and atToc159 (159A), respectively, were sub-cloned by PCR using cDNA clones as templates [10,15]. Primer-adapters were used to incorporate an Nterminal His 6  Signal was captured using a Bio-Rad Fluor-S MultiImager in high sensitivity mode, equipped with a Nikkor AF 50 mm lens (Nikon), using an f-stop of 1.4 and an exposure time of 2 to 4 min. The images were analyzed using Quantity One 1-D Analysis software v4.6 (Bio-Rad Laboratories Inc.).

CD measurements and analysis
Far-UV CD spectra were measured on an Aviv 215 spectropolarimeter (Aviv Biomedical). Measurements were performed using rectangular quartz cells with 0.1 cm pathlength. 132A His and 159A His were measured at concentrations of 5 μM or 2.5 μM in CD buffer. Samples were equilibrated at the indicated temperature for 10 min prior to measurements, and pH was adjusted immediately prior to measurement for pH-dependent experiments. Spectra of protein samples and the buffer baseline were measured with a 0.5 nm/s scanning speed at 0.5 nm intervals, and were an average of four scans. Averaged buffer baseline spectra were subtracted from averaged protein sample spectra and the resultant corrected spectra were converted to mean residue ellipticity. Spectra were deconvoluted on the Dichroweb website [51] using the K2D method [52].