Pattern similarity study of functional sites in protein sequences: lysozymes and cystatins
© Nakai et al; licensee BioMed Central Ltd. 2005
Received: 08 July 2004
Accepted: 18 May 2005
Published: 18 May 2005
Although it is generally agreed that topography is more conserved than sequences, proteins sharing the same fold can have different functions, while there are protein families with low sequence similarity. An alternative method for profile analysis of characteristic conserved positions of the motifs within the 3D structures may be needed for functional annotation of protein sequences. Using the approach of quantitative structure-activity relationships (QSAR), we have proposed a new algorithm for postulating functional mechanisms on the basis of pattern similarity and average of property values of side-chains in segments within sequences. This approach was used to search for functional sites of proteins belonging to the lysozyme and cystatin families.
Hydrophobicity and β-turn propensity of reference segments with 3–7 residues were used for the homology similarity search (HSS) for active sites. Hydrogen bonding was used as the side-chain property for searching the binding sites of lysozymes. The profiles of similarity constants and average values of these parameters as functions of their positions in the sequences could identify both active and substrate binding sites of the lysozyme of Streptomyces coelicolor, which has been reported as a new fold enzyme (Cellosyl). The same approach was successfully applied to cystatins, especially for postulating the mechanisms of amyloidosis of human cystatin C as well as human lysozyme.
Pattern similarity and average index values of structure-related properties of side chains in short segments of three residues or longer were, for the first time, successfully applied for predicting functional sites in sequences. This new approach may be applicable to studying functional sites in un-annotated proteins, for which complete 3D structures are not yet available.
In their recent review of protein sequence analysis in silico, Michalovich et al.  described the methodology for transferring functional annotation of known proteins to a novel protein. Computer-assisted technology is used to search for and assign the similarity from databases of well-maintained and previously annotated sources. Sequence-based and profile-based searches are conducted using BLAST and PSI-BLAST, respectively. Meanwhile, the Hidden Markov model is more efficient in searching for a distant family. Furthermore, structure-based annotation conducted by using a combination of PSI-BLAST and GenThreader (matching of substitution energy in evolution) may facilitate rapid functional annotation from structure . However, proteins sharing the same fold can have different functions, and structure determination and analysis will not always mean that function can be derived . There are examples of protein families, such as the four-helical cytokine and cytochrome super families, whose sequence similarities are either very low or not detectable . Instead, their topography is more conserved than their sequences. This is rational, since protein functions are classified based on function per se, regardless of whether their sequences or 3D structures are similar or different. An example is the classification of a protein as possessing the function of lysozyme activity, as long as the protein possesses the ability of hydrolyzing peptidoglycans.
Another direct approach for peptide QSAR has been simultaneously investigated in peptide sequence analysis . A critical difference between those two approaches, namely bioinformatics and QSAR, is the prerequisite of 3D structure information on the basis of evolutionary conservation in the case of former; on the other hand, the 3D information is helpful but not always indispensable in the case of latter, by substituting with simpler steric parameters to account for the functional mechanism . For example, Hellberg et al.  used altogether 29 properties of side chains of bioactive peptides. After dimension reduction using principal components analysis (PCA) the resultant three main PC scores, i.e., z1, z2 and z3, representing hydrophobicity, molecular size and electronic parameter, respectively, were used as independent variables in regression analysis on the dependent variable of functionality .
Meanwhile, by using the homology similarity analysis (HSA), we have found the importance of functional segments within 15-residue sequences of lactoferricin derivatives to correlate with the minimum inhibitory concentration (MIC) . Pattern similarity constant (a correlation coefficient) of the pattern of segments within a test derivative, in comparison to the reference pattern of the corresponding segment and the average of property values of the amino acid side-chains in the most potent derivative, was computed and correlated with MIC of the derivatives. In order to obtain the best (lowest) MIC, the pattern similarity should be close to 1.0 and the average property value should be close to that of the reference potent peptide (template). In the case of the above lactoferricin derivatives, higher correlation coefficients were obtained for log MIC predicted by HSA vs. measured log MIC computed as the output variables of regression ANN (artificial neural networks) than by sequence analysis based on the Hellberg approach . More recently, a different approach, namely "additive QSAR" obtained by substituting with other amino acid residues at different positions in the same sequences, was reported to correlate well with peptide functions .
Lejon et al.  reported that in PCA analysis of peptide sequences, information of the positions of side chains in the sequence should be included for improving R2X value compared to the results obtained by computing without side chain position data (0.99 vs. 0.60, respectively). Our HSA approach, by segregating segments with and without α-helix propensity, was in good agreement with theirs (R2X of 0.90–0.94 compared to corresponding value of 0.60 but with a much larger number of derivatives). We have further extended this approach to infer the mechanism of emulsifying capacity of peptides with 10–32 residues as a function of hydrophobic periodicity . For the study of emulsification function, a new homology similarity search (HSS) was introduced to plot similarity constants and average property values of segments (3–7 residues) by shifting the segment stepwise from N-terminus towards C-terminus of the sequences; the reference segment used was ELE, i.e., alternate cycle of charged (E), hydrophobic (L) and charged (E) residues. However, emulsification ability is a rather general function of peptides that is not dependent on specific active sites within the sequences; overall, the emulsification ability of peptides was highly correlated with hydrophobic periodicity of their entire sequences.
There are cases of "peptides" which do not have definitive functional sites but requiring specific segments, or "functions" which do require neither specific sites nor segments. The lactoferricin derivatives described in the above study  are an example of the former since all of the mutants were prepared as derivatives of the corresponding wild-type lactoferricin 15-residue sequence, which has distinct helical and cationic segments. In contrast, the peptide emulsions  are an example of the latter. Typical examples of proteins with definitive functional sites are enzymes, for which the positions of active sites are critical to elucidate the functional mechanisms. Defective protein folding leading to amyloid fibril formation has been associated with various human diseases, such as Alzheimer's and Creutzfelds-Jacob diseases. In 1993, hereditary non-neuropathic systemic amyloidosis was reported to be caused by naturally occurring variants of human lysozyme that aggregated in the liver . Similarly, cystatin C mutation in an elderly man was reported to be the cause of amyloid angiopathy and intracerebral hemorrhage .
The recent discovery of a new-fold enzyme named Cellosyl  led us to select the lysozyme family in this study as an important one to use for validating the HSS approach to search for functional sites . Meanwhile, loss of papain inhibitory activity in recombinant human cystatin C was reported to be due to insolubilization . Assuming that this loss was induced by an amyloidosis, changes of helix-to-strand in the inhibitory sites as well as the binding sites with papain could also be used for a rational example of application of the HSS approach in this study.
The objective of this paper was to extend application of this new HSS approach to search for functional sites, such as active and substrate binding sites in lysozyme and amyloidosis of cystatin families, to verify the reliability of our new method. The intention was to validate the hypothesis that the evaluation of pattern similarity of short segments with 3–7 residues or even slightly longer in protein sequences is useful in predicting functionality, assuming that they are within allowable topographical units. Accordingly, it is not our intention to replace the 3D approach by the new peptide QSAR proposed in this study; rather, it is anticipated to be supplemental.
Segment pattern similarity search for active sites
Determination of active sites in sequences of lysozymes in different families
HSS search for substrate binding sites
Determination of substrate binding sites in amino acid sequences of lysozyme families
For the goose lysozyme, six potential binding positions were detected (Table 2); I113 and G163appear more likely to be the binding sites than other positions considering the location of the cleft in the molecule. Similarly, nine sites were found to be potential sites of T4 lysozyme, especially three positions, i.e. M6, L66 and S136 (Table 2). In the case of Cellosyl, eight positions were identified as the potential binding sites (Table 2). Probably due to the considerable 3D-structure difference of this lysozyme from those of other lysozyme families , the alignment positions 40–45 (T34EGTNY) instead of hen's 84–89 (I55LQINS) were used as a reference segment for obtaining more rational search results. Instead of hydrogen-bonding motivated interactions, less polar van der Waals interaction with the aromatic side chains in CH-lysozyme may be regarded as the second important stereochemical forces in the substrate binding .
In the literature, the most frequently cited substrate-binding sites in c-type lysozymes family have been W62 and D101 of hen lysozyme . Since Figure 2 includes the distant family of CH-type, the segment similarity search was repeated within the c-type lysozymes alone to restrict the search within similar fold. The results are shown as "hen 2" and "human 2" in Table 2. Those results almost perfectly match to the substrate binding mechanism based on X-ray crystallographic analysis, e.g. D101, N103, N104, A107, V109, E35, N46, V110, E52, N59, and W63 . Almost all of these side-chains are very close or adjacent to the segments listed in "Hen 1" and "Hen 2" of Table 2.
Substrate binding sites reported by site-directed mutagenesis
Among three mutants obtained by replacing W62 with Y, F or H, the W62H mutant, and especially the double mutant W62H/D101G, reduced substrate binding drastically . This change can be explained by a decrease in the hydrogen bond average value from 0.58 to 0.54 and from 0.46 to 0.29 in V62H and D101G, respectively, when the index values employed in this study were used in computation. The double mutant changed substrate-binding mode while maintaining the overall protein structure almost identical to that of the wild type . An extensive cluster of hydrophobic structure is involved in distinct regions of the sequence, but is all disrupted by a single point mutation of W62G located at the interface of the two major structural domains in the native lysozyme . Similar effects were observed in mutants Y63L and D102E of human lysozyme . The double mutants R41N/R101S and V74R/Q126R of human lysozyme were better catalysts for lysis of Micrococcus lysodeikticus . The average hydrogen bond value of both R41N and R102S was shown to increase in our HSS search, but similar effects could not be observed for V74R/Q126R. An interesting finding is that these two mutations have both resulted in the side chains being identical to those of hen lysozyme. R41 and V74 are near A42 and A73, respectively. Importance of R115 in substrate binding of human lysozyme was reported , which is in good agreement of W112 within the same subsite F (Table 2).
Active and substrate binding sites of cystatins
Binding site 1
Binding site 2
Substrate binding sites
A HSA study similar to our previous paper  was conducted at the active and two binding sites of cystatins, yielding results (Table 3) which are in good agreement with Turk et al. . Substrate-binding site 1 had the pattern Q-x(3)-V- [S,A]-G, while substrate-binding site 2 had the pattern [L,I,V]-P-x(3)-x(3)- [N,G]. Similarity constants of binding loop 2 of egg white cystatin (EWC) and HCA are lower than that of HCC, whereas not only similarity but also average hydrophobicity are lower in HCB. Similarity constants at the active site (against 1.0 for HCC) using hydrophobicity index were >0.8 for cystatins A, D, F and hen, ~0.5 for E and M, and 0.1–0.2 for cystatins B, S, SA and AN. Similarity constants at binding loop 2, when strand propensity was used for PCS computation, were lower for stefins A and B with values of 0.8 and 0.6, respectively, than >0.9 for other cystatins. On the other hand, similarity constants for strand at binding site 1 were not much different among different cystatins, with values >0.9 (not included in Table 3).
It is interesting to note that stefins A and B do not have the PW pair which is in the binding site 2 of HCC and EWC; instead they have PG and PH pairs, respectively (Table 3). The W → G replacement increased strand propensity, while W → H replacement did so moderately. The values shown in Table 3 were almost inversely proportional to the equilibrium inhibition constant k i except for human cystatin S that was weak in the inhibitory activity, which might have been due to the difference in phosphorylation of serine at N-terminal region . Although stefins A and B are classified differently from other groups on PCS scattergram (Fig. 5), the weak binding at the binding site 2 may not have considerable effects on the k i values.
Turk et al.  have stated that the differences in the binding constants between cystatins and various cysteine proteases arise primarily from differences in the structure of enzyme active site clefts. The inhibition of endopeptidases, i.e. papain and cathepsins S and L, by cystatins is extremely tight and rapid, whereas the inhibition of exopeptidases, i.e. cathepsins B and H, is considerably weaker. The active site cleft of known endopeptidases is free to accommodate inhibitors, while in the case of exopeptidases, the active site cleft contains extra residues in it. In the N-terminal region of cystatins, it was observed that the affinity for target proteases decreased with both size and charge of substituting residues . These observations are in good agreement with the results when bulkiness of side chains was used for the HSS computation for the binding site 1, "SimConst /Av.bulk" values were 1.00/12.3, 0.92/14.2, 0.98/11.4 and 0.91/11.34 for HCC, EWC, HCA and HCB, respectively. As expected, stefins A and B were less bulky. Furthermore, HCC and EWC include longer chains at the N-terminal sides with bulkier residues than those of stefins. These findings are in good agreement with the effect of bulkiness of G4 in the stefin A sequence, implying that the bulkier the residue at position 4, the weaker the papain inhibitory activity .
HSA computation for I55T lysozyme
Of 86 residues in the strand domain (positions 36–121) of HCC, 21 residues (positions 36–120) were mutated in the 23 double-site mutants using the RCG program . Thirty-seven residues were used for the PCS computation of single-site mutations as described above. Hydrophobicity appeared to be playing an important role in thermostability, while strand propensity was important for inhibitory activity (data not shown). Strand and helix propensities in the strand domain were influential to the papain inhibitory activity of HCC (Fig. 7). The figures show 12 data points only by eliminating data from single site mutation in the helix domain, which did not show distinct trends with broader scatter in these figures. The second mutations in addition to the above single mutations were conducted at the strand domain of the enzyme . Coefficient of determination of 1.0 and slope of 1 indicate perfect match with the reference sample (5*) that is mutant G12W/H86V with the lowest strand propensity along with highest helix propensity in the strand domain among 23 double mutants. It is worth noting that the PCS is a classification program comparing pattern similarity without demonstrating quantitative relationships with functions but providing the information of the extent of involvement of side chain properties in the functions of interest.
For mutant G12W/H86V that gained the greatest activity increase of 4.98 ± 0.09 times (mean ± SD at n = 3) that of recombinant wild-type , the strand propensity decreased from 0.78 to 0.69 (H86V) with a slight increase in the helix propensity (corresponding to decrease in the index values). The same was true for mutant D15P/H86I with 2.65 ± 0.30 times activity increase. A similar result was observed in mutant G4L/D40I with 2.11 ± 0.29 times activity increase, due to strand decrease along with almost no change in helix (D40I). However, mutant V10S/R93G with 4.50 ± 0.07 times activity increase behaved differently with increased strand and simultaneous decrease in helix. It is worth noting that the single mutation of V10S alone increased the activity 2.96 ± 0.06 times, therefore, changes in the helix domain may have a more predominant effect on the inhibitory activity than mutations in the strand domain. The activity change due to mutation helix → strand in the strand domain in the sequence may be slight in this case.
It is well known that a single-point mutation of human lysozyme, namely I56T, has been identified as the origin of hereditary systemic amyloidosis . The amyloidogenic nature of the lysozyme variants arises from a decrease in the stability of the native fold relative to partially folded intermediates. Accordingly, in a low population of soluble, partially folded species, the protein can aggregate in a slow and controlled manner to form amyloid fibrils. Similarly, sporadic amyloid angiopathy and intracerebral hemorrhage was reported in an elderly man due to cystatin C mutation . In the case of human cystatin C, the decrease in strand along with an increase in helix might have prevented amyloidosis, despite the fact that helix change was not always as evident as in the case of lysozyme. Some inconsistency in the amyloidosis as a cause of inhibitory activity of human cystatin C in our mutation optimization  may be due to lack of the data of single-site mutation in the strand domain of the cystatin sequence. Unfortunately, the objective of that study  was for mutation optimization and not for investigation of the mechanism of amyloidosis. It has been reported that stefin B (HCB) readily formed amyloid , which may imply declined importance of the role being played by the binding site 2 in amyloidosis of HCC.
In a review on the quest to deduce protein function from sequences , the author stated that the searching of pattern databases would be more sensitive and selective than searching of sequence database. It was predicted that the sequence pattern databases, especially by comparing the pattern similarity, would play an increasingly important role, as the post-genome quest to assign functional information to raw sequence data gains pace . Pattern similarity computation requires at least three residues in segments to represent a nonlinear curve, which is unlikely to be due to the effect of a single point mutation per se.
With regard to an apparent effect of the single residue mutations of hen lysozyme on substrate binding, the structural analysis by NMR of the position-62 mutant of hen lysozyme [18, 30] found major changes in the chemical shift of back bone protons, especially in a loop region (positions 61–78), which contains W62 influencing the local folding. Similarly, Muraki et al.  reported that compared to the wild-type human lysozyme, the N-acetylglucosamine residue at subsite B of the L63 mutant markedly moved away from the 63rd residue, with substantial loss of hydrogen-bonding interaction. In Figure 5 of Ref 17, involvement of not only Y63 but also W64 is evident. These results are supportive of the importance of pattern similarity of ≥ 3 residues, which are affected by single-residue mutation.
The predictability of the active and binding sites solely on the basis of protein sequences  may be useful for investigating the underlying mechanisms of unknown functions of human genes after translation to protein sequences. Usually, two or three essential residues are directly involved in the bond making and breaking steps leading to formation of enzyme catalysis; however, the removal of an essential group often does not abolish activity, but can significantly alter the catalytic mechanism . T4 lysozyme was cited by Peracchi  as an example of the alteration of catalytic mechanisms; the lytic activity of lysozyme changed to that of a transglucosidase.
An approach utilizing the property of side chains in a sequence for identifying functional motifs has already been utilized in the computer-assisted selection of antigenic peptide sequences . The authors stated that an antibody produced in response to a simple linear peptide with 7–9 residues in a protein would most likely recognize a linear epitope. Furthermore, this epitope must be solvent-exposed to be accessible to the antibody. In a large scale data mining study, Binkowski et al.  described the importance of local sequence and spatial surface patterns in inferring functional relationships of proteins. The general feature of protein structure that would correspond to these criteria could be turns or loop structures, which are generally found on the molecular surface connecting to other elements of secondary structure, and the area of high hydrophobicity, especially for those containing charged residues.
For multiple sequence alignment of an uncharacterized protein or peptide, many Web alignment servers are available for use , such as Blast and NPSA, as was done in this study. For classification of uncharacterized sequences, the PCS scatterplots are also useful as shown in Figures 1, 5 and 7. The PCA demonstrated the classifying capacity superior to that of distance-based cluster analysis . The PCS is more flexible than cluster analysis as different pattern similarity patterns can be drawn by rotating the reference segment for searching. It implies that similarity is not always [1 – dissimilarity]. This difference resulted in the possibility of selecting outliers, which is critical in deriving true classes or ranking . Most of the currently available peptide QSAR, such as the method of Hellberg et al. , intends to be based on whole sequence data. The new HSS approach reported in this study could be just the beginning of more detailed, reliable peptide QSAR to be developed in the future. Analysis of a variety of bioactive proteins contributing to human health is a potential future application of the HSS software package as well as multifunctional PCS. Considering the multifunctional nature of human diseases, the functionality of food proteins also can be manipulated based on combinations of bioactive segments in different or even single natural protein sequences. Therefore, for an uncharacterized protein or peptide, a new plan is proposed: (1) A reference sequence is chosen from multiple sequence alignment (MSA) as discussed above; PCS scattergrams would assist this selection in addition to BLAST search. (2) Based on segments with high similarity in MSA, segments to be used for search are selected within the reference sequence. Then, (3) HSS search is conducted to identify functional segments in the uncharacterized sequence. (4) From the above PCS computation, important PC scores are screened (PCA is a subroutine subprogram of PCS). (5) Regression neural networks are conducted using selected PC scores as input variables as exemplified in our lactoferricin derivative study . (6) RCG would be useful for confirming the HSS data and also to find the best segment or sequence as exemplified in our HCC mutation .
One of the original purposes of our new approach in unsupervised data mining was to verify the hypothesis that there might be adaptability of different human cystatins to better inhibit different human cathepsins . This hypothesis has not been fully pursued in the past, probably because of costly separation of pure cystatins and cathepsins. An advantage of our approach is to derive potential hypothesis for enzyme/substrate interactions exclusively from their sequence data. Although the verification of those hypotheses may need to await future 3D-structure study, it is important that most of the useful QSAR data could become available, which would promote the functional mechanism study based on 3D structure. However, we admit that more examples of application should be performed in the future to more thoroughly verify and establish this method for predicting functions based on sequences. This work is underway in our laboratory.
Although the importance of pattern similarities of motifs with 20–30 residues as a whole has been reported for peptide QSAR in the past, the importance of a search for segments with three or more residues as functional sites of protein sequences has not been investigated. Lysozymes and cystatins were used as examples of proteins to demonstrate the capacity of segment pattern similarity analysis to predict functions, such as active and binding sites, amyloidosis and thermostability as a tool for quantitative functional sequence analysis.
Amino acid sequences of proteins
Multiple sequence alignments of lysozymes were conducted using the Network Protein Sequence Analysis of Pôle Bio-Informatique Lyonnais  based on Clustal W. Similarly, multiple sequence alignments were obtained for human cystatins A (HCA), B (HCB) and C (HCC) and hen egg white cystatin (EWC) as well as for papain as host proteases of cysteine protease inhibitors, i.e. cystatins. For PCS analysis, a total of 17 cystatins were used: human A, B, C, D, E, F, M, S, SA, SN, hen (EWC), bovine, ratC, mouseC, Chum salmon, Rainbow trout and carp.
Principal components similarity analysis of protein sequences
The method described in the previous papers [9, 39] was followed. Principal components analysis (PCA) was modified to principal components similarity (PCS) by incorporating linear regression of PC scores to be able to account for more than three PC scores on a 2D scatter plot. The PCS was then modified to apply to peptide sequences.
Homology similarity search
Homology similarity search (HSS) was conducted as reported previously . The similarity constant used in this study is eventually a correlation coefficient . A preliminary study was carried out by changing the size of segment (normally 3–7) flanking the potential functional position to determine the most appropriate size of segment in differentiating the functional site from other segments within the sequence of lysozymes and cystains. The property indices used for amino acid side chains were hydrophobicity, charge, propensities of α-helix, β-strand and β-turn, hydrogen bonding, and bulkiness as reported previously [14, 44]. Segments with pattern similarity close to 1.0 and average values similar to that of the reference segment were sought within each gapped sequence.
All software used in this study along with the instructions on how to use the computer programs are available in the form of ftp files on the Web  to download to PC computers.
List of abbreviations
Egg white cystatin or hen cystatin.
Human cystatin C.
Human cystatin A or stefin A.
Human cystatin B or stefin B.
- HSA Homology similarity analysis:
the PCS software was modified to compute pattern similarity constants and average side-chain property index values of segments in sequences .
- HSS Homology similarity search:
A step-wise search program initiated from N-terminus of query sequences by shifting the search unit (reference segment) towards C-terminus based on similar segments in terms of pattern similarity constant and average property values compared to those of template sequences .
Minimum inhibitory concentration.
Principal components analysis.
- PCS Principal components similarity:
PCA modified for multi-functional variables using linear regression of deviation of PC scores on the reference PC scores. Scatter plot is drawn as slope vs. coefficient of determination (r2) .
Random-centroid optimization of site directed mutagenesis.
Quantitative structure-activity relationships.
This work was financially supported by a Multidisciplinary Network Grant entitled "Structure-function of food biopolymers" (Dr. Rickey Y. Yada of University of Guelph as the principal investigator) from the Natural Sciences and Engineering Research Council of Canada. The authors acknowledge the collaboration of all co-authors listed in our past publications as shown in the following references. Especially, the drawing of 3D structures to compare lysozyme and α-lactalbumin by Dr. Yasumi Horimoto is highly appreciated.
- Michalovich D, Overington J, Fagam R: Protein sequence analysis in silico : application of structure-based bioinformatics to genomic initiatives. Cur Opinion Pharmacol. 2002, 2: 574-580. 10.1016/S1471-4892(02)00202-3. [Blast (www.ncbi.nlm.nih.gov/blast), NPSA (http://npsa-pbil.ibcp.fr/)]View ArticleGoogle Scholar
- Norin M, Sundsröm M: Structural proteomics: developments in structure-to-function predictions. Trends Biotechnol. 2002, 20: 79-84. 10.1016/S0167-7799(01)01884-4.View ArticlePubMedGoogle Scholar
- Hill EE, Morea VM, Chothia C: Sequence conservation in families whose members have little or no sequence similarity: the four-helical cytokines and cytochromes. J Mol Biol. 2002, 322: 205-233. 10.1016/S0022-2836(02)00653-8.View ArticlePubMedGoogle Scholar
- Hellberg S, Sjöström M, Skagerberg B, Wold S: Peptide quantitative structure-activity relationships, a multivariate approach. J Med Chem. 1987, 30: 1126-1135. 10.1021/jm00390a003.View ArticlePubMedGoogle Scholar
- Giliani A, Benigni R, Zbilut JP, Webber CL, Sirabella P, Colosimo A: Nonlinear signal analysis methods in the elucidation of protein sequence-structure relationships. Chem Rev. 2002, 102: 1471-1491. 10.1021/cr0101499.View ArticleGoogle Scholar
- Nakai S, Chan JCK, Li-Chan EC, Dou J, Ogawa M: Homology similarity analysis of sequences of lactoferricin and its derivatives. J Agric Food Chem. 2003, 51: 1215-1223. 10.1021/jf0206062.View ArticlePubMedGoogle Scholar
- Doytchinova IA, Walshe VA, Jones NA, Closter SE, Borrow P, Flower DR: Coupling in silico and in vitro analysis of peptide-MHC binding: A bioinformatics approach enabling prediction of superbinding peptides and anchorless epitopes. J Immunol. 2004, 172: 7495-7502.View ArticlePubMedGoogle Scholar
- Lejon T, Ström MB, Svendsen JS: Is information about peptide sequence necessary in multivariate analysis?. Chemom Intell Lab Syst . 2001, 57: 93-95. 10.1016/S0169-7439(01)00126-5.View ArticleGoogle Scholar
- Nakai S, Alizadeh-Pasdar N, Dou J, Buttimor R, Rousseau D, Paulson A: Pattern similarity analysis of amino acid sequences for peptide emulsification. J Agric Food Chem. 2004, 52: 927-934. 10.1021/jf034744i.View ArticlePubMedGoogle Scholar
- Pepys MB, Hawkins PN, Booth DR, Virgushin DM, Tennent GA, Soutar AK, Totty N, Nuguyen O, Blake CCF, Feest TG, Zalin AM, Hsuan JJ: Human lysozyme gene mutations cause hereditary systemic amyloidosis. Nature. 1993, 362: 553-557. 10.1038/362553a0.View ArticlePubMedGoogle Scholar
- Graffagnino C, Herbstreith MH, Schmechel DE, Levy E, Roses AD, Alberts MJ: Cystatin C mutation in an elderly man with sporadic amyloid angiopathy and intracerebral hemorrhage. Stroke. 1995, 26: 2190-2193.View ArticlePubMedGoogle Scholar
- Rau A, Hogg R, Marquardt R, Hulgenfeld R: A new lysozyme fold: crystal structure of the muramidase from Streptomyces coelicolor at 1.65 Å. J Biol Chem. 2001, 276: 31994-31999. 10.1074/jbc.M102591200.View ArticlePubMedGoogle Scholar
- Frastrez J, Höltje J-V: Phage lysozyme, Bacterial lysozymes. Lysozymes: Model Enzymes in Biochemistry and Biology. Edited by: Jollés P. 1996, Berlin: Birkhäuser Verlag, 35-74.View ArticleGoogle Scholar
- Ogawa M, Nakamura S, Scaman CH, Jing H, Kitts DD, Dou J, Nakai S: Enhancement of proteinase inhibitory activity of recombinant human cystatin C using random-centroid optimization. Biochim Biophys Acta. 2002, 1599: 115-124.View ArticlePubMedGoogle Scholar
- Strynadka NCJ, James MNG: Lysozyme: a model enzyme in protein crystallography. Lysozyme: Model Enzymes in Biochemistry and Biology. Edited by: Jollès P. 1996, Berlin: Birkhäuzer Velag, 185-221.View ArticleGoogle Scholar
- Imoto T: Engineering of lysozyme. Lysozymes: Model Enzymes in Biochemistry and Biology. Edited by: Jollés P. 1996, Berlin: Birkhäuser Verlag, 163-181.View ArticleGoogle Scholar
- Song HW, Inaka K, Maenaka K, Matsushima M: Structure-change of active site cleft and different saccharide binding modes in human lysozyme co- crystallized with hexa-N-acetyl-chitohexaose at pH 4.0. J Mol Biol. 1994, 244: 522-540. 10.1006/jmbi.1994.1750.View ArticlePubMedGoogle Scholar
- Maenaka K, Matsushima M, Kawai G, Kidera A, Watanabe K, Kuroki R, Kumagai I: Structural and functional effect of Trp-62 → Gly and Asp-101 → Gly substitutions on substrate-binding modes of mutant hen egg-white lysozyme. Biochem J. 1998, 333: 71-76.PubMed CentralView ArticlePubMedGoogle Scholar
- Klein-Seetharaman J, Oikawa M, Grimshaw SB, Wirmer J, Duchardt E, Ueda T, Imoto T, Smith LJ, Dobson CM, Schwalbe H: Long-range interactions within a nonnative protein. Science. 2002, 295: 1719-1722. 10.1126/science.1067680.View ArticlePubMedGoogle Scholar
- Muraki M, Harada K, Sugita N, Sato K: Protein-carbohydrate interaction in human lysozyme probed by combining site-directed mutagenesis and affinity labelling. Biochemistry. 2000, 39: 292-299. 10.1021/bi991402q.View ArticlePubMedGoogle Scholar
- Muraki M, Morikawa M, Jigami Y, Tanaka H: Engineering of human lysozyme as a polyelectolyte by the alteration of molecular surface charge. Protein Eng. 1988, 2: 49-54.View ArticlePubMedGoogle Scholar
- Turk B, Turk V, Turk D: Structural and functional aspects of papain-like cysteine proteinases and their protein inhibitors. J Biol Chem. 1997, 378: 141-150.Google Scholar
- Abrahamson M: Cystatins. Methods Enzym. 1994, 244: 685-700.View ArticleGoogle Scholar
- Shibuya K, Kaji H, Ohyama Y, Tate S, Kainosho M, Inagaki R, Samejima T: Significance of the highly conserved Gly-4 residue in human cystatin A. J Biochem. 1995, 118: 635-642.PubMedGoogle Scholar
- Ekiel I, Abramhamson M: Folding-related dimerization of human cystatin C. J Biol Chem. 1996, 271: 1314-1321. 10.1074/jbc.271.3.1314.View ArticlePubMedGoogle Scholar
- Hall A, Håkannsson K, Mason RW, Crubb A, Abrahamson M: Structural basis for the biological specificity of cystatin C. J Biol Chem. 1995, 270: 5115-5121. 10.1074/jbc.270.10.5115.View ArticlePubMedGoogle Scholar
- Canet D, Sunde M, Last AM, Miranker A, Spencer A, Robinson CV, Dobson M: Mechanistic studies of the folding of human lysozyme and the origin of amyloidogenic behavior in its disease-related variants. Biochemistry. 1999, 38: 6419-6427. 10.1021/bi983037t.View ArticlePubMedGoogle Scholar
- Žeronik E, Pompe-Novak M, Škarabot M, Ravnikar M, Muševič I, Turk V: Human stefin B readily forms amyloid fibrils in vitro. Biochim Biophys Acta. 2002, 1594: 1-5.View ArticleGoogle Scholar
- Attwood TK: The quest to deduce protein function from sequence: the role of pattern databases. Int J Biochem Cell Biol. 2000, 32: 139-155. 10.1016/S1357-2725(99)00106-5.View ArticlePubMedGoogle Scholar
- Kumagai I, Maenaka K, Sunada F, Takeda S, Miura K: Effect of subsite alterations on substrate-binding mode in the active site of hen egg-white lysozyme. Eur J Biochem. 1993, 212: 151-156. 10.1111/j.1432-1033.1993.tb17645.x.View ArticlePubMedGoogle Scholar
- Karplus K, Barrett C, Cline M, Diekhans M, Grate L, Hughey R: Predicting protein structure using only sequence information. Proteins. 1999, 121-125. 10.1002/(SICI)1097-0134(1999)37:3+<121::AID-PROT16>3.0.CO;2-Q. Suppl 3
- Peracchi A: Enzyme catalysis: removing chemically 'essential' residues by site-directed mutagenesis. Trends Biochem Sci. 2001, 26: 497-503. 10.1016/S0968-0004(01)01911-9.View ArticlePubMedGoogle Scholar
- Grant GA: Synthetic peptides for production of antibodies that recognize intact proteins. Current Protocols in Molecular Biology. Edited by: Ausubel EM. 2002, New York: Wiley Interscience, 2: 11.16.1-11.16.9.Google Scholar
- Binkowski TA, Adamian L, Liang J: Inferring functional relationships of proteins from local sequence and special surface patterns. J Mol Biol. 2003, 332: 505-526. 10.1016/S0022-2836(03)00882-9.View ArticlePubMedGoogle Scholar
- Alvarez-Fernandez M, Barrett AJ, Gerhartz B, Dando PM, Ni J, Abrahamson M: Inhibition of mammalian legumain by some cystatins is due to a novel second reactive site. J Biol Chem. 1999, 274: 19195-19203. 10.1074/jbc.274.27.19195.View ArticlePubMedGoogle Scholar
- Todd AE: From protein structure to function. Bioinformatics: Genes, Proteins and Computers. Edited by: Oreng C, Jones D, Thornton J. 2003, Oxford: BIOS Scientific Publisher, 151-174.Google Scholar
- Stubbs MT, Laber B, Bode W, Huber R, Jerala R, Lenarcid B, Turk V: The refined 2.4Å X-ray crystal structure of recombinant human stefin B in complex with the cysteine protease papain: a novel type of proteinase inhibitor interaction. EMBO J. 1990, 9: 1939-1947.PubMed CentralPubMedGoogle Scholar
- Abrahamson M, Titonja A, Brown MA, Crubb A, Machleidt W, Barrett AJ: Identification of the probable inhibitory reactive site of the cysteine proteinase inhibitors human cystatin C and chicken cystatin. J Biol Chem. 1987, 262: 9688-9694.PubMedGoogle Scholar
- Nakai S, Amantea G, Nakai H, Ogawa M, Kanagawa S: Definition of outliers using unsupervised principal component similarity analysis for sensory evaluation of foods. Int J Food Prop. 2002, 5: 289-306. 10.1081/JFP-120005786.View ArticleGoogle Scholar
- Nakai S, Dou J, Richards JF: New multivariate strategy for panel evaluation using principal component similarity. Int J Food Prop. 2000, 3: 149-164.View ArticleGoogle Scholar
- Bromme D, Kaleta J: Thiol-dependent cathepsins: pathophysiological implications and recent advances in inhibitor design. Current Pharm Design. 2002, 8: 1639-1658. 10.2174/1381612023394179.View ArticleGoogle Scholar
- Network Protein Sequence Analysis of Pôle Bio-Informatique Lyonnais. [http://npsa-pbil.ibcp.fr]
- Krzanowski WJ: Principles of Multivariate Analysis. 1988, Oxford: Oxford Science Publications, 26-Google Scholar
- Nakai S, Ogawa M, Nakamura S, Dou J, Funane K: A computer-aided strategy for structure-function study of food proteins using unsupervised data mining. Int J Food Prop. 2003, 6: 25-47. 10.1081/JFP-120016622.View ArticleGoogle Scholar
- Nakai S: Computer software used in this study. 2003, [ftp://ftp.agsci.ubc.ca/foodsci/]Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.