Báo cáo khoa học: What determines the degree of compactness of a calcium-binding protein? pdf

What determines the degree of compactness of a calcium-binding protein? Liliane Mouawad1, Adriana Isvoran2, Eric Quiniou1 and Constantin T Craescu1 Inserm U759 ⁄ Institut Curie-Recherche, Centre Universitaire Paris-Sud, Orsay, France Department of Chemistry, West University of Timisoara, Romania Keywords calcium-binding proteins; centrin; EF-hand; hydrophobicity; predicted form Correspondence L Mouawad, Inserm U759 ⁄ Institut CurieRecherche, Centre Universitaire Paris-Sud, ˆ Batiment 112, 91405 Orsay Cedex, France Fax: +33 69 07 53 27 Tel: +33 69 86 71 51 E-mail: liliane.mouawad@curie.u-psud.fr (Received September 2008, revised December 2008, accepted 10 December 2008) doi:10.1111/j.1742-4658.2008.06851.x The EF-hand calcium-binding proteins may exist either in an extended or a compact conformation This conformation is sometimes correlated with the function of the calcium-binding protein For those proteins whose structure and function are known, calcium sensors are usually extended and calcium buffers compact; hence, there is interest in predicting the form of the protein starting from its sequence In the present study, we used two different procedures: one that already exists in the literature, the sosuidumbbell algorithm, mainly based on the charges of the two EF-hand domains, and the other comprising a novel procedure that is based on linker average hydrophilicity The linker consists of the residues that connect the domains The two procedures were tested on 17 known-structure calcium-binding proteins and then applied to 59 unknown-structure centrins The sosuidumbbell algorithm yielded the correct conformations for only 15 of the known-structure proteins and predicted that all centrins should be in a closed form The linker average hydrophilicity procedure discriminated well between all the extended and non-extended forms of the known-structure calcium-binding proteins, and its prediction concerning centrins reflected well their phylogenetic classification The linker average hydrophilicity criterion is a simple and powerful means to discriminate between extended and non-extended forms of calcium-binding proteins What is remarkable is that only a few residues that constitute the linker (between and 20 in our tested sample of proteins) are responsible for the form of the calciumbinding protein, showing that this form is mainly governed by short-range interactions Calcium transport and ⁄ or regulation are important events for the normal morphology and metabolism of the cell and play significant roles in the mechanisms of many disease processes [1] The proteins that interact with the calcium ions involved in these events are called calcium-binding proteins (CaBPs) They form two main subfamilies: the EF-hand CaBPs and the non-EF-hand CaBPs EF-hand CaBPs, whose prototype is calmodulin [2], are characterized by the presence of structural motifs called ‘EF-hands’ Non EF-hand CaBPs not use this structural motif to bind calcium; they may be found in the cytoplasm (similar to C2 domain proteins) [3], in the extracellular medium [4] or associated with the membrane (similar to annexins) [5] For the EF-hand CaBPs, each EF-hand motif contains two helices connected by the calcium-binding loop, a highly conserved region that binds the metal ion Many CaBPs exhibit two domains, each containing two EF-hand motifs; the N-terminal (helices A, B, C and D) and C-terminal (helices E, F, G and H) domains are connected by a linker region (Fig 1) Abbreviations CaBP, calcium-binding protein; LAH, linker average hydrophilicity; PDB, Protein Data Bank 1082 FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works L Mouawad et al Compactness of calcium-binding proteins N-domain C-domain Loop I Loop II A B C D Loop III Loop IV E F G H Linker Fig The EF-hand protein schematic representation Each EFhand motif consists of two helices linked by a calcium loop (black dots represent calcium ions) Two motifs constitute one EF-hand domain The N- and C-domains are bound by a linker (bold line) EF-hand CaBPs are divided into two broad classes [6]: those that bind calcium to regulate its concentration (calcium-buffering and calcium-transporting proteins) and those that bind calcium to decode its signal (calcium-sensor proteins) The two functional classes also have different structural features: calcium-buffering and calcium-transporting proteins, such as parvalbumin [7] or the Nereis diversicolor sarcoplasmic calciumbinding protein [8], usually have a compact tertiary structure and are not conformationally sensitive to calcium-binding, whereas calcium sensor proteins, such as calmodulin [2] and troponin C [9], have extended tertiary structures and show important conformational changes upon calcium-binding In the extended form, the linker between the two domains may be structured in a straight helix, whereas, in the non-extended form, the linker is unstructured leading to either a floppy conformation or a very compact one (Fig 2) [10] It is important to understand the physical reasons for these differences This would provide tools to predict the A B Fig View of the 3D structures of two CaBPs: (A) the extended form of calmodulin (PDB code: 1CLL) and (B) the non-extended form of guanylate cyclase activating protein (PDB code: 1JBA) The helices are in cyan, the b-sheets are in yellow and the linker is in red The linker in 1CLL is structured, whereas it is a loop in 1JBA The view was drawn using VMD software [10] form of the CaBPs from their sequences, and therefore indicate their biological function Recently, a protein classification tool, sosuidumbbell [11], was developed to predict the degree of compactness of proteins starting from their amino acid sequences This tool is based on studies undertaken on all the monomers of the Protein Data Bank (PDB) [12], and not just CaBPs, indicating that the electrostatic repulsion between the domains is a dominant factor in the stabilization of the extended structures, in addition to the amphiphilic character of the central flexible region By contrast, globular proteins are predicted to be stabilized by a hydrophobic core built by residues from the two domains Using the sosuidumbbell algorithm, we have analyzed 17 CaBPs with known 3D structures (Table 1) Fifteen of them were predicted in the correct form but, unfortunately, two structures were incorrectly predicted Indeed, human calmodulin-like protein (1GGZ) [13] and human centrin (2GGM) [14], which are extended proteins, were predicted to be compact These exceptions represent a non-negligible percentage (12%) and they emphasize the need for a more detailed analysis of the sequence– structure relationship in the case of CaBPs In the present study, we have developed a novel procedure based on the linker average hydrophilicity (LAH), which we applied to our sample of 17 knownstructure CaBPs and to unknown structures of centrins Centrins, a subfamily of CaBPs, are essential components of microtubule-organizing centers in organisms ranging from algae and yeast to humans [15,16] They are EF-hand calcium-binding proteins with a sequence similarity to calmodulin but distinct calcium-binding properties [15] They were shown to be involved in centrosome duplication [17] and the contraction of centrin-based fiber systems [18] and to play a functional role in nuclear export pathways [19] The Ca2+ dependence of the centrin interactions with their targets suggests that centrins play a regulatory role by activating or changing the conformation of various target proteins Analyses of amino acid sequences of centrins from different organisms reveal at least four phylogenetic families and several phylogenetic subfamilies [20,21] The centrins that we consider in the present study are listed in Table 2: (a) the Chlamydomonas reinhardtii-like family (CrCen-like), which contains centrins from the subfamilies of green algae and vertebrate isoforms Cen1 and Cen2; (b) the higher plants Arabidopsis-like family (AtCen-like); (c) the yeast Saccharomyces cerevisiae-like family (Cdc31-like), which contains mainly two subfamilies, fungal centrins and the vertebrate isoform Cen3; and (d) the Paramecium tetraurelia infraciliary lattice family (PtICL1-like), FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works 1083 Compactness of calcium-binding proteins L Mouawad et al Table Features of the known-structure CaBPs used in the present study, showing the name of the protein, its code in the PDB, its code in the SwissProt data bank, its form as determined experimentally and its form as predicted by the SOSUIDUMBBELL algorithm (http:// bp.nuap.nagoya-u.ac.jp/sosui/sosuidumbbell/dumbbell_submit.html) CIB, calcium-and-integrin-binding protein; SCBP, sarcoplasmic calciumbinding protein Protein PDB code SwissProt code Experimental structure Structure predicted by the SOSUIDUMBBELL algorithm Chicken troponin C Rabbit troponin C Human calmodulin Paramecium calmodulin Potato calmodulin Human calmodulin-like protein Human centrin Yeast centrin Yeast myosin light chain Calcineurin B homologous protein Bovine recoverin Guanylate cyclase activating protein Bovine neurocalcin d Amphioxus SCBP Sandworm SCBP Bacterial calerythrin Human CIB 4TNC 1TN4 1CLL 1OSA 1RFJ 1GGZ 2GGM 2DOQ 1GGWa 2CT9 1REC 1JBAa 1BJF 2SAS 2SCP 1NYAa 1DGUa P02588 P02586 P62158 P07463 Q42478 P27482 P41208 P06704 Q09196 P61023 P21457 P51177 P61602 P04570 P04571 P06495 Q99828 Extended Extended Extended Extended Extended Extended Extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Extended Extended Extended Extended Extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended a Structure determined by NMR organized in ten subfamilies that contain 35 identified isoforms [22] The 3D structure of the entire protein in complex with its target polypeptide is known for only two centrins: the human centrin: HsCen2 (2GGM) [14] and the Saccaromyces cerevisiae centrin, ScCdc31 (2DOQ) [23] The functional diversity of centrins should depend on their sequence and their Ca2+ binding properties However, we may ask whether the global conformation or the conformational preference of individual centrin molecules also play a role in the target recognition and the plasticity of heteromolecular complexes This idea is supported by the recent observation that yeast ScCdc31 bound to a ScSfi1 fragment shows a bent conformation [23], whereas human HsCen2 in complex with an XPC peptide is completely extended [14] In the present study, we present a new and simple theoretical procedure for the global shape prediction of EF-hand proteins that allows us to analyze the possible shape diversity of centrins presented in Table Results and Discussion Utilization of the SOSUIDUMBBELL algorithm We first applied the sosuidumbbell algorithm (http:// bp.nuap.nagoya-u.ac.jp/sosui/sosuidumbbell/dumbbell_ submit.html) to all the CaBPs with known 3D struc1084 tures (Table 1) In this algorithm, a structure is predicted to be extended if it obeys four criteria: (a) the absolute value of the net charge of the entire protein is higher than 20 (|Qprot| > 20); (b) the absolute net charge density (|Qprot| ⁄ N, where N is the total number of residues) is higher than 0.14 (dQ > 0.14); (c) there is a charge balance between the two domains (|QNQC| > 100); and (d) there is a high amphiphilicity at the center of the linker region and a high hydropathy at its termini [11] Based on these four criteria, the results yielded 15 well-predicted structures and two incorrectly predicted ones The latter are human calmodulin-like protein (1GGZ) and human centrin (2GGM), the structures of which are extended but predicted as non-extended Therefore the question remained as to which of the four criteria described above is responsible for this misprediction To address this question, we verified initially the first two criteria For this purpose, we calculated the absolute net charge and the charge density of the entire protein for all the investigated CaBPs (Table 3), with known and unknown structures (Tables and 2) First, we followed exactly the procedure described by Uchikoga et al [11], namely that histidine residues were considered as positively charged (although at the pH values corresponding to the great majority of the experiments, they are deprotonated) and the calcium ions that might bind to the protein were omitted The results FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works L Mouawad et al Compactness of calcium-binding proteins Table Phylogenetic classification of centrins All centrins considered in the present study (with known and unknown structures) are classified by families and subfamilies The PDB codes of the known structures of fragments (*) or the entire protein are given Phylogenetic family Subfamily Protein name Abbreviation SwissProt code ⁄ PDB code CrCen Cen1 Human centrin Mouse centrin Bovine centrin Human centrin HsCen1 MmCen1 BtCetn1 HsCen2 Mouse centrin Pig centrin Xenopus laevis centrin Xenopus tropicalis centrin Dunaliella salina centrin Chlamydomonas reinhardtii centrin Tetraselmis striata centrin Scherffelia dubia centrin Micromonas pusilla centrin Marsilea vestita centrin Spermatozopsis similis centrin Pterosperma cristatum centrin Arabidopsis thaliana centrin Nicotiana tabacum centrin Atriplex nummularia centrin Human centrin Rat centrin Mouse centrin Xenopus laevis centrin Euplotes octocarinatus centrin Yeast centrin Xenopus tropicalis centrin MmCen2 SsCen2 XlCen2 XtCen2 DsCen CrCen TsCen SdCen MpCen MvCen SsCen PcCen AtCen NtCen AnCen HsCen3 RnCen3 MmCen3 XlCen3 EoCen ScCdc31 XtCen3 PtICL1a PtICL1b PtICL1c PtICL1d PtICL1f PtICL1e PtICL1g PtCen8 PtCen10 PtCen12 PtCen15 PtCen18 PtICL3a PtICL3c PtICL3d PtICL3e PtICL3f PtICL3b PtICL3g PtICL5a PtICL5b PtICL6a PtICL6b PtICL7a PtICL7b PtICL8a PtICL8b PtICL9a Q12798 P41209 Q32LE3 P41208 ⁄ 2GGM ⁄ 2OBH ⁄ 1M39* ⁄ 1ZMZ* ⁄ 2A4J* Q9R1K9 Q4U4N2 Q7SYA4 Q28HC5 P54213 P05434 ⁄ 1OQP* ⁄ 2AMI* P43646 Q06827 Q40303 O49999 P43645 Q40791 O82659 Q9SQI5 P41210 O15182 Q91ZZ8 O35648 Q9DEZ4 Q9XZV2 ⁄ 2JOJ* P06704 ⁄ 2DOQ ⁄ 2GV5 Q28GW2 Q27177 Q 27179 Q 27178 Q 94726 Q3SEK2 Q3SEK0 Q3SEJ9 Q3SEJ6 Q3SEJ7 Q6BFB6 Q3SEJ0 A0CTY5 Q3SDB8 Q3SDA6 Q3SEI1 Q3SEI3 Q3SEI4 Q3SEI0 A0BUT1 Q3SEH8 Q3SEH7 Q3SEH9 Q3SCX3 A0DZH6 A0DZH A0BTY0 A0C3G3 Q3SEI2 Cen2 Algae centrins AtCen Higher plant centrins Cdc31 Cen3 PtICLs ICL1a ICL1e ICL3a ICL3b ICL5 ICL7 ICL8 ICL9 FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works 1085 Compactness of calcium-binding proteins L Mouawad et al Table Continued Phylogenetic family Subfamily Protein name ICL10 ICL11 (Fig 3A,B and Table 3) show that, as indicated above, only five known-structure proteins are predicted to be extended instead of the seven expected (1GGZ and 2GGM are mispredicted) and all centrins with unknown structures are predicted in a non-extended form In a second step, the histidines were considered neutral (CaBPs usually contain very little His) and the Ca+2 ions were added, but the results were even worse (data not shown) because the net charge was diminished and therefore the structures were predicted to be even more compact The first two criteria appear to be responsible for the misprediction of the form of 1GGZ and 2GGM Moreover, concerning centrins with unknown structures, some experimental results (C T Craescu & S Miron, unpublished data) in addition to the phylogenetic classification indicate that at least the CrCen family proteins should be in an extended form, which is not the case in the prediction based on the first two criteria The last two criteria in the sosuidumbbell algorithm are strongly dependent on the definition of the domains and the inter-domain linker The delimitation of this linker is not always obvious: in the extended structures, it forms a helix in the continuity of helices D and E, whereas, in some compact conformations, it is a very short unstructured region (Fig 2) In the sosuidumbbell algorithm, the linker considered may be too long and, consequently, the domains too short, as for calmodulin, where helices D and E, which belong to the N- and C-domains, respectively, are considered as parts of the linker [11] In the present study, to determine the linker, we identified first the calcium-binding loops (Fig 1), then we counted ten residues after loop II (corresponding to helix D) and ten residues before loop III (corresponding to helix E), and the remaining residues inbetween were considered as the inter-domain linker Ten residues were considered for helices D and E because the experimental structural data show that a helix belonging to an EF-hand motif contains ten residues on average Consequently, in the proteins investigated in the present study, the linker was 1086 Abbreviation SwissProt code ⁄ PDB code PtICL9b PtICL9c PtICL9d PtICL10a PtICL10b PtICL11a PtICL11b A0BE66 A0D3D5 A0D6A4 A0DZD2 A0BJD5 A0BI27 A0BQH1 between two and 20 residues long (Table 3), corresponding to 0.96% and 10.26%, respectively, of the protein sequence length Based on this definition of the linker, the charges of the N- and C-domains were calculated without considering the calcium ions In Fig 3C, we report the absolute value of the product of these charges, |QNQC|, which represents the charge balance between the domains With the exception of troponins, all the investigated proteins are characterized by products |QNQC| lower than 100, and therefore are predicted to be non-extended From these results, it is clear that, for CaBPs, the charges of the entire protein or of the separated domains are not responsible for the extended or compact form of the protein This assertion is obvious in the case of human centrin (HsCen2) In this protein, the first 25 amino acids, corresponding to a disordered region, are highly charged [24,25], with the net charge of this peptide being equal to (it contains seven basic and one acidic residues) The X-ray structure of this protein was obtained in the presence [14] and in the absence [25] of these residues (PDB codes 2GGM and 2OBH, respectively) In both cases, HsCen2 adopts an extended conformation, showing that the charge balance of the domains does not play an important role for this protein Nevertheless, in both cases, the sosuidumbbell algorithm predicts a non-extended form, which is not correct Moreover, the structure of all the extended forms of the CaBPs considered in the present study was determined experimentally in the presence of calcium ions Knowing that these ions reduce significantly the charges of the domains and therefore their electrostatic repulsions, calcium-binding should favor the compact structure of CaBPs, which is not the case The fourth criterion of the sosuidumbbell tool refers to the hydrophobicity of the central linker region, which is calculated using the Kyte & Doolittle Scale [26] Ushikoga et al [11] described the linker region of an extended protein as having an important negative hydrophobicity in its center (i.e to be significantly hydrophilic), whereas its edges (helices D and FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works No QN Extended structures 4TNC )14 1TN4 )14 1CLL )10 1OSA )10 1RFJ )10 1GGZ )8 2GGM Non-extended structures 2DOQ )7 1GGW )3 2CT9 10 1REC 11 )2 1JBA 12 )9 1BJF 13 )2 2SAS 14 )8 2SCP 15 )2 1NYA 16 )5 1DGU 17 Unknown structures HsCen1 18 MmCen1 19 BtCen1 20 MmCen2 21 SsCen2 22 )2 XlCen2 23 XtCen2 24 DsCen 25 CrCen 26 )1 TsCen 27 )7 SdCen 28 )1 MpCen 29 )6 MvCen 30 )1 SsCen 31 )9 PcCen 32 )5 AtCen 33 NtCen 34 )5 AnCen 35 )1 Protein )1 )1 )3 )3 )3 )3 Qlink )2 )2 )1 0 )1 )1 )1 0 0 0 0 0 0 0 0 0 )13 )13 )10 )9 )9 )7 )9 Qc )10 )5 )4 )2 )1 )2 )10 )7 )9 )11 )11 )10 )9 )9 )9 )9 )9 )11 )10 )10 )10 )10 )9 )12 )7 )7 )7 )10 )10 )9 )8 )11 )8 )8 )9 )12 )17 )11 )16 )11 )18 )17 )7 )12 )8 )16 )10 )6 )3 )10 )3 )10 )13 )13 )10 )28 )28 )23 )22 )22 )18 )8 Qprot 11 11 10 18 9 11 70 10 60 10 81 60 35 70 15 16 20 35 182 182 100 90 90 56 |QNQC| 0.058 0.058 0.052 0.046 0.079 0.046 0.046 0.053 0.071 0.115 0.065 0.108 0.065 0.121 0.128 0.041 0.068 0.048 0.099 0.071 0.031 0.015 0.049 0.015 0.054 0.075 0.073 0.052 N jQprot j 0.172 0.175 0.154 0.147 0.147 0.121 0.046 dQ ¼ 1.713 1.642 1.642 1.713 1.713 1.731 1.731 1.767 1.749 1.760 1.760 1.760 1.762 1.760 1.760 1.669 1.698 1.649 1.174 0.516 0.965 0.477 0.317 1.016 )0.202 0.532 0.003 )0.404 1.662 1.662 1.822 1.782 1.822 1.811 1.713 LAH 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 No Pro residues 0 0 0 1 1 1 1 1 1 1 1 0 0 No Gly residues 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 No Trp residues 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 No Phe residues 99–103 99–103 99–103 99–103 66–70 99–103 99–103 96–100 96–100 75–79 95–99 75–79 97–101 75–79 67–71 94–98 103–107 93–97 91–95 70–76 93–112 96–99 91–94 95–98 92–104 86–93 90–102 93–97 89–96 86–93 79–83 79–83 79–83 79–83 99–103 Linker region ]J + 10, K – 10[ 172 172 172 172 139 172 172 169 169 148 168 148 170 148 133 169 177 167 161 141 195 202 204 193 185 174 177 191 163 160 149 149 149 149 172 Total no of residues (N) 5 5 5 5 5 5 5 5 5 20 4 13 13 8 5 5 Linker length (n) 2.91 2.91 2.91 2.91 3.60 2.91 2.91 2.96 2.96 3.38 2.98 3.38 2.94 3.38 3.76 2.96 2.82 2.99 3.11 4.96 10.26 1.98 1.96 2.07 7.03 4.60 7.34 2.62 4.91 5.00 3.36 3.36 3.36 3.36 2.91 Percent linker Àn Á length N Â 100 Table Results of all our calculations on CaBPs, showing the PDB code or the abbreviation of the protein name, the number of the protein as used in Figs and , the net charge of the N- and C-domains (QN, QC), the net charge of the linker (Qlink) and of the entire protein (Qprot), the charge balance of the domains (|QNQC|), the absolute charge density (dQ), the value of the LAH, the number of Pro, Gly, Trp and Phe in the linker region plus three residues from each side of the sequence (length equals n + 6), the residues belonging to the linker as defined in the text, the total number of residues (N), the linker length (n) and, finally, the percentage of linker length with respect to the protein length L Mouawad et al Compactness of calcium-binding proteins FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works 1087 1088 HsCen3 RnCen3 MmCen3 XlCen3 EoCen XtCen3 PtICL1a PtICL1b PtICL1c PtICL1d PtICL1f PtICL1e PtICL1g PtCen8 PtCen10 PtCen12 PtCen15 PtCen18 PtICL3a PtICL3c PtICL3d PtICL3e PtICL3f PtICL3b PtICL3g PtICL5a PtICL5b PtICL6a PtICL6b PtICL7a PtICL7b PtICL8a PtICL8b PtICL9a PtICL9b PtICL9c PtICL9d PtICL10a PtICL10b PtICL11a PtICL11b Protein 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 No Table Continued Qc )11 )11 )11 )11 )9 )11 )7 )7 )7 )7 )7 )5 )5 )5 )5 )5 )5 )5 )8 )8 )8 )8 )8 )7 )7 )4 )4 )4 )4 )2 )1 )2 )2 )10 )10 )10 )10 )7 )6 1 QN )3 )2 )2 )2 )3 )2 )6 )6 )6 )6 )6 )6 )6 )6 )6 )6 )6 )6 )6 11 )3 )3 )2 )3 )3 )5 )5 )4 )4 0 )1 )1 )6 )6 )6 )6 )3 )2 0 )1 )1 0 0 1 1 1 0 0 1 2 2 )1 )1 )1 0 0 )1 )1 2 Qlink )14 )13 )13 )14 )12 )14 )13 )13 )13 )13 )13 )10 )10 )9 )10 )10 )10 )10 )14 )11 )11 )10 )9 )9 )7 )7 )6 )6 )3 )2 )4 )3 )16 )16 )16 )16 )5 )5 Qprot 33 22 22 22 27 22 42 42 42 42 42 30 30 30 30 30 30 30 48 88 24 24 16 21 21 20 20 16 16 0 2 60 60 60 60 21 12 |QNQC| N jQprot j 0.084 0.082 0.078 0.084 0.071 0.084 0.072 0.071 0.071 0.072 0.071 0.057 0.054 0.051 0.057 0.057 0.057 0.057 0.072 0.015 0.057 0.057 0.050 0.046 0.047 0.038 0.038 0.032 0.033 0.016 0.011 0.022 0.016 0.077 0.077 0.078 0.077 0.024 0.024 0.004 dQ ¼ 0.927 0.927 0.927 1.076 1.215 1.076 1.476 1.476 1.476 1.476 1.476 1.196 1.196 1.107 1.196 1.196 1.185 1.196 1.542 1.522 1.542 1.542 1.542 0.820 0.820 0.918 0.918 0.865 0.865 0.527 0.527 0.409 0.332 0.206 0.206 0.206 0.206 1.218 1.280 1.878 1.878 LAH 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 No Pro residues 0 0 0 0 0 2 2 2 0 0 1 2 2 2 2 2 2 1 0 No Gly residues 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 No Trp residues 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 No Phe residues 96–100 88–92 96–100 96–100 95–99 96–100 108–112 109–113 110–114 108–112 110–114 102–107 110–115 104–109 102–107 102–107 104–109 102–107 117–121 117–121 117–121 115–119 122–126 119–123 118–122 104–109 104–109 106–111 106–111 100–104 100–104 99–104 99–104 132–133 132–133 130–131 132–133 129–133 129–133 164–167 164–167 Linker region ]J + 10, K – 10] 167 159 167 167 168 167 181 182 183 181 183 174 182 176 174 174 176 174 192 192 192 190 197 193 192 182 182 184 184 184 184 184 184 208 208 206 208 205 205 240 240 Total no of residues (N) 5 5 5 5 5 6 6 6 5 5 5 6 6 5 6 2 2 5 4 Linker length (n) 2.99 3.14 2.99 2.99 2.98 2.99 2.76 2.75 2.73 2.76 2.73 3.45 3.30 3.41 3.45 3.45 3.41 3.45 2.60 2.60 2.60 2.63 2.54 2.59 2.60 3.30 3.30 3.26 3.26 2.72 2.72 3.26 3.26 0.96 0.96 0.97 0.96 2.44 2.44 1.67 1.67 Percent linker Àn Á length N Â 100 Compactness of calcium-binding proteins L Mouawad et al FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works L Mouawad et al A Compactness of calcium-binding proteins B 30 0.2 25 0.15 0.1 dQ Qprot 20 15 0.05 10 0 10 20 30 40 50 60 70 –0.05 80 10 C D 200 30 40 50 60 70 80 Hydrophobicity 150 100 50 Helix E Helix D |QNQC| 20 Protein number Protein number Linker –1 –2 –3 0 10 20 30 40 50 60 70 80 Protein number –4 –30 –20 –10 10 20 30 Relative residue number Fig Test of the four criteria used in the SOSUIDUMBBELL algorithm (A) The absolute net charge (|Qprot|) of investigated proteins without calcium ions versus the protein number from Table The horizontal line corresponds to the limit of net charge between extended (|Qprot| > 20) and non-extended structures (|Qprot| £ 20) as considered by Uchikoga et al [11] Vertical lines delimit between the known extended structures (filled circles), the known non-extended structures (open diamonds) and the unknown structures of centrins (filled triangles) It can be seen that two extended structures are mispredicted (1GGZ and 2GGM) and that all the unknown-structure centrins are predicted to be nonextended (B) The absolute net charge density (dQ) with a horizontal line limit at 0.14 (C) The absolute value of the product of the two domain charges (|QNQC|) in the absence of calcium ions with a horizontal line limit at 100 In this case, only tropnin C molecules are predicted to be extended (D) The hydrophobicity profile of the linker region and its surroundings using the Kyte & Doolitle Scale for two extended structures (dotted lines, 1OSA and dashed line, 4TNC) and for a non-extended one (solid line, 1REC) For convenience of comparison, the three sequences were renumbered and centered on the linker The zero point corresponds to residue number 92 in 4TNC, 81 in 1OSA and 98 in 1REC, which represents the center of the linker in each case E) are hydrophobic In the present study, the same calculations were applied to all known-structure proteins, and it was observed that, in some cases, nonextended proteins (e.g recoverin; 1REC) present the same hydropathy profile around the linker as extended proteins, such as calmodulin or troponin C (1OSA and 4TNC; Fig 3D) Therefore, none of the criteria retained in the sosuidumbbell algorithm are completely reliable to predict the form of the CaBPs This motivated our search for other criteria Utilization of other criteria Contact area We analyzed the contact area between the domains of known-structure non-extended CaBPs As expected, most of the residues at the interface were found to be hydrophobic In most compact structures, a tryptophan (or less frequently a phenylalanine) located in one domain was buried in a hydrophobic cavity in the other domain, which would stabilize the compact structure Unfortunately, this observation cannot be used as a predictive tool starting from the sequence because the aromatic residue is not located in a specific part of it Indeed, the sequence of the linker and its close vicinity (three more residues from each side of the linker) does not always contain tryptophan or phenylalanine residues for compact forms (see 1REC, 1JBA, 1BJF and 2SCP in Table 3) The presence of helix breakers Prolines and, to a lesser extent, glycines, are wellknown helix breakers We investigated the presence of FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works 1089 Compactness of calcium-binding proteins Net electric charge of the linker It might be assumed that the net electric charge of the linker plays a role if there is repulsion between this linker and the adjacent domains Thus, this property was investigated (Table 3) but did not yield a good discriminating criterion because, in HsCen2 (2GGM), which is extended, the linker is neutral as in bovine neurocalcin d (1BJF) or amphioxus sarcoplasmic calcium-binding protein (2SAS), which are non-extended structures Hydrophilicity of the linker The criterion that yielded the best results was based on the hydrophilicity of the linker It was obtained by the procedure detailed below First, the hydrophilicity (hi) of each residue i of the protein was calculated using the Hopp & Woods Scale [27] with a nine-residue sliding window In this scale, positive values correspond to hydrophilic positions Second, the linker was determined as described above: if the last residue of the calcium-binding loop II is denoted J and the first residue of the calcium-binding loop III is denoted K, the linker consists of all residues comprised in the interval ]J + 10, K ) 10[ Finally, the LAH was calculated: LAH ¼ X hi n i2Jỵ10;K10ẵ where n is the number of residues in the linker and hi is the hydrophilicity at position i of the linker This procedure was applied to all proteins in Tables and The results are presented in Fig Remarkably, the LAH values discriminated well between the extended and non-extended forms of the known structures of the CaBPs, with two distinct sets of points, where LAH was greater than 1.6 for the 1090 2.5 Cen2 AtCen Algae ICL1a ICL11 Cen1 ICL3a 1.5 LAH such residues in the linker or its vicinity (i.e plus three residues from each side of the linker) The results presented in Table show that, as expected, the presence of a Pro yields a non-extended form by breaking the central helix that constitutes the linker, but the reverse is not true because all the compact CaBPs not contain a Pro in the linker Therefore, this criterion cannot constitute a predictive rule Moreover, concerning glycines, it was observed that, in both troponin C proteins (4TNC and 1TN4), which are extended, there is one Gly in the linker, as in bovine recoverin (1REC), guanylate cyclase activating protein (1JBA) and bovine neurocalcin d (1BJF), which present very compact structures L Mouawad et al ICL5 ICL10 ICL1e Cdc31 ICL3b 0.5 ICL7 ICL8 ICL9 –0.5 10 20 30 40 50 60 70 80 Protein number Fig The LAH for the investigated proteins The horizontal line delimits between the predicted extended structures (LAH > 1.4) and the predicted non-extended ones (LAH £ 1.4) Vertical lines delimit between the known extended structures (filled circles), the known non-extended structures (open diamonds) and the unknown structures of centrins (filled triangles) For the unknown-structure centrins, we indicate the phylogenetic subfamilies extended forms and < 1.2 for the others Therefore, an average value of 1.4 was considered as the threshold above which a two-domain EF-hand protein is extended Moreover, one of the reviewers of the present study suggested the case of calcineurin B-like protein from Arabidopsis (SwissProt code: Q8LAS7, PDB code: 1UHN), which we omitted to consider in our sample The protein consists of 226 residues and the linker of five residues (residues 117–121) The calculated LAH value is 0.2978, predicting a compact structure in good agreement with the 3D structure of the protein Considering centrins with unknown structures, it can be seen that the LAH values reflect well the phylogenetic classification, although this classification is based on the entire sequence, whereas LAH is based on only few residues in the linker region To determine whether the discrimination potency of the linker average hydrophilicity is fortuitous or not, LAH values were reported versus the radius of gyration of the known structures in Fig A clear correlation is demonstrated between these two features, with a correlation coefficient equal to 0.82 and a Student coefficient of 36.98 (for 16 degrees of freedom that correspond to 17 points), indicating that the probability of this correlation to be random is < 0.001 The LAH algorithm is available at: http://u759.curie.u-psud.fr/ modelisation/LAH The predictive potency of the present method depends on the determination of the linker limits, which must be defined objectively To find such a definition, several delimitations were tested, including the FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works L Mouawad et al Compactness of calcium-binding proteins 20 the PtICL family is divided into two sets: the extended proteins (ICL1a, ICL3a and ICL11 subfamilies) and the non-extended ones (ICL1e, ICL3b, ICL5, ICL7, ICL8, ICL9 and ICL10 subfamilies) 18 Conclusions Radius of gyration (Å) 22 16 –0.5 0.5 1.5 LAH Fig The radius of gyration of the known-structure CaBPs versus their LAH The straight line shows the linear fit of the points The correlation coefficient is 0.82 one used in the sosuidumbbell tool We have observed that considering long linkers, which overlap adjacent helices, does not allow us to discriminate between the different forms of CaBPs because the results were polluted by the nature of the extra residues, whereas the shortest possible linkers provided the most reliable way to discriminate between the extended and compact forms However, it must be noted that the influence of four neighboring residues at both ends of the linker are taken indirectly into account because of the nine-residue window used in the calculations of hydrophilicity Raw hydrophilicity data (equivalent to a one-residue window) were also tested to check the importance of this influence The results were qualitatively similar to those obtained with the nine-residue window with respect to the prediction of the form of the protein, but the correlation between LAH and the radius of gyration was less evident Moreover, this discrimination was possible when calculating LAH with the Hopp & Woods Scale for hydropathy Three other scales were tested (Kyte & Doolittle [26], Miyazawa & Jernigen [28] and Janin [29]) but did not provide satisfactory results This is mainly due to the scores attributed to the Asn, Gln and Trp residues, which are considered to be much more hydrophilic in these scales than in the Hopp & Woods Scale Applying the LAH method to centrins showed that the CrCen-like proteins are predicted to be extended, which is in good agreement with the known structure of one member of this family, HsCen2 [14,25] The Cdc31-like family is predicted to be in the nonextended form, which is also in good agreement with the known structure of ScCdc31 [23] There are no experimental information about the other centrins, but we predict that members of the AtCen family are in an extended form, similar to the CrCen family, and that The results obtained in the present study indicate that the extended and compact forms of EF-hand proteins not necessarily depend on the electric charge of the domains, but they are mainly determined by the hydrophilicity (as determined by the Hopp & Woods Scale) of the residues that link the two domains The definition of the linker is very important and should not include residues from the adjacent helices What is remarkable is that, once the linker is defined objectively, the nature of its residues appears to determine the form of the CaBP, whatever the length of this linker; it can be as long as 20 residues, as in calcineurin B homologous protein 1, 2CT9 (representing approximately 10% of the protein length; Table 3), or as short as two residues, as in P tetraurelia infraciliary lattice centrins 9, PtICL9 (< 1% of the protein length) However, the length of the linker in the set of proteins considered in the present study is approximately five residues on average, which is rather short This indicates that the form of CaBPs is likely governed by short-distance interactions Experimental procedures Seventeen CABPs with known structures, two of them comprising centrins, in addition to 59 centrins with unknown structures, were considered in the present study Choice of the proteins CaBPs with known structures were taken from the PDB [12] Only proteins containing four EF-hand motifs were considered The chosen structures had to obey to several criteria First, the proteins had to be in their unbound state (i.e not in complex with their target peptides because peptide binding may cause conformational changes of the entire protein) There were, however, two exceptions: human centrin (2GGM) and yeast centrin (2DOQ), in which the peptide interacts with only one domain (C-domain) and therefore does not modify the relative position of the two domains In addition, these two structures were the only ones available in the PDB for this family of proteins Second, the EF-hand proteins, which had an extended structure resolved by NMR, were discarded because they did not provide enough information concerning the relative positions of their domains FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works 1091 Compactness of calcium-binding proteins Third, only three families of known extended structures of CaBPs were found in the PDB: troponin C, calmodulin and human centrin The chosen structures in each family had to share the least possible sequence identity The most divergent ones shared between 78% and 90% identity However, the sequence identity between the three families did not exceed 50% Fourth, all the non-extended structures constituted of two domains, with a linker containing more than one residue, were kept They share between 5% and 50% sequence identity This left a set of 17 CaBPs with known structures: seven extended and ten non-extended forms It should be noted that one of these structures is a mutant protein, the rabbit troponin C (1TN4) [30], where Cys98 was replaced by Leu This residue is located in helix E and does not modify the extended structure of the protein Concerning the unknown CaBP structures, we considered the three well-characterized phylogenetic families of centrin, in addition to all the PtICLs Inside each family (or subfamily for the PtICL), the sequence identity was in the range 60–98%, whereas, between different families, it was in the range 11–50% L Mouawad et al parameters were conserved (i.e the window size was equal to nine residues, with a relative weight of the window edges compared to its center equal to 100%) The scale was not normalized In more detail, the ‘smoothed’ hydrophilicity hi was calculated for each residue i of the protein by averaging the raw hydrophilicities over the residues of the sliding window (here i ) to i + 4) Then, to obtain LAH, only the values for the linker residues were averaged again and taken into account Several other hydrophobicity scales were also used (which are available on the ExPASy server) either for comparison with the Hopp & Woods Scale or for verification of the sosuidumbbell criteria Kyte & Doolittle [26], Miyazawa & Jernigen [28] and Janin [29] scales were used with the same default parameters Acknowledgements This work was supported by the Institut National ´ ´ de la Sante et de la Recherche Medicale (INSERM) and the Institut Curie We acknowledge the financial support of the EGIDE (ECO-NET project 16342RH) and a FEBS short-term fellowship for A Isvoran Sequence alignments Sequence alignments were used to identify the calcium loops and therefore to delimit the linker as described in the Results and Discussion They were performed with clustalw [31] (http://www.ebi.ac.uk/Tools/clustalw2/ index.html) Known structures were aligned together using the default settings Unknown structures were aligned separately by families (four distinct families and four distinct alignments) In each alignment, there were all possible subfamilies in addition to calmodulin to help identify calcium loops No structural alignments were taken into account Form prediction To predict the form of the structures (extended or not), all the sequences of our CaBP sample were introduced in the sosuidumbbell algorithm [11] Then, to analyze the reasons for the misprediction of some structures, we used bespoke software that was based on the same criteria as the sosuidumbbell algorithm Similar to the latter, only the electric charge of basic and acidic residues (in addition to His) was taken into account, but not the charge of the N- and C-termini of the protein Calculation of the hydrophilicity The hydrophilicity of each protein was calculated using the Hopp & Woods Scale [27] available on the ExPASy server [32] (http://us.expasy.org/tools/protscale.html) The default 1092 References Carafoli E (2002) Calcium signaling: a tale for all seasons PNAS 99, 1115–1122 Babu YS, Bugg CE & Cook WJ (1988) Structure of calmodulin refined at 2.2 A resolution J Mol Biol 204, 191–204 Nalefski EA & Falke JJ (1996) The C2 domain calcium-binding motif: structural and functional diversity Protein Sci 5, 2375–2390 Krebs J & Heizmann CW (2007) Calcium binding proteins and the EF-hand principles In Calcium: A Matter of Life or Death (Krebs J & Michalak M, eds), pp 49–132 Elsevier BV, Oxford Raynald P & Pollard HB (1994) Annexins: the problem of assessing the biological role for a gene family of multifunctional calcium- and phospholipid-binding proteins Biochim Biophys Acta 1197, 63–93 Carafoli E (2003) The calcium signaling saga: tap water and protein crystals Nature 4, 326–332 Chard PS, Bleakman D, Christakos S, Fullmer CS & Miller RJ (1993) Calcium buffering properties of calbindin D28k and parvalbumin in rat sensory neurones J Physiol 472, 341–357 Christova P, Cox JA & Craescu CT (2000) Ion-induced conformational and stability changes in Nereis sarcoplasmic calcium binding protein: evidence that the APO state is a molten globule Proteins Struct Funct Bioinformatics 40, 177–184 FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works L Mouawad et al Satyshur KA, Rao ST, Pyzalska D, Drendel W, Greaser M & Sundaralingam M (1988) Refined structure of chicken skeletal muscle troponin C in the two-calcium ˚ state at 2A resolution J Biol Chem 263, 1628–1647 10 Humphrey W, Dalke A & Schulten K (1996) VMD – visual molecular dynamics J Mol Graph 14, 33–38 11 Uchikoga N, Takahashi SY, Ke R, Sonoyama M & Mitaku S (2005) Electric charge balance mechanism of extended soluble proteins Protein Sci 14, 74–80 12 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN & Bourne PE (2000) The Protein Data Bank Nucleic Acids Res 28, 235– 242 13 Han BG, Han M, Sui H, Yaswen P, Walian PJ & Jap BK (2002) Crystal structure of human calmodulin-like protein: insights into its functional role FEBS Lett 521, 24–30 14 Thompson JR, Ryan ZC, Salisbury JL & Kumar R (2006) The structure of the human centrin 2-xeroderma pigmentosum group C protein complex J Biol Chem 281, 18746–18752 15 Schiebel E & Bornens M (1995) In search of a function for centrins Trends Cell Biol 5, 197–201 16 Zamora I & Marchall FW (2005) A mutation in the centriole-associated protein centrin causes genomic instability via increased chromosome lost in Chlamydomonas reinhardtii BMC Biol 3, 15–22 17 Salisbury JL, Suino KM, Busby R & Springett M (2002) Centrin-2 is required for centriole duplication in mammalian cells Curr Biol 12, 1287–1292 18 Wiech H, Geier BM, Paschke T, Spang A, Grein K, Steinkotter J, Melkonian M & Schiebel E (1996) Characterization of green algae, yeast and human centrins J Biol Chem 271, 22453–22461 19 Resendes KK, Rasala BA & Forbes DJ (2008) Centrin localizes to the vertebrate nuclear pore and plays a role in mRNA and protein export Mol Cell Biol 28, 1755–1769 20 Wolfrum U, Giebl A & Pulvermuller A (2002) Centrins, a novel group of Ca2+-binding proteins in vertebrate photoreceptor cells In Photoreceptors and Calcium (Baehr W & Palczewski K, eds), pp 155–178, Kluwer Academic ⁄ Plenum Publishers, New York, NY 21 Azimzadeh J & Bornens M (2004) The centrosome in evolution In Centrosome in Development and Disease (Nigg EA, ed.), pp 93–112 Wiley-VCH Verlag GmbH&Co KGaA, Weinheim Compactness of calcium-binding proteins 22 Gogendeau D, Beisson J, Garreau de Loubresse N, Le Caer JP, Ruiz F, Cohen J, Sperling L, Koll F & Klotz C (2007) A sfi1p-like centrin-binding protein mediates centrin based Ca2+ dependent contractility in Paramecium Eucaryot Cell 6, 1992–2000 23 Li S, Sandercock AM, Conduit PT, Robinson CV, Williams RL & Kilmartin JV (2006) Structural role of Sfi1p-centrin filaments in budding yeast spindle pole body duplication J Cell Biol 73, 867–877 24 Yang A, Miron S, Duchambon P, Assairi L, Blouquit Y & Craescu CT (2006) The N-terminal domain of human centrin has a closed structure, binds calcium with a very low affinity and plays a role in the protein self-assembly Biochemistry 45, 880–889 25 Charbonnier JB, Renaud E, Miron S, Le Du MH, Blouquit Y, Duchambon P, Christova P, Shosheva A, Rose T, Angulo JF et al (2007) Structural, thermodynamic, and cellular characterization of human centrin interaction with xeroderma pigmentosum group C protein J Mol Biol 373, 1032–1046 26 Kyte J & Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein J Mol Biol 157, 105–132 27 Hopp TP & Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences PNAS 78, 3824–3828 28 Miyazawa S & Jernigen RL (1985) Estimation of effective inter-residue contact energies from protein crystal structures: quasi-chemical approximation Macromolecules 18, 534–552 29 Janin J (1979) Surface and inside volumes in globular proteins Nature 277, 491–492 30 Houdusse A, Love ML, Dominguez R, Grabarek Z & Cohen C (1997) Structures of four Ca2+-bound tropo˚ nin C at 2.0 A resolution: further insights into the Ca2+-switch in the calmodulin superfamily Structure 5, 1695–1711 31 Thompson JD, Higgins DG & Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice Nucleic Acids Res 22, 4673–4680 32 Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wilkins MR, Appel RD & Bairoch A (2005) Protein identification and analysis tools on the ExPASy server In The Proteomics Protocols Handbook (Walker JM, ed), pp 571–607 Humana Press, Totowa, NJ FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS No claim to original French government works 1093 ... algorithm are completely reliable to predict the form of the CaBPs This motivated our search for other criteria Utilization of other criteria Contact area We analyzed the contact area between the domains... C-domains (QN, QC), the net charge of the linker (Qlink) and of the entire protein (Qprot), the charge balance of the domains (|QNQC|), the absolute charge density (dQ), the value of the LAH, the. .. 100 Table Results of all our calculations on CaBPs, showing the PDB code or the abbreviation of the protein name, the number of the protein as used in Figs and , the net charge of the N- and C-domains

Báo cáo khoa học: What determines the degree of compactness of a calcium-binding protein? pdf

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan