Báo cáo khoa học: Revisiting the prediction of protein function at CASP6 potx

7 280 0
Báo cáo khoa học: Revisiting the prediction of protein function at CASP6 potx

Đang tải... (xem toàn văn)

Thông tin tài liệu

Revisiting the prediction of protein function at CASP6 Marialuisa Pellegrini-Calace 1, *, Simonetta Soro 1, * and Anna Tramontano 1,2 1 Department of Biochemical Sciences ‘A. Rossi Fanelli’, University ‘La Sapienza’, Rome, Italy 2 Istituto Pasteur, Fondazione ‘Cenci-Bolognetti’, University ‘La Sapienza’, Rome, Italy Modern biology strongly exploits the rapid progress in generation of experimental data and development of computational methods. The identification and charac- terization of proteins on a genome-wide scale is accomplished by proteomics projects, and computa- tional biology plays a pivotal role in the determination of their structure–function relationships and in the pre- diction of their biological functions [1]. A number of computational and experimental methods have been developed in the last few years to try to address the function prediction issue using available information, from sequence homology to orthology, gene context and structural features [2,3]. Despite the complexity of the relationship between protein fold and protein func- tion and the existence of proteins with multiple differ- ent functions [4,5], global structure similarity can often Keywords Critical Assessment of Techniques for Protein Structure Prediction (CASP); function prediction; protein function; structural genomics Correspondence A. Tramontano, Department of Biochemical Sciences ‘A. Rossi Fanelli’, University ‘La Sapienza’, P.le Aldo Moro, 5-00185 Rome, Italy Fax: +39 06 4440062 Tel: +39 06 49910556 E-mail: anna.tramontano@uniroma1.it *These authors contributed equally to this work. (Received 21 February 2006, revised 11 April 2006, accepted 5 May 2006) doi:10.1111/j.1742-4658.2006.05309.x The ability to predict the function of a protein, given its sequence and ⁄ or 3D structure, is an essential requirement for exploiting the wealth of data made available by genomics and structural genomics projects and is there- fore raising increasing interest in the computational biology community. To foster developments in the area as well as to establish the state of the art of present methods, a function prediction category was tentatively introduced in the 6th edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP) worldwide experiment. The assessment of the performance of the methods was made difficult by at least two fac- tors: (a) the experimentally determined function of the targets was not available at the time of assessment; (b) the experiment is run blindly, pre- venting verification of whether the convergence of different predictions towards the same functional annotation was due to the similarity of the methods or to a genuine signal detectable by different methodologies. In this work, we collected information about the methods used by the various predictors and revisited the results of the experiment by verifying how often and in which cases a convergent prediction was obtained by methods based on different rationale. We propose a method for classifying the type and redundancy of the methods. We also analyzed the cases in which a function for the target protein has become available. Our results show that predictions derived from a consensus of different methods can reach an accuracy as high as 80%. It follows that some of the predictions submitted to CASP6, once reanalyzed taking into account the type of converging methods, can provide very useful information to researchers interested in the function of the target proteins. Abbreviations BIND, binding feature number indicating the nature of putative interacting partners; BS, binding site residue identifier; CASP, Critical Assessment of Techniques for Protein Structure Prediction; GOC, GO cellular component number; GOF, GO molecular function number; GOP, GO biological process number; PT, post-translational modification number; RESIDUE-ROLE, free comment on residues with a putative peculiar role. FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS 2977 help in assigning a function to proteins [6,7]. This abil- ity is important for the many ongoing structural genomics projects that will provide us with more and more structures of proteins with very low sequence identity with proteins of known structure. The task of predicting the function of a protein is exceptionally interesting but very challenging. The existence of paralogous relationships implies that a common evolutionary origin does not guarantee com- mon function [8,9]. Moreover, the discovery of moon- lighting proteins, able to perform different functions in different conditions or environments, makes the prob- lem even more complex [10,11]. The Critical Assessment of Techniques for Protein Structure Prediction (CASP) [12] community recog- nized the relevance of this issue and tried to foster novel developments by setting up a function prediction category in addition to the well-known structure pre- diction ones. The question addressed was whether and in which cases computational methods are able to pro- vide useful information about the molecular or biologi- cal function of an unknown protein, with the aim of providing researchers with potentially useful informa- tion [12]. This category is intrinsically different from the other CASP categories because, at the end of the experiment, the function of the target protein is likely still to be unknown. However, the analysis of the submitted predictions made the assessors conclude that [13]: (a) groups predicting the 3D structure of a protein only rarely used this information to predict its function as well, and vice versa; (b) in a substantial fraction of cases, the same predic- tion was submitted by different groups and there- fore a ‘prediction consensus’ could be derived for some targets. CASP is run blindly. This implies that the assessor should not know the identity of the predicting groups and therefore cannot take into account the method used for deriving a prediction. Indeed it was suggested and accepted that, in subsequent editions of the experi- ment, a general description of the method used should be made available to the assessor. After the experiment was concluded, we revisited the data collected and reassessed the results, taking into account the methods used. We took advantage of the knowledge of the identity of the predicting groups as well as of functional annotations that have become available in the mean time. Our results show that a basic knowledge of the methods is important for understanding the level at which predictions can be trusted and for assessing the reliability of the predicted functions. They also show that predictions derived by a consensus among groups using different, non-redun- dant, methods can reach an accuracy of 80%. Results The protein set at the beginning of CASP6 contained 87 targets, 23 of which were discarded during the experiment because of practical issues, such as early or late release of the 3D protein structure. Therefore, the set that was considered contained 64 protein targets, 29 of which had no functional annotation in any data- base at the time of the experiment. A function predic- tion for at least one of the 64 targets was submitted by 23 of the total 172 predictors. Within the CASP6 experiment, seven classes of func- tion predictions were considered: GO molecular func- tion numbers (GOFs), GO biological process numbers (GOPs), GO cellular component numbers (GOCs), binding feature numbers indicating the nature of puta- tive interacting partners (BINDs), binding site residues identifiers (BSs), free comments on residues with a putative peculiar role (RESIDUE-ROLEs) and post- translational modification numbers (PTs). In the pre- sent study, the BS and RESIDUE-ROLE subsets were not analyzed because of the high variability in the type and format of the submitted predictions. We considered 1590 total function predictions sub- mitted by 18 groups for 64 protein targets, as GOFs (568), GOPs (445), GOCs (363), BINDS (150), and PTs (64). Method classification The first step was to recover the information about the method used by each predictor, by inspecting the abstracts submitted to CASP, performing literature searches and, in some cases, directly contacting the predictors. Each method was assigned to one or more of five categories (F1 to F5), here called features, on the basis of the type of information used by predictors. The use of sequence information corresponded to the F1 cate- gory. F2 indicated the use of structural features. Meth- ods using the GO database for any reason other than deriving GO numbers for submission were assigned to F3. Literature-based methods and manual methods were indicated as F4 and F5, respectively (Table 1). Therefore, each method could be classified by a five bit binary code indicating the presence (1) or absence (0) of each of the five features. For instance, the 1000 code corresponds to completed automated (F5 ¼ 0) methods based on sequence information (F1 ¼ 1) Prediction of protein function M. Pellegrini-Calace et al. 2978 FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS which do not take advantage of structural (F2 ¼ 0), GO (F3 ¼ 0) and literature (F4 ¼ 0) information. The distribution of single features and binary combi- nations are shown in Fig. 1A and Fig. 1B, respect- ively. Although the possible theoretical combinations of features are 25 (5 2 ), the F1 feature was used by all the 18 predictor groups, reducing the possible binary com- binations to 20. The observed binary codes were only 8, showing a lower than expected variability in the combination of used information. More than 50% of the possible combinations were not found, and the majority of predictors used canonical feature combina- tions. In fact, all groups used sequence information, nine took advantage of literature information, and six used structure information, but only four predictors exploited the GO database and only one of them com- bined it with structural features. Moreover, predictors using GO never took literature information into account and vice versa. It is worth highlighting here that 11 groups used an approach developed in-house for features F1, F2 or F3. For each of the function prediction classes, we com- puted a consensus value and a redundancy value within the consensus [F(red)] (Table 2 and Fig. 2). A consensus is defined as the number of identical predic- tions submitted by at least two predictors. The redund- ancy value F(red) indicates the method variability in terms of feature combinations within a consensus and is calculated as follows: F(red) ¼ N(red) N(tot) where N(red) and N(tot) are the number of methods with the same binary identifier, i.e. the number of redundant methods, and the total number of methods generating the consensus, respectively. Lower F(red) values reflect a higher variability in the type of meth- ods generating the consensus. For the GOF, GOP and GOC classes, we also com- puted a consensus value among the GO parents of the submitted predictions for up to three levels of the GO ontology (P1, P2 and P3), to verify whether less speci- fic consensus could be achieved by different methods. The results are shown in Table 2. A consensus was never found for prediction categor- ies other than GOF, GOP and BIND, although the number of GOF and GOP consensus was significantly Table 1. List of features used to classify methods exploited for the predictions. Number Description Included tools F1 Sequence information BLAST, PFAM, INTERPRO, CHOP, CHIEFC, PROSITE, SMART, PRINTS, PHYRE, others F2 Structure information 3 D-JURY, HMAP, PROFUNC, COLUMBA, PHUNCTIONER F3 GO database F4 Literature information F5 Manual intervention 25.00 20.00 15.00 10.00 5.00 0.00 F1 (seq) F2 (str) F3 (GO) F4 (lit) F5 (man) Own Unknown N(met) A 4 5 2 2 2 1 1 1 11011 10011 10100 10001 10000 11100 10101 11001 B Fig. 1. Distribution of method features. (A) Number of methods including F1 (sequence information), F2 (structure information), F3 (use of GO database for any other reason than deriving GO numbers for submission), F4 (literature information), F5 (manual intervention), number of methods developed in-house (own) and number of methods for which no description is available (unknown). (B) Distribution of binary method class identifiers among the 18 methods submitting predictions (in-house developed methods and methods for which no information is available are not included). Table 2. Number of GO identifiers predicted by at least two groups. GOF, Number of GO function predictions; GOP, number of GO process predictions; GOC, number of GO cellular component predictions; BIND, number of binding predictions; PT, number of post-transcriptional modification predictions; NA, not applicable. The number of predictions corresponding to the ‘unknown’ annotation is shown in parentheses. GOF GOP GOC BIND PT Consensus 70 (4) 19 (17) 0 31 0 Consensus P1 72 (4) 28 (21) 1 NA NA Consensus P1-P2 74 (4) 33 (25) 10 NA NA Consensus P1-P3 78 (4) 35 (26) 27 NA NA M. Pellegrini-Calace et al. Prediction of protein function FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS 2979 lower than the number of submitted predictions per targets (70 out of 450 and 19 out of 457 for GOF and GOP, respectively). The fraction of targets for which a consensus could be found is high for the BIND class, accounting for about one third of the total submitted predictions (31 out of 150). About half of the consensus predictions were gener- ated by two predictions only, except for the GOP class, where about 40% of the consensus predictions were obtained by three independent methods. It should be noticed that some of the consensus predictions also included annotations such as ‘unknown molecular func- tion’ or ‘unknown biological process’, highlighted in parentheses in Table 2. The exclusion of ‘unknown bio- logical process’ predictions left only two of the 19 GOP consensus predictions that were generated by three sub- missions corresponding to three different methods. Figure 2B shows a histogram of the fraction of redundancy for the three functional classes. Interest- ingly, redundancy values between 0 and 0.2 were often observed, corresponding to a variability of at least 80% in the combinations of features generating the consensus. When annotations were grouped according to parent levels of the respective GO terms (one level, P1; two levels, P2; three levels, P3), neither the number of consensus predictions nor the fraction of redund- ancy changed significantly (supplementary material, Fig. S1). This is most likely due to the somewhat lim- ited depth of the GO graph, so that the existence of a common node between two predictions does not neces- sarily provide additional information. Target function annotation versus time At the beginning of the CASP6 experiment, 42, 32 and 9 targets had a molecular function, a biological process and a cellular component annotation, respectively. In 23 cases, information about interaction partners was available, whereas no annotation about post-transla- tional modifications was present in the databases. One year later (October 2005), the available annotations decreased by 5% to 10%, showing that the knowledge of the 3D structure of a protein allows its function annotation to be improved, even if this can just imply removing a previous annotation (supplementary mater- ial, Fig. S2). In fact, for 11 targets at least one molecu- lar function annotation was either modified or deleted between the end of 2004 and the end of 2005 (supple- mentary material, Table S1). In the same period, the number of non-annotated targets decreased by 8, 6, 2 and 1 for GOF, GOP, GOC and BIND, respectively. These data confirm that the process of assigning a function to proteins is still a very difficult task for both experimental and computational biologists and suggest that there is still a long way to go to fully exploit genome-scale data. Interestingly, only in about half of the cases did pre- dictions agree with function assignments that were sub- sequently removed, suggesting that taking into account different functional predictions for a given protein might be helpful in avoiding errors in database annota- tion. Predictions versus target function annotation We can reliably assess the correctness of a prediction only for those cases where a subsequently released annotation is available for targets that had no annota- tion at the time of CASP6 (supplementary material, Fig. S2). This subset was made of only 11 targets and included 24, 7, 2 and 1 GOF, GOP, GOC and BIND annotations, respectively. Because of the sparseness of the data, the analysis was limited to GOF functional assignments only and included the eight targets listed in Table 3, five of which belong to the comparative 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2345678910 N(c-preds) F(c-preds) GOFs GOPs BINDs A F(preds) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.0-0.2 0.2-0.4 0.4-0.6 0.6-0.8 0.8-1.0 F(red) B GOFs GOPs BINDs Fig. 2. (A) Fraction of consensus of prediction [F(preds)] as a func- tion of the number of contributing predictions [N(c-preds)]. (B) Frac- tion of consensus of prediction [F(c-preds)] as a function of the redundancy of the contributing methods [F(red)], i.e. of the number of methods exploiting the same source of information for the pre- diction (see text for a detailed definition). Prediction of protein function M. Pellegrini-Calace et al. 2980 FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS model (CM) and three to the fold recognition (FR) classes. Sixty-eight groups submitted 85 predictions, 32 of which converged into 12 consensus predictions. These predictions were compared with the current tar- get function annotation (October 2005) present in the UniProt [13], Entrez Gene [14] and InterPro [15] data- bases (Tables 3 and 4). Among the predictions submitted (85 by 68 groups), about 20% (18) were correct, i.e. overlapped with the current annotation of the corresponding targets. Inter- estingly, 17 of them were consensus predictions. More- over, although a significantly higher number of consensus predictions was observed for CM than for FR targets (11 against 1, respectively), the only FR target (T0243) consensus matched the corresponding function annotation. Even if the size of the dataset is too small to derive general conclusions, we believe that the success rate for these predictions supports both the usefulness of the experiment and the validity of our method for deri- ving the consensus prediction. Moreover it suggests that an ‘easy’ structure prediction does not necessarily correspond to an ‘easy’ function prediction. The defini- tion of ‘difficulty’ of a target has been the subject of many debates in the structure prediction field. Although clearly needed as well, equally complex is defining the difficulty of a function prediction. Here we took the view that a target not annotated in any database at the time of prediction is difficult to pre- dict. Given the time constraints imposed by the CASP experiment, most of the methods used in the experi- ment are automatic ones and it is unlikely that any of the predictions include a large amount of human inter- vention, unlike the case for curated databases. We feel it is inappropriate to try to derive a ranking of the various methods on the basis of such a limited dataset, but if, as we expect, participation in future experiments increases, it will be possible to derive con- clusions about the quality of different methods. More importantly, the number of consensus predictions will also increase, and this will allow a substantial number of correct functional predictions to be produced. The CASP6 assessor highlighted five cases where a consensus could be derived by comparing the different submitted predictions, although the design of the experiment did not allow the redundancy of the meth- ods to be taken into account at the time. Three of these targets (T0226, T0243 and T0263) were annotated between the end of CASP6 and the time that this analy- sis was performed. For T0243 and T0263, the newly deposited annotations match the consensus prediction. T0243 was predicted and proved to bind DNA, and T0263 was predicted to have oxidoreductase activity Table 3. List of targets annotated after October 2004 and the corresponding submitted predictions. Target, CASP6 target identifier; Class, target classification (CM, comparative modeling; FR, fold recognition; NF, new fold); Ann DB, annotation database (EG, Entrez Gene; IP, InterPro; UP, UniProt); GOF, GO molecular function identifier, according to GO database definition; Pred (Sub) , number of predictions submit- ted; NP, number of predictors; N (Cons) , number of consensus predictions found; Pred (Cons) , number of predictions generating the consensus. Bold, Annotations correctly predicted by at least one group. Target Class Ann DB GOF Pred (Sub) NP N (Cons) Pred (Cons) T0196 CM EG 3746, 8135, 3676 16 12 2 8 T0205 CM IP 3676, 5488 15 10 3 7 T0211 CM UP 16787, 3824 8 8 1 2 T0215 FR IP 4766, 16765, 8757, 8168, 16741, 16740, 3824 3 3 0 0 T0226 CM IP 5198 12 10 2 5 T0243 FR UP 3677, 3676, 5488 10 8 1 2 T0263 FR UP 3862, 16491 9 7 0 0 T0268 CM UP-IP 8168, 16741, 16740,3824 12 10 3 8 Table 4. GO function predictions by method class (October 2005 annotated targets only). Bin, Class binary identifier; Sub Pred, num- ber of predictions submitted by methods belonging to the class; Sub GOFN, number of GOF submitted by methods belonging to the class; Exact Pred, number of predictions (out of a total of 14) corresponding to annotations found in UniProt, Entrez Gene and InterPro databases; Exact GOFN, number of predicted GOF num- bers (out of a total of 12) corresponding to annotations found in UniProt, Entrez Gene and InterPro databases. Numbers in paren- theses indicate the predictions that can be clustered in terms of common GO parents (up to three levels). Bin Sub Pred (ConsP1-P3) Sub GOFN (ConsP1-P3) Exact Pred (ConsP1-P3) Exact GOFN (ConsP1-P3) 10000 1(1) 1(1) 1(1) 1(1) 10001 9(3) 9(3) 2(2) 2(2) 10011 17(10) 15(8) 7(6) 5(4) 11001 1(1) 1(1) 1(1) 1(1) 11011 8(5) 8(5) 1(1) 1(1) 10100 24(3) 22(2) 0(0) 0(0) 10101 1(1) 1(1) 1(1) 1(1) 11100 1(1) 1(1) 1(1) 1(1) M. Pellegrini-Calace et al. Prediction of protein function FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS 2981 and is indeed annotated as 3-isopropyl malate dehy- drogenase. Both consensus predictions were achieved by predictions submitted by two different methods (type 10100 and 10011 for T0243 and type 11011 and 11100 for T0263). For T0226, there were three consen- sus predictions: isomerase, transferase and sugar bind- ing; the current annotation suggests that the protein has a structural role, which may or may not be compat- ible with sugar binding. In the light of these findings, we can confidently conclude that consensus predictions, normalized on the basis of redundancy of the methods, could be useful to researchers in narrowing the number of biological assays needed for protein function assignments and speed up the difficult and challenging functional anno- tation process. As we anticipate that this will prove to be useful to our research colleagues, the consensus pre- dictions for targets with no current function annota- tion are reported in Table 5. Conclusions The prediction of protein function is one of the major challenges of protein bioinformatics. The growing number of completely sequenced genomes has allowed the development of a number of new approaches com- plementary to the use of sequence analysis, which can be combined to elucidate complete functional networks and biochemical pathways. The CASP community set up a new function predic- tion category aimed at understanding whether and in which cases computational methods are able to pro- vide useful information about the molecular or biologi- cal function of an unknown protein and to provide useful information to researchers working on the target proteins. Here we revisited the results of this CASP6 cate- gory with two aims: (a) to verify how relevant a knowledge of the methods used by predictors for assessing the results is; (b) to see to what extent infor- mation made available after the end of the experiment could contribute to our understanding of which are the best strategies for providing useful information to researchers. Our results show that consensus predictions gener- ated by diverse methods, i.e. methods exploiting differ- ent sources of information, are more reliable than predictions obtained by a single method and can be used as indicators of the reliability of the prediction. A general knowledge of the methods used is therefore important for understanding the level at which predic- tions can be trusted and should be made available to the CASP assessor in the next round of the experi- ment. The conclusions are based on a small number of cases as we can only use the few cases for which annotation is available now and not at the time of the CASP experiment in order to properly assess the correctness of the predictions. However, it is interest- ing to note that, if we trust that the submitted pre- dictions did not make use of the existing annotations, and therefore include predictions of already annotated targets in our analysis, a molecu- lar function and biological process was correctly pre- dicted for about half of the protein targets, and more than 80% of the exact predictions were within a consensus (data not shown). On the other hand, more than 30% of the 64 CASP protein targets still have no molecular function annotation in any data- base and more than half of them have no biological process annotation. Clearly therefore the assignment of functional data to proteins is still a very difficult task not only from a computational point of view, but also experimentally. We hope that the present analysis, and especially the observation of the high reliability of consensus predictions, will encourage predictors to participate in the next CASP functional prediction experiments as well as convince research- ers to take into account the results in designing their experiments. It is our opinion that the CASP func- tion prediction experiment can provide a significant contribution and promote important and useful development in the area of protein function predic- tion. Table 5. Consensus of predictions for non-annotated targets (as in October 2005). Target, Target identifier; GOF, predicted GO molecular function identifier; Function, molecular function descrip- tion; NP, number of predictors; N (comb) , number of different method binary identifiers, i.e. of nonredundant methods, that contribute to the consensus. Target GOF Function NP N (comb) T0212 5515 Selective protein binding 3 3 T0214 3677 DNA binding 2 2 T0216 8237 Metallopeptidase activity 2 1 T0222 50825 Ice binding (antifreeze activity) 4 1 T0227 3677 DNA binding 2 1 T0232 4364 Glutathione transferase activity 9 4 T0237 3793 Defense (immunity protein activity) 2 1 T0249 5515 Selective protein binding 2 2 30528 Transcription regulator activity 4 3 3700 Transcription factor activity 3 2 3677 DNA binding (functional hypothesis: transcription factor) 32 T0251 5489 Electron transporter 2 2 T0275 5524 ATP binding 4 3 Prediction of protein function M. Pellegrini-Calace et al. 2982 FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS Experimental procedures Submitted predictions are available at the CASP web site (http://www.predictioncenter.org). All analyses were performed using in-house built scripts in the PERL programming language. References 1 Wolfson HJ, Shatsky M, Schneidman-Duhovny D, Dror O, Shulman-Peleg A, Ma B & Nussinov R (2005) From structure to function: methods and applications. Curr Protein Pept Sci 6, 171–183. 2 Jones S & Thornton JM (2004) Searching for functional sites in protein structures. Curr Opin Chem Biol 8, 3–7. 3 Gabaldon T & Huynen MA (2004) Prediction of protein function and pathways in the genome era. Cell Mol Life Sci 61, 930–944. 4 Todd AE, Orengo CA & Thornton JM (1999) Evolu- tion of protein function from a structural perspective. Curr Opin Chem Biol 3, 548–556. 5 Thornton JM, Todd AE, Milburn D, Borkakoti N & Orengo CA (2000) From structure to function: approaches and limitations. Nat Struc Biol 7, 991–994. 6 Dietmann S & Holm L (2001) Identification of homol- ogy in protein structure classification. Nat Struct Biol 8, 953–957. 7 Orengo CA, Jones DT & Thornton JM (1994) Protein superfamilies and domain superfolds. Nature 372, 631– 634. 8 Devos D & Valencia A, (2000) Practical limits of func- tion prediction. Proteins: Structure, Function, Bioinfor- matics 41, 98–107. 9 Rost B (2002) Enzyme function less conserved than anticipated. J Mol Biol 318, 595–608. 10 Jeffery CJ (2003) Moonlighting proteins: old proteins learning new tricks. Trends Genet 19, 415–417. 11 Jeffery CJ (2003) Multifunctional proteins: examples of gene sharing. Ann Med 35, 28–35. 12 Moult J, Fidelis K, Rost B, Hubbard T & Tramontano A (2005) Proteins: Structure, Function, Bioinformatics Supplement 7, 3–7. 13 Bairoch A, Apweiler R, Wu CH, Barker WC, Boeck- mann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res 33, D154–D159. 14 Maglott D, Ostell J, Pruitt KD & Tatusova T (2005) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33, D54–D58. 15 Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bat- eman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, et al. (2005) InterPro, progress and status in 2005. Nucleic Acids Res 33, D201–D205. Supplementary material The following supplementary material is available online: Fig. S1. Percentage of consensus predictions as a func- tion of the redundancy [F(red)] of the contributing methods. (A) GOF functional class; (B) GOP func- tional class; (C) GOP functional class, ‘‘unknown bio- logical process’’ prediction excluded; (D) GOC functional class. Fig. S2. (A) Number of annotated targets versus time: the dotted blue line indicates the number of annotated targets for which the annotation did not change between December 2004 and October 2005. (B) Num- ber of non-annotated targets versus time. Table S1. Submitted predictions for targets for which there was at least one GOF annotation in October 2004, subsequently removed. This material is available as part of the online article from http://www.blackwell-synergy.com M. Pellegrini-Calace et al. Prediction of protein function FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS 2983 . of the 3D protein structure. Therefore, the set that was considered contained 64 protein targets, 29 of which had no functional annotation in any data- base at the time of the experiment. A function. advantage of the knowledge of the identity of the predicting groups as well as of functional annotations that have become available in the mean time. Our results show that a basic knowledge of the. [12]. This category is intrinsically different from the other CASP categories because, at the end of the experiment, the function of the target protein is likely still to be unknown. However, the analysis

Ngày đăng: 30/03/2014, 10:20

Tài liệu cùng người dùng

Tài liệu liên quan