Tài liệu Báo cáo khoa học: Domain deletions and substitutions in the modular protein evolution doc

11 608 0
Tài liệu Báo cáo khoa học: Domain deletions and substitutions in the modular protein evolution doc

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Domain deletions and substitutions in the modular protein evolution January Weiner 3rd, Francois Beaussart and Erich Bornberg-Bauer Division of Bioinformatics, School of Biological Sciences, The Westfalian Wilhelms University of Munster, Germany ă Keywords domain loss; ssion; fusion; protein domains; protein evolution Correspondence E Bornberg-Bauer, Division of Bioinformatics, School of Biological Sciences,The Westfalian Wilhelms University of Munster, Schlossplatz 4, ă D48149 Munster, Germany ă Fax: +49 251 8321631 Tel: +49 251 8321630 E-mail: ebb@uni-muenster.de (Received December 2005, revised 13 February 2006, accepted March 2006) doi:10.1111/j.1742-4658.2006.05220.x The main mechanisms shaping the modular evolution of proteins are gene duplication, fusion and fission, recombination and loss of fragments While a large body of research has focused on duplications and fusions, we concentrated, in this study, on how domains are lost We investigated motif databases and introduced a measure of protein similarity that is based on domain arrangements Proteins are represented as strings of domains and comparison was based on the classic dynamic alignment scheme We found that domain losses and duplications were more frequent at the ends of proteins We showed that losses can be explained by the introduction of start and stop codons which render the terminal domains nonfunctional, such that further shortening, until the whole domain is lost, is not evolutionarily selected against We demonstrated that domains which also occur as single-domain proteins are less likely to be lost at the N terminus and in the middle, than at the C terminus We conclude that fission ⁄ fusion events with single-domain proteins occur mostly at the C terminus We found that domain substitutions are rare, in particular in the middle of proteins.We also showed that many cases of substitutions or losses result from erroneous annotations, but we were also able to find courses of evolutionary events where domains vanish over time This is explained by a case study on the bacterial formate dehydrogenases Proteins are well known to evolve not only by point mutations, but also by modular rearrangements [1– 3] By and large, these rearrangements occur at the level of domains, which are independent folding units and have been proposed to represent the unit of modular evolution [3,4] Most domains always form the same combinations; that is, they are always found next to the same neighbours For example, domains found in ribosomal proteins are not found elsewhere and are present always in the same context Also, it has been reported that many domains appear in a very much conserved order (supradomains) [5], and that the frequent occurrence of certain modular arrangements (arrangements of modules along a sequence) across phyla is the result of conservation [6] While few domains co-occur with many others at least once in the same protein, most domains have few partner domains, or are even always singletons [3,7–9] Well-known examples of highly linked domains occurring in many different combinations are the P-loop nucleotide triphosphate hydrolase domain, the epidermal growth factor (EGF) domain, the SH3 domain, the P-kinase domain and the domains involved in the blood clotting cascade [1,10] The phenomenon of differential arrangements has often been termed domain mobility [11] However, this term may be misleading as it implies that single Abbreviations Domain ID, domain identification number; EGF, epidermal growth factor; FDHF, formate dehydrogenase H FEBS Journal 273 (2006) 2037–2047 ª 2006 The Authors Journal compilation ª 2006 FEBS 2037 Mechanisms shaping modular protein evolution J Weiner 3rd et al modules or small arrangements are being transferred from one protein to another Considering that often two modules or larger arrangements as such are fused into one protein, it becomes difficult to defne which of the modules is ‘mobile’ and which is ‘static’ Therefore, it has been suggested that the term versatility ahould be used instead of domain mobility [3,12] Independently of the perspective taken, the underlying mechanisms of modular rearrangements are mostly gene fusion and domain loss and, probably to a lesser extent, domain shuffling of exons and recombination [13–17] While the emergence of domain combinations is well documented [4,6,7,18–21], relatively little is known about domain losses In this article, we focus on how domains are lost Ultimately, this question is difficult to discern from the recruitment of domains because, in comparing two proteins, phylogenetic analysis is required to detect whether a domain has been recruited in one protein or lost in the other To deal with this problem, we investigated the possible genetic mechanisms that can cause a domain to be lost or gained As usual in sequence analysis, information on the history of evolution can only be assumed a posteriori, meaning that disadvantagous mutations (frameshifts, domain deletions, etc.) have been weeded out by negative selection Thus, we only observe events of modular rearrangements that are either beneficial or neutral For the sake of comprehensiveness, we used the ProDom database [22], which records conserved sequence fragments However, they are not always identical to structural domains To confer with the general definition of domains [3], all key results were confirmed using Pfam, which largely agrees with structural domain definitions [23] In the following study we first investigated whether the relative frequencies of deletions (or recruitements) depend on if a domain is at the end or the middle of a protein Unless explicitly stated, we used the term ‘deletion’ as synonymous for deletions and recruitments We then investigated whether eliminations are more frequently observed at the boundaries of domains and whether or not domain substitutions are frequent For that purpose, we categorized and described misannotations of domains to discern them from real substitutions or deletions of domains Next, we studied whether some domains are more often lost and whether frequencies of domain deletions depend on domain versatility Finally, we discussed the implications of our results for a wider understanding of modular protein evolution and the possibilities for generating a model in which modular protein evolution is 2038 formally described in terms of module edit operations and cost functions Results and Discussion Single domain deletions The first question we asked was whether the probability of a domain deletion is evenly distributed throughout a protein The null hypothesis was that genetic mechanisms which lead to domain deletions (for example, deletions and insertions of sequence fragments, intron recombinations, etc.) not depend on the position within the sequence However, two factors could cause a bias First, any point mutation that creates a premature stop codon will cause a C-terminal deletion of a protein Likewise, a mutation leading to the emergence of an alternative transcription or translation start will cause an N-terminal deletion Second, a fission producing two genes from one will result in the deletion of a terminal fragment from a protein or, vice versa, a fusion of two smaller proteins into one will result in the observed pattern We first grouped proteins by the number of domains they have (see the Materials and methods) For each protein, we searched for deletion events, that is, a protein which has exactly the same domain arrangement, except for a single domain missing anywhere in the arrangement Then we calculated the frequency of the deletion at each domain position within the group of proteins containing a given number of domains We found that the domain deletions are more common at either of the protein termini, and that their occurrence is slightly higher at one of the termini, depending on the number of domains in the protein and the database selected (Fig 1) The prevalence of terminal deletions did not depend on the number of domains in proteins, and the results for Pfam and ProDom databases were similar In only a few cases were slightly increased frequencies of domain deletions observed at a central position According to our predictions, this suggests that the genetic mechanism of domain deletions acts predominantly on sequence termini Therefore, we tentatively propose that the insertions of new transcription start and stop codons, as well as gene fusion and fission, are more likely to occur than, for example, intron mobility caused by exon shuffling Multiple domain deletions We supported the previous findings by analysing cases where one or more domains were deleted from a FEBS Journal 273 (2006) 2037–2047 ª 2006 The Authors Journal compilation ª 2006 FEBS J Weiner 3rd et al Mechanisms shaping modular protein evolution 0.9 0.8 0.8 0.7 0.7 0.6 Proportion of domains deleted 0.9 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 10 10 11 Position protein We considered only deletions in which at least half of the domains of the full length arrangement was preserved, to ensure that homologous arrangements were being compared The results were similar to those of single domain deletions, in that the terminal deletions were prevalent (see the Supplementary Material) In many cases, a deleted domain is a part of a larger, deleted fragment We have found that fragments deleted at either termini are, in general, much longer than fragments deleted within a protein sequence The deletions within the protein are much more often single domain deletions (Fig 2) The total number of deletions that concern only one, single domain, is higher for the positions between the termini However, the number of major deletions (deletions that span more than one domain) is higher at terminal positions This supports the view that the deletions generally involve the protein termini Number of occurencies Fig Statistics of single domain deletions in the whole SwissProt ⁄ TrEMBL set of proteins The figure shows the relative proportion of domain deletions at different positions within the proteins of length 4, 6, 10 and 11 domains Dark grey, Pfam; Light grey, ProDom Length of the deleted fragment (in domains) Fig Number of occurrences of domain deletions as a function of the length (in domains) of the deleted fragment Diamonds, N-terminal deletions; squares, deletions within the protein; circles, C-terminal deletions Single domain losses occur preferentially on one of the middle positions, whereas longer fragments tend to be deleted at the termini In-detail analysis of the deletion events During our analyses, we noted that some of the apparent domain deletions are actually just misannotations A lack of a domain identifier at a given position in a protein annotation does not necessarily mean that the corresponding domain is physically deleted Likewise, a different identifier does not necessarily signify a physical substitution To address this problem, we constructed clusters of similar proteins that contained at FEBS Journal 273 (2006) 2037–2047 ª 2006 The Authors Journal compilation ª 2006 FEBS 2039 Mechanisms shaping modular protein evolution J Weiner 3rd et al Table Criteria used to distinguish between various types of sequence rearrangements and annotation artefacts that result in a disappearance of a domain in the domain string of a protein Evolutionary events physical deletion a domain is physically deleted from the protein sequence, and only a short (

Ngày đăng: 19/02/2014, 07:20

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan