Báo cáo khoa học: Prediction of coenzyme specificity in dehydrogenases ⁄ reductases A hidden Markov model-based method and its application on complete genomes doc

8 481 0
Báo cáo khoa học: Prediction of coenzyme specificity in dehydrogenases ⁄ reductases A hidden Markov model-based method and its application on complete genomes doc

Đang tải... (xem toàn văn)

Thông tin tài liệu

Prediction of coenzyme specificity in dehydrogenases⁄ reductases A hidden Markov model-based method and its application on complete genomes Yvonne Kallberg 1,2 and Bengt Persson 1,2 1 IFM Bioinformatics, Linko ¨ ping University, Sweden 2 Centre for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden Dehydrogenases and reductases are enzymes of funda- mental metabolic importance that utilize coenzymes for electron transport (NAD(H), NADP(H) or FAD(H 2 ), herein denoted NAD, NADP and FAD). The enzymes bind the coenzyme through a double babab fold, resulting in a six-stranded b-sheet surroun- ded by a-helices, known as the Rossmann fold [1]. This domain is often found in combination with other domains of different folding types either on the N-ter- minal side, C-terminal side, or interrupting the Ross- mann fold [2]. For example, glutathione reductases have two domains of the Rossmann-fold type, one FAD-binding domain that is interrupted by an NAD(P)-binding domain (PDB code 3grs [3]). 6-Phos- phogluconate dehydrogenases have an NADP-binding domain of the Rossmann-fold type followed by a Keywords bioinformatics; coenzyme specificity; hidden Markov model; prediction; Rossmann fold Correspondence B. Persson, IFM Bioinformatics, Linko ¨ ping University, S-581 83 Linko ¨ ping, Sweden Fax: +46 13 137568 Tel: +46 13 282983 E-mail: bpn@ifm.liu.se (Received 13 December 2005, revised 17 January 2006, accepted 23 January 2006) doi:10.1111/j.1742-4658.2006.05153.x Dehydrogenases and reductases are enzymes of fundamental metabolic importance that often adopt a specific structure known as the Rossmann fold. This fold, consisting of a six-stranded b-sheet surrounded by a-helices, is responsible for coenzyme binding. We have developed a method to iden- tify Rossmann folds and predict their coenzyme specificity (NAD, NADP or FAD) using only the amino acid sequence as input. The method is based upon hidden Markov models and sequence pattern analysis. The pre- diction sensitivity is 79% and the selectivity close to 100%. The method was applied on a set of 68 genomes, representing the three kingdoms arch- aea, bacteria and eukaryota. In prokaryotes, 3% of the genes were found to code for Rossmann-fold proteins, while the corresponding ratio in euk- aryotes is only around 1%. In all genomes, NAD is the most preferred cofactor (41–49%), followed by NADP with 30–38%, while FAD is the least preferred cofactor (21%). However, the NAD preponderance over NADP is most pronounced in archaea, and least in eukaryotes. In all three kingdoms, only 3–8% of the Rossmann proteins are predicted to have more than one membrane-spanning segment, which is much lower than the frequency of membrane proteins in general. Analysis of the major protein types in eukaryotes reveals that the most common type (26%) of the Ross- mann proteins are short-chain dehydrogenases reductases. In addition, the identified Rossmann proteins were analyzed with respect to further protein types, enzyme classes and redundancy. The described method is available at http://www.ifm.liu.se/bioinfo, where the preferred coenzyme and its binding region are predicted given an amino acid sequence as input. Abbreviations ORF, open reading frame; SDR, short-chain dehydrogenase reductase; TM, transmembrane. FEBS Journal 273 (2006) 1177–1184 ª 2006 The Authors Journal compilation ª 2006 FEBS 1177 C-terminal catalytic domain consisting of a-helices only (PDB code 2pgd [4]). In the first part of the Rossmann fold (b 1 a 1 b 2 ), there are three glycine residues surrounded by hydrophobic residues, with the first glycine at the end of the b 1 strand and the other two at the beginning of the a 1 helix (Fig. 3, top right, Experimental procedures). The first two glycine residues are involved in dinucleotide binding, while the third is involved in the close packing of the b-strands and the a-helix [5]. Most of the early characterized dehydrogenases reductases showed a spacing of these glycine residues in a GxGxxG pattern, where ‘x’ denotes any residue [5,6]. However, as new members of this fold have been recognized, the general pattern is now described as Gx(x)Gx(x)G [7], i.e. the spacing between the glycine residues can be one or two residues. The members of the extended short-chain dehydrogenase reductase (SDR) family have this GxxGxxG pattern, whereas the classical SDRs still do not fit into the description, since they instead have a GxxxGxG pattern ([8]and references therein). The residues at the end of the b 2 strand normally guide identification of the nature of the coenzyme, i.e. if an enzyme binds FAD, NAD or NADP. In general, the presence of a negatively charged residue indicates that FAD or NAD is the preferred cofactor [5], due to the steric hindrance to accommodate the additional 2¢-phosphate found in NADP. NADP-preferring enzymes typically have a basic residue one position down-chain instead [5]. Among the classical SDRs, a basic residue at the position preceding the second gly- cine residue in the Gly-pattern also indicates that the enzyme prefers NADP over NAD [8]. A more difficult task is to distinguish between the coenzyme types FAD and NAD. Most NAD-prefer- ring enzymes have an aspartic acid residue at the end of the b 2 -strand, while FAD-preferring enzymes instead have a glutamic acid residue at this position. However, there are exceptions in both cases that pre- vent this feature to be used to differentiate between the two types. We have now developed a method that from the amino acid sequence alone identifies a protein with coenzyme binding of the Rossmann type, and predicts the coenzyme specificity. The method is applied to all eukaryotic and archaeal genomes and a representative set of bacterial genomes. Results and discussion We have developed a method for prediction of coen- zyme specificity, based upon hidden Markov models (HMMs) and sequence motifs (see Experimental proce- dures). To the best of our knowledge there is no pre- diction method available with the same applicability as the one presented here. A search in InterPro [9] using key words such as ‘Rossmann’, ‘NAD’, ‘NADP’ and ‘FAD’ reveals many entries but there is no single entry which can be used to identify the motifs of interest. While most entries are on protein family level, there are some on domain level as well, e.g. ‘NAD_BS’ (identifier IPR000205) which identifies NAD binding sites. However, this motif only identifies 29 gene prod- ucts in the human Ensembl [10] database, a number far below what could be expected. Rossmann fold in completed genomes The new method was applied to a selection of 68 com- pleted genomes, representing archaea, bacteria and eukaryota. In total, around 9200 Rossmann proteins were identified in these genomes. The median numbers of Rossmann proteins in each organism within eukary- otes, bacteria and archaea are 196, 67 and 59, respect- ively, corresponding to 1% of the eukaryotic proteins and 3% of the prokaryotic proteins. As expected, the number of predicted coenzyme binding proteins within a genome increases with its size (Fig. 1). The number of Rossmann folds has a steep increase for genomes with up to 10 000 open reading frames (ORFs), while it levels out for larger genomes. Among eukaryotes, Oryza sativa is at the top with 655 predicted Ross- mann proteins, and Trypanosoma brucei is at the bot- tom with only three Rossmann proteins. In bacteria, the corresponding extremes are Mycobacterium tuber- culosis (185 proteins) and Chlamydophila caviae (13 proteins), while in archaea the top and bottom is rep- resented by Haloarcula marismortui (146 proteins) and Nanoarchaeum equitans (five proteins). The genomes of Oryza sativa and Xenopus tropicalis have many more 0 100 200 300 400 500 600 700 800 0 10000 20000 30000 40000 50000 60000 70000 Open Reading Frames (ORFs) Rossmann Folds Archaea Bacteria Eukaryota Fig. 1. Number of coenzyme binding proteins in each genome plot- ted versus number of open reading frames. The number of Ross- mann-folds increase steeply for genomes with up to 10 000 ORFs, while it levels out for larger genomes. Prediction of coenzyme specificity Y. Kallberg and B. Persson 1178 FEBS Journal 273 (2006) 1177–1184 ª 2006 The Authors Journal compilation ª 2006 FEBS coenzyme binding proteins than the others (655 and 646, respectively), but given the size of their genomes ($61 000 and $53 000) the proportions are still within the same range as for other eukaryotes. There are four eukaryotic parasites (Plasmodium falciparum, Plasmo- dium yoelii, Leishmania major and Entamoeba histolyti- ca) for which the ratio of coenzyme binding proteins is much lower than expected, possibly due to their ability to rely on the dehydrogenase reductase systems of the host organism. Redundancy Prokaryotic species, with a typical maximum genome size of 5000 ORFs, have a moderate sequence redund- ancy among their coenzyme binding proteins. Using a threshold of maximum 60% pair-wise sequence iden- tity, 0–10% of the sequences are redundant. Most of the small eukaryotic genomes have a comparable level of redundancy. In general, the redundancy of Ross- mann proteins is similar to that of other proteins in the genomes. However, there are five genomes which do not follow this pattern. In Thermoplasma volcanium, Pyrococcus horikoshii, Thermococcus kodakaraensis, Candida glabrata and Yarrowia lipolytica, the Ross- mann proteins are two to three times more redundant than proteins in general. The redundancy among euk- aryotes increases with genome size and is 30–40% for genome sizes around 30 000 ORFs. There are some outliers, e.g. Apis mellifera, with a very high redund- ancy level of 54% in spite of a rather small genome ($17000 ORFs), but the redundancy in general in this genome is 46%. Comparing the two plant genomes, Arabidopsis thaliana and Oryza sativa, we find different redundancy in general (33% vs. 46%), while the num- bers are much closer considering Rossmann proteins only (40% versus 37%). Prediction of coenzyme specificity In general, for all kingdoms, NAD is the specificity most preferred, while FAD is the least (Table 1). Irres- pective of kingdom, FAD preference constitutes 21% on average, while the NAD and NADP ratios vary somewhat. For nearly all prokaryotic organisms, the NAD-preferring Rossmann folds are more numerous than the NADP-preferring (Fig. 2). The only excep- tions are Lactobacillus acidophilus, Staphylococcus aureus, Aeropyrum pernix, Pyrobaculum aerophilum, Sulfolobus tokodaii and Thermococcus kodakaraensis. However, among eukaryotes it can be seen that for most species the NAD- and NADP-preferring enzymes are close to equal in numbers. In plant, worm and insect, there is a majority of NADP-preferring enzymes while mammals and chicken have a majority of NAD- preferring enzymes. In a previous study of short chain dehydrogenases reductases (SDRs) it was found that NADP is more frequent than NAD in human, mouse, fruit fly, worm, plant and yeast [8]. As mentioned above, this is still valid when including all Rossmann- fold proteins for the lower organisms, but in human and mouse the balance is shifted and NAD is the most frequent coenzyme. Dual coenzyme sites Some proteins have two Rossmann binding sites; for example, the flavin monooxygenases with both an FAD and an NAD binding site. Out of the $9200 pro- teins predicted to have a Rossmann fold, almost 700 have more than one such fold. For all kingdoms, the fraction of Rossmann proteins with dual sites amount to 0–10%, with some exceptions. Among the eukaryo- tes Entamoeba histolytica, Plasmodium falciparum, and Plasmodium yoelii the proportion is 15, 18 and 15%, respectively. The bacterial genome of Chlamydophila caviae also show a dual sites proportion of 15%, while the archeal genomes of Thermococcus kodakaraensis and Nanoarchaeum equitans show 17 and 20%, respect- ively. These high ratios are partly caused by the low number of Rossmann-fold proteins. Protein families Among the annotated human Rossmann proteins, most proteins have EC numbers within main group 1 (oxidoreductases). However, there are several SDRs and multifunctional enzymes also within groups 3 (hydrolases), 4 (lyases), and 5 (isomerases), reflecting the versatility of the Rossmann fold. Among the eukaryotic genomes annotated by Ensembl, 60% of the Rossmann-fold proteins are found to belong to 10 major groups. The SDR super- family contributes with 26%, and is by far the largest group (Table 2). The three next largest groups are var- ious flavin-binding oxidoreductases with proportions each of around 6%. Closely related species show approximately the same number of proteins within Table 1. Average coenzyme preference among archaean, bacterial, and eukaryotic genomes. Kingdom FAD NAD NADP Archaea 0.21 0.49 0.30 Bacteria 0.21 0.46 0.33 Eukaryota 0.21 0.41 0.38 Y. Kallberg and B. Persson Prediction of coenzyme specificity FEBS Journal 273 (2006) 1177–1184 ª 2006 The Authors Journal compilation ª 2006 FEBS 1179 Fig. 2. Coenzyme preferences in all investi- gated genomes from eukaryota, bacteria and archaea. The left axis shows numbers of coenzyme binding proteins, and the right axis shows numbers of ORFs. Species names are given on the horizontal axis. Table 2. The 10 most common types of Rossmann-fold proteins in eukaryotic genomes. The types are listed according to annotation of Pfam families as given in the Ensembl entries. The fish genome is represented by Danio rerio, the fly by Drosophila melanogaster, the worm by Caenorhabditis elegans, and the yeast by Saccharomyces cerevisiae. The total column gives the percentage of all proteins of all types and all species included in the study. The species columns give the number of proteins of each type. Type Total proportion (%) Human Chimp Mouse Rat Fish Fly Worm Yeast Sum Short-chain dehydrogenases reductases 26 71 62 68 67 79 57 75 13 492 FAD-dependent pyridine nucleotide- disulphide oxidoreductases 717131623111186105 Flavin-containing amine oxidases 5 18 17 12 17 5 8 5 0 82 FAD-dependent oxidoreductases 5 15 14 12 10 5 9 8 1 74 Zinc-containing alcohol dehydrogenases 4 12 12 8 11 7 5 6 11 72 Lactate malate dehydrogenases 3 7 7 8 13 8 3 1 2 49 UBA THIF-type NAD FAD binding fold 3 10 8 8 8 1 4 3 3 45 Flavin-containing monooxygenases 3 7 7 10 10 3 2 5 1 45 D-isomer specific 2-hydroxyacid dehydrogenases 2 6 5 4 5 7 6 1 6 40 Aldehyde dehydrogenases 2 2 2 6 11 1 2 3 3 30 Prediction of coenzyme specificity Y. Kallberg and B. Persson 1180 FEBS Journal 273 (2006) 1177–1184 ª 2006 The Authors Journal compilation ª 2006 FEBS each family, but there are a few notable exceptions. Rat aldehyde dehydrogenases, for instance, are almost twice as frequent as mouse aldehyde dehydrogenases, and FAD-dependent pyridine nucleotide-disulphide oxidoreductases are also more numerous in rat com- pared to mouse. Another species which deviates from the general pattern is yeast. In this species, the fifth major group, zinc-containing alcohol dehydrogenases, has almost as many members as the SDRs (Table 2). Transmembrane regions A number of dehydrogenases and reductases are mem- brane-attached. The transmembrane (TM) helix can be found in either the N-terminal part of the protein, as in 11-beta hydroxysteroid dehydrogenase type 1 [11], or in the C-terminal, as in monoamine oxidase B [12]. There can also be multiple TM helices as, e.g. in the proton pumping nicotinamide nucleotide transhydroge- nase, a three domain protein with the first and third domain binding NAD and NADP, respectively, and the second domain consisting of 13–14 TM helices [13]. For all Rossmann proteins found in the genomes, transmembrane regions were predicted (see Experimen- tal procedures). Rossmann-fold regions are sometimes falsely predicted as TM regions, due to the hydropho- bic nature of the fold. In this study, over half (57%) of the predicted membrane-bound proteins were found to have at least one TM region predicted in the Ross- mann fold. These predicted TM segments were there- fore excluded in this analysis. As the TM prediction ambiguities are considerable, Rossmann-fold predic- tions could be used to increase the reliability of TM predictions. While the average proportion of membrane proteins with two transmembrane segments or more is about 15–30% in all kingdoms [14,15], the proportion of membrane-bound Rossmann-fold proteins only amounts to 3–8% (Table 3). The proportion of mem- brane bound proteins with Rossmann fold is about twice as high in eukaryotes as in prokaryotes. It was also noticed that the organisms, even closely related ones, showed considerable variations in how many Rossmann proteins had TM regions. There are three parasites with a very high proportion of Rossmann membrane proteins, Plasmodium falciparum and Plas- modium yoelii with one-third each, and Encephalito- zoon cuniculi with as many as five of its six predicted Rossmann proteins also being predicted as membrane proteins. The majority of proteins was found to harbor one or two TM segments ($800 proteins vs. $350 proteins with more than two TM helices), with one TM most usual ($600 proteins). A positioning of the TM seg- ments C-terminally of the coenzyme binding site was twice as common as an N-terminally positioning. Looking at differences in TM attachment between the various coenzyme specificities it was found that NADP-preferring enzymes are the most common type to be membrane bound. Around 44% ($500 proteins) of the Rossmann membrane proteins are NADP-pre- ferring, which is a larger proportion than Rossmann NADP-preferring proteins in general ( $36%, Table 4). Inversely, NAD-preferring membrane proteins amount to 33% ($400 proteins) which is lower than the fre- quency in general ($43%, Table 4). Finally, FAD- preference is 15% (close to 200 proteins), also below the general occurrence ($21%). Thus, NADP prefer- ence is overrepresented, while NAD and FAD pre- ferences are underrepresented. Protein sequences predicted to have two or more coenzyme binding sites were the least common to be membrane bound, with only $100 sequences out of $670 predicted to have TM helices. In the human genome, there are 45 Rossmann proteins with predicted TM regions. The three main families found among them are the SDRs (27%), flavin-containing monooxygenases (13%) and F420- dependent oxidoreductases (11%). Proteins of the Rossmann-fold type constitute a con- siderable group with many members. These proteins display great versatility in terms of functions and sequence compositions. In spite of these differences, Table 3. Proportion of Rossmann-fold membrane proteins, with more than one predicted transmembrane region, compared to membrane proteins in general. Archaea Bacteria Eukaryota Rossmann-fold proteins 0.04 0.03 0.08 All proteins [16] 0.14 0.15 0.14 Table 4. Distribution of various types of Rossmann-fold transmem- brane proteins with different coenzyme specificities. 1N and 2N indicate 1 and 2 transmembrane segments N-terminally of the co- enzyme binding site. Similarly, 1C and 2C denote 1 and 2 trans- membrane segments C-terminally of the coenzyme binding site. >2 TM indicates more than two transmembrane segments, irrespect- ive of the coenzyme binding site location. The numbers include all 68 investigated genomes. Coenzyme 1N TM 2N TM 1C TM 2C TM >2 TM All FAD 25 7 87 19 41 179 NAD 71 27 115 28 135 376 NADP 100 17 142 60 186 505 Dual 17 6 56 9 8 96 Total 213 57 400 116 370 1156 Y. Kallberg and B. Persson Prediction of coenzyme specificity FEBS Journal 273 (2006) 1177–1184 ª 2006 The Authors Journal compilation ª 2006 FEBS 1181 Fig. 3. Overview of the novel prediction method. Sample sequences of Rossmann-fold motif are shown (top right). a and b denotes secon- dary structure elements. Arrows indicate positions of critical importance for coenzyme specificity prediction. In the flow chart, the boxes describe the different steps of the method. Prediction of coenzyme specificity Y. Kallberg and B. Persson 1182 FEBS Journal 273 (2006) 1177–1184 ª 2006 The Authors Journal compilation ª 2006 FEBS our study demonstrates the power of sequence-based predictions. It is our hope and belief that the presented prediction tool will be a welcome addition to the arsenal of analysis methods available for large scale protein function exploration. The prediction tool is available via http://www.ifm.liu.se/bioinfo, where a web form allows the user to enter one or several amino acid sequence(s) and in return get the Rossmann-fold prediction with estimated coenzyme preference and position. Experimental procedures We have developed a method which identifies coenzyme binding regions in proteins, and also predicts if the specific- ity is FAD, NAD or NADP. The method is based upon a combination of HMMs and sequence motif matching as outlined in Fig. 3. The HMMs are used to extract a num- ber of potential hits which subsequently are exposed to a filtering process followed by prediction of coenzyme specif- icity. During the development phase, different combinations of HMMs were tried: one for each type of specificity, one for all, and one for FAD-binding combined with one for NAD(P)-binding proteins. The latter was found to be the best solution in terms of specificity and selectivity. All HMMs were developed using the hmmbuild command in HMMer [17], with the parameters –F and –fast, followed by the hmmcalibrate command. The ASTRAL database [18], version 1.65 with maximum 30% sequence identity, was used to obtain a trustworthy test set. The selected proteins belong to the folds ‘NAD(P)- binding Rossmann-fold domain’, ‘FAD NAD(P)-binding domain’ and ‘Nucleotide-binding domain’. The dataset was scrutinized and only proteins utilizing FAD or NAD(P) in a typical manner were used, i.e. only selecting sequences with Gly, Ser or Ala in the key positions g 1 ,g 2 and g 3 in Fig. 1. A total of 16 proteins were removed, of which five do not bind the coenzymes of interest and the others devi- ate in their coenzyme-binding manner. The resulting data set, with 120 members, was manually aligned based upon their three-dimensional structures, and divided into six groups with an even distribution of the three coenzyme spe- cificities in each group (Supplement Tables 1–3). These groups were then included in a six-fold jack-knife test, iter- atively training the two HMMs, one with FAD-binding sequences and one with NAD(P)-binding sequences, using sequences from five of the groups and testing against the remaining group and a false data set. The false data sets were created by dividing the remaining sequences in the ASTRAL data set (4701 sequences) into six equally sized groups. As the method is divided into two steps, true coenzyme binding proteins can be lost either during the database search or during the classification. Only two FAD-binding proteins are lost (false negatives): one is classified as NADP-binding and the other is classified as false, i.e. non- Rossmann fold. Among the NAD-binding proteins a total of 10 are false negatives: four are lost during the database search, five are classified as false, and one is classified as NADP-binding. The group with most failures is NADP- binding proteins, with a total of 13 false negatives: eight are lost during database search, three are classified as false, and two are falsely predicted to be NAD-binding. False positives, i.e. protein sequences falsely predicted to have certain coenzyme specificities, can be of two types: either they do not bind the coenzymes of interest or they do but the coenzyme preference is not correctly predicted. Initially, during the database search, 62 proteins were picked up which do not bind any of the coenzymes of inter- est. However, only three of them remain as false positives after the classification step: molybdenum cofactor biosyn- thesis protein (1jw9, MoeB), glycinamide ribonucleotide transformylase (1kjq, PurT), and a cell division protein (1ofu, FtsZ). In common for all three is a Rossmann-fold- like structure at the predicted coenzyme binding site. MoeB and PurT are ATP-binding proteins, but while the predicted coenzyme binding region in MoeB is in contact with ATP, in PurT it is the substrate (glycinamide ribonucleotide) which is in contact with the corresponding region. FtsZ is a GTPase and its coenzyme is in contact with the region fal- sely predicted to be NADP-bound. In addition to these three there are four Rossmann-fold proteins where the wrong coenzyme is predicted, rendering a total of seven false positives. Table 5. Prediction sensitivity and specificity of the novel prediction method as judged towards the ASTRAL database. TP ¼ true positives, FP ¼ false positives, FN ¼ false negatives, TN ¼ true negatives. The sensitivity was calculated as TP TP þFN , the specificity as 1 À FP FP þTN ,and Matthews correlation coefficient as ðTP ÃTNÀFP ÃFNÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðTP þFP ÞðTPþFNÞðTNþFP ÞðTNþFN Þ p . Coenzyme TP FP FN TN Sensitivity Specificity Database size Matthews correlation coefficient FAD 26 0 2 4793 0.929 1.000 4821 0.96 NAD 38 4 10 4769 0.792 0.999 4821 0.84 NADP 31 3 13 4774 0.705 0.999 4821 0.80 Total 95 7 25 4694 0.792 0.999 4821 0.86 Y. Kallberg and B. Persson Prediction of coenzyme specificity FEBS Journal 273 (2006) 1177–1184 ª 2006 The Authors Journal compilation ª 2006 FEBS 1183 All in all, for 95 of 120 sequences the correct coenzyme specificity was predicted and only seven of 4701 sequences were false positives, yielding an overall prediction sensitivity of 79.2%, a specificity of 99.9% and a Matthews correla- tion coefficient of 0.86 (Table 5). The method, using HMMs trained on all six groups, was applied on 68 genomes: all available among eukaryotes (30) and archaea (18), and a representative selection of 20 bac- terial genomes. Genome sequences were downloaded from ENSEMBL (ftp://ftp.ensembl.org/pub/release-30/), NCBI (ftp://ftp.ncbi.nih.gov/genomes/) and TIGR (ftp://ftp. tigr.org/pub/data/). TM regions were predicted using phobius [19], a tool based on HMMs, with ability to differentiate between sig- nal sequences and true transmembrane sequences. The TM regions were subsequently scrutinized, and in those cases they overlap with a predicted Rossmann-fold region (coen- zyme binding site plus 65 residues), the transmembrane pre- diction was ignored. References 1 Rossmann MG, Liljas A, Bra ¨ nde ´ n C-I & Banaszak LJ (1975) In (Boyer, P D, eds), The Enzymes, Vol. 11, 3rd edn. pp. 61–102. Academic Press, New York. 2 Brenner SE, Chothia C, Hubbard TJP & Murzin AG (1996) Understanding protein structure: using scop for fold interpretation. Methods Enzymol 266, 635–643. 3 Schulz GE, Schirmer RH, Sachsenheimer W & Pai EF (1978) The structure of the flavoenzyme glutathione reductase. Nature 273, 120–124. 4 Adams MJ, Ellis GH, Gover S, Naylor CE & Phillips C (1994) Crystallographic study of coenzyme, coenzyme analogue and substrate binding in 6-phosphogluconate dehydrogenase: implications for NADP specificity and the enzyme mechanism. Structure 2, 651–668. 5 Wierenga RK, De Maeyer MCH & Hol GJ (1985) Interaction of pyrophosphate moieties with a-helixes in dinucleotide binding proteins. Biochemistry 24, 1346– 1357. 6 Wierenga RK, Terpstra P & Hol WGJ (1986) Prediction of the occurrence of the ADP-binding beta alpha beta- fold in proteins, using an amino acid sequence finger- print. J Mol Biol 187, 101–107. 7 Carugo O & Argos P (1997) NADP-dependent enzymes. I: Conserved stereochemistry of cofactor binding. Pro- teins 28, 10–28. 8 Kallberg Y, Oppermann U, Jo ¨ rnvall H & Persson B (2002) Short-chain dehydrogenases reductases (SDRs). Eur J Biochem 269, 4409–4417. 9 Mulder NJ, Apweiler R, Attwood TK, et al. (2005) InterPro, progress and status in 2005. Nucleic Acids Res 33, D201–205. 10 Hubbard T, Andrews D, Caccamo M, et al. (2005) Ensembl 2005. Nucleic Acids Res 33, D447–453. 11 Odermatt A, Arnold P, Stauffer A, Frey BM & Frey FJ (1999) The N-terminal anchor sequences of 11beta- hydroxysteroid dehydrogenases determine their orienta- tion in the endoplasmic reticulum membrane. J Biol Chem 274, 28762–28770. 12 Binda C, Hubalek F, Li M, Edmondson DE & Mattevi A (2004) Crystal structure of human monoamine oxi- dase B, a drug target enzyme monotopically inserted into the mitochondrial outer membrane. FEBS Lett 564, 225–228. 13 Jackson JB, Peake SJ & White SA (1999) Structure and mechanism of proton-translocating transhydrogenase. FEBS Lett 464, 1–8. 14 Liu J & Rost B (2001) Comparing function and struc- ture between entire proteomes. Protein Sci 10, 1970– 1979. 15 Krogh A, Larsson B, von Heijne G & Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete gen- omes. J Mol Biol 305, 567–580. 16 Nilsson J, Persson B & von Heijne G (2005) Compara- tive analysis of amino acid distributions in integral membrane proteins from 107 genomes. Proteins 60, 606–616. 17 Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14, 755–763 (http://hmmer.wustl.edu ). 18 Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M & Brenner SE (2004) The ASTRAL Com- pendium in 2004. Nucleic Acids Res 32, 189–192. 19 Ka ¨ ll L, Krogh A & Sonnhammer EL (2004) A com- bined transmembrane topology and signal peptide pre- diction method. J Mol Biol 338, 1027–1036. Supplementary material The following supplementary material is available online: Table S1. All enzymes used in the development of the prediction method. Table S2. Alignment of NAD- and NADP-preferring enzymes used in the development of the prediction method. Table S3. Alignment of FAD-preferring enzymes used in the development of the prediction method. This material is available as part of the online article from http://www.blackwell-synergy.com Prediction of coenzyme specificity Y. Kallberg and B. Persson 1184 FEBS Journal 273 (2006) 1177–1184 ª 2006 The Authors Journal compilation ª 2006 FEBS . Prediction of coenzyme specificity in dehydrogenases reductases A hidden Markov model-based method and its application on complete genomes Yvonne Kallberg 1,2 and. zinc-containing alcohol dehydrogenases, has almost as many members as the SDRs (Table 2). Transmembrane regions A number of dehydrogenases and reductases

Ngày đăng: 23/03/2014, 10:21

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan