生命科学科3年後期� 分子細胞生物�I 相同アミノ酸配列の比較解析 藤 博幸 関西学院大学理工学部生命医化学科 Outline 1. イントロダクション 2. 膜タンパク質のトポロジー反転 3. ケモカイン受容体 Evolu&onaryfateandfunc&onalconsequence Gene A Gene A' duplication Gene A Non-processed pseudogene A Function A Non-functionalization (pseudogenization) Gene A Gene B Neofunctionalization Functions A + B Gene A' Subfunctionalization Gene A'' Functions A' + A'' = A Evolu&onaryfateandfunc&onalconsequence Gene A Gene A' duplication Gene A Non-processed pseudogene A Function A Non-functionalization (pseudogenization) Gene A Gene B Neofunctionalization Functions A + B Gene A' Subfunctionalization Gene A'' Functions A' + A'' = A Evolu&onaryfateandfunc&onalconsequence Gene A Gene A' duplication ! Which amino acid sitesNon-processed are related to ! pseudogene A Gene A the functional divergence ?! Non-functionalization Function A ! (pseudogenization) Gene A Gene B Neofunctionalization Functions A + B Gene A' Subfunctionalization Gene A'' Functions A' + A'' = A Classical Approach ! to identify the critical sites! for functional divergence ! ! - Evolution of Prostaglandin D Synthase - ! 2 Nagata, A., Suzuki, Y., Igarashi, M., Eguchi, N., Toh, H., Urade, Y.,Hayaishi, O. Proc. Natl. Acad. Sci. USA 88, 4020-4024 (1991). Igarashi, M., Nagata, A., Toh, H., Urade, Y., Hayaishi, O. Proc. Natl. Acad. Sci. USA 89, 5376-5380 (1992). PGD synthase O COOH HO COOH O O OH PGH2 OH PGD2 PGD Synthase about 190 a.a. Amino Acid Sequence Database Database Searching Lipocalins mouse PGD synthase� human neutrophil gelatinase-associated lipocalin� Lipocalin Family! Diverse family of secretory proteins involved ! 分泌蛋白質から構成されるグループで、疎水性の低分子に結合し、その輸送に携わっている。 in binding and transport of small hydrophobic molecules secretory tissue! lipocalin! Small hydrophobic ! molecules! target cell! PGD synthases enzyme! ! Lipocalins! transporter! = non-enzyme! ! ! vertebrates! from bacteria to eukaruyotes! Which sites are involved in acquisition ! of the catalytic activity ? ! PGD synthase is inactivated by treatment with . SH X SH-Modifier� Cys residues may be involved in the catalytic reaction of the enzyme. C C C C C C C C C C C C C C C Site-Directed Mutagenesis Cys Ala, Ser Cys S S Mutants showed the activity comparable to that of their parent enzyme. SH Cys Cys Mutants lost the enzyme activity. More systematic and automatic method ! to detect the critical sites! for functional divergence! Deep insight into evolution of protein function Clue for design and/or alteration ! of protein function! Substrate A Substrate B amino acid sequence alignment ! consisting of groups with different functions 0 40058-6 04 0585 2 05835 0 098 0 098 0 098 0 098 conservation! 5 5 5 5 5 0 -15 17 5 5 3 0 9454 0 015517 5 533 3 0 945 -3 15 17553333 0 464 515 1715 5 25 515 1715 5 25 515 1715 5 25 515 1715 5 25 0 098 56 515 1715 0 098 3 5 015 1715 0 93 983 550 16 1715 374 0 5 0057 17 5 74 5 03 -5 055 6 1715 574 3 5 055 5517 5 574 3 5 055 5517 5 5 5 55 5 5 65 65 25 25 574 33 5 574 3 5 574 3. 5 574 3 5 374 0 33 5 374 0 33 5 7 53 35 37 3 5 65 65 65 35 65 65 65 65 35 1 35 31 35 31 5 9312 35 1 35 1 35 35 7 - 055 055 055 0 5 055 055 055 55 5517 5517 5517 5517 5517 5517 5517 5 17 5 5 5 5 5 5 5 5 2 0- 434 3 77454 1237 35 7 35 1 amino acid composition evolutionary rata! Conservation Evolutionary Trace Quantitative Evolutionary Trace Evolutionary Rate Hierarchical Conservation Analysis Diverge Amino Acid Composition Cumulative Relative Entropy Relative Entropy among Paralogs Level Entropy Branch Length Outline 1. イントロダクション 2. 膜タンパク質のトポロジー反転 3. ケモカイン受容体 Aquaporin Fu et al.. Science 290, 481-486 (2000). Murata et al.. Nature 407, 599-605 (2000). ClC chloride ion channel Dutzler et al. Nature 415, 287-294 (2002). a b N c a b c C ・The N-terminal domain is homologous to the C-terminal domain. ・The arrangement of the two domains are opposite to each other. Membrane proteins have bias in amino acid composition at the membrane boundary. ex) positive inside rule The amino acid composition of cytoplasmic proteins is different from that of the extracellular proteins extracellular region membrane +� +� +� +� +� +� cytoplasmic region von Heijne, Nature, 341, 456-458 (1989) Nakashima & Nishikawa FEBS Lett. 303, 141-146 (1992) Nakashima & Nishikawa J. Mol. Biol. 238, 54-61 (1994) a b N c a b c C ・The N-terminal domain is homologous to the C-terminal domain. ・The arrangement of the two domains are opposite to each other. The two domains are expected to have evolved under deferent constraints. Different constraints are considered to have been working on the interfaces against the extracellular and cytoplasmic environments. Extracellular Region Membrane Pore Surface Cytoplasmic Region 3. Methods (1) Multiple Alignment (a) Collection of homologous amino acid sequences by database searching Sequence DB N-termnal domains C-terminal domains (b) Cleavage of the obtained sequences into the N- and the C-terminal domains (c) Multiple alignment of each domain (d) Profile alignment between the alignments of the N- and the C-terminal domains Evaluation of difference between two domains at each alignment site! Amino acid residue frequency Ala Arg at the alignment site i in the 0.05 0.03 chmokine receptor Site 1 . . . Site i . . . Trp 0.01 Site L Protein 1 Protein 2 Protein 3 Protein M chemokine receptor Protein 1 Protein 2 Protein 3 Protein N a cluster of decoy or viral receptor Ala Arg Amino acid residue frequency 0.02 at the alignment site i in the a group of decoy or viral receptor 0.05 Trp 0.03 Estimation of Amino Acid Composition ! at Each Alignment Site ・Taxonomic Bias! Henikoff & Henikoff weight! ・Unobserved Residue Pseudocounts adopted ! in PSI-BLAST ※ it is the same method used for the calculation of PSSM in ! PSI-BLAST (β = 0.1)! ※ BLAST parameter λu was obtained by Newton-Laphson method ! at each calculation.! ※ CRE uses Dirichlet mixture as a prior instead of pseudocount.! Evaluation of difference between two domains at each alignment site! Amino acid residue frequency Ala Arg at the alignment site i in the 0.05 0.03 chmokine receptor Site 1 . . . Site i . . . Trp 0.01 Site L Protein 1 Protein 2 Protein 3 Protein M chemokine receptor Protein 1 Protein 2 Protein 3 Protein N a cluster of decoy or viral Kulback-Leibler information between the two groups calculated at each alignment site. receptor Ala Arg Amino acid residue frequency 0.02 at the alignment site i in the a group of decoy or viral receptor 0.05 Trp 0.03 The difference between two probability distributions can be ! quantitatively evaluated with Kullback-Leibler information (KLI).! (1) Definition of KLI! p(1)+p(2)+p(3)+ . . . +p(20)=1.0 20 Σ p(i) log i=1 p(i) q(i) (2) Asymmetry of KLI! … p(i) 20 Σ p(i) log i=1 q(i) = 20 Σ q(i) log i=1 q(i) p(i) (3) Modified KLI used in this study.! 20 … Σ p(i) log i=1 q(1)+q(2)+q(3)+ . . . +q(20)=1.0 p(i) q(i) + 20 Σ q(i) log i=1 q(i) p(i) Sites with top 5% KLI! Different Constraints Difference in Amino Acid Composition or Conservation Pattern If an alignment site shows large difference between the N- and the C-terminal domains, the site is considered to have been subject to different constraints between the domains. (1) How to evaluate the difference between two domains? (2) How large difference is considered to be significant? Two Problems for estimation of residue frequency at each alignment site. � ������ ・Taxonomic Bias Henikoff & Henikoff weight ・Unobserved residue Pseudocount adopted in PSI-BLAST ※ BLASTparameterλis obtained by Newton-Laphson method at each calculation. ※ As the background compostion of amino acid residues, the compostion obtained from database analysis, and the one obtained from the multiple alignment under consideration were examined. However, no significant difference was observed. ・Number of Sequences used for the analysis aquaporin family 50 sequences (The alignment consists of 100 sequences) ClC chloride channel family 50 sequences (The alignment consists of 100 sequences) ・The distribution of the KL information follows Γ distribution. 30 60 25 50 20 40 系列1 15 系列2 10 20 5 10 0 0 -1.5 1 -1.02 -0.53 0.0 4 0.5 5 1.0 61.5 7 2.0 8 2.5 9 3.0 103.5 11 * aquaprorin family 系列1 30 系列2 -2.01 -1.52 -1.03 -0.54 0.05 0.56 1.0 7 1.5 8 2.0 9 2.510 3.011 3.512 * ClC chloride channel family χ2 = 4.176 χ2(6, 0.01) =16.819 χ2 = 5.625 χ2(7, 0.01) =18.473 Oberved frq. Expected frq. 4. Results & Discussion The residues of 1J4N and the amino acid compositions corresponding to the sites selected from the alignment of aquaporins N (S28) C (V155) A R N D C Q E G H I L K M F P S T W Y V 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.39 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.47 N (F58) C (I174) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.14 0.00 0.00 0.00 0.00 0.53 0.00 0.00 0.00 0.28 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.65 0.00 0.00 0.00 0.07 0.06 0.00 0.00 0.00 0.00 0.22 N (H76) C (G192) 0.05 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.00 0.00 0.00 0.05 0.00 0.00 0.21 0.00 0.00 0.00 0.00 0.00 0.00 0.17 0.37 0.00 0.00 0.00 0.00 N (V81) C (R197) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.03 0.00 0.05 0.05 0.00 0.00 0.00 0.00 0.00 0.86 0.00 0.95 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 N (L85) C (S201) 0.07 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.49 0.00 0.14 0.19 0.00 0.00 0.00 0.00 0.05 0.04 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.78 0.05 0.00 0.00 0.00 0.03 N (I97) C (W212) 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.22 0.00 0.00 0.06 0.23 0.00 0.00 0.00 0.00 0.22 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.06 0.00 0.00 0.00 0.04 0.00 0.00 0.83 0.00 0.00 N (Q103) C (P218) 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.97 0.00 0.03 0.00 0.00 0.00 N (L114) C (Y229) 0.11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.45 0.00 0.03 0.03 0.00 0.00 0.05 0.00 0.00 0.31 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.15 0.03 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.04 0.73 0.00 The residues of 1J4N and the amino acid compositions corresponding to the sites selected from the alignment of ClC chloride ion channels N (R147) C (I356) A R N D C Q E G H I L K M F P S T W Y V 0.03 0.38 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.52 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.06 0.00 0.00 0.00 0.00 0.03 0.04 0.02 0.00 0.27 0.23 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.06 0.27 N (E148) C (F357) 0.00 0.03 0.00 0.00 0.00 0.00 0.91 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.83 0.00 0.00 0.00 0.00 0.05 0.07 N (Q153) C (A362) 0.00 0.00 0.00 0.00 0.00 0.36 0.04 0.00 0.55 0.00 0.02 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.24 0.00 0.05 0.00 0.00 0.00 0.00 0.02 0.00 0.03 0.14 0.00 0.03 0.13 0.00 0.07 0.00 0.00 0.06 0.23 N (R174) C (A386) 0.07 0.60 0.00 0.00 0.00 0.03 0.00 0.07 0.00 0.00 0.00 0.15 0.00 0.00 0.00 0.02 0.02 0.00 0.04 0.00 0.13 0.00 0.00 0.00 0.00 0.07 0.05 0.00 0.00 0.08 0.00 0.00 0.00 0.00 0.53 0.06 0.00 0.00 0.00 0.08 N (H175) C (G387) 0.05 0.59 0.03 0.00 0.00 0.00 0.00 0.02 0.10 0.00 0.00 0.05 0.04 0.04 0.00 0.02 0.04 0.00 0.03 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.02 0.73 0.00 0.00 0.00 0.00 0.00 0.00 0.06 0.03 0.07 0.00 0.00 0.00 N (G185) C (L397) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.09 0.07 0.00 0.00 0.00 0.00 0.17 0.00 0.05 0.29 0.00 0.02 0.09 0.00 0.13 0.08 N (F190) C (V402) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.11 0.00 0.00 0.00 0.03 0.00 0.00 0.07 0.00 0.02 0.00 0.00 0.13 0.00 0.00 0.06 0.30 0.00 0.00 0.28 N (gap) C (gap) 0.03 0.07 0.09 0.00 0.06 0.04 0.00 0.24 0.05 0.03 0.02 0.00 0.00 0.09 0.00 0.02 0.00 0.00 0.05 0.21 0.00 0.00 0.00 0.68 0.00 0.00 0.26 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 (For the calculation of amino acid composition, Henikoff & Henikoff weight is used buy pseudocount is not introduced.)� Clustering of the residues corresponding to the selected alignment sites is statistically significant. Aquaporin(1J4N) : Out of 249 residues, 50 residues constitute the pore surface. Out of the 16 residues corresponding to the selected alignment sites, 11 residues are present on the pore surface. M21, F24, I25, S28, I29, A32, L33, F35, H36, Q43, F58, I62, A75, H76, L77, N78, A80, V81, L85, S88, Q90, T111, L114, T118, L121, N124, S125, G127, N129, T148, L151, V152, V155, L156,159, T160, R161,R162, I174, V178, H182, G190, C191, G192, I193, N194, R197, S201, V226, R236 i 16−i 16! $ 50 ' $ 249 − 50 ' ∑ i!(16 − i)!&% 249 )( &% 249 )( = 0.00003393 i=11 16 ClC chloride channel(1KPL) : Out of 451 residues, 36 residues constitute pore surface. Out of 14 residues corresponding to the selected alignment sites, 5 residues are present on the pore surface. € V51, E54, S107, I109, P110, G146, R147, E148, G149, P150, T151, V152, A188, A189, F190, F229, N233, G234, A236, I238, N270, V273, L274, Q277, D278, F317, F348, G355, I356, F357, A358, P359, M360, L444, Y445, I448 i 14−i 14! $ 36 ' $ 451− 36 ' ∑ i!(14 − i)!&% 451)( &% 451 )( = 0.00351064 i= 5 14 extracellular region € cytoplasmic region Different constraints are considered to have been working on the interfaces against the extracellular and cytoplasmic environments. Extracellular Region Membrane Pore Surface Cytoplasmic Region positive inside rule Extracellular environment Cytosolic environment +� +� +� +� +� +� The efficiency to detect the residues related to positive-inside rule was not so good. The method was modified to increase the efficiency. Reorganization of amino acid compostion Ala Arg Trp 0.01 0.05 0.03 Lys+Arg remaining 0.012 0.988 site1 . . . site i . . . site L Protein 1 Protein 2 Protein 3 ��� Protein N N-terminal domain Protein 1 Protein 2 Protein 3 ��� Protein N C-terminal domain Ala Arg 0.02 0.05 � � Trp 0.03 KLI between the two domains is calculated at each alignment site. Lys+Arg 0.134 remainings 0.866 Distributions of KLI 160 100 140 90 80 120 70 100 60 50 系列1 80 系列1 60 40 30 40 20 20 10 0 -0.5 1 0.0 2 0.5 3 1.0 0 41.5 52.0 2.5 6 �aquaporin family 73.0 � 3.5 > 3.5 8 9 -0.5 1 0.0 2 0.5 3 1.0 41.5 52.0 62.5 73.0 83.5 >3.5 9 ClC chloride ion channel family χ2 =28.110 χ2(2, 0.01) =9.21 χ2 =36.717 χ2(3, 0.01) =11.34 ・The distributions do not follow the Γ distribution ・We used the kernel method to estimate the distribution non-parametrically. Quaritc kernel is used for the estimation. aquaporin N C N C N C N C N C ClC chloride ion channel (R12) c (Q139) e 0.51 0.03 0.20 0.02 N (L77) C (gap) e 0.07 0.22 0.00 0.37 (F35) e (R162) c 0.00 0.31 0.04 0.23 N (P80) e C (G285) c 0.00 0.27 0.00 0.09 (V81) p (R197) p 0.00 0.95 0.00 0.00 N (A90) ? C (G295) ? 0.00 0.19 0.00 0.15 (L86) c (S202) e 0.00 0.32 0.00 0.05 N (E113) c C (P321) e 0.00 0.00 0.58 0.00 0.46 0.00 0.26 0.00 0.38 0.00 0.52 0.00 0.60 0.00 0.15 0.00 N (R175) c C (G387) e 0.59 0.00 0.05 0.00 N (L212) c C (P424) e 0.21 0.00 0.18 0.00 N (K216) c C (T428) e 0.43 0.00 0.11 0.00 N (G234) e C (gap) 0.00 0.07 R 0.00 0.33 K (C89) c (T205) e 0.25 0.00 N (R126) c C (F335) e 0.04 0.00 N (R147) p C (I356) p N (R95) c C (D210) e 0.41 0.00 R 0.18 0.00 K N (R174) c C (A386) e N:N-terminal domain C:C-terminal domain c: cytosolic region e: extracellular region p: pore surface extracellular ++++++ cytoplasmic What we have learned from the study is … Different glasses are required to see different constrains. � But, how to select a proper glasses ? Ordinarily, we don’t have any prior knowledge about constraints. � Is it possible to make flexible glasses for any constraints? Outline 1. 2. 3. イントロダクション 膜タンパク質のトポロジー反転 ケモカイン受容体 ケモカイン受容体 GPCRs • Membrane proteins • Bind neurotransmitters (physiologically active peptides, amines, nucleic acids, etc). • Ligand binding to GPCRs causes their conformation changes. • It leads to several signal transductions conjugated with trimeric G-proteins. GPCRs • About 1000 genes in human genome • Target for ~45% of clinically marketed drugs • Divided into 5 classes based on sequence similarity (Class A-E, the other) • Atomically resolved structure in class A GPCR: Bovine Rhodopsin デコイ受容体 リガンド結合能 ○ ○ △ シグナリング能 × ○ デコイ受容体 ○ ケモカイン受容体 ウイルス性受容体 機能的制約の違いを配列比較で検出できるのでは? リガンド結合からシグナリングにいたるパスウェイの解明へ � Evaluation of difference between two domains at each alignment site Amino acid residue frequency at the alignment site i in the chmokine receptor Site 1 . . . Site i . . . Ala Arg 0.05 0.03 Trp 0.01 Site L Protein 1 Protein 2 Protein 3 ��� Protein M chemokine receptor � Protein 1 Protein 2 Protein 3 ��� Protein N group of decoy or viral Kulback-Leibler information between the two groups calculated at each alignment site. receptor � Ala Arg Amino acid residue frequency 0.02 at the alignment site i in the a group of decoy or viral receptor 0.05 Trp 0.03 The difference between two probability distributions can be quantitatively evaluated with Kulback-Leibler information (KLI). (1) Definition of KLI � p(1)+p(2)+p(3)+ . . . +p(20)=1.0 20 Σ p(i) log i=1 p(i) q(i) (2) KLI representing the deviation of p from q is different from that of q from p. … 20 Σ p(i) log i=1 p(i) q(i) 20 = Σ q(i) log i=1 q(i) p(i) (3) Modified KLI is used in this study. 20 … Σ p(i) log i=1 q(1)+q(2)+q(3)+ . . . +q(20)=1.0 p(i) q(i) + 20 Σ q(i) log i=1 q(i) p(i) デコイ受容体 ウイルス性受容体 Conclusion 重複遺伝子の機能差を調べることで モチーフを調べること以上の機能的 情報を得ることがきでる。 分子進化に基づく生体機能の解析 Protein Informagics Molecular Evolutionary Genetics Bioinformatics Information Science Developmental Biology Genomics Molecular Biology Statistics Structural Biology Physics 共同研究者 1. 膜タンパク質のトポロジー反転 市原寿子 かずさDNA研究所 大安裕美 大阪大学 2. ケモカイン受容体 大安裕美 大阪大学 根本 航 東京電機大学
© Copyright 2024 ExpyDoc