Đăng ký Đăng nhập
Trang chủ Giáo dục - Đào tạo Cao đẳng - Đại học Y dược Lập chỉ mục cơ sở dữ liệu cấu trúc protein...

Tài liệu Lập chỉ mục cơ sở dữ liệu cấu trúc protein

.PDF
15
216
102

Mô tả:

LҰP C Hӌ M Ө&&Ѫ6Ӣ DӲ L IӊU CҨU T R Ú C PR O T E I N Phan MҥQK7Kѭӡng1, L âm T hӏ Hoà Bình 1ĈһQJ1Kѭ7RjQ1ĈRjQ7KLӋn M inh1 T rҫQ9ăQ/ăQJ2 1 Khoa Công ngh͏ WK{QJWLQ7U˱ͥQJĈ̩ i h͕c L̩c H ͛ng 10 HuǤQK9ăQ1JKӋ%LrQ+zDĈӗng Nai {thuong,binh,dangnhutoan,dtminh}@lhu.edu.vn 2 9L͏Q.KRDK͕FYj&{QJQJK͏9L͏W1DP 0ҥFĈƭQK&KL4XұQ73+ӗ&Kt0LQK [email protected] 7yP WҳW 7uP NLӃP Vӵ WѭѫQJ ÿӗQJ YӅ FҩX WU~F EұF ED FӫD FiF SURWHLQ WURQJFѫ Vӣ GӳOLӋXFҩXWU~F SURWHLQOӟQOj PӝWEjLWRiQSKӭFWҥSYjÿzL KӓLQKLӅX WKӡLJLDQ[ӱOê6ӕOѭӧQJFiFFҩXWU~FSURWHLQÿѭӧFNKiPSKi QJj\ FjQJ JLD WăQJ QKDQK FKyQJ Yj WURQJ FiF Fѫ Vӣ Gӳ OLӋX YӅ FҩX WU~F SURWHLQ YLӋF OұS FKӍ PөF FKR FiF SURWHLQ VӁ JL~S WKDR WiF WuP NLӃP VR ViQK FҩX WU~F WKӵF KLӋQ QKDQK KѫQ Yj KLӋX TXҧ KѫQ 7Uong bài báo này WUuQKEj\PӝWSKѭѫQJSKiSOұSFKӍPөFFKRFѫVӣGӳOLӋXFҩXWU~FSURWHLQ WK{QJ TXD YLӋF SKkQ WtFK FҩX WU~F Wӯ ÿy U~W UD YHFWRU ÿһF WUѭQJ Yj [k\ GӵQJ PӝWFҩXWU~FFk\GӵD trên các YHFWRUÿһFWUѭQJÿӇOұSFKӍPөFFKR FҩX WU~F SURWHLQ 9ӟL Fѫ Vӣ Gӳ OLӋX ÿm ÿѭӧF OұS FKӍ PөF YLӋF WuP NLӃP PӝW FҩX WU~F SURWHLQ KRһF PӝW FҩX WU~F FRQ WURQJ SURWHLQ WUӣ QrQ QKDQK FKyQJYjFKtQK[iFKѫQ 7ӯNKRi&ҩXWU~FSURWHLQEұFEDOұSFKӍPөFFѫVӣGӳOLӋXSURWHLQ. 1. Ĉһt Yҩn ÿӅ Protein là mӝt chuӛi polypeptLGHÿѭӧc tҥo thành tӯ các axít amin. Nghiên cӭu SURWHLQÿyQJYDLWUzTXDQWUӑng, vì chúng hoҥWÿӝng trong tҩt cҧ các quá trình sinh hӑc, bao gӗm cҧ xúc tác enzym (tҩt cҧ các phҧn ӭng hóa hӑc trong tӃ bào sӕQJÿѭӧc xúc tác 1     bӣi enzyme protein), vұn chuyӇn các chҩWNKiFQKDXQKѭGѭӥQJNKtFiFLRQ«, và tín hiӋu. ĈӇ hiӇXÿѭӧc mӕi quan hӋ giӳa cҩu trúc và chӭFQăQJ cӫa protein, các nhà nghiên cӭu cҫn phҧi lҩy tӯ Fѫ Vӣ dӳ liӋu cҩu trúc protein và phân loҥi chúng thành các hӑ protein khác nhau.VҩQ ÿӅ quan trӑng trong viӋc gom nhóm các protein dӵa trên sӵ WѭѫQJÿӗng cҩu trúc nhҵm mөc tiêu: o Phát hiӋn các mӕi quan hӋ tiӃn hóa o Xác ÿӏQKFiFPRWLI ÿRҥn lһp), là nhӳng cҩXWU~Fÿѭӧc hình thành bӣi sӵ sҳp xӃp cӫa các axit amin trong không gian ba chiӅu o Phát hiӋn mӕi quan hӋ giӳa cҩu trúc và chӭFQăQJFӫa protein o Hӛ trӧ trong viӋc thiӃt kӃ thuӕc trӏ bӋnh o Phát hiӋn các trình tӵ FyOLrQTXDQÿӃn bӋQKXQJWKѭYjFiFEӋnh khác. 9ӟLVӵÿәLPӟLF{QJQJKӋYjSKiWWULӇQQKDQKFKyQJFӫD các SKѭѫQJSKiSxác ÿӏQKFҩXWU~FSURWHLQQKѭSKѭѫQJSKiS;-quang WLQKWKӇ, NӻWKXұWSKkQWtFKTXDQJSKә NMR«PӝWVӕOѭӧQJOӟQ FiFFҩXWU~F FKLӅX FӫD FiFSKkQWӱSURWHLQ PӟLÿm ÿѭӧF[iF ÿӏQK &iFFҩXWU~F Qj\KLӋQÿDQJÿѭӧFOѭXWUӳWҥLQKLӅXFѫVӣGӳOLӋXWUrQLQWHUQHWYj FXQJFҩSPLӉQSKtFKRFiFQKjQJKLrQFӭXFyWKӇNӇÿӃQ o Ngân hàng dӳ liӋu protein PDB [1] (Protein Data Bank) thuӝc phòng thí nghiӋm RCSB (Research Collaboratory for Structural Bioinformatics): bao gӗm 73153 cҩu trúc o SCOP Structural Classification of Proteins [2]: bao gӗm 38221 cҩu trúc o CATH Protein Structure Classification [3]: bao gӗm 104238 cҩu trúc o ModBase Database of Comparative Protein Structure Models (Sali Lab, UCSF): bao gӗm 41140 cҩu trúc 7uPNLӃPVӵWѭѫQJÿӗQJYӅFҩXWU~F EұFba FӫDPӝWSURWHLQKRһFPӝWFҩXWU~F con cӫDprotein EҩWNǤtrong FѫVӣGӳOLӋXFҩXWU~FSURWHLQngày càng OӟQ OjPӝWQKLӋP YөNKyNKăQYjWӕQWKӡLJLDQ9uYұ\ FiFQKjVLQKKӑFÿDQJFҫQPӝWSKѭѫQJWLӋQÿӇWuP NLӃPFѫVӣGӳOLӋXFҩXWU~FSURWHLQQKDQKFKyQJ YjKLӋXTXҧWѭѫQJWӵQKѭFiFK%/$67 [5] WuPNLӃP trong FѫVӣGӳOLӋXWUuQKWӵ %jLWRiQWuPNLӃPYjSKkQORҥLSURWHLQWKѭӡQJ WUҧLTXDKDLJLDLÿRҥQU~WWUtFKÿһFWUѭQJP{WҧFKRSURWHLQ YjÿRVӵJLӕQJQKDXYӅÿһF WUѭQJFӫDFiFSURWHLQÿӇSKkQORҥLFK~QJ 2     ĈӇ WKӵF KLӋQ U~W WUtFK ÿһF WUѭQJ FӫD Fҩu trúc protein Fy UҩW QKLӅX WKXұW WRiQ, WKXұWWRiQ&766>6@[ҩS[ӍFҩXWU~FFiF&Į[ѭѫQJVӕQJFӫDSURWHLQ EҵQJ PӝWÿѭӡQJ VSOLQHPӏQYӟLÿӝFRQJWӕLWKLӇXVDXÿyOѭXWUӳÿѭӡQJFRQJJyF[RҳQYjFҩXWU~FEұF KDLFӫDPӛLQJX\rQWӱ&ĮWURQJPӝWPөFFKӍVӕGӵD WUrQSKpSEăP ProGreSS [5@OjPӝWSKѭѫQJSKiS PӟL, WKӵFKLӋQU~WWUtFKÿһFWUѭQJWӯFҩXWU~F NӃWKӧSYӟLWUuQKWӵWK{QJTXDPӝWFӱDVәWUѭӧWWUrQFҩXWU~F[ѭѫQJVӕQJFӫDSURWHLQ ĈһFWUѭQJYӅFҩXWU~FFӫDQyWѭѫQJWӵQKѭFiFÿһFWUѭQJU~WUDWӯ&766 ÿӝcong, góc [RҳQYjWK{QJWLQFҩXWU~FEұFKDL FiFFKXӛLÿһFWUѭQJÿѭӧFWtQKWRiQWӯYLӋFVӱGөQJ PD WUұQ ÿLӇP QKѭ 3$0 KRһF %/2680 *LӕQJ QKѭ &766 FiF ÿһF WUѭQJ U~W UD Wӯ ProGreSS NK{QJSKҧLOjÿһFWUѭQJFөFEӝ 7KXұWtoán PSIST[7] OjPӝWWURQJVӕFiFWKXұWWRiQKLӋXTXҧYuFyÿӝFKtQK[iF WѭѫQJÿӕLFDR, ciFKWLӃSFұQFӫD WKXұWWRiQ36,67 là ELӃQÿәLFiFWK{QJWLQFҩXWU~FFөF EӝFӫDPӝWSURWHLQWKjQKPӝWWUuQKWӵ" YjGӵDtrên WұSFiF³WUuQKWӵ´ÿy [k\GӵQJPӝW cây KұX WӕSKөFYөFKRYLӋFWuPNLӃP6RYӟi cách rút trích FiFÿһFWUѭQJFөFEӝWӯPӝW axit amin GX\QKҩWthì cách rút trích ÿһFWUѭQJWKHRFӱDVәWUѭӧWWURQJKѭӟQJWLӃSFұQ FӫD WKXұWWRiQ36,67 OjWӕWKѫQYuYHFWRUÿһFWUѭQJKjPFKӭD FҧKDLWK{QJWLQWӏQKWLӃQ và xoay ӣ ErQ WURQJ Sau khi các veFWѫ ÿһF WUѭQJ ÿѭӧF FKXҭQ KyD FҩX WU~F SURWHLQ ÿѭӧFFKX\ӇQWKjQKPӝWFKXӛL JӑLOjWUuQKWӵÿһFWUѭQJ-FҩXWU~F FӫDFiFNêKLӋXÿѭӧF UӡLUҥFKRi. Tuy nhiên viӋc tìm kiӃm trên cây hұu tӕ thӵc sӵ FKѭDÿҥt hiӋu quҧ cao vӅ tӕFÿӝ, thuұt toán PSISA[8] sӱ dөng hѭӟng tiӃp cұQWUtFKYHFWRUÿһFWUѭQJJLӕQJ36,67QKѭQJ thay vì dùng cây hұu tӕ thì thuұt toán này sӱ dөng mҧng hұu tӕ WURQJ SKѭѫQJ SKiS ÿiQK FKӍ mөc nhҵP WăQJ WӕF ÿӝ tìm kiӃm. KӃt quҧ thӵc nghiӋm trong PSISA chӍ ra rҵQJÿiQKFKӍ mөc bҵng mҧng hұu tӕ giúp WăQJtӕFÿӝ tìm kiӃPQKѭQJÿӗng thӡLFNJQJ OjPJLDWăQJkhҧ QăQJVӱ dөng bӝ nhӟ vӟi hӋ sӕ OrQÿӃQKѫQVRYӟi cây hұu tӕ QKѭ trong PSIST. 7URQJEjLEiRQj\WUuQKEj\ PӝWSKѭѫQJSKiS OұSFKӍ PөFFKRFѫ VӣGӳOLӋX FҩXWU~FSURWHLQWK{QJTXDYLӋFNӃWKӯD WKXұWWRiQ36,67 ÿӇ U~WUDYHFWRUÿһFWUѭQJYj WӯWұSFiFYHFWRUÿһFWUѭQJEjLEiRÿӅ[XҩW[k\GӵQJPӝWFҩXWU~FFk\FKӍPөF GӵDWUrQ YLӋFJKpSQKiQKFiFFKXӛLYHFWRUÿһFWUѭQJFҩXWU~FFk\Qj\YӯDJL~SKҥQFKӃYLӋFVӱ GөQJEӝQKӟYjYӯDFKRSKpSWuPNLӃPWUrQNK{QJJLDQFӫDWRjQEӝFiFFҩXWU~FWKXӝF 3     FiFKӑSURWHLQNKiFQKDX, ÿLӅXQj\JL~SFKR YLӋFWuPNLӃPPӝWFҩXWU~FSURWHLQKRһF PӝWWLӇXFҩXWU~FWURQJSURWHLQWUӣQrQQKDQKFKyQJYjFKtQK[iFKѫQ &iFQӝLGXQJ FzQOҥLFӫDEjLEiRÿѭӧF WUuQKEj\QKѭVau: SKҫQWKӭKDLWUuQKEj\ SKѭѫQJSKiSOұSFKӍPөFGӳOLӋXFҩXWU~FSURWHLQFiFKWKӭFU~WWUtFKYHFWRUÿһFWUѭQJ FKXҭQKyDYeFWRUÿһFWUѭQJFNJQJQKѭYLӋF[k\GӵQJFk\FKӍPөFSKҫQWKӭEDQrXOrQ PӝWVӕWKӱQJKLӋPWӯQJXӗQGӳOLӋXFҩXWU~FSURWHLQ YLӋF WUX\YҩQWUrQQJXӗQGӳOLӋX Qj\SKҫQFXӕLFQJWUuQKEj\PӝWVӕÿiQKJLiYjNӃWOXұQ 2. /ұSFKӍPөFGӳOLӋXFҩXWU~FSURWHLQ a) 5~WWUtFKYHFWRUÿһFWUѭQJ 0ӛLSURWHLQOjPӝWWәKӧSFӫDPӝWFKXӛLFyWKӭWӵFiFD[LWDPLQ UHVLGXH ÿѭӧF OLrQNӃWYӟLQKDXEӣLFiFOLrQNӃWSHSWLGH0ӛLUHVLGXHJӗPPӝW& D , các N và C khác. &KLӅXGjLFӫDOLrQNӃWJyFOLrQNӃWYjFiFJyF[RҳQKRjQWRjQ[iFÿӏQKFҩXWҥRYjKuQK KӑFFӫDSURWHLQ ĈӝGjLOLrQNӃWOjNKRҧQJFiFKJLӳDFiFQJX\rQWӱÿѭӧFQӕLNӃW ÿѭӧFWtQKEҵQJ o ÿѫQYӏ Amstrong ( A )YjJyFOLrQNӃWOjJyFJLӳDKDLOLrQNӃWFӝQJKRiWUӏFӫDFQJPӝW o QJX\rQWӱ9tGөÿӝGjLOLrQNӃWJLӳDFһSQJX\rQWӱ1-C là 1.33 A JyFOLrQNӃWJLӳD CD-N và N-C là 1220. Hình 1ĈӝGjLOLrQNӃWYjFiFJyFOLrQNӃWJLӳDFiFQJX\rQWӱ *yF[RҳQGQJÿӇP{WҧFiFFҩXWU~FFyWKӇ[RD\TXDQKFiFOLrQNӃW*LҧVӱWDFy EӕQ ngX\rQWӱÿѭӧFNӃWQӕLWK{QJTXDED OLrQNӃW%i-1, Bi và Bi+1WKuJyF[RҳQFӫDPӕL OLrQNӃW%i ÿѭӧFÿӏQKQJKƭDEҵQJJyFQKӓQKҩWFӫDFiFKuQKFKLӃX%i-1 và Bi+1 OrQPһW SKҷQJYX{QJJyFYӟL%i 4     Hình 2&iFJyF[RҳQI, M và Z JLӳDFiFQJX\rQWӱ ĈӇFKөSÿѭӧFFiFÿһFWUѭQJFөFEӝPӝWFiFKFKtQK[iFKѫQ FҫQSKҧLWUtFK[XҩW FiFÿһFWUѭQJWӯPӝWWұSFiFUHVLGXHFөFEӝĈӇWҥRUDYHFWRUÿһFWUѭQJFөFEӝÿҫXWLrQ P{WҧWӯQJUHVLGXHULrQJELӋWYj[iFÿӏQKVӵOLrQKӋJLӳDPӝWFһSUHVLGXHYjJLӳDPӝW o WұSFiFUHVLGXHYӟLQKDX9ӟLPӛLUHVLGXHÿӝGjLOLrQNӃWCD-N là 1.46 A OLrQNӃW&D-C o là 1.51 A YjJyFJLӳD&D-N và CD-C là 11601KѭYұ\WҩWFҧFiFWDPJLiFWҥRQrQWӯFiF QJX\rQWӱ1-CD-&FӫDPӛLUHVLGXHOjWѭѫQJÿѭѫQJQKѭQKDXYjPӛLUHVLGXHFyWKӇÿҥL GLӋQEӣLPӝWWDPJLiF .KRҧQJ FiFK G JLӳD PӝW FһS UHVLGXH ÿѭӧF [iF ÿӏQK GӵD WUrQ NKRҧQJ FiFK EXFOLGH JLӳD KDL QJX\rQ Wӱ &D FӫD FK~QJ &{QJ WKӭF   ÿѭӧF Vӱ GөQJ ÿӇ WtQK WRiQ NKRҧQJFiFKJLӳDhai residue (1) Góc T JLӳDPӝWFһSUHVLGXHÿѭӧF[iFÿӏQKEҵQJJyFJLӳDKDLPһWSKҷQJWҥRQrQ Wӯba QJX\rQWӱ1-CD-&FӫDPӛLUHVLGXH Hình 3. .KRҧQJFiFKYjJyFJLӳDKDLUHVLGXH 5     .KRҧQJ FiFK Yj JyF Oj EҩW ELӃQ ÿӕL YӟL SKpS GӏFK FKX\ӇQ Yj [RD\ SURWHLQ .KRҧQJ FiFK (XFOLGH JLӳD hai QJX\rQ Wӱ &D ÿѭӧF WtQK WUӵF WLӃS Wӯ FiF WRҥ ÿӝ WURQJ không gian ba FKLӅXFӫDFK~QJ*yFJLӳDKDLPһWSKҷQJWҥRQrQWӯEӝED ngu\rQWӱ1CD-&ÿѭӧFWtQKWRiQGӵDWUrQJyFFӫDFһSYHFWRUSKiSWX\ӃQFyJӕF[XҩWSKiWWӯQJX\rQ Wӱ&D FӫDPӛLPһWSKҷQJ9HFWRUSKiSWX\ӃQQj\ÿѭӧFWtQKEӣLF{QJWKӭF (2) (2) *yFJLӳDKDLYHFWRUSKiSWX\ӃQQYjQÿѭӧFWtQKWKHRF{QJWKӭF (3) (3) ĈӇ P{Wҧ FiF ÿһFWUѭQJFөF EӝWӯPӝWWұSFiFUHVLGXH QKyP WiF JLҧ GQJ PӝW FӱD Vә Fy NtFK WKѭӟF Z WUѭӧW TXD WUrQ FKXӛL & D [ѭѫQJ VӕQJ FӫD SURWHLQ &iF NKRҧQJ FiFKYjFiFJyFJLӳDUHVLGXHÿҫXWLrQYjFiFUHVLGXHFzQOҥLWURQJFӱDVәVӁÿѭӧFWtQK toán và thêm vào vHFWRUÿһFWUѭQJ, mӛLFӱDVәӭQJYӟLPӝWYHFWRUÿһFWUѭQJ. &KRWұS3 ^S1,p2,..pn`ÿҥLGLӋQFKRPӝWSURWHLQWURQJÿyS i OjUHVLGXHWKӭLWURQJ FҩX WU~F [ѭѫQJ VӕQJ FӫD SURWHLQ 9HFWRU ÿһF WUѭQJ FӫD SURWHLQ ÿѭӧF ÿӏQK QJKƭD Oj Pv={pv1, pv2« pvn-w+1}, trong ÿyZOjÿӝUӝQJFӱDVәWUѭӧWYjS vi OjYHFWRUÿһFWUѭQJFy pvi=(d(pi,pi+1 FRVș Si,pi+1),..., d(pi,pLZí), FRVș Si,pLZí)) YӟLG Si, pj OjNKRҧQJFiFKJLӳDKDL UHVLGXHWKӭLYjMYjFRVș Si,pj FKREӣLJyFJLӳDhai UHVLGXH9ӟLFӱDVәFyNtFKWKѭӟFZ WKuFKLӅXFӫDPӛLYHFWRUÿһFWUѭQJSvi là 2(w-1). b) C huҭQKRiYHFWRUÿһFWUѭQJ 'RFiFYHFWRUÿһFWUѭQJFKӭDFiFWK{QJWLQYӅNKRҧQJFiFKYjJyFOLrQNӃWYӟL ÿѫQYӏÿROѭӡQJNKiFQKDXQrQFҫQSKҧLÿѭӧFFKXҭQKRi7KrPQӳDYLӋFFKXҭQKRiVӁ JL~SKҥQFKӃEӟWPLӅQJLiWUӏFӫDFiFWKjQKSKҫQWURQJYHFWRUÿһFWUѭQJ*yFș WKXӝF SKҥPYL>ʌ@YuYұ\FRVș‫[ א‬-1, 1]. ĈӇFKXҭQKyDNKRҧQJFiFKFK~QJWDFҫQSKҧLELӃW FұQWUrQ YӅNKRҧQJFiFKJLӳDresidue WKӭL YjUHVLGXHWKӭ (i+w-1) trong protein. 7ҩWFҧFiFNKRҧQJFiFKYjFiFJyFÿӅXÿѭӧFFKXҭQKRiYjÿѭDYӅPӝWVӕQJX\rQ WURQJNKRҧQJ>E-1] YӟLEOjPӝWWKDPVӕ FKRWUѭӟF. 0ӛLNKRҧQJFiFKGWURQJYHFWRUÿһFWUѭQJVӁÿѭӧFFKXҭQKRiWKHRc{QJWKӭF(4) 6     d= « » d *b « 4.025 * ( w  1) » (4) ¬ ¼ WURQJF{QJWKӭF  JLiWUӏKҵQJVӕ5 OjNKRҧQJFiFKWUXQJEuQKJLӳDKDLQJX\rQWӱ CD , và ZOjÿӝUӝQJFӱDVәWUѭӧW &iFJyFWURQJYHFWRUÿһFWUѭQJVӁÿѭӧFFKXҭQKRiWKHRF{QJWKӭF(5) cos T = « (cos T  1) * b » «¬ »¼ (5) 2 6DXNKLFKXҭQKRiFҩXWU~FSURWHLQVӁÿѭӧFELӇXGLӉQEҵQJPӝWFKXӛL³WUuQKWӵ´ FiFJLiWUӏUӡLUҥFWKHRFiF YHFWRUÿһFWUѭQJWURQJÿyYHFWRUWKӭLELӇXGLӉQÿһFWUѭQJ FӫDUHVLGXHWKӭLWURQJFKXӛL[ѭѫQJVӕQJFӫDSURWHLQ c) X ây dӵng cây chӍ mөc ĈӇ WLӃQKjQKOұSFKӍ PөF FKRWұSGӳOLӋXFҩXWU~FSURWHLQEjLEiRÿӅ[XҩW[k\ GӵQJPӝWFҩXWU~FFk\QKLӅXQKiQKWKHR WKXұWWRiQQKѭWURQJKuQK. ĈҫXWLrQWKXұWWRiQVӁÿӑFGӳOLӋXFҩXWU~FFӫDWӯQJSURWHLQWURQJFѫVӣGӳOLӋX VDXÿyWLӃQKjQKU~WWUtFKÿһFWUѭQJGӵDWKHRWKXұWWRiQÿmWUuQKEj\ QKҵP³WUuQKWӵ´KRi FҩXWU~FEDFKLӅXFӫD PӛLSURWHLQEҵQJPӝWWұSFiFYHFWRUÿһFWUѭQJӭQJYӟLFҩXWU~F [ѭѫQJ VӕQJ FӫD Qy 6DX NKL FKXҭQ KRi FiF YHFWRU ÿһF WUѭQJ PӛL ³WUuQK Wӵ´ FҩX WU~F SURWHLQVӁÿѭӧF WKrPYjRWURQJFk\FKӍPөFÿӇSKөFYөFKRYLӋFWUDFӭX. Hình 4. 7KXұWWRiQWҥRFk\FKӍPөFGӵDWUrQÿһFWUѭQJFҩXWU~FFӫDSURWHLQ. 7     9tGө;k\GӵQJFk\FKӍPөFWӯWұSJӗPViX FҩXWU~FSURWHLQÿmWUuQKWӵKRiӣ ÿk\PӛLWUuQKWӵSURWHLQÿѭӧFELӇXGLӉQEӣLPӝWWұS FiFNêWӵPӛLNêWӵӭQJYӟLPӝW YHFWRUÿһFWUѭQJÿmÿѭӧFFKXҭQKRi P1={a,b,d,f,a,h}; P2={b,a,d,b,d}; P3={a,b,c,b,d,s,f}; P4={c,a,b,a,b,c}; P5={c,a,b,c,c,b}; P6={a,c,b,a,d}; .ӃWTXҧVӁÿѭӧFFҩXWU~FFk\QKѭKuQK Hình 5. Cây FKӍPөFGӵDWUrQÿһFWUѭQJFҩXWU~FFӫDcác protein. d) T ruy vҩn dӳ liӋu trên cây chӍ mөc &KRPӝWWUX\YҩQ4WUѭӟFWLrQcác vector ÿһFWUѭQJFӫDFҩXWU~F4VӁÿѭӧFtrích [XҩWYjFKX\ӇQÿәLWKjQKPӝWFKXӛL³WUuQKWӵ´QKѭP{WҧWURQJPөFD và 2b6DXÿy vLӋFWUDFӭXVӁÿѭӧFWKӵFKLӋQ TXDEDJLDLÿRҥQWuPNLӃP[ӃSKҥQJYj FKӑQWӕLѭX. Giai ÿRҥQ WuP NLӃP WKӕQJ Nr các FҩX WU~F WURQJ Fѫ Vӣ Gӳ OLӋX SK KӧS YӟL Q theo PӝW QJѭӥQJ NKRҧQJ FiFK H JLӳD FiF YHFWRU JLDL ÿRҥQ WKӭ Kai [ӃS KҥQJ WҩW Fҧ FiF SURWHLQ FKӭD FKXӛL SK KӧS WuP WKҩ\, và JLDL ÿRҥQ sau cùng Vӱ GөQJ WKXұW WRiQ SmithWaterman[9@ÿӇWuPNLӃPFҩXWU~FWѭѫQJÿӗQJFөFEӝ WӕWQKҩW GӵDWUrQWUX\YҩQQ và WұSJӗPFiFSURWHLQÿѭӧFOӵDFKӑQ. 7KXұWWRiQ WuPNLӃP PүXWUX\YҩQ Q trên FҩXWU~FFk\FKӍ PөF ÿѭӧc trình bày QKѭVDX InputÿRҥQFҩXWU~FSURWHLQ4QJѭӥQJVRNKӟSQKӓQKҩWH Output7ұSFiFFҩXWU~FSURWHLQWKRҧÿLӅXNLӋQWuPNLӃPÿѭӧFVҳS[ӃSWKHRVӕ OѭӧQJUHVLGXHVRNKӟSJLҧPGҫQ F unction Search WUHH5RRWPͱFLFKX͟LWUX\Y̭Q4QJ˱ͩQJH ){ While (i  FKL͉XFDRFk\ - ÿ͡GjLFKX͟L4 ^ - *RPQKiQKWKHRPͱFL - )RUHDFKQRGHW̩LPͱFL 8     o 1͇X QRGH1>M@WUQJNKͣSYͣL4 [0]) ƒ )RU HDFK QKiQK FRQ FͯD 1>M@ 1͇X VR NKͣS YͣL SK̯Q FzQO̩LFͯDFKX͟L4WKR̫QJ˱ͩQJH thì: ƒ o x 7KrPQKiQKYjRW̵SN͇WTX̫ x /R̩LQKiQKNK͗L cây Return Search (Root, i +1, Q[0], H); 1J˱ͫFO̩L ƒ Return Search (N[j], i +1, Q[i+1], H); } end while }end function )XQFWLRQ4XHU\ WUHH5RRWP̳XWUX\Y̭Q4WRSNP̳XF̯QFK͕QQJ˱ͩQJH){ - .KͧLW̩RW̵SN͇WTX̫U͟QJ - 5~WWUtFKÿ̿FWU˱QJYjW̩RFKX͟LWUuQKW͹F̭XWU~FFKRWUX\Y̭Q4 - ;k\G͹QJFk\FK͑PͭF - Search (Root, i =0, Q, H); - 6̷S[͇SW̵SN͇WTX̫ JL̫PG̯QWKHRV͙O˱ͫQJVRNKͣS m ; - &K͕QNP̳XW͙WQK̭WWURQJW̵SN͇WTX̫YjiSGͭQJWKX̵WWRiQ6PLWK -Waterman WuPV̷SKjQJF̭XWU~FFͭFE͡W͙WQK̭W }end function 9tGө: 7uPNLӃPPүXWUX\YҩQ4 ^EFGE`trên cây FKӍPөFWӯWұSFiFFҩXWU~FSURWHin ÿmWUuQKWӵKRi YӟL QJѭӥQJH=3. 7ұSJӗPP1={a,b,d,f,a,h}; P2={b,a,d,b,c}; P3={a,b,c,d,b,s,f}; P4={c,b,c,a,b,c}; P5={c,b,c,c,d,b}; P6={a,c,b,a,d} x TUX\YҩQWҥLPӭFJӕF PӭF Æ 7ұSNӃWTXҧ ^P2 (VӕVRNKӟSP )} 9     x TUX\YҩQWҥLPӭF1Æ 7ұSNӃWTXҧ ^P4 (VӕVRNKӟSP ), P3 (m=4)} x Truy YҩQWҥLPӭF2Æ 7ұSNӃWTXҧ ^P5 (VӕVRNKӟSP )} 3. 0ӝWVӕNӃWTXҧWKӱQJKLӋP a) C ác nguӗn dӳ liӋu cҩu trúc protein &iF FҩX WU~F SURWHLQ EұF ED ÿѭӧF OѭX WUӳ QKLӅX WҥL QJkQ KjQJ Gӳ OLӋX 3URWHLQ (PDB ± Protein Data Bank >@ÿyOj NKROѭXWUӳFKtQKFKRWKӵFQJKLӋP[iFÿӏQK FҩX trúc EұF ED FӫD Protein. Ngân hàng PDB ÿѭӧF WҥR UD YjR QăP  WҥL 3KzQJ WKt QJKLӋPTXӕFJLD%URRNKDYHQ %1/ ӣ0ӻ1KӳQJFҩXWU~FÿѭӧF [iFÿӏQKQKӡ VӱGөQJ SKѭѫQJSKiSWLQKWKӇKӑF+LӋQ QD\FyKѫQ 73153 FҩXWU~FSURWHLQWURQJNKROѭXWUӳWҥL PDB và KjQJQăP có KѫQF{QJWUuQKPӟLÿѭӧFOѭXWUӳ &iF SURWHLQ WURQJ Fѫ Vӣ Gӳ OLӋX 6&23 >@ ÿѭӧF Wә FKӭF WҥL 3KzQJ WKt QJKLӋP 6LQKKӑF3KkQWӱFӫD+ӝLÿӗQJ1JKLrQFӭX<NKRD 05& ӣ&DPEULGJH$QKP{Wҧ FiFPӕLTXDQKӋFҩXWU~FYjWLӃQKyDJLӳDFiFFҩXWU~FSURWHLQÿmÿѭӧFELӃWÿӃQ. SCOP ÿmÿѭӧFFKҩSQKұQOjSKKӧSQKҩWYjSKkQORҥLFiFWұSGӳOLӋXÿiQJWLQFұ\QKҩWGR WKӵFWӃUҵQJ6&23[k\GӵQJTX\ӃWÿӏQKSKkQORҥLFӫDQyGӵDWUrQQKӳQJTXDQViWWUӵF TXDQFiF\ӃXWӕFҩXWU~FFӫDSURWHLQGRFiFFKX\rQJLDWKӵFKLӋQ3URWHLQÿѭӧFSKkQORҥL PӝWFiFKFyWKӭEұFSKҧQiQKPӕLTXDQKӋFӫDFK~QJYӅFҩXWU~FYjWLӃQKyD&iFFҩS FKtQKFӫD KӋ WKӕQJSKkQFҩSOjKӑJLDÿuQK IDPLO\  GӵDWUrQFiFPӕLTXDQKӋWLӃQ 10     KyDFӫDFiFSURWHLQ VLrXKӑ VXSHUIDPLO\  GӵDWUrQPӝWVӕÿһFÿLӇPFKXQJYӅFҩX WU~F YjJҩSFXӝQ IROG  GӵDWUrQFiF\ӃXWӕFҩXWU~FEұFKDL  &ѫVӣGӳOLӋX&$7+[3@ÿѭӧFWәFKӭFWҥLĈҥLKӑF8&//RQGRQKLӋQFy104238 cҩu trúc, VӱGөQJSKѭѫQJSKiSWӵÿӝQJÿӇSKkQORҥLSURWHLQ YjFNJQJFy QKӳQJÿyQJ JySFӫDFiFFKX\rQJLDNKLSKѭѫQJSKiSWӵÿӝQJNK{QJFKRNӃWTXҧÿiQJWLQFұ\&ѫVӣ Gӳ OLӋX &$7+ ÿѭӧF [k\ GӵQJ EҵQJ FiFK iS GөQJ F{QJ Fө VR ViQK FҩX WU~F EұF KDL SSAP 66$3 Vӱ GөQJ PӝW Nӻ WKXұW OұS WUuQK TX\ KRҥFK ÿӝQJ KDL OӟS ÿӇ VR NKӟS KDL protein và tìm ra FҩXWU~FOLrQNӃWWӕLѭXFӫa hai protein. Cѫ Vӣ Gӳ OLӋX FSSP [4@ ÿm ÿѭӧF WҥR UD WKHR SKѭѫQJ SKiS SKkQ ORҥL '$/, Yj ÿѭӧFWәFKӭFWҥL9LӋQ7LQVLQKKӑFFKkXÆX (%, 1yFXQJFҩSPӝWSKkQORҥLSKӭFWҥS FӫDFiFFҩXWU~FSURWHLQ6ӵWѭѫQJWӵJLӳDKDLSURWHLQÿѭӧF[iFÿӏQKGӵDWUrQFҩXWU~F EұFKDLFӫDFK~QJ9LӋFÿiQKJLiWӯQJFһSSURWHLQOjPӝWF{QJYLӋFWӕQWKӡLJLDQYuYұ\ YLӋFVRViQKJLӳDPӝWÿҥLSKkQWӱYjWҩWFҧFiFÿҥLSKkQWӱFӫDFiFFѫVӣGӳOLӋXFyWKӇ PҩWFҧQJj\'RÿyPӝWSURWHLQÿҥLGLӋQFKRPӛLOӟSÿѭӧF[iFÿӏQKYjPӛLSURWHLQPӟL FKӍSKҧLVRNKӟSYӟLSURWHLQÿҥLGLӋQFӫDWӯQJORҥL b) Tә chӭFOѭXWUӳ &iFFҩXWU~F EұFED FӫD SURWHLQWK{QJWKѭӡQJÿѭӧFOѭXWUӳWKHRFiF ÿӏQKGҥQJ QKѭ 00'% ³0ROHFXODU 0RGHOLQJ 'DWD%DQN´ ÿӏQK GҥQJ FKXҭQ P{ Wҧ WK{QJ WLQ FiF OLrQNӃWSHSWLGH PP&,)³&KHPLFDO,QWHUFKDQJH)RUPDW´ GҥQJFѫVӣGӳOLӋXTXDQKӋ  Yj3'%³3URWHLQ'DWD%DQN´ GҥQJFӝWYăQEҧQYӟLQKLӅXPөFWK{QJWLQWtFKKӧS  7URQJVӕFiFÿӏQKGҥQJQrXWUrQWKuÿӏQKGҥQJ3'%OjSKәELӃQKѫQFҧWURQJWұS WLQ3'%OѭXWUӳFiF WK{QJWLQYӅWRҥÿӝFӫDFiFQJX\rQWӱWURQJNK{QJJLDQEDFKLӅX WKHRKӋTX\FKLӃX(XFOLGHQJRjLUDFzQFyFiFWK{QJWLQYӅWiFJLҧFiFWKDPFKLӃXYj FiFNӃWTXҧWKӵFQJKLӋP[iFÿӏQKFҩXWU~FSURWHLQ 11     Hình 60ӝWSKҫQFҩXWU~FWұSWLQ3'% 1KyPWiFJLҧEjLEiRÿmWKӵFKLӋQ WKXWKұSFiFFҩXWU~FÿmÿѭӧFF{QJEӕWӯFiF QJXӗQ>@GѭӟLÿӏQKGҥQJ3'%YjWәFKӭFOѭXWUӳWURQJPӝWFѫVӣGӳOLӋXTXDQ KӋÿӇWKXұQWLӋQFKRYLӋFOұSFKӍPөFYjWUDFӭX0{KuQKFѫVӣGӳOLӋXTXDQKӋÿѭӧFÿӅ [XҩWQKѭWURQJKuQK. Hình 7. Mô huQKFѫVӣGӳOLӋXTXDQKӋOѭXWUӳWK{QJWLQFҩXWU~FSURWHLQ 12     c) Mӝt sӕ kӃt quҧ thӱ nghiӋm 'ѭӟL ÿk\ Oj PӝW Vӕ NӃW TXҧ WKӱ QJKLӋP 7ұS Gӳ OLӋX ' GQJ FKR WKӱ QJKLӋP ÿѭӧF U~W WUtFK Wӯ Fѫ Vӣ Gӳ OLӋX 6&23 >@ JӗP FiF SURWHLQ WKXӝF Fҧ EӕQ OӟS FXӝQ D, SKLӃQE, D+E và D/E7ұSGӳOLӋXEDRJӗPSURWHLQWKXӝFPӛL³VLrXKӑ´ VXSHUIDPLO\  WURQJWәQJVӕ³VLrXKӑ´FӫD6&23QKѭYұ\FyWәQJFӝQJSURWHLQ0үXWUX\ YҩQVӁÿѭӧFOҩ\QJүXQKLrQWӯWұSGӳOLӋX'WURQJFiFWKӱQJKLӋP&yWKDPVӕWURQJ FiFWKӱQJKLӋP JӗPZOjÿӝUӝQJFӱDVәEOjJLiWUӏFKXҭQKRiH QJѭӥQJNKRҧQJFiFK WӕLWKLӇXJLӳDKDLYHFWRUOOjÿӝGjLWӕLWKLӇXSKҧLÿҥWFӫDFKXӛLVRNKӟSOӟQQKҩWYjNOj Vӕ OѭӧQJ SURWHLQ ÿѭӧF Oҩ\ Wӯ WUrQ [XӕQJ WKHR ÿLӇP Vӕ 7KXұW WRiQ ÿѭӧF FjL ÿһW EҵQJ C++ và cKҥ\ WKӱ QJKLӋP WUrQ P{L WUѭӡQJ :LQGRZV YӟL FҩX KuQK Pi\ &38 'XDO 1.6GHz, RAM 2GB. 6ӕSURWHLQWKӇ KLӋQWURQJÿӗWKӏOj VӕWUXQJEuQKFiFSURWHLQWuP 6ӕSURWHLQWuPWKҩ\ 6ӕSURWHLQWuPWKҩ\ WKҩ\WURQJ³siêu Kӑ´ TXDFiFWKӱQJKLӋP. .tFKWKѭӟFFӱDVәZ   Hình 8 6ӕ SURWHLQ WuP WKҩ\ WURQJFQJVXSHUIDPLO\WKHRVӕ OѭӧQJNFXWRII (w=3, b=10, H=3 và l=10)   Hình 9 6ӕ SURWHLQ WuP WKҩ\ trong cùng superfamily theo NtFKWKѭӟFFӱDVәZ (b=10, H=3 và l=15)     6ӕSURWHLQWuPWKҩ\ 6ӕSURWHLQWuPWKҩ\ 6ӕOѭӧQJNFXWRII .KRҧQJFiFKH *LiWUӏFKXҭQKRiE   Hình 10 6ӕ SURWHLQ WuP WKҩ\ trong cùng superfamily theo QJѭӥQJNKRҧQJFiFKH (w=3, b=10, và l=15)   Hình 11 6ӕ SURWHLQ WuP WKҩ\ trong cùng superfamily theo JLiWUӏFKXҭQKRiE (w=3, H=2.5, và l=15)     13     d) ĈiQKJLi YjQKұQ[pW 7URQJKuQKFKRWKҩ\VӕSURWHLQWuPÿѭӧF trong cùng superfamily ÿһWÿѭӧFPӭF WUXQJEuQKNKRҧQJ YӟLVӕFXWRIIWӯÿӃQNӃWTXҧQj\FKRWKҩ\KLӋXTXҧWuP NLӃPJҫQWѭѫQJÿѭѫQJYӟL PSIST. .ӃWTXҧӣhình 9 FKRWKҩ\WKXұWWRiQKRҥWÿӝQJәQ ÿӏQK YӟLNtFKWKѭӟFFӱDVәNKRҧQJ Wӯ3 ÿӃQ QӃXYѭӧWTXDNKRҧQJQj\WKuKLӋXTXҧ JLҧPWKҩ\U}GRcác sai Vӕ SKiWVLQKWURQJTXiWUuQKU~WÿһFWUѭQJYjFKXҭQKRiYHFWRU. &yWKӇFҧLWKLӋQYҩQÿӅQj\EҵQJFiFKJLDWăQJJLiWUӏFKXҭQKRi QKѭNӃWTXҧWKӇKLӋQ WURQJKuQKWX\QKLrQYLӋFQj\VӁGүQÿӃQWăQJWKӡLJLDQ[ӱ OêYjNK{QJJLDQOѭXWUӳ các vector ÿһc WUѭQJ. .ӃW TXҧ FKR WKҩ\ KLӋX VXҩW FӫD WKXұW WRiQ JҫQ WѭѫQJ ÿѭѫQJ YӟL 36,67 Yj Fy SKҫQWӕWKѫQ3UR*UH66WX\QKLrQQӃX[pWYӅPһWOѭXWUӳWKuWKXұWWRiQ36,67FҫQQKLӅX NK{QJJLDQKѫQFKRFk\KұXWӕQӃXSKҧLFKҥ\WUrQWұSGӳOLӋXOӟQYjWKDRWiFWuPNLӃP FNJQJSKӭFWҥSKѫQQKѭQJ FyÿӝFKtQK[iFFDRKѫQWKXұWWRiQbài báo ÿӅ[XҩW 7KXұWWRiQÿӅ[XҩWFyQKӳQJÿLӇPWӕW ƒ &k\FKӍPөFÿѭӧF[k\GӵQJPӝWOҫQYjKLӋXFKӍQKQKLӅXOҫQWURQJTXi WUuQK WuP NLӃP. Ĉӝ SKӭF WҥS WuP NLӃP FKXӛL 4 ÿӝ GjL O WUrQ Fk\ FKӍ PөF FKLӅX FDR K Oj 2 k*(h-l)*b), k Oj Vӕ WUXQJ EuQK FiF QKiQK Fy WUQJJLiWUӏӣPӭFi, EOjVӕQKiQKWҥLJӕF. ƒ 9LӋF JӝS QKiQK NKL KLӋX FKӍQK Fk\ VӁ FKR SKpS WuP WKҩ\ FQJ O~F QKLӅXFҩXWU~FWKRҧWUX\YҩQQKiQKVDXNKLWuPWKҩ\ÿѭӧFORҥLEӓNKӓL câ\ÿӇJLҧPNK{QJ JLDQWuPNLӃPWUrQFiFPӭFFDRKѫQ. ƒ 7KXұWWRiQFKRSKpSWuPWUrQWRjQEӝNK{QJJLDQGӳOLӋXFҩXWU~F. 4. .ӃWOXұQ 7URQJEjLEiRQj\WUuQKEj\PӝWKѭӟQJWLӃSFұQWURQJYLӋFOұSFKӍPөFFKRFѫVӣ GӳOLӋXFҩXWU~F EұFED FӫD SURWHLQGӵDWUrQU~WWUtFKÿһFWUѭQJFӫD protein theo WKXұW WRiQ36,67YjÿӅ[XҩWWKXұWWRiQWuPNLӃPWUrQFҩXWU~FFk\FKӍPөF%jLEiRFNJQJWUuQK Ej\YӅFiFQJXӗQGӳOLӋXFҩXWU~FEұFEDFӫDSURWHLQÿӅ[XҩWP{KuQKFѫVӣGӳOLӋXFKR YLӋFOѭXWUӳSKөFYөWKDRWiFOұSFKӍPөFYjWUDFӭXWK{QJWLQFiFFҩXWUúc protein này. 'ӳOLӋXGQJFKRFiFWKӱQJKLӋPÿѭӧFU~WWUtFKWӯ³VLrXKӑ´FӫD6&23YjFiFNӃW TXҧFKRWKҩ\ÿӝ FKtQK[iFWѭѫQJÿӕLFDRYjKLӋXTXҧNKLiSGөQJFiFWKXұWWRiQÿӅ[XҩW WUrQGӳOLӋXWKӱQJKLӋP 14           dăŝůŝҵƵƚŚĂŵŬŚңŽ   [1] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. 6KLQG\DORYDQG3(%RXUQH³7KH3URWHLQ'DWD%DQN´1XFOHLF$FLGV5HVHDUFK vol. 28, 2000, pp. 235-242. [2@  $* 0XU]LQ 6( %UHQQHU 7 +XEEDUG DQG & &KRWKLD ³6FRS $ 6WUXFWXUDO Classification of Proteins Database for the Investigation of Sequences and 6WUXFWXUHV´-0RO%LROSS-540. [3] C.A. Orengo, A.D. Michie, D.T. Jones, M.B. Swindells, and J.M. Thornton, ³&$7+ - A Hierarchic Classification of Protein Domain SWUXFWXUHV´ 6WUXFWXUH vol. 5, no. 8, 1997, pp. 1093-1108. [4@  / +ROP DQG & 6DQGHU ³7KH )663 'DWDEDVH )ROG &ODVVLILFDWLRQ %DVHG RQ Structure - 6WUXFWXUH $OLJQPHQW RI 3URWHLQV´ 1XFOHLF $FLGV 5HVHDUFK YRO  1996, pp. 206-210. [5] Can T. Kahveci T. Singh A.K. , A. and Y.F Wang, ³Progress: Simultaneous searching of protein databases by sequence and structure´, Pacific Symp. Bioinformatics, pages 264±275, 2004. [6] T. Can and Y.Wang, ³&766 D UREXVW DQG HI¿FLHQW PHWKRG IRU protein structure alignment based on local geometrical and biological features´ IEEE Computer Society Bioinformatics Conference (CSB), pages 169±179, 2003. [7] Mohammed J. Zaki Feng Gao, ³PSIST: Indexing Protein Structures using Suffix Trees´ in IEEE Computational Systems Bioinformatics Conference, Palo Alto, CA, August 2005. [8] A. Salah Tarek F. Gharib and Abdel-Badeeh M.Salem, ³PSISA: an Algorithm for Indexing and Searching Protein Structure using Suffix Arrays´ In The WSEAS International Conference on Computers, pages 775±780, 2008. [9] F. Smith and M. Waterman, ³,GHQWL¿FDWLRQRIFRPPRQ molecular subsequences´ J. Mol. Biol., (147):195±197, 1981. 15    
- Xem thêm -

Tài liệu liên quan

Tài liệu vừa đăng