Multiple sequence alignment (MSA)

Similar documents
0 0 = 1 0 = 0 1 = = 1 1 = 0 0 = 1

Chapter 20 Cell Division Summary

Frequency Response (Bode Plot) with MATLAB

= lim(x + 1) lim x 1 x 1 (x 2 + 1) 2 (for the latter let y = x2 + 1) lim

邏輯設計 Hw#6 請於 6/13( 五 ) 下課前繳交

Algorithms and Complexity

Chapter 6. Series-Parallel Circuits ISU EE. C.Y. Lee

Differential Equations (DE)

Chapter 22 Lecture. Essential University Physics Richard Wolfson 2 nd Edition. Electric Potential 電位 Pearson Education, Inc.

國立中正大學八十一學年度應用數學研究所 碩士班研究生招生考試試題

相關分析. Scatter Diagram. Ch 13 線性迴歸與相關分析. Correlation Analysis. Correlation Analysis. Linear Regression And Correlation Analysis

Linear Regression. Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34

GSAS 安裝使用簡介 楊仲準中原大學物理系. Department of Physics, Chung Yuan Christian University

Chapter 1 Linear Regression with One Predictor Variable

tan θ(t) = 5 [3 points] And, we are given that d [1 points] Therefore, the velocity of the plane is dx [4 points] (km/min.) [2 points] (The other way)

Statistical Intervals and the Applications. Hsiuying Wang Institute of Statistics National Chiao Tung University Hsinchu, Taiwan

pseudo-code-2012.docx 2013/5/9

EXPERMENT 9. To determination of Quinine by fluorescence spectroscopy. Introduction

期中考前回顧 助教 : 王珊彗. Copyright 2009 Cengage Learning

生物統計教育訓練 - 課程. Introduction to equivalence, superior, inferior studies in RCT 謝宗成副教授慈濟大學醫學科學研究所. TEL: ext 2015

14-A Orthogonal and Dual Orthogonal Y = A X

在雲層閃光放電之前就開始提前釋放出離子是非常重要的因素 所有 FOREND 放電式避雷針都有離子加速裝置支援離子產生器 在產品設計時, 為增加電場更大範圍, 使用電極支援大氣離子化,

2019 年第 51 屆國際化學奧林匹亞競賽 國內初選筆試 - 選擇題答案卷

台灣大學開放式課程 有機化學乙 蔡蘊明教授 本著作除另有註明, 作者皆為蔡蘊明教授, 所有內容皆採用創用 CC 姓名標示 - 非商業使用 - 相同方式分享 3.0 台灣授權條款釋出

MECHANICS OF MATERIALS

KWUN TONG GOVERNMENT SECONDARY SCHOOL 觀塘官立中學 (Office) Shun Lee Estate Kwun Tong, Kowloon 上學期測驗

授課大綱 課號課程名稱選別開課系級學分 結果預視

命名, 構象分析及合成簡介 (Nomenclature, Conformational Analysis, and an Introduction to Synthesis)

統計學 Spring 2011 授課教師 : 統計系余清祥日期 :2011 年 3 月 22 日第十三章 : 變異數分析與實驗設計

Chapter 8 Lecture. Essential University Physics Richard Wolfson 2 nd Edition. Gravity 重力 Pearson Education, Inc. Slide 8-1

Ch.9 Liquids and Solids

Numbers and Fundamental Arithmetic

Candidates Performance in Paper I (Q1-4, )

2001 HG2, 2006 HI6, 2010 HI1

基因演算法 學習速成 南台科技大學電機系趙春棠講解

Advanced Engineering Mathematics 長榮大學科工系 105 級

Answers: ( HKMO Heat Events) Created by: Mr. Francis Hung Last updated: 23 November see the remark

HKDSE Chemistry Paper 2 Q.1 & Q.3

1 dx (5%) andˆ x dx converges. x2 +1 a

Lecture Notes on Propensity Score Matching

論文與專利寫作暨學術 倫理期末報告 班級 : 碩化一甲學號 :MA 姓名 : 林郡澤老師 : 黃常寧

Introduction to Bioinformatics Introduction to Bioinformatics

雷射原理. The Principle of Laser. 授課教授 : 林彥勝博士 Contents

KIRCHHOFF CURRENT LAW

Digital Image Processing

ApTutorGroup. SAT II Chemistry Guides: Test Basics Scoring, Timing, Number of Questions Points Minutes Questions (Multiple Choice)

Finite Interval( 有限區間 ) open interval ( a, closed interval [ ab, ] = { xa x b} half open( or half closed) interval. Infinite Interval( 無限區間 )

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

國立成功大學 航空太空工程學系 碩士論文 研究生 : 柯宗良 指導教授 : 楊憲東

大原利明 算法点竄指南 点竄術 算 額 絵馬堂

REAXYS NEW REAXYS. RAEXYS 教育訓練 PPT HOW YOU THINK HOW YOU WORK

心智科學大型研究設備共同使用服務計畫身體 心靈與文化整合影像研究中心. fmri 教育講習課程 I. Hands-on (2 nd level) Group Analysis to Factorial Design

Chapter 1 Physics and Measurement

Elementary Number Theory An Algebraic Apporach

Ch2. Atoms, Molecules and Ions

壓差式迴路式均熱片之研製 Fabrication of Pressure-Difference Loop Heat Spreader

Chapter 1 Linear Regression with One Predictor Variable

5.5 Using Entropy to Calculate the Natural Direction of a Process in an Isolated System

Digital Integrated Circuits Lecture 5: Logical Effort

Learning to Recommend with Location and Context

磁振影像原理與臨床研究應用 課程內容介紹 課程內容 參考書籍. Introduction of MRI course 磁振成像原理 ( 前 8 週 ) 射頻脈衝 組織對比 影像重建 脈衝波序 影像假影與安全 等

ANSYS 17 應用於半導體設備和製程的應用技術

CHAPTER 4. Thermochemistry ( 熱化學是熱力學的一支, 在化學反應或相變化過程中發生的能量吸收或釋出, 若以吸放熱的形式表現, 即為熱化學研究的對象 ) Chap. 4 Thermochemistry

Ch. 13 Carbonyl (1) Answers

Statistics and Econometrics I

107 年公務人員特種考試警察人員 一般警察人員考試及 107 年特種考試交通事業鐵路人員考試試題

Chapter 13. Enzyme Kinetics ( 動力學 ) and Specificity ( 特異性 專一性 ) Biochemistry by. Reginald Garrett and Charles Grisham

Sparse Learning Under Regularization Framework

Ph.D. Qualified Examination

Candidates Performance in Paper I (Q1-4, )

第 3 章有機化學反應種類及酸鹼有機反應. 一 ) 有機化反應的種類及有機反應機制 (organic reactions and their mechanism)

課程名稱 : 電路學 (2) 授課教師 : 楊武智 期 :96 學年度第 2 學期

原子模型 Atomic Model 有了正確的原子模型, 才會發明了雷射

A Direct Simulation Method for Continuous Variable Transmission with Component-wise Design Specifications

醫用磁振學 MRM 課程介紹與原理複習. Congratulations! Syllabus. From Basics to Bedside. You are HERE! License of Radiological Technologist 盧家鋒助理教授國立陽明大學生物醫學影像暨放射科學系

Using Bootstrap in Capture-Recapture Model

個體經濟學二. Ch10. Price taking firm. * Price taking firm: revenue = P(x) x = P x. profit = total revenur total cost

材料力學 Mechanics of Materials

Chapter 13. Chemical Kinetics. Fu-Yin Hsu

(i) Tangutorine 1 之全合成 : 經由關鍵的中間體 2; 化合物 2 具有. Aspidospermidine 39 之合成研究 : 經由螺環中間體 47

奈米微污染控制工作小組 協辦單位 台灣賽默飛世爾科技股份有限公司 報名方式 本參訪活動由郭啟文先生負責 報名信箱

Basic of Biomolecular NMR. & New Software in HFNMRC

MRDFG 的周期界的計算的提升計畫編號 :NSC E 執行期限 : 94 年 8 月 1 日至 94 年 7 月 31 日主持人 : 趙玉政治大學資管系計畫參與人員 :

Chemistry II Midterm Exam 20 April, 2012

Boundary Influence On The Entropy Of A Lozi-Type Map. Cellular Neural Networks : Defect Patterns And Stability

Study of Leaf Area as Functions of Age and Temperature in Rice (Oryza sativa L.) 1

Chapter 7. The Quantum- Mechanical Model of the Atom. Chapter 7 Lecture Lecture Presentation. Sherril Soman Grand Valley State University

國立交通大學 電子工程學系電子研究所碩士班 碩士論文

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Chapter 5-7 Errors, Random Errors, and Statistical Data in Chemical Analyses

高雄市立右昌國民中學 107 學年度第一學期第三次段考二年級英語科試題卷

Polybenzimidazole 及其奈米複合材 料薄膜在直接甲醇燃料電池的應用

統計學 ( 一 ) 第七章信賴區間估計 (Estimation Using Confidence Intervals) 授課教師 : 唐麗英教授 國立交通大學工業工程與管理學系聯絡電話 :(03)

Ch2 Linear Transformations and Matrices

Phylogeny: building the tree of life

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

d) There is a Web page that includes links to both Web page A and Web page B.

細胞生物學 : 葉綠體 粒線體 12/5/16 葉綠體與粒線體的生理功能 謝明勳副研究員 中央研究院植物暨微生物學研究所. Endosymbiosis and The Origin of Eukaryotes

Ph.D. Qualified Examination

Earth System Science Programme. Academic Counseling 2018

Transcription:

Multiple sequence alignment (MSA)

From pairwise to multiple A T _ A T C A... A _ C A T _ A... A T _ G C G _... A _ C G T _ A... A T C A C _ A... _ T C G A G A... Relationship of sequences (Tree)

NODE : a node represents a taxonomic unit. This can be a taxon (an existing species) or an ancestor (unknown species : represents the ancestor of 2 or more species). BRANCH : defines the relationship between the taxa in terms of descent and ancestry. TOPOLOGY : is the branching pattern. BRANCH LENGTH : often represents the number of changes that have occurred in that branch. ROOT : is the common ancestor of all taxa. https://users.ugent.be/~avierstr/principles/phylogeny.html DISTANCE SCALE : scale which represents the number of differences between sequences (e.g. 0.1 means 10 % differences between two sequences)

MSA is useful for bioinformatics Phylogenetic tree Motifs Structure prediction (RNA, protein) Gene Logo Conserved sequence elements

ClustalW procedure The progressive method e.g. ClustalW Step 1.) Pairwise alignments Step 2.) Build guide tree Step 3.) Progressive alignment guided by the tree

http://ai.stanford.edu/~chuongdo/papers/alignment_review.pdf

The BLOSUM62 matrix Step1. Pairwise alignments Pairwise sequence alignments Scoring matrix Gap penalties Global/Local alignments

Step 2. Build guide tree Neighbor-Joining Algorithm UPGMA 1 3 2 4

Neighbor-Joining Algorithm Step1: 準備三個 matrix: P T Q A B C D 0 + 8 + 4 +6 = 18 A B C D A 0 8 4 6 A 18 B 8 0 8 8 B 24 C 4 8 0 6 C 18 D 6 8 6 0 D 20 P TotalDistance T Q i,j = (n-2)*p i,j - T i - T j A B C D A 0-26 -28-26 B -26 0-26 -28 C -28-26 0-26 D -26-28 -26 0 Q Q i,j = (4-2)*6-18 - 20 = -26 https://www.youtube.com/watch?v=agsudxq7gp8

Neighbor-Joining Algorithm Step2: 從 Q 找出最小的值, 選其中一組出來合併 A B A B C D A 0 8 4 6 A 18 A B C D A 0-26 -28-26 B 8 0 8 8 B 24 B -26 0-26 -28 C D C 4 8 0 6 D 6 8 6 0 C 18 D 20 C -28-26 0-26 D -26-28 -26 0 P T Q A,C B,D https://www.youtube.com/watch?v=agsudxq7gp8

Neighbor-Joining Algorithm Step3: 合併 A,C, 產生新的 P T Q A 2 B A,C B A,C B D A,C B A,C B A,C B D C D D D P T D Q https://www.youtube.com/watch?v=agsudxq7gp8 1) 從前一個 P T, 計算 A,C 的距離 : n = 4 ( 因為 P 是 4x4 的 matrix) 4/2 + (18-18)/2(n-2) = 2 2) 把前一個 P 的對應值 ( 有跟 A,C 相關的 ) 都減去 2, 其他不受影響的值不變產生新的 P 再由新的 P 計算出新的 T Q

Neighbor-Joining Algorithm Step3: 合併 A,C, 產生新的 P T Q A 2 B A,C B D A,C 0 6 4 B 6 0 8 A,C 10 B 14 A,C B D A,C 0-18 -18 B -18 0-18 C D D 4 8 0 D 12 P T D -18-18 0 Q https://www.youtube.com/watch?v=agsudxq7gp8 1) 從前一個 P T, 計算 A,C 的距離 : n = 4 ( 因為 P 是 4x4 的 matrix) 4/2 + (18-18)/2(n-2) = 2 2) 把前一個 P 的對應值 ( 有跟 A,C 相關的 ) 都減去 2, 其他不受影響的值不變產生新的 P 再由新的 P 計算出新的 T Q

A 2 B C D A 2 B 2 4-2 =2 C D

重複 Step2, 合併 A,C,B 請同學算一次新的 P T Q

UPGMA Algorithm A B C D E A 0 8 4 6 8 B 8 0 8 8 4 C 4 8 0 6 8 D 6 8 6 0 8 E 8 4 8 8 0 A B C D E https://www.youtube.com/watch?v=c2y9s_e2184

UPGMA Algorithm A,C B D E A B C D E A 0 8 4 6 8 B 8 0 8 8 4 C 4 8 0 6 8 找出最小的值, 選其中一組出來合併合併 A,C ( 距離直接對分 4/2=2) A,C 0 8 6 8 B 8 0 8 4 D 6 8 0 8 E 8 4 8 0 D 6 8 6 0 8 E 8 4 8 8 0 C A 2 2 B D E A B C D E https://www.youtube.com/watch?v=c2y9s_e2184

UPGMA Algorithm A,C B D E A B C D E A 0 8 4 6 8 B 8 0 8 8 4 C 4 8 0 6 8 找出最小的值, 選其中一組出來合併合併 A,C ( 距離直接對分 4/2=2) A,C 0 8 6 8 B 8 0 8 4 D 6 8 0 8 E 8 4 8 0 D 6 8 6 0 8 E 8 4 8 8 0 C A 2 2 B D E A B C D E 選擇最小的合併合併 B 與 E ( 距離直接對分 4/2=2) A,C B,E D A,C 0 8 6 B,E 8 0 8 D 6 8 0 C A 2 2 B E 2 2 D https://www.youtube.com/watch?v=c2y9s_e2184

UPGMA 有時候有二組以上最小的 圖片出處 https://www.youtube.com/watch?v=c2y9s_e2184

UPGMA 有時 branch length 無法分配完美 圖片出處 https://www.youtube.com/watch?v=c2y9s_e2184

Step3. progressive alignment 1 2 1 3 1 4 2 3 2 4 1 3 2 4 3 4

Problems with progressive alignments Dependence of the initial pair-wise sequence alignment. Propagating errors form initial alignments.

Example This and next figures examples are from T-coffee paper: Noterdame, Higgins, Heringa, JMB 2000, 302 205-217

MUSCLE Robert C. Edgar* Nucleic Acids Research, 2004, Vol. 32, No. 5 1792-1797 There are three main stages: Stage 1. draft progressive Stage 2. improved progressive Stage 3. refinement

Robert C. Edgar* Nucleic Acids Research, 2004, Vol. 32, No. 5 1792-1797

http://ai.stanford.edu/~chuongdo/papers/alignment_review.pdf

https://www.ebi.ac.uk/tools/msa/

MUSCLE https://www.ebi.ac.uk/tools/msa/muscle/

MAFFT https://www.ebi.ac.uk/tools/msa/mafft/

Newick format 1 1 1 2 3 (B,(A,C,E),D); (B:2,(A:1,C:1,E:1),D:3); More detailed: http://evolution.genetics.washington.edu/phylip/newicktree.html

MEGA https://www.megasoftware.net/

Homework APOBEC ("apolipoprotein B mrna editing enzyme, catalytic polypeptide-like") is a family of evolutionarily conserved cytidine deaminases. A mechanism of generating protein diversity is mrna editing. Members of this family are C-to-U editing enzymes. The N-terminal domain of APOBEC like proteins is the catalytic domain, while the C-terminal domain is a pseudocatalytic domain. More specifically, the catalytic domain is a zinc dependent cytidine deaminase domain and is essential for cytidine deamination. RNA editing by APOBEC-1 requires homodimerisation and this complex interacts with RNA binding proteins to form the editosome. In humans/mammals they help protect from viral infections. [3] These enzymes, when misregulated, are a major source of mutation in numerous cancer types. (...from wiki) https://en.wikipedia.org/wiki/apobec http://cgmmrc.cgu.edu.tw/files/14-1064-53824,r34-1.php?lang=zh-tw

Chen et al., APOBEC3A is an oral cancer prognostic biomarker in Taiwanese carriers of an APOBEC deletion polymorphism. Nature Communications 8:465, 2017

請用 Microsoft Word 或 PDF 格式編輯作業, 檔案名稱請用學號 _ 姓名例 : u9934123_ 姓名繳交時間 : 3/28 15:30 前上傳至 ilms (1) How many human APOBEC family deposited in NCBI? (2) Do MSA and Tree visualization analysis using MAFFT, T-coffee, and MUSCLE. (DNA) (3) Are the MSA results the same? If not, why?