Mutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802
|
|
- Prosper Davis
- 5 years ago
- Views:
Transcription
1 Mutual Information & Genotype-Phenotype Association Norman MacDonald January 31, 2011 CSCI 4181/6802
2 2 Overview What is information (specifically Shannon Information)? What are information entropy and mutual information? How are they used? In-depth example: Genotype-Phenotype Association
3 Which message has more information? 3
4 4 What is information? There are many definitions of (different types of) information. Here, we are talking about Shannon Information. Shannon Information is not knowledge.
5 5 Aside: A little history on Claude Shannon Research Fellow at Princeton s Institute for Advanced Study Claude Shannon worked as a wartime cryptanalyst from at Bell Labs. His work led to his influential A mathematical theory of communication, published in Some say he already had the most famous master s thesis of the century at MIT, laying the groundwork of electronic communication with Boolean algebra.
6 6 What is information? Information is often defined in terms of communication. It depends only on the probability of a message. The more improbable a message, the more information it contains.
7 What is information? We can measure information in bits. Less probable events have more information. Can intuitively be thought of as surprisal *. *Coined by Myron Tribus in Thermostatics and Thermodynamics (1961). 7
8 Drawings by David Mosher from M Mitchell s Complexity: A Guided Tour (2009) 8
9 9 Intuitively, either outcome of a fair coin flip has 1 bit of information. e.g. Let Heads=1, Tails=0 Each outcome equally probable, thus, each outcome equally informative.
10 The result of each possible die roll has 2.58 bits of information bits?? Huh? Ok, practically we need 3 bits, but theoretically only 2.58 bits are needed (we can represent up to 8 states with 3 bits.) 10
11 11 How do we measure information? Where ω n is a given outcome and P(.) is the probability mass function for ω. With a log base of two, the units are bits. The more unlikely an event, the more information is received when it occurs. Definite events (P=1.0) have 0 bits of information. suprisal
12 12 Another example: Winning the lottery Let M be a language with two messages: W: Yay! I won! L : Boo! I lost! Let P(M=W) = P(M=L) = Then L : has 1.44 x 10-7 bits of information. W : has 23.3 bits of information.
13 13 Information Entropy Now that we can measure the information of actual messages received, we can think about overall information content of a random variable. A useful measure of this is the Expected Value of the Information of a random variable, otherwise known as the Information Entropy.
14 14 Information Entropy E: The expected value function H: The information entropy expected surprisal
15 H(X) 15 Information Entropy 1 Entropy of a two state variable P(X) A coin flip, the distribution is (p=0.5), and the entropy (average suprisal) is 1 bit. The lottery example (p=1.0x10-7 ) has near zero entropy.
16 16 More examples with entropy A flip of a fair coin: Initially: Low prior information, thus high uncertainty A roll of a six-sided die: Initially: Low prior information, thus high uncertainty A lottery ticket: Initially: High prior information, thus low uncertainty Note that this has to do with the uncertainty. Uncertainty deals with the future. The actual information contained in a message depends upon the probability of the actual event that occurred!
17 17 Conditional Entropy H(X Y) What uncertainty is left in X if we know Y? E.g. X: {grass wet, grass dry} Y: {rainy, sunny} In this case, very little uncertainty remains.
18 18 Conditional Entropy If the entropy in a system is H(Y,X), and we remove the entropy of X, then we have H(Y G). Note: H(Y X) = H(Y) iff X and Y are independent. (knowing one gives no information about the other)
19 19 So far We now have a sense of: The information (surprisal) of a specific state. The expected information over all states, known as the entropy. What about the information shared between two random variables?
20 20 Mutual Information Given two random variables, we can formally define the level of relationship between them by the average mutual information. A couple of extremes: Zero mutual information: The variables are independent. Mutual information ~= Information: The variables are potentially redundant. Can be thought of as agreement
21 21 Mutual Information Formally: Other quantities:
22 22 Mutual Information Important point: Mutual information is ignorant of the message itself. Each value contributes to the information. e.g. the absence and presence of a feature equally contribute to the information. Agreement Reminder: Information is dependent only on the probability of an outcome, not on any meaning attributed to the outcome.
23 Entropy Relationships 23
24 Application areas Lossless data compression (e.g. Huffman encoding) Theoretical channel capacity Corpus linguistics (word collocation) RNA secondary structure prediction (covarying sites) Feature selection Relevance and redundancy Microarray expression Measuring cluster quality Genotype-phenotype association 24
25 Genotype-Phenotype Association 25
26 26 The problem Gene A, Gene B Trait logs/hirez/champagne_vent_hirez.jpg
27 27 We can create two random variables. X = 1, 0 The presence or absence of a gene Y = 1, 0 The presence or absence of a trait With this encoding, we can measure the agreement among X and Y to determine if they may be related.
28 Genotype Phenotype 28
29 Genotype Phenotype 29
30 NETCAR Tamura and D haeseleer, 2008, Bioinformatics 30
31 31 So we need examples of organisms with and without genes and traits to analyze. We can get our examples from complete genomes available for download online.
32 32 However, some of these microbes will be distantly related, having genes with similar function, but are not identical. We need to group based on orthology.
33 33 Clusters of Orthologous Groups Homologous genes: Set of genes that share a last common ancestor. Orthologous genes: Homologous genes that are separated by a speciation event. COGs used here are from NCBI and the STRING databases.
34 34 Once we have our genomes, COGs, and traits, we can build phylogenetic profiles (Pellegrini et al 1999) Organism α β γ A Gene B C Trait Y We can analyze patterns of presence and absence
35 35 Associative rule models Gene A and Gene B and Gene C Trait If we were to exhaustively search all possible interactions of size three in a 26,290 gene set, we would have a search space of size 3.03 x Association rule mining allow us to prune this search space.
36 36 Associative rule models (Agrawal et al. 1993) A classical example is a set of grocery store sales transactions. +
37 NETCAR (Association rule mining algorithm) 1. Find parent features strongly associated with phenotype Orthologous gene clusters H A E J B L G I C D O M N F Q K P E Thermophily Tamura and D'haeseleer, Bioinformatics, 2008, 24 (13),
38 NETCAR (Association rule mining algorithm) 2. Find all child features within x steps of a parent in terms of mutual information. Orthologous gene clusters H L C M K E B J G I O D Q N P R A F Thermophily Tamura and D'haeseleer, Bioinformatics, 2008, 24 (13),
39 NETCAR (Association rule mining algorithm) 3. Generate candidate rules with at least one parent. Orthologous gene clusters H L M O J B I D N Q P R C K G E E A F Thermophily [A E] [F E G] [F E] [F G C] [F G] [F G K] [F] [F C K] [A] Tamura and D'haeseleer, Bioinformatics, 2008, 24 (13),
40 NETCAR (Association rule mining algorithm) 4. Save rules with high mutual information with phenotype. Thermophily [A E] [F E G] [F E] [F G C] [F G] [F G K] [F] [F C K] [A] Tamura and D'haeseleer, Bioinformatics, 2008, 24 (13),
41 Classification based on Predictive Association Rules (CPAR) F, Q None above gain threshold F F, Z None above gain threshold A None above gain threshold Rules discovered: 1. F, Q -> POSITIVE 2. F, Z -> POSITIVE 3. A -> POSITIVE Covered samples get their weight reduced before the next iteration Yin and Han, Proceedings of the Third SIAM International Conference on Data Mining (SDM03),
42 42 Data 427 organisms (STRING 8) unique orthologous gene cluster patterns 10 Phenotypes (focus on thermophily, JGI IMG) Taxonomy (NCBI)
43 43 CPAR versus NETCAR Accuracy Runtime (s)
44 Dependent Samples 44
45 45 Dependence among samples Both A and B have a strong association with the phenotype (measured with mutual information) Phenotype Gene A can be explained by shared ancestry. Gene A Gene B Gene B cannot be explained by shared ancestry and should be highlighted.
46 Dependent Samples 29 of the 40 correctly classified thermophiles are homogeneous to taxonomic rank order. Phenotype light: non-thermophiles dark: thermophiles and hyperthermophiles Gene A Gene B 46
47 Accounting for shared ancestry with conditional mutual information 47
48 48 Confoundment H: Shannon Entropy I: Mutual Information
49 49 Results of CWMI with MI There is no difference in accuracy but there is a difference in the genes that are selected.
50 Thermophily Top MI Top CWMI X: A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Makarova et al., Nucleic Acids Research, 2002, 30 (2),
51 51 Misclassifications Some organisms classified correctly with one score and not the other. For example, over ten replicates, 5-fold cross-validation on thermophily
52 52 Misclassifications (10 replicates) Organism CPAR MI CWMI Streptococcus_thermophilus_LMG_ Streptococcus_thermophilus_CNRZ Carboxydothermus_hydrogenoformans_Z Geobacillus_kaustophilus_HTA Synechococcus_sp._JA-3-3Ab Methanocaldococcus_jannaschii_DSM_ Acidothermus_cellulolyticus_11B Deinococcus_geothermalis_DSM_ Clostridium_thermocellum_ATCC_ Chlorobium_tepidum_TLS
53 53 Thermophilic streptococci Rules applying to Thermophilic streptococci
54 54 Discussion: CPAR vs MI CPAR uses an approximation of conditional probability P(Trait Gene). When we see gene G, what is the probability of trait P Mutual information is a measure of agreement How well does the presence & absence of G match the presence & absence of P
55 55 Discussion CPAR mines rules 100x faster than NETCAR, and those rules are better predictors. Shared ancestry confounds gene to trait association problems. Some of the rules weighted with CMI are already known to biologically influence the target traits. We may be subtracting predictive features in favor of those that defy ancestry.
56 56 References 1. Tamura and D haeseleer Microbial genotype-phenotype mapping by class association rule mining. Bioinformatics, 24(13): , Steuer, Kurths et al The mutual information: detecting and evaluating dependencies between variables. Bioinformatics, 18(2):S231- S Yin X and Han J CPAR: Classification based on predictive association rules. In Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, G. Kastenmuller, M. Schenk, J. Gasteiger, and H.-W. Mewes Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes. Genome Biol, 10(3):R28 5. Cover and Thomas Elements of information theory. Wiley, New Jersey.
Information in Biology
Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living
More informationInformation in Biology
Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationQuantitative Biology Lecture 3
23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance
More informationIntro to Information Theory
Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationINTRODUCTION TO INFORMATION THEORY
INTRODUCTION TO INFORMATION THEORY KRISTOFFER P. NIMARK These notes introduce the machinery of information theory which is a eld within applied mathematics. The material can be found in most textbooks
More informationShannon's Theory of Communication
Shannon's Theory of Communication An operational introduction 5 September 2014, Introduction to Information Systems Giovanni Sileno g.sileno@uva.nl Leibniz Center for Law University of Amsterdam Fundamental
More informationLecture 11: Information theory THURSDAY, FEBRUARY 21, 2019
Lecture 11: Information theory DANIEL WELLER THURSDAY, FEBRUARY 21, 2019 Agenda Information and probability Entropy and coding Mutual information and capacity Both images contain the same fraction of black
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationIntroduction to Information Theory
Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information
More informationInformation & Correlation
Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What
More informationInformation Theory (Information Theory by J. V. Stone, 2015)
Information Theory (Information Theory by J. V. Stone, 2015) Claude Shannon (1916 2001) Shannon, C. (1948). A mathematical theory of communication. Bell System Technical Journal, 27:379 423. A mathematical
More informationSome Basic Concepts of Probability and Information Theory: Pt. 2
Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationTaxonomy. Content. How to determine & classify a species. Phylogeny and evolution
Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature
More information6.02 Fall 2011 Lecture #9
6.02 Fall 2011 Lecture #9 Claude E. Shannon Mutual information Channel capacity Transmission at rates up to channel capacity, and with asymptotically zero error 6.02 Fall 2011 Lecture 9, Slide #1 First
More informationDecision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore
Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy
More informationComputing and Communications 2. Information Theory -Entropy
1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy
More information6.02 Fall 2012 Lecture #1
6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture
More informationMATH 3C: MIDTERM 1 REVIEW. 1. Counting
MATH 3C: MIDTERM REVIEW JOE HUGHES. Counting. Imagine that a sports betting pool is run in the following way: there are 20 teams, 2 weeks, and each week you pick a team to win. However, you can t pick
More informationIntroduction to Information Theory. Part 4
Introduction to Information Theory Part 4 A General Communication System CHANNEL Information Source Transmitter Channel Receiver Destination 10/2/2012 2 Information Channel Input X Channel Output Y 10/2/2012
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More informationDecision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore
Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More informationMicrobiota: Its Evolution and Essence. Hsin-Jung Joyce Wu "Microbiota and man: the story about us
Microbiota: Its Evolution and Essence Overview q Define microbiota q Learn the tool q Ecological and evolutionary forces in shaping gut microbiota q Gut microbiota versus free-living microbe communities
More informationUNIT I INFORMATION THEORY. I k log 2
UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper
More informationInformation Theory & Decision Trees
Information Theory & Decision Trees Jihoon ang Sogang University Email: yangjh@sogang.ac.kr Decision tree classifiers Decision tree representation for modeling dependencies among input variables using
More informationInformation. = more information was provided by the outcome in #2
Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual
More informationA Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University
A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information
More informationLecture 16 Oct 21, 2014
CS 395T: Sublinear Algorithms Fall 24 Prof. Eric Price Lecture 6 Oct 2, 24 Scribe: Chi-Kit Lam Overview In this lecture we will talk about information and compression, which the Huffman coding can achieve
More informationDATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering
DATA MINING LECTURE 9 Minimum Description Length Information Theory Co-Clustering MINIMUM DESCRIPTION LENGTH Occam s razor Most data mining tasks can be described as creating a model for the data E.g.,
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationClassical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006
Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Fabio Grazioso... July 3, 2006 1 2 Contents 1 Lecture 1, Entropy 4 1.1 Random variable...............................
More informationMultimedia Communications. Mathematical Preliminaries for Lossless Compression
Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when
More informationLecture 1: Shannon s Theorem
Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015 Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work
More informationMurray Gell-Mann, The Quark and the Jaguar, 1995
Although [complex systems] differ widely in their physical attributes, they resemble one another in the way they handle information. That common feature is perhaps the best starting point for exploring
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationMiGA: The Microbial Genome Atlas
December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From
More informationMachine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler
+ Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:
More informationMicrobial Taxonomy and the Evolution of Diversity
19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy
More informationAQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013
AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 Lecturer: Dr. Mark Tame Introduction With the emergence of new types of information, in this case
More informationIntroduction to Information Theory. Part 3
Introduction to Information Theory Part 3 Assignment#1 Results List text(s) used, total # letters, computed entropy of text. Compare results. What is the computed average word length of 3 letter codes
More informationEntropies & Information Theory
Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information
More informationMicrobes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.
Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional
More informationMicrobial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.
Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional
More informationWhat is Entropy? Jeff Gill, 1 Entropy in Information Theory
What is Entropy? Jeff Gill, jgill@ucdavis.edu 1 Entropy in Information Theory There are many definitions of information in various literatures but all of them have the same property of distinction from
More informationEntropy as a measure of surprise
Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationConditional Probability and Bayes Theorem (2.4) Independence (2.5)
Conditional Probability and Bayes Theorem (2.4) Independence (2.5) Prof. Tesler Math 186 Winter 2019 Prof. Tesler Conditional Probability and Bayes Theorem Math 186 / Winter 2019 1 / 38 Scenario: Flip
More informationPart I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy
and Networks Lecture 5: Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/ Lecture_notes/InformationTheory/ Part I School of Mathematical Sciences, University
More informationWelcome to Comp 411! 2) Course Objectives. 1) Course Mechanics. 3) Information. I thought this course was called Computer Organization
Welcome to Comp 4! I thought this course was called Computer Organization David Macaulay ) Course Mechanics 2) Course Objectives 3) Information L - Introduction Meet the Crew Lectures: Leonard McMillan
More informationBasic Probability and Statistics
Basic Probability and Statistics Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Jerry Zhu, Mark Craven] slide 1 Reasoning with Uncertainty
More informationData Warehousing & Data Mining
13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.
More informationEntropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39
Entropy Probability and Computing Presentation 22 Probability and Computing Presentation 22 Entropy 1/39 Introduction Why randomness and information are related? An event that is almost certain to occur
More informationPhylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?
Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species
More informationMarkov Chains. Chapter 16. Markov Chains - 1
Markov Chains Chapter 16 Markov Chains - 1 Why Study Markov Chains? Decision Analysis focuses on decision making in the face of uncertainty about one future event. However, many decisions need to consider
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationLecture 7: DecisionTrees
Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output
More informationTree Building Activity
Tree Building Activity Introduction In this activity, you will construct phylogenetic trees using a phenotypic similarity (cartoon microbe pictures) and genotypic similarity (real microbe sequences). For
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationA. Incorrect! In the binomial naming convention the Kingdom is not part of the name.
Microbiology Problem Drill 08: Classification of Microorganisms No. 1 of 10 1. In the binomial system of naming which term is always written in lowercase? (A) Kingdom (B) Domain (C) Genus (D) Specific
More informationEPE / EDP 557 Homework 7
Section III. A. Questions EPE / EDP 557 Homework 7 Section III. A. and Lab 7 Suppose you roll a die once and flip a coin twice. Events are defined as follows: A = {Die is a 1} B = {Both flips of the coin
More informationMicrobiome: 16S rrna Sequencing 3/30/2018
Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationbiologically-inspired computing lecture 18
Informatics -inspired lecture 18 Sections I485/H400 course outlook Assignments: 35% Students will complete 4/5 assignments based on algorithms presented in class Lab meets in I1 (West) 109 on Lab Wednesdays
More informationBiological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor
Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms
More informationTotal
Student Performance by Question Biology (Multiple-Choice ONLY) Teacher: Core 1 / S-14 Scientific Investigation Life at the Molecular and Cellular Level Analysis of Performance by Question of each student
More informationHeredity and Genetics WKSH
Chapter 6, Section 3 Heredity and Genetics WKSH KEY CONCEPT Mendel s research showed that traits are inherited as discrete units. Vocabulary trait purebred law of segregation genetics cross MAIN IDEA:
More information1 What are probabilities? 2 Sample Spaces. 3 Events and probability spaces
1 What are probabilities? There are two basic schools of thought as to the philosophical status of probabilities. One school of thought, the frequentist school, considers the probability of an event to
More informationSUPPLEMENTARY INFORMATION
Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,
More informationMachine Learning Recitation 8 Oct 21, Oznur Tastan
Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy
More informationEvolution Problem Drill 09: The Tree of Life
Evolution Problem Drill 09: The Tree of Life Question No. 1 of 10 Question 1. The age of the Earth is estimated to be about 4.0 to 4.5 billion years old. All of the following methods may be used to estimate
More informationChapter 19: Taxonomy, Systematics, and Phylogeny
Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand
More informationProbability & Random Variables
& Random Variables Probability Probability theory is the branch of math that deals with random events, processes, and variables What does randomness mean to you? How would you define probability in your
More informationLecture 11: Continuous-valued signals and differential entropy
Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components
More informationA different perspective. Genes, bioinformatics and dynamics. Metaphysics of science. The gene. Peter R Wills
Genes, bioinformatics and dynamics A different perspective Peter R Wills Department of Physics University of Auckland Supported by the Alexander von Humboldt Foundation Metaphysics of science The Greeks
More informationConditional Probability
Conditional Probability When we obtain additional information about a probability experiment, we want to use the additional information to reassess the probabilities of events given the new information.
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationIntroduction to Data Science Data Mining for Business Analytics
Introduction to Data Science Data Mining for Business Analytics BRIAN D ALESSANDRO VP DATA SCIENCE, DSTILLERY ADJUNCT PROFESSOR, NYU FALL 2014 Fine Print: these slides are, and always will be a work in
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationMicrobial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.
Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional
More informationA brief review of basics of probabilities
brief review of basics of probabilities Milos Hauskrecht milos@pitt.edu 5329 Sennott Square robability theory Studies and describes random processes and their outcomes Random processes may result in multiple
More informationProperties of Probability
Econ 325 Notes on Probability 1 By Hiro Kasahara Properties of Probability In statistics, we consider random experiments, experiments for which the outcome is random, i.e., cannot be predicted with certainty.
More information3F1 Information Theory, Lecture 1
3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:
More informationOutline. Computer Science 418. Number of Keys in the Sum. More on Perfect Secrecy, One-Time Pad, Entropy. Mike Jacobson. Week 3
Outline Computer Science 48 More on Perfect Secrecy, One-Time Pad, Mike Jacobson Department of Computer Science University of Calgary Week 3 2 3 Mike Jacobson (University of Calgary) Computer Science 48
More informationDiscovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7.
Discovering Correlation in Data Vinh Nguyen (vinh.nguyen@unimelb.edu.au) Research Fellow in Data Science Computing and Information Systems DMD 7.14 Discovering Correlation Why is correlation important?
More informationEE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018
Please submit the solutions on Gradescope. Some definitions that may be useful: EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Definition 1: A sequence of random variables X
More informationIntroduction to Information Theory. Part 2
Introduction to Information Theory Part 2 1 A General Communication System CHANNEL Information Source Transmitter Channel Receiver Destination 2 Information: Definition Information is quantified using
More informationComputational Structural Bioinformatics
Computational Structural Bioinformatics ECS129 Instructor: Patrice Koehl http://koehllab.genomecenter.ucdavis.edu/teaching/ecs129 koehl@cs.ucdavis.edu Learning curve Math / CS Biology/ Chemistry Pre-requisite
More informationSome Concepts in Probability and Information Theory
PHYS 476Q: An Introduction to Entanglement Theory (Spring 2018) Eric Chitambar Some Concepts in Probability and Information Theory We begin this course with a condensed survey of basic concepts in probability
More informationUnit of Study: Genetics, Evolution and Classification
Biology 3 rd Nine Weeks TEKS Unit of Study: Genetics, Evolution and Classification B.1) Scientific Processes. The student, for at least 40% of instructional time, conducts laboratory and field investigations
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More information