Mutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802

Size: px
Start display at page:

Download "Mutual Information & Genotype-Phenotype Association. Norman MacDonald January 31, 2011 CSCI 4181/6802"

Transcription

1 Mutual Information & Genotype-Phenotype Association Norman MacDonald January 31, 2011 CSCI 4181/6802

2 2 Overview What is information (specifically Shannon Information)? What are information entropy and mutual information? How are they used? In-depth example: Genotype-Phenotype Association

3 Which message has more information? 3

4 4 What is information? There are many definitions of (different types of) information. Here, we are talking about Shannon Information. Shannon Information is not knowledge.

5 5 Aside: A little history on Claude Shannon Research Fellow at Princeton s Institute for Advanced Study Claude Shannon worked as a wartime cryptanalyst from at Bell Labs. His work led to his influential A mathematical theory of communication, published in Some say he already had the most famous master s thesis of the century at MIT, laying the groundwork of electronic communication with Boolean algebra.

6 6 What is information? Information is often defined in terms of communication. It depends only on the probability of a message. The more improbable a message, the more information it contains.

7 What is information? We can measure information in bits. Less probable events have more information. Can intuitively be thought of as surprisal *. *Coined by Myron Tribus in Thermostatics and Thermodynamics (1961). 7

8 Drawings by David Mosher from M Mitchell s Complexity: A Guided Tour (2009) 8

9 9 Intuitively, either outcome of a fair coin flip has 1 bit of information. e.g. Let Heads=1, Tails=0 Each outcome equally probable, thus, each outcome equally informative.

10 The result of each possible die roll has 2.58 bits of information bits?? Huh? Ok, practically we need 3 bits, but theoretically only 2.58 bits are needed (we can represent up to 8 states with 3 bits.) 10

11 11 How do we measure information? Where ω n is a given outcome and P(.) is the probability mass function for ω. With a log base of two, the units are bits. The more unlikely an event, the more information is received when it occurs. Definite events (P=1.0) have 0 bits of information. suprisal

12 12 Another example: Winning the lottery Let M be a language with two messages: W: Yay! I won! L : Boo! I lost! Let P(M=W) = P(M=L) = Then L : has 1.44 x 10-7 bits of information. W : has 23.3 bits of information.

13 13 Information Entropy Now that we can measure the information of actual messages received, we can think about overall information content of a random variable. A useful measure of this is the Expected Value of the Information of a random variable, otherwise known as the Information Entropy.

14 14 Information Entropy E: The expected value function H: The information entropy expected surprisal

15 H(X) 15 Information Entropy 1 Entropy of a two state variable P(X) A coin flip, the distribution is (p=0.5), and the entropy (average suprisal) is 1 bit. The lottery example (p=1.0x10-7 ) has near zero entropy.

16 16 More examples with entropy A flip of a fair coin: Initially: Low prior information, thus high uncertainty A roll of a six-sided die: Initially: Low prior information, thus high uncertainty A lottery ticket: Initially: High prior information, thus low uncertainty Note that this has to do with the uncertainty. Uncertainty deals with the future. The actual information contained in a message depends upon the probability of the actual event that occurred!

17 17 Conditional Entropy H(X Y) What uncertainty is left in X if we know Y? E.g. X: {grass wet, grass dry} Y: {rainy, sunny} In this case, very little uncertainty remains.

18 18 Conditional Entropy If the entropy in a system is H(Y,X), and we remove the entropy of X, then we have H(Y G). Note: H(Y X) = H(Y) iff X and Y are independent. (knowing one gives no information about the other)

19 19 So far We now have a sense of: The information (surprisal) of a specific state. The expected information over all states, known as the entropy. What about the information shared between two random variables?

20 20 Mutual Information Given two random variables, we can formally define the level of relationship between them by the average mutual information. A couple of extremes: Zero mutual information: The variables are independent. Mutual information ~= Information: The variables are potentially redundant. Can be thought of as agreement

21 21 Mutual Information Formally: Other quantities:

22 22 Mutual Information Important point: Mutual information is ignorant of the message itself. Each value contributes to the information. e.g. the absence and presence of a feature equally contribute to the information. Agreement Reminder: Information is dependent only on the probability of an outcome, not on any meaning attributed to the outcome.

23 Entropy Relationships 23

24 Application areas Lossless data compression (e.g. Huffman encoding) Theoretical channel capacity Corpus linguistics (word collocation) RNA secondary structure prediction (covarying sites) Feature selection Relevance and redundancy Microarray expression Measuring cluster quality Genotype-phenotype association 24

25 Genotype-Phenotype Association 25

26 26 The problem Gene A, Gene B Trait logs/hirez/champagne_vent_hirez.jpg

27 27 We can create two random variables. X = 1, 0 The presence or absence of a gene Y = 1, 0 The presence or absence of a trait With this encoding, we can measure the agreement among X and Y to determine if they may be related.

28 Genotype Phenotype 28

29 Genotype Phenotype 29

30 NETCAR Tamura and D haeseleer, 2008, Bioinformatics 30

31 31 So we need examples of organisms with and without genes and traits to analyze. We can get our examples from complete genomes available for download online.

32 32 However, some of these microbes will be distantly related, having genes with similar function, but are not identical. We need to group based on orthology.

33 33 Clusters of Orthologous Groups Homologous genes: Set of genes that share a last common ancestor. Orthologous genes: Homologous genes that are separated by a speciation event. COGs used here are from NCBI and the STRING databases.

34 34 Once we have our genomes, COGs, and traits, we can build phylogenetic profiles (Pellegrini et al 1999) Organism α β γ A Gene B C Trait Y We can analyze patterns of presence and absence

35 35 Associative rule models Gene A and Gene B and Gene C Trait If we were to exhaustively search all possible interactions of size three in a 26,290 gene set, we would have a search space of size 3.03 x Association rule mining allow us to prune this search space.

36 36 Associative rule models (Agrawal et al. 1993) A classical example is a set of grocery store sales transactions. +

37 NETCAR (Association rule mining algorithm) 1. Find parent features strongly associated with phenotype Orthologous gene clusters H A E J B L G I C D O M N F Q K P E Thermophily Tamura and D'haeseleer, Bioinformatics, 2008, 24 (13),

38 NETCAR (Association rule mining algorithm) 2. Find all child features within x steps of a parent in terms of mutual information. Orthologous gene clusters H L C M K E B J G I O D Q N P R A F Thermophily Tamura and D'haeseleer, Bioinformatics, 2008, 24 (13),

39 NETCAR (Association rule mining algorithm) 3. Generate candidate rules with at least one parent. Orthologous gene clusters H L M O J B I D N Q P R C K G E E A F Thermophily [A E] [F E G] [F E] [F G C] [F G] [F G K] [F] [F C K] [A] Tamura and D'haeseleer, Bioinformatics, 2008, 24 (13),

40 NETCAR (Association rule mining algorithm) 4. Save rules with high mutual information with phenotype. Thermophily [A E] [F E G] [F E] [F G C] [F G] [F G K] [F] [F C K] [A] Tamura and D'haeseleer, Bioinformatics, 2008, 24 (13),

41 Classification based on Predictive Association Rules (CPAR) F, Q None above gain threshold F F, Z None above gain threshold A None above gain threshold Rules discovered: 1. F, Q -> POSITIVE 2. F, Z -> POSITIVE 3. A -> POSITIVE Covered samples get their weight reduced before the next iteration Yin and Han, Proceedings of the Third SIAM International Conference on Data Mining (SDM03),

42 42 Data 427 organisms (STRING 8) unique orthologous gene cluster patterns 10 Phenotypes (focus on thermophily, JGI IMG) Taxonomy (NCBI)

43 43 CPAR versus NETCAR Accuracy Runtime (s)

44 Dependent Samples 44

45 45 Dependence among samples Both A and B have a strong association with the phenotype (measured with mutual information) Phenotype Gene A can be explained by shared ancestry. Gene A Gene B Gene B cannot be explained by shared ancestry and should be highlighted.

46 Dependent Samples 29 of the 40 correctly classified thermophiles are homogeneous to taxonomic rank order. Phenotype light: non-thermophiles dark: thermophiles and hyperthermophiles Gene A Gene B 46

47 Accounting for shared ancestry with conditional mutual information 47

48 48 Confoundment H: Shannon Entropy I: Mutual Information

49 49 Results of CWMI with MI There is no difference in accuracy but there is a difference in the genes that are selected.

50 Thermophily Top MI Top CWMI X: A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Makarova et al., Nucleic Acids Research, 2002, 30 (2),

51 51 Misclassifications Some organisms classified correctly with one score and not the other. For example, over ten replicates, 5-fold cross-validation on thermophily

52 52 Misclassifications (10 replicates) Organism CPAR MI CWMI Streptococcus_thermophilus_LMG_ Streptococcus_thermophilus_CNRZ Carboxydothermus_hydrogenoformans_Z Geobacillus_kaustophilus_HTA Synechococcus_sp._JA-3-3Ab Methanocaldococcus_jannaschii_DSM_ Acidothermus_cellulolyticus_11B Deinococcus_geothermalis_DSM_ Clostridium_thermocellum_ATCC_ Chlorobium_tepidum_TLS

53 53 Thermophilic streptococci Rules applying to Thermophilic streptococci

54 54 Discussion: CPAR vs MI CPAR uses an approximation of conditional probability P(Trait Gene). When we see gene G, what is the probability of trait P Mutual information is a measure of agreement How well does the presence & absence of G match the presence & absence of P

55 55 Discussion CPAR mines rules 100x faster than NETCAR, and those rules are better predictors. Shared ancestry confounds gene to trait association problems. Some of the rules weighted with CMI are already known to biologically influence the target traits. We may be subtracting predictive features in favor of those that defy ancestry.

56 56 References 1. Tamura and D haeseleer Microbial genotype-phenotype mapping by class association rule mining. Bioinformatics, 24(13): , Steuer, Kurths et al The mutual information: detecting and evaluating dependencies between variables. Bioinformatics, 18(2):S231- S Yin X and Han J CPAR: Classification based on predictive association rules. In Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, G. Kastenmuller, M. Schenk, J. Gasteiger, and H.-W. Mewes Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes. Genome Biol, 10(3):R28 5. Cover and Thomas Elements of information theory. Wiley, New Jersey.

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Quantitative Biology Lecture 3

Quantitative Biology Lecture 3 23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance

More information

Intro to Information Theory

Intro to Information Theory Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

INTRODUCTION TO INFORMATION THEORY

INTRODUCTION TO INFORMATION THEORY INTRODUCTION TO INFORMATION THEORY KRISTOFFER P. NIMARK These notes introduce the machinery of information theory which is a eld within applied mathematics. The material can be found in most textbooks

More information

Shannon's Theory of Communication

Shannon's Theory of Communication Shannon's Theory of Communication An operational introduction 5 September 2014, Introduction to Information Systems Giovanni Sileno g.sileno@uva.nl Leibniz Center for Law University of Amsterdam Fundamental

More information

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019 Lecture 11: Information theory DANIEL WELLER THURSDAY, FEBRUARY 21, 2019 Agenda Information and probability Entropy and coding Mutual information and capacity Both images contain the same fraction of black

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Introduction to Information Theory

Introduction to Information Theory Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information

More information

Information & Correlation

Information & Correlation Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What

More information

Information Theory (Information Theory by J. V. Stone, 2015)

Information Theory (Information Theory by J. V. Stone, 2015) Information Theory (Information Theory by J. V. Stone, 2015) Claude Shannon (1916 2001) Shannon, C. (1948). A mathematical theory of communication. Bell System Technical Journal, 27:379 423. A mathematical

More information

Some Basic Concepts of Probability and Information Theory: Pt. 2

Some Basic Concepts of Probability and Information Theory: Pt. 2 Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

6.02 Fall 2011 Lecture #9

6.02 Fall 2011 Lecture #9 6.02 Fall 2011 Lecture #9 Claude E. Shannon Mutual information Channel capacity Transmission at rates up to channel capacity, and with asymptotically zero error 6.02 Fall 2011 Lecture 9, Slide #1 First

More information

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

6.02 Fall 2012 Lecture #1

6.02 Fall 2012 Lecture #1 6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture

More information

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

MATH 3C: MIDTERM 1 REVIEW. 1. Counting MATH 3C: MIDTERM REVIEW JOE HUGHES. Counting. Imagine that a sports betting pool is run in the following way: there are 20 teams, 2 weeks, and each week you pick a team to win. However, you can t pick

More information

Introduction to Information Theory. Part 4

Introduction to Information Theory. Part 4 Introduction to Information Theory Part 4 A General Communication System CHANNEL Information Source Transmitter Channel Receiver Destination 10/2/2012 2 Information Channel Input X Channel Output Y 10/2/2012

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Microbiota: Its Evolution and Essence. Hsin-Jung Joyce Wu "Microbiota and man: the story about us

Microbiota: Its Evolution and Essence. Hsin-Jung Joyce Wu Microbiota and man: the story about us Microbiota: Its Evolution and Essence Overview q Define microbiota q Learn the tool q Ecological and evolutionary forces in shaping gut microbiota q Gut microbiota versus free-living microbe communities

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Information Theory & Decision Trees

Information Theory & Decision Trees Information Theory & Decision Trees Jihoon ang Sogang University Email: yangjh@sogang.ac.kr Decision tree classifiers Decision tree representation for modeling dependencies among input variables using

More information

Information. = more information was provided by the outcome in #2

Information. = more information was provided by the outcome in #2 Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual

More information

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information

More information

Lecture 16 Oct 21, 2014

Lecture 16 Oct 21, 2014 CS 395T: Sublinear Algorithms Fall 24 Prof. Eric Price Lecture 6 Oct 2, 24 Scribe: Chi-Kit Lam Overview In this lecture we will talk about information and compression, which the Huffman coding can achieve

More information

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering

DATA MINING LECTURE 9. Minimum Description Length Information Theory Co-Clustering DATA MINING LECTURE 9 Minimum Description Length Information Theory Co-Clustering MINIMUM DESCRIPTION LENGTH Occam s razor Most data mining tasks can be described as creating a model for the data E.g.,

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

More information

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Fabio Grazioso... July 3, 2006 1 2 Contents 1 Lecture 1, Entropy 4 1.1 Random variable...............................

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Lecture 1: Shannon s Theorem

Lecture 1: Shannon s Theorem Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015 Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work

More information

Murray Gell-Mann, The Quark and the Jaguar, 1995

Murray Gell-Mann, The Quark and the Jaguar, 1995 Although [complex systems] differ widely in their physical attributes, they resemble one another in the way they handle information. That common feature is perhaps the best starting point for exploring

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler + Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 Lecturer: Dr. Mark Tame Introduction With the emergence of new types of information, in this case

More information

Introduction to Information Theory. Part 3

Introduction to Information Theory. Part 3 Introduction to Information Theory Part 3 Assignment#1 Results List text(s) used, total # letters, computed entropy of text. Compare results. What is the computed average word length of 3 letter codes

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; fast- clock molecules for fine-structure. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

What is Entropy? Jeff Gill, 1 Entropy in Information Theory

What is Entropy? Jeff Gill, 1 Entropy in Information Theory What is Entropy? Jeff Gill, jgill@ucdavis.edu 1 Entropy in Information Theory There are many definitions of information in various literatures but all of them have the same property of distinction from

More information

Entropy as a measure of surprise

Entropy as a measure of surprise Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Conditional Probability and Bayes Theorem (2.4) Independence (2.5)

Conditional Probability and Bayes Theorem (2.4) Independence (2.5) Conditional Probability and Bayes Theorem (2.4) Independence (2.5) Prof. Tesler Math 186 Winter 2019 Prof. Tesler Conditional Probability and Bayes Theorem Math 186 / Winter 2019 1 / 38 Scenario: Flip

More information

Part I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy

Part I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy and Networks Lecture 5: Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/ Lecture_notes/InformationTheory/ Part I School of Mathematical Sciences, University

More information

Welcome to Comp 411! 2) Course Objectives. 1) Course Mechanics. 3) Information. I thought this course was called Computer Organization

Welcome to Comp 411! 2) Course Objectives. 1) Course Mechanics. 3) Information. I thought this course was called Computer Organization Welcome to Comp 4! I thought this course was called Computer Organization David Macaulay ) Course Mechanics 2) Course Objectives 3) Information L - Introduction Meet the Crew Lectures: Leonard McMillan

More information

Basic Probability and Statistics

Basic Probability and Statistics Basic Probability and Statistics Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Jerry Zhu, Mark Craven] slide 1 Reasoning with Uncertainty

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining 13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.

More information

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39 Entropy Probability and Computing Presentation 22 Probability and Computing Presentation 22 Entropy 1/39 Introduction Why randomness and information are related? An event that is almost certain to occur

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Markov Chains. Chapter 16. Markov Chains - 1

Markov Chains. Chapter 16. Markov Chains - 1 Markov Chains Chapter 16 Markov Chains - 1 Why Study Markov Chains? Decision Analysis focuses on decision making in the face of uncertainty about one future event. However, many decisions need to consider

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Lecture 7: DecisionTrees

Lecture 7: DecisionTrees Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Tree Building Activity

Tree Building Activity Tree Building Activity Introduction In this activity, you will construct phylogenetic trees using a phenotypic similarity (cartoon microbe pictures) and genotypic similarity (real microbe sequences). For

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name.

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name. Microbiology Problem Drill 08: Classification of Microorganisms No. 1 of 10 1. In the binomial system of naming which term is always written in lowercase? (A) Kingdom (B) Domain (C) Genus (D) Specific

More information

EPE / EDP 557 Homework 7

EPE / EDP 557 Homework 7 Section III. A. Questions EPE / EDP 557 Homework 7 Section III. A. and Lab 7 Suppose you roll a die once and flip a coin twice. Events are defined as follows: A = {Die is a 1} B = {Both flips of the coin

More information

Microbiome: 16S rrna Sequencing 3/30/2018

Microbiome: 16S rrna Sequencing 3/30/2018 Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

biologically-inspired computing lecture 18

biologically-inspired computing lecture 18 Informatics -inspired lecture 18 Sections I485/H400 course outlook Assignments: 35% Students will complete 4/5 assignments based on algorithms presented in class Lab meets in I1 (West) 109 on Lab Wednesdays

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Total

Total Student Performance by Question Biology (Multiple-Choice ONLY) Teacher: Core 1 / S-14 Scientific Investigation Life at the Molecular and Cellular Level Analysis of Performance by Question of each student

More information

Heredity and Genetics WKSH

Heredity and Genetics WKSH Chapter 6, Section 3 Heredity and Genetics WKSH KEY CONCEPT Mendel s research showed that traits are inherited as discrete units. Vocabulary trait purebred law of segregation genetics cross MAIN IDEA:

More information

1 What are probabilities? 2 Sample Spaces. 3 Events and probability spaces

1 What are probabilities? 2 Sample Spaces. 3 Events and probability spaces 1 What are probabilities? There are two basic schools of thought as to the philosophical status of probabilities. One school of thought, the frequentist school, considers the probability of an event to

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

Evolution Problem Drill 09: The Tree of Life

Evolution Problem Drill 09: The Tree of Life Evolution Problem Drill 09: The Tree of Life Question No. 1 of 10 Question 1. The age of the Earth is estimated to be about 4.0 to 4.5 billion years old. All of the following methods may be used to estimate

More information

Chapter 19: Taxonomy, Systematics, and Phylogeny

Chapter 19: Taxonomy, Systematics, and Phylogeny Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand

More information

Probability & Random Variables

Probability & Random Variables & Random Variables Probability Probability theory is the branch of math that deals with random events, processes, and variables What does randomness mean to you? How would you define probability in your

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

A different perspective. Genes, bioinformatics and dynamics. Metaphysics of science. The gene. Peter R Wills

A different perspective. Genes, bioinformatics and dynamics. Metaphysics of science. The gene. Peter R Wills Genes, bioinformatics and dynamics A different perspective Peter R Wills Department of Physics University of Auckland Supported by the Alexander von Humboldt Foundation Metaphysics of science The Greeks

More information

Conditional Probability

Conditional Probability Conditional Probability When we obtain additional information about a probability experiment, we want to use the additional information to reassess the probabilities of events given the new information.

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Introduction to Data Science Data Mining for Business Analytics

Introduction to Data Science Data Mining for Business Analytics Introduction to Data Science Data Mining for Business Analytics BRIAN D ALESSANDRO VP DATA SCIENCE, DSTILLERY ADJUNCT PROFESSOR, NYU FALL 2014 Fine Print: these slides are, and always will be a work in

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

A brief review of basics of probabilities

A brief review of basics of probabilities brief review of basics of probabilities Milos Hauskrecht milos@pitt.edu 5329 Sennott Square robability theory Studies and describes random processes and their outcomes Random processes may result in multiple

More information

Properties of Probability

Properties of Probability Econ 325 Notes on Probability 1 By Hiro Kasahara Properties of Probability In statistics, we consider random experiments, experiments for which the outcome is random, i.e., cannot be predicted with certainty.

More information

3F1 Information Theory, Lecture 1

3F1 Information Theory, Lecture 1 3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:

More information

Outline. Computer Science 418. Number of Keys in the Sum. More on Perfect Secrecy, One-Time Pad, Entropy. Mike Jacobson. Week 3

Outline. Computer Science 418. Number of Keys in the Sum. More on Perfect Secrecy, One-Time Pad, Entropy. Mike Jacobson. Week 3 Outline Computer Science 48 More on Perfect Secrecy, One-Time Pad, Mike Jacobson Department of Computer Science University of Calgary Week 3 2 3 Mike Jacobson (University of Calgary) Computer Science 48

More information

Discovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7.

Discovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7. Discovering Correlation in Data Vinh Nguyen (vinh.nguyen@unimelb.edu.au) Research Fellow in Data Science Computing and Information Systems DMD 7.14 Discovering Correlation Why is correlation important?

More information

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Please submit the solutions on Gradescope. Some definitions that may be useful: EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Definition 1: A sequence of random variables X

More information

Introduction to Information Theory. Part 2

Introduction to Information Theory. Part 2 Introduction to Information Theory Part 2 1 A General Communication System CHANNEL Information Source Transmitter Channel Receiver Destination 2 Information: Definition Information is quantified using

More information

Computational Structural Bioinformatics

Computational Structural Bioinformatics Computational Structural Bioinformatics ECS129 Instructor: Patrice Koehl http://koehllab.genomecenter.ucdavis.edu/teaching/ecs129 koehl@cs.ucdavis.edu Learning curve Math / CS Biology/ Chemistry Pre-requisite

More information

Some Concepts in Probability and Information Theory

Some Concepts in Probability and Information Theory PHYS 476Q: An Introduction to Entanglement Theory (Spring 2018) Eric Chitambar Some Concepts in Probability and Information Theory We begin this course with a condensed survey of basic concepts in probability

More information

Unit of Study: Genetics, Evolution and Classification

Unit of Study: Genetics, Evolution and Classification Biology 3 rd Nine Weeks TEKS Unit of Study: Genetics, Evolution and Classification B.1) Scientific Processes. The student, for at least 40% of instructional time, conducts laboratory and field investigations

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information