QB LECTURE #4: Motif Finding

Size: px
Start display at page:

Download "QB LECTURE #4: Motif Finding"

Transcription

1 QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015

2 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding

3 3 Transcription Initiation Chromatin Distal TFBS Co-activator complex Transcription initiation complex Transcription initiation CRM Proximal TFBS

4 4 Binding Sites a Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 G A C C A A A T A A G G C A G A C C A A A T A A G G C A T G A C T A T A A A A G G A T G A C T A T A A A A G G A T G C C A A A A G T G G T C C A A C T A T C T T G G G C C A A C T A T C T T G G G C C T C C T T A C A T G G G C Source binding sites b B R M C W A W H R W G G B M Consensus sequence

5 5 Probability Model for Motifs Let x =(x 1,...,x k ) be a sequence possibly representing a binding site of length k We represent the motif as a sequence of position-specific multinomial models, π =(π 1,A,π 1,C,π 1,G,π 1,T,π 2,A...,π k,t ) such that at position i π i,j The likelihood is: L(x π) = is the probability of base j k P (x i π i,. )= i=1 k i=1 π i,xi

6 6 Background Model Assume an iid multinomial background model,, so that As with alignment, classical theory says a good statistic for discrimination is: where θ =(θ A,θ C,θ G,θ T ) k L(x θ) = log L(x π) L(x θ) = log k = k i=1 i=1 i=1 s i,xi θ xi π i,xi θ xi = s i,a = log π i,a log θ a k log π i,xi log θ xi i=1

7 7 Weight Matrix a Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 b G A C C A A A T A A G G C A G A C C A A A T A A G G C A T G A C T A T A A A A G G A T G A C T A T A A A A G G A T G C C A A A A G T G G T C C A A C T A T C T T G G G C C A A C T A T C T T G G G C C T C C T T A C A T G G G C Source binding sites B R M C W A W H R W G G B M Consensus sequence c Position frequency matrix (PFM) A C G T {s i,a } d Position weight matrix (PWM) A C G T

8 8 Estimating the Model If we have several training examples, we can estimate the parameters in the usual way for multinomial models a Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 b Problem: sparse data π i,j G A C C A A A T A A G G C A G A C C A A A T A A G G C A T G A C T A T A A A A G G A T G A C T A T A A A A G G A T G C C A A A A G T G G T C C A A C T A T C T T G G G C C A A C T A T C T T G G G C C T C C T T A C A T G G G C Source binding sites B R M C W A W H R W G G B M c Position frequency matrix (PFM) Consensus sequence A C G T π 1,A =0 π 1,C = 3 8 π 1,G = 1 4. π 14,T =0

9 9 Example of Estimates with Pseudocounts a Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 b G A C C A A A T A A G G C A G A C C A A A T A A G G C A T G A C T A T A A A A G G A T G A C T A T A A A A G G A T G C C A A A A G T G G T C C A A C T A T C T T G G G C C A A C T A T C T T G G G C C T C C T T A C A T G G G C Source binding sites B R M C W A W H R W G G B M c Position frequency matrix (PFM) Consensus sequence A C G T π 1,A = 1 12 π 1,C = 4 12 π 1,G = π 14,T = 1 12

10 10 Prediction of Binding Sites We predict a binding site if and only if: S(x) = k i=1 s i,xi T where T is chosen to achieve the desired tradeoff between sensitivity and specificity Sensitivity is the fraction of true sites that are predicted (1 false negative rate) Specificity is the fraction of false sites that are not predicted (1 false positive rate)

11 11 background (null) binding sites (alternative) false negatives T S(x) false positives

12 12 Sorting Out Terms Prediction Outcome True Condition False Pos TP FP (PPV) Neg FN TN (NPV) Sens = TP / (TP+FN) Spec = TN / (FP+TN) FP rate = type I error = α = FP/(FP+TN) = 1 Spec FN rate = type II error = β = FN/(TP+FN) = 1 Sens β α Power = 1 (for bounded ) A p-value is an estimate of α for a given observation

13 13 How to Choose T? If known positive and negative examples, can estimate sensitivity and specificity directly and adjust accordingly Can control false positive rate only using a reasonable proxy for background (sometimes permuted data) Can generate synthetic data from the background model and use it to simulate from the null distribution of S(x) Can compute the exact null distribution by dynamic programming in some cases

14 14 Computing p-values Similar methods can be used to compute p-values for predicted motifs First characterize null distribution of log-odds scores, f(s(x)), empirically or analytically Now assign a p-value to a prediction by computing p = y S(x) f(y) Must be corrected for multiple testing

15 15 Improving the Background Model Bases are not independent: CpGs, polyas, simple sequence repeats, transposons, etc. In some cases, nonindependence will inflate false positive rates A better background model is needed Typically, higher order Markov models are used

16 16 Markov Models We are interested in the joint distribution of X1,...,Xk and for convenience have assumed: P (X 1,...,X k )=P (X 1 ) P (X k ) X 1 X 2 It may be slightly less egregious to assume: P (X 1,...,X k )=P (X 1 )P (X 2 X 1 )P (X 3 X 2 ) P (X k X k 1 ) X 1 X 2 X k X k This is a 1st-order Markov model. In an N th order model each Xi depends on Xi N,...,Xi 1

17 17 Markov Scores Now the background model is θ =(θ A A,θ A C,θ A T,...,θ T G,θ T T ) where θ x1 x 0 is specially defined to denote the marginal probability of x1 The log odds scores are: where L(x θ) = log L(x π) L(x θ) = = k log π i,xi log θ xi x i 1 i=1 k i=1 k i=1 s i,xi x i 1 θ xi x i 1 s i,a b = log π i,a log θ a b

18 18 Effect of Better Background Model background (null) binding sites (alternative) background (null) binding sites (alternative) T S(x) T S(x) false negatives false positives false negatives false positives First Model Better Model

19 19 An Aside on Information Theory Invented by Claude Shannon in the late 1940s, at the dawn of the digital age Motivated by problems in information transmission, especially data compression Has deep connections with probability theory, computer science, statistical mechanics, gambling and investment, etc. You benefit from it every time you gzip a file or look at a JPEG image!

20 20 Entropy The entropy of a (discrete) rv X is: H(X) = x = E p(x) log p(x) [ log 1 ] p(x) Interpretations of H(X): - Min. ave. length of binary encoding of X - Ave. information gained by observing X - Min. ave. number of yes/no questions to find out X - Min. ave. number of fair coins required to generate X

21 21 Encoding Example Suppose we want to encode n coin tosses as a binary sequence. If the coin is fair, we can do no better than to use a bit for each coin toss, e.g., for TTHTHHHT. It will always take n bits to encode the sequence. Suppose, however, that the coin has weight θ = 0.2. Can we do better? It turns out we can (for large enough n), by encoding subsequences and giving shorter codes to more probable subsequences.

22 22 Encoding Example, cont. X P(X) Code TTT Expected length: = TTH THT HTT THH HTH HHT Therefore, 2.184/3 = bits/coin are needed For the naive code: 1 Entropy: HHH

23 23 Entropy for Bernoulli rv with Parameter p H(X) is always concave and nonnegative

24 24 Perfect Code Suppose X has pdf: p(x) = An optimal binary encoding is: 1 2 x = a 1 4 x = b 1 8 x = c 1 8 x = d 0 a 10 b 110 c 111 d Expected length = H(X) = 1.75 bits Naive encoding: 2 bits

25 25 Entropy and Information Before an event X, your uncertainty about it is measured by H(X) Therefore, when you observe X, your ave. gain in information is measured by H(X) However, you may not observe X directly; after observing a noisy message Y, there may still be uncertainty about X We can measure the (ave.) information content of Y as Hbefore(X) Hafter(X)

26 26 Relative Entropy The relative entropy of pdf p wrt pdf q is: D(p q) = x p(x) log p(x) q(x) It represents the average additional bits needed to encode X if it comes from p but the code was optimized for q D(p q) =H pq (X) H pp (X) = x p(x) log q(x)+ x p(x) log p(x) = x p(x) log p(x) q(x) Useful as a measure of divergence between distributions

27 27 Mutual Information The mutual information in rv s X and Y is: I(X; Y )= x p(x, y) log y p(x, y) p(x)p(y) I(X;Y) is the relative entropy of P(X,Y) wrt P(X)P(Y). It represents the reduction in uncertainty about X due to knowledge of Y. Mutual information can be thought of as a test statistic for independence (connected with χ 2 test, G test)

28 28 Likelihood Connection Suppose n iid random variables X 1,..., Xn. What is the expected log likelihood? n p(x) log p(x) = i=1 x Similarly, what is the expected log-odds score of model 1 wrt model 2, if the variables are drawn from model 1? n i=1 x If drawn from model 2? n i=1 x n H(X) = nh(x) i=1 p 1 (x) log p 1(x) p 2 (x) = nd(p 1 p 2 ) p 2 (x) log p 1(x) p 2 (x) = nd(p 2 p 1 )

29 29 Motif Information Content The entropy of the distribution for each position determines the information content of that position: IC i =2 H(X i ) Can be considered ave. reduction in uncertainty wrt random DNA. Also, relative entropy wrt random DNA: p(x = b) log b p(x = b) 1/4 = b p(x = b) log p(x = b) b p(x = b) log 1 4 =2 H(X) Also related to the binding energy and the evolutionary constraint Visualized in widely used sequence logos

30 a Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 b G A C C A A A T A A G G C A G A C C A A A T A A G G C A T G A C T A T A A A A G G A T G A C T A T A A A A G G A T G C C A A A A G T G G T C C A A C T A T C T T G G G C C A A C T A T C T T G G G C C T C C T T A C A T G G G C Source binding sites B R M C W A W H R W G G B M Consensus sequence 30 c Position frequency matrix (PFM) A C G T d Position weight matrix (PWM) A C G T e Site scoring T T A C A T A A G T A G T C Σ = 5.23, 78% of maximum f Bits Position

31 31 Motif Discovery Consider the problem of estimating a motif model from N sequences, each believed to have a binding site for some TF As before, we assume a motif model of width k with a multinomial distribution at each position l. We assume an iid multinomial background model. θ bg The goal in this case is to learn the parameters of the motif model The location of the binding site in each sequence i, denoted zi, is a latent variable θ l

32 32 Illustration initialize sample or average θ 1 θ 2... θ k sample or average

33 33 EM vs. Gibbs Sampling In EM, we average over potential positions In Gibbs sampling, we sample positions In EM, you estimate parameters that maximize the likelihood (locally) In Gibbs, you sample both binding sites and parameters, allowing for uncertainty in both

34 That s All 34

Mathematics. ( : Focus on free Education) (Chapter 16) (Probability) (Class XI) Exercise 16.2

Mathematics. (  : Focus on free Education) (Chapter 16) (Probability) (Class XI) Exercise 16.2 ( : Focus on free Education) Exercise 16.2 Question 1: A die is rolled. Let E be the event die shows 4 and F be the event die shows even number. Are E and F mutually exclusive? Answer 1: When a die is

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 2: Conditional probability Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/ psarkar/teaching

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 3, 25 Random Variables Until now we operated on relatively simple sample spaces and produced measure functions over sets of outcomes. In many situations,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Quantitative Biology Lecture 3

Quantitative Biology Lecture 3 23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Introduction to Information Theory

Introduction to Information Theory Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Chapter 3: Random Variables 1

Chapter 3: Random Variables 1 Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Intro to Information Theory

Intro to Information Theory Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here

More information

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Outline 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Likelihood A common and fruitful approach to statistics is to assume

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

MODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES DISTRIBUTION FUNCTION AND ITS PROPERTIES

MODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES DISTRIBUTION FUNCTION AND ITS PROPERTIES MODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES 7-11 Topics 2.1 RANDOM VARIABLE 2.2 INDUCED PROBABILITY MEASURE 2.3 DISTRIBUTION FUNCTION AND ITS PROPERTIES 2.4 TYPES OF RANDOM VARIABLES: DISCRETE,

More information

More on Distribution Function

More on Distribution Function More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function F X. Theorem: Let X be any random variable, with cumulative distribution

More information

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan The Communication Complexity of Correlation Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan Transmitting Correlated Variables (X, Y) pair of correlated random variables Transmitting

More information

What is Probability? Probability. Sample Spaces and Events. Simple Event

What is Probability? Probability. Sample Spaces and Events. Simple Event What is Probability? Probability Peter Lo Probability is the numerical measure of likelihood that the event will occur. Simple Event Joint Event Compound Event Lies between 0 & 1 Sum of events is 1 1.5

More information

Chapter 3: Random Variables 1

Chapter 3: Random Variables 1 Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

Notation: X = random variable; x = particular value; P(X = x) denotes probability that X equals the value x.

Notation: X = random variable; x = particular value; P(X = x) denotes probability that X equals the value x. Ch. 16 Random Variables Def n: A random variable is a numerical measurement of the outcome of a random phenomenon. A discrete random variable is a random variable that assumes separate values. # of people

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

Lecture 5: Asymptotic Equipartition Property

Lecture 5: Asymptotic Equipartition Property Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Conditional Probability

Conditional Probability Conditional Probability Idea have performed a chance experiment but don t know the outcome (ω), but have some partial information (event A) about ω. Question: given this partial information what s the

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Information. = more information was provided by the outcome in #2

Information. = more information was provided by the outcome in #2 Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual

More information

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information

More information

Lecture 1 : The Mathematical Theory of Probability

Lecture 1 : The Mathematical Theory of Probability Lecture 1 : The Mathematical Theory of Probability 0/ 30 1. Introduction Today we will do 2.1 and 2.2. We will skip Chapter 1. We all have an intuitive notion of probability. Let s see. What is the probability

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.

Why should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4. I. Probability basics (Sections 4.1 and 4.2) Flip a fair (probability of HEADS is 1/2) coin ten times. What is the probability of getting exactly 5 HEADS? What is the probability of getting exactly 10

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

27 Binary Arithmetic: An Application to Programming

27 Binary Arithmetic: An Application to Programming 27 Binary Arithmetic: An Application to Programming In the previous section we looked at the binomial distribution. The binomial distribution is essentially the mathematics of repeatedly flipping a coin

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Lecture 3: Random variables, distributions, and transformations

Lecture 3: Random variables, distributions, and transformations Lecture 3: Random variables, distributions, and transformations Definition 1.4.1. A random variable X is a function from S into a subset of R such that for any Borel set B R {X B} = {ω S : X(ω) B} is an

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019 Lecture 11: Information theory DANIEL WELLER THURSDAY, FEBRUARY 21, 2019 Agenda Information and probability Entropy and coding Mutual information and capacity Both images contain the same fraction of black

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability Lecture Notes 1 Basic Probability Set Theory Elements of Probability Conditional probability Sequential Calculation of Probability Total Probability and Bayes Rule Independence Counting EE 178/278A: Basic

More information

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical

More information

the Information Bottleneck

the Information Bottleneck the Information Bottleneck Daniel Moyer December 10, 2017 Imaging Genetics Center/Information Science Institute University of Southern California Sorry, no Neuroimaging! (at least not presented) 0 Instead,

More information

Probability. VCE Maths Methods - Unit 2 - Probability

Probability. VCE Maths Methods - Unit 2 - Probability Probability Probability Tree diagrams La ice diagrams Venn diagrams Karnough maps Probability tables Union & intersection rules Conditional probability Markov chains 1 Probability Probability is the mathematics

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

PERMUTATIONS, COMBINATIONS AND DISCRETE PROBABILITY

PERMUTATIONS, COMBINATIONS AND DISCRETE PROBABILITY Friends, we continue the discussion with fundamentals of discrete probability in the second session of third chapter of our course in Discrete Mathematics. The conditional probability and Baye s theorem

More information

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1 Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and

More information

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2 Introduction to Information Theory B. Škorić, Physical Aspects of Digital Security, Chapter 2 1 Information theory What is it? - formal way of counting information bits Why do we need it? - often used

More information

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Sequences and Information

Sequences and Information Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols

More information

Joint Distribution of Two or More Random Variables

Joint Distribution of Two or More Random Variables Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few

More information

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM? Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test

More information

What is a random variable

What is a random variable OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Introduction Probability. Math 141. Introduction to Probability and Statistics. Albyn Jones

Introduction Probability. Math 141. Introduction to Probability and Statistics. Albyn Jones Math 141 to and Statistics Albyn Jones Mathematics Department Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 September 3, 2014 Motivation How likely is an eruption at Mount Rainier in

More information

Probability Pearson Education, Inc. Slide

Probability Pearson Education, Inc. Slide Probability The study of probability is concerned with random phenomena. Even though we cannot be certain whether a given result will occur, we often can obtain a good measure of its likelihood, or probability.

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

CLASS 6 July 16, 2015 STT

CLASS 6 July 16, 2015 STT CLASS 6 July 6, 05 STT-35-04 Plan for today: Preparation for Quiz : Probability of the union. Conditional Probability, Formula of total probability, ayes Rule. Independence: Simple problems (solvable by

More information

Probability (10A) Young Won Lim 6/12/17

Probability (10A) Young Won Lim 6/12/17 Probability (10A) Copyright (c) 2017 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Expected Value 7/7/2006

Expected Value 7/7/2006 Expected Value 7/7/2006 Definition Let X be a numerically-valued discrete random variable with sample space Ω and distribution function m(x). The expected value E(X) is defined by E(X) = x Ω x m(x), provided

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Sec$on Summary. Assigning Probabilities Probabilities of Complements and Unions of Events Conditional Probability

Sec$on Summary. Assigning Probabilities Probabilities of Complements and Unions of Events Conditional Probability Section 7.2 Sec$on Summary Assigning Probabilities Probabilities of Complements and Unions of Events Conditional Probability Independence Bernoulli Trials and the Binomial Distribution Random Variables

More information

Communication Theory and Engineering

Communication Theory and Engineering Communication Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 018-019 Information theory Practice work 3 Review For any probability distribution, we define

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

Events A and B are said to be independent if the occurrence of A does not affect the probability of B.

Events A and B are said to be independent if the occurrence of A does not affect the probability of B. Independent Events Events A and B are said to be independent if the occurrence of A does not affect the probability of B. Probability experiment of flipping a coin and rolling a dice. Sample Space: {(H,

More information

Quantitative Bioinformatics

Quantitative Bioinformatics Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize

More information