QB LECTURE #4: Motif Finding
|
|
- Edwina Alexander
- 5 years ago
- Views:
Transcription
1 QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015
2 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding
3 3 Transcription Initiation Chromatin Distal TFBS Co-activator complex Transcription initiation complex Transcription initiation CRM Proximal TFBS
4 4 Binding Sites a Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 G A C C A A A T A A G G C A G A C C A A A T A A G G C A T G A C T A T A A A A G G A T G A C T A T A A A A G G A T G C C A A A A G T G G T C C A A C T A T C T T G G G C C A A C T A T C T T G G G C C T C C T T A C A T G G G C Source binding sites b B R M C W A W H R W G G B M Consensus sequence
5 5 Probability Model for Motifs Let x =(x 1,...,x k ) be a sequence possibly representing a binding site of length k We represent the motif as a sequence of position-specific multinomial models, π =(π 1,A,π 1,C,π 1,G,π 1,T,π 2,A...,π k,t ) such that at position i π i,j The likelihood is: L(x π) = is the probability of base j k P (x i π i,. )= i=1 k i=1 π i,xi
6 6 Background Model Assume an iid multinomial background model,, so that As with alignment, classical theory says a good statistic for discrimination is: where θ =(θ A,θ C,θ G,θ T ) k L(x θ) = log L(x π) L(x θ) = log k = k i=1 i=1 i=1 s i,xi θ xi π i,xi θ xi = s i,a = log π i,a log θ a k log π i,xi log θ xi i=1
7 7 Weight Matrix a Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 b G A C C A A A T A A G G C A G A C C A A A T A A G G C A T G A C T A T A A A A G G A T G A C T A T A A A A G G A T G C C A A A A G T G G T C C A A C T A T C T T G G G C C A A C T A T C T T G G G C C T C C T T A C A T G G G C Source binding sites B R M C W A W H R W G G B M Consensus sequence c Position frequency matrix (PFM) A C G T {s i,a } d Position weight matrix (PWM) A C G T
8 8 Estimating the Model If we have several training examples, we can estimate the parameters in the usual way for multinomial models a Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 b Problem: sparse data π i,j G A C C A A A T A A G G C A G A C C A A A T A A G G C A T G A C T A T A A A A G G A T G A C T A T A A A A G G A T G C C A A A A G T G G T C C A A C T A T C T T G G G C C A A C T A T C T T G G G C C T C C T T A C A T G G G C Source binding sites B R M C W A W H R W G G B M c Position frequency matrix (PFM) Consensus sequence A C G T π 1,A =0 π 1,C = 3 8 π 1,G = 1 4. π 14,T =0
9 9 Example of Estimates with Pseudocounts a Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 b G A C C A A A T A A G G C A G A C C A A A T A A G G C A T G A C T A T A A A A G G A T G A C T A T A A A A G G A T G C C A A A A G T G G T C C A A C T A T C T T G G G C C A A C T A T C T T G G G C C T C C T T A C A T G G G C Source binding sites B R M C W A W H R W G G B M c Position frequency matrix (PFM) Consensus sequence A C G T π 1,A = 1 12 π 1,C = 4 12 π 1,G = π 14,T = 1 12
10 10 Prediction of Binding Sites We predict a binding site if and only if: S(x) = k i=1 s i,xi T where T is chosen to achieve the desired tradeoff between sensitivity and specificity Sensitivity is the fraction of true sites that are predicted (1 false negative rate) Specificity is the fraction of false sites that are not predicted (1 false positive rate)
11 11 background (null) binding sites (alternative) false negatives T S(x) false positives
12 12 Sorting Out Terms Prediction Outcome True Condition False Pos TP FP (PPV) Neg FN TN (NPV) Sens = TP / (TP+FN) Spec = TN / (FP+TN) FP rate = type I error = α = FP/(FP+TN) = 1 Spec FN rate = type II error = β = FN/(TP+FN) = 1 Sens β α Power = 1 (for bounded ) A p-value is an estimate of α for a given observation
13 13 How to Choose T? If known positive and negative examples, can estimate sensitivity and specificity directly and adjust accordingly Can control false positive rate only using a reasonable proxy for background (sometimes permuted data) Can generate synthetic data from the background model and use it to simulate from the null distribution of S(x) Can compute the exact null distribution by dynamic programming in some cases
14 14 Computing p-values Similar methods can be used to compute p-values for predicted motifs First characterize null distribution of log-odds scores, f(s(x)), empirically or analytically Now assign a p-value to a prediction by computing p = y S(x) f(y) Must be corrected for multiple testing
15 15 Improving the Background Model Bases are not independent: CpGs, polyas, simple sequence repeats, transposons, etc. In some cases, nonindependence will inflate false positive rates A better background model is needed Typically, higher order Markov models are used
16 16 Markov Models We are interested in the joint distribution of X1,...,Xk and for convenience have assumed: P (X 1,...,X k )=P (X 1 ) P (X k ) X 1 X 2 It may be slightly less egregious to assume: P (X 1,...,X k )=P (X 1 )P (X 2 X 1 )P (X 3 X 2 ) P (X k X k 1 ) X 1 X 2 X k X k This is a 1st-order Markov model. In an N th order model each Xi depends on Xi N,...,Xi 1
17 17 Markov Scores Now the background model is θ =(θ A A,θ A C,θ A T,...,θ T G,θ T T ) where θ x1 x 0 is specially defined to denote the marginal probability of x1 The log odds scores are: where L(x θ) = log L(x π) L(x θ) = = k log π i,xi log θ xi x i 1 i=1 k i=1 k i=1 s i,xi x i 1 θ xi x i 1 s i,a b = log π i,a log θ a b
18 18 Effect of Better Background Model background (null) binding sites (alternative) background (null) binding sites (alternative) T S(x) T S(x) false negatives false positives false negatives false positives First Model Better Model
19 19 An Aside on Information Theory Invented by Claude Shannon in the late 1940s, at the dawn of the digital age Motivated by problems in information transmission, especially data compression Has deep connections with probability theory, computer science, statistical mechanics, gambling and investment, etc. You benefit from it every time you gzip a file or look at a JPEG image!
20 20 Entropy The entropy of a (discrete) rv X is: H(X) = x = E p(x) log p(x) [ log 1 ] p(x) Interpretations of H(X): - Min. ave. length of binary encoding of X - Ave. information gained by observing X - Min. ave. number of yes/no questions to find out X - Min. ave. number of fair coins required to generate X
21 21 Encoding Example Suppose we want to encode n coin tosses as a binary sequence. If the coin is fair, we can do no better than to use a bit for each coin toss, e.g., for TTHTHHHT. It will always take n bits to encode the sequence. Suppose, however, that the coin has weight θ = 0.2. Can we do better? It turns out we can (for large enough n), by encoding subsequences and giving shorter codes to more probable subsequences.
22 22 Encoding Example, cont. X P(X) Code TTT Expected length: = TTH THT HTT THH HTH HHT Therefore, 2.184/3 = bits/coin are needed For the naive code: 1 Entropy: HHH
23 23 Entropy for Bernoulli rv with Parameter p H(X) is always concave and nonnegative
24 24 Perfect Code Suppose X has pdf: p(x) = An optimal binary encoding is: 1 2 x = a 1 4 x = b 1 8 x = c 1 8 x = d 0 a 10 b 110 c 111 d Expected length = H(X) = 1.75 bits Naive encoding: 2 bits
25 25 Entropy and Information Before an event X, your uncertainty about it is measured by H(X) Therefore, when you observe X, your ave. gain in information is measured by H(X) However, you may not observe X directly; after observing a noisy message Y, there may still be uncertainty about X We can measure the (ave.) information content of Y as Hbefore(X) Hafter(X)
26 26 Relative Entropy The relative entropy of pdf p wrt pdf q is: D(p q) = x p(x) log p(x) q(x) It represents the average additional bits needed to encode X if it comes from p but the code was optimized for q D(p q) =H pq (X) H pp (X) = x p(x) log q(x)+ x p(x) log p(x) = x p(x) log p(x) q(x) Useful as a measure of divergence between distributions
27 27 Mutual Information The mutual information in rv s X and Y is: I(X; Y )= x p(x, y) log y p(x, y) p(x)p(y) I(X;Y) is the relative entropy of P(X,Y) wrt P(X)P(Y). It represents the reduction in uncertainty about X due to knowledge of Y. Mutual information can be thought of as a test statistic for independence (connected with χ 2 test, G test)
28 28 Likelihood Connection Suppose n iid random variables X 1,..., Xn. What is the expected log likelihood? n p(x) log p(x) = i=1 x Similarly, what is the expected log-odds score of model 1 wrt model 2, if the variables are drawn from model 1? n i=1 x If drawn from model 2? n i=1 x n H(X) = nh(x) i=1 p 1 (x) log p 1(x) p 2 (x) = nd(p 1 p 2 ) p 2 (x) log p 1(x) p 2 (x) = nd(p 2 p 1 )
29 29 Motif Information Content The entropy of the distribution for each position determines the information content of that position: IC i =2 H(X i ) Can be considered ave. reduction in uncertainty wrt random DNA. Also, relative entropy wrt random DNA: p(x = b) log b p(x = b) 1/4 = b p(x = b) log p(x = b) b p(x = b) log 1 4 =2 H(X) Also related to the binding energy and the evolutionary constraint Visualized in widely used sequence logos
30 a Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 b G A C C A A A T A A G G C A G A C C A A A T A A G G C A T G A C T A T A A A A G G A T G A C T A T A A A A G G A T G C C A A A A G T G G T C C A A C T A T C T T G G G C C A A C T A T C T T G G G C C T C C T T A C A T G G G C Source binding sites B R M C W A W H R W G G B M Consensus sequence 30 c Position frequency matrix (PFM) A C G T d Position weight matrix (PWM) A C G T e Site scoring T T A C A T A A G T A G T C Σ = 5.23, 78% of maximum f Bits Position
31 31 Motif Discovery Consider the problem of estimating a motif model from N sequences, each believed to have a binding site for some TF As before, we assume a motif model of width k with a multinomial distribution at each position l. We assume an iid multinomial background model. θ bg The goal in this case is to learn the parameters of the motif model The location of the binding site in each sequence i, denoted zi, is a latent variable θ l
32 32 Illustration initialize sample or average θ 1 θ 2... θ k sample or average
33 33 EM vs. Gibbs Sampling In EM, we average over potential positions In Gibbs sampling, we sample positions In EM, you estimate parameters that maximize the likelihood (locally) In Gibbs, you sample both binding sites and parameters, allowing for uncertainty in both
34 That s All 34
Mathematics. ( : Focus on free Education) (Chapter 16) (Probability) (Class XI) Exercise 16.2
( : Focus on free Education) Exercise 16.2 Question 1: A die is rolled. Let E be the event die shows 4 and F be the event die shows even number. Are E and F mutually exclusive? Answer 1: When a die is
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Introduction to Probability and Statistics Lecture 2: Conditional probability Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/ psarkar/teaching
More informationMachine Learning Lecture Notes
Machine Learning Lecture Notes Predrag Radivojac January 3, 25 Random Variables Until now we operated on relatively simple sample spaces and produced measure functions over sets of outcomes. In many situations,
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationQuantitative Biology Lecture 3
23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationCOMPSCI 650 Applied Information Theory Jan 21, Lecture 2
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.
More informationIntroduction to Information Theory
Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationChapter 3: Random Variables 1
Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More informationHomework Set #2 Data Compression, Huffman code and AEP
Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code
More informationIntro to Information Theory
Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here
More informationOutline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks
Outline 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Likelihood A common and fruitful approach to statistics is to assume
More informationComputing and Communications 2. Information Theory -Entropy
1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More informationMODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES DISTRIBUTION FUNCTION AND ITS PROPERTIES
MODULE 2 RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES 7-11 Topics 2.1 RANDOM VARIABLE 2.2 INDUCED PROBABILITY MEASURE 2.3 DISTRIBUTION FUNCTION AND ITS PROPERTIES 2.4 TYPES OF RANDOM VARIABLES: DISCRETE,
More informationMore on Distribution Function
More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function F X. Theorem: Let X be any random variable, with cumulative distribution
More informationThe Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan
The Communication Complexity of Correlation Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan Transmitting Correlated Variables (X, Y) pair of correlated random variables Transmitting
More informationWhat is Probability? Probability. Sample Spaces and Events. Simple Event
What is Probability? Probability Peter Lo Probability is the numerical measure of likelihood that the event will occur. Simple Event Joint Event Compound Event Lies between 0 & 1 Sum of events is 1 1.5
More informationChapter 3: Random Variables 1
Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.
More informationNotation: X = random variable; x = particular value; P(X = x) denotes probability that X equals the value x.
Ch. 16 Random Variables Def n: A random variable is a numerical measurement of the outcome of a random phenomenon. A discrete random variable is a random variable that assumes separate values. # of people
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More information(Classical) Information Theory II: Source coding
(Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable
More informationLecture 5: Asymptotic Equipartition Property
Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationConditional Probability
Conditional Probability Idea have performed a chance experiment but don t know the outcome (ω), but have some partial information (event A) about ω. Question: given this partial information what s the
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationInformation. = more information was provided by the outcome in #2
Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual
More informationA Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University
A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information
More informationLecture 1 : The Mathematical Theory of Probability
Lecture 1 : The Mathematical Theory of Probability 0/ 30 1. Introduction Today we will do 2.1 and 2.2. We will skip Chapter 1. We all have an intuitive notion of probability. Let s see. What is the probability
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationInformation in Biology
Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve
More informationComplex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity
Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary
More informationEntropies & Information Theory
Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information
More informationWhy should you care?? Intellectual curiosity. Gambling. Mathematically the same as the ESP decision problem we discussed in Week 4.
I. Probability basics (Sections 4.1 and 4.2) Flip a fair (probability of HEADS is 1/2) coin ten times. What is the probability of getting exactly 5 HEADS? What is the probability of getting exactly 10
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More information27 Binary Arithmetic: An Application to Programming
27 Binary Arithmetic: An Application to Programming In the previous section we looked at the binomial distribution. The binomial distribution is essentially the mathematics of repeatedly flipping a coin
More informationSolutions to Set #2 Data Compression, Huffman code and AEP
Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More informationLecture 3: Random variables, distributions, and transformations
Lecture 3: Random variables, distributions, and transformations Definition 1.4.1. A random variable X is a function from S into a subset of R such that for any Borel set B R {X B} = {ω S : X(ω) B} is an
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationNoisy channel communication
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University
More informationLecture 11: Information theory THURSDAY, FEBRUARY 21, 2019
Lecture 11: Information theory DANIEL WELLER THURSDAY, FEBRUARY 21, 2019 Agenda Information and probability Entropy and coding Mutual information and capacity Both images contain the same fraction of black
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationLecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability
Lecture Notes 1 Basic Probability Set Theory Elements of Probability Conditional probability Sequential Calculation of Probability Total Probability and Bayes Rule Independence Counting EE 178/278A: Basic
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More informationthe Information Bottleneck
the Information Bottleneck Daniel Moyer December 10, 2017 Imaging Genetics Center/Information Science Institute University of Southern California Sorry, no Neuroimaging! (at least not presented) 0 Instead,
More informationProbability. VCE Maths Methods - Unit 2 - Probability
Probability Probability Tree diagrams La ice diagrams Venn diagrams Karnough maps Probability tables Union & intersection rules Conditional probability Markov chains 1 Probability Probability is the mathematics
More informationChapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for
More informationOutline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationPERMUTATIONS, COMBINATIONS AND DISCRETE PROBABILITY
Friends, we continue the discussion with fundamentals of discrete probability in the second session of third chapter of our course in Discrete Mathematics. The conditional probability and Baye s theorem
More informationLearning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1
Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and
More informationIntroduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2
Introduction to Information Theory B. Škorić, Physical Aspects of Digital Security, Chapter 2 1 Information theory What is it? - formal way of counting information bits Why do we need it? - often used
More informationInformation in Biology
Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationJoint Distribution of Two or More Random Variables
Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few
More informationNeyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?
Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test
More informationWhat is a random variable
OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationLecture 6 I. CHANNEL CODING. X n (m) P Y X
6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder
More informationIntroduction Probability. Math 141. Introduction to Probability and Statistics. Albyn Jones
Math 141 to and Statistics Albyn Jones Mathematics Department Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 September 3, 2014 Motivation How likely is an eruption at Mount Rainier in
More informationProbability Pearson Education, Inc. Slide
Probability The study of probability is concerned with random phenomena. Even though we cannot be certain whether a given result will occur, we often can obtain a good measure of its likelihood, or probability.
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More informationInformation Theory, Statistics, and Decision Trees
Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010
More informationELEC546 Review of Information Theory
ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More informationCh. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited
Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance
More informationEE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.
EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of
More informationCLASS 6 July 16, 2015 STT
CLASS 6 July 6, 05 STT-35-04 Plan for today: Preparation for Quiz : Probability of the union. Conditional Probability, Formula of total probability, ayes Rule. Independence: Simple problems (solvable by
More informationProbability (10A) Young Won Lim 6/12/17
Probability (10A) Copyright (c) 2017 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later
More informationInformation Theory in Intelligent Decision Making
Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory
More informationExpected Value 7/7/2006
Expected Value 7/7/2006 Definition Let X be a numerically-valued discrete random variable with sample space Ω and distribution function m(x). The expected value E(X) is defined by E(X) = x Ω x m(x), provided
More informationCapacity of a channel Shannon s second theorem. Information Theory 1/33
Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,
More informationChapter 2: Source coding
Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent
More informationU Logo Use Guidelines
Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity
More informationSec$on Summary. Assigning Probabilities Probabilities of Complements and Unions of Events Conditional Probability
Section 7.2 Sec$on Summary Assigning Probabilities Probabilities of Complements and Unions of Events Conditional Probability Independence Bernoulli Trials and the Binomial Distribution Random Variables
More informationCommunication Theory and Engineering
Communication Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 018-019 Information theory Practice work 3 Review For any probability distribution, we define
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationThe binary entropy function
ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but
More informationEvents A and B are said to be independent if the occurrence of A does not affect the probability of B.
Independent Events Events A and B are said to be independent if the occurrence of A does not affect the probability of B. Probability experiment of flipping a coin and rolling a dice. Sample Space: {(H,
More informationQuantitative Bioinformatics
Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize
More information