Finding a Needle in a Haystack: Conditions for Reliable. Detection in the Presence of Clutter

Size: px
Start display at page:

Download "Finding a Needle in a Haystack: Conditions for Reliable. Detection in the Presence of Clutter"

Transcription

1 Finding a eedle in a Haystack: Conditions for Reliable Detection in the Presence of Clutter Bruno Jedynak and Damianos Karakos October 23, 2006 Abstract We study conditions for the detection of an -length iid sequence with unknown pmf p, among M -length iid sequences with unknown pmf p 0. We show how the quantity M2 Dp p 0 determines the asymptotic probability of error. Keywords: reliable detection, probability of error, Sanov s theorem, Kulback- Leibler distance, phase transition. Introduction Our motivation for this paper has its origins in Geman et. al. 996, where an algorithm for tracking roads in satellite images was experimentally studied. Below a certain clutter level, the algorithm could track a road acurately, and suddenly, with increased clutter level, tracking would become impossible. This phenomenon was studied theoretically in Yuille et. al.2000 and 200. Using a simplified statistical model, the authors show that, in an This research was partially supported by ARO DAAD9/ and general funds from the Center for Imaging Science at The Johns Hopkins University. USTL and Department of Applied Mathematics, Johns Hopkins University Mail should be adressed to: Bruno Jedynak, Clark 302b, Johns Hopkins University, Baltimore, MD, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD,

2 apropriate asymptotic setting, the number of false detections is subject to a phase transition. Our objective in this paper is to generalize these results. First, we demonstrate, in the same setting, that the phase transition phenomenon occurs for the error rate of the maximum likelihood estimator. Second, we consider the situation where the underlying statistical model is unknown; i.e., there is a special object among many others but it is not known how it is special it is an outlier, in some sense. We show that the same phase transition phenomenon occurs in this case as well. Moreover, we propose a target detector that has the same asymptotic performance as the maximum likelihood estmator, had the model been known. Simulations illustrate these results. Let X = X X 2... X X2 X X2... X M+ X 2 M+... X M+ be a M + matrix made of independent random variables rvs taking values in a finite set. We denote by X m = Xm,..., Xm X the rvs in line m and by X m the ones that are not in line m. There is a special line, the target, with index t. All the other lines will be called distractors. The rvs X t are identically distributed with point mass function pmf p. The other ones, X t, are identically distributed with pmf p 0 p. The goal is to estimate t, the target, from a single realization of X. If p 0 and p are close, the target does not differ much from the distractors, a situation akin to finding a needle in a haystack. 2

3 2 Known distributions Let x be a realization of X. Then, the log-likelihood of x is lx = = log p x n t + n= n= M+ log p x n M+ t p 0 x n t + m=,m t n= m= n= log p 0 x n m 2 log p 0 x n m 3 The maximum likelihood estimator mle for t is then ˆtx = arg max m M+ n= log p x n m p 0 x n m 4 We call the reward of line m the quantity n= log p x n m p 0 x n m 5 The mle entails choosing the line with the largest reward. The quantity of interest is the probability that the mle differs from the target: em, = IP ˆtX t 6 which is the probability that a distractor gets a reward which is greater than the reward of the target. If M is fixed, letting, and using the law of large numbers, we obtain n= n= log p x n t p 0 x n t log p x n m p 0 x n m Dp, p 0 and 7 Dp 0, p for every m t 8 Logarithms are base 2 throughout the paper. 3

4 almost surely, where Dp, q = x px log px qx 9 is the Kulback-Leibler distance between p and q. Hence, as long p q, Dp, q > 0, and the reward of the target converges to a positive value while the reward of each distractor converges to a negative value which allows us to show that the error of the mle goes to zero. One can even bound em, from above for any fixed M and as follows Theorem 2 em, M p0 xp x. 0 x ote that p0 xp x = Hellingerp 0, p, 0 x where Hellingerp 0, p is the Hellinger distance between p 0 and p. The proof, using classical large deviations techniques, is in Section 6. ote that if the right-hand side of 0 goes to 0 as M and, the probability that the mle differs from the target goes to 0. This condition, however, is not necessary. As we show below, there is a maximum rate at which M can go to infinity in order for the probability of error to go to zero if M increases faster, then the probability of error goes to one. A similar result, i.e., that the number of distractors for which the reward is larger than the reward of the target follows a phase transition, was also shown by Yuille et. al We present below the same analysis for the convergence of the mle. The phase transition, or in other words, the dependence of the probability of error on the rate at which M goes to infinity, is expressed in the following theorem: Theorem 2 If ε > 0, such that lim M, M2 Dp,p0 ε = 0 then lim em, = 0, 2 M, 4

5 and If ε > 0, such that lim M, M2 Dp,p0+ε = + then lim em, =. 3 M, The intuition is as follows. First, as goes to infinity, if M remains fixed, the probability of error goes to zero exponentially fast, following a large deviation phenomenon since the reward of the target line converges to a positive value, while the reward of the distractors converges to a negative value as was mentioned earlier. On the other hand, as the number M of distractors increases, when remains fixed, the probability that there exists a distractor with a reward larger than the reward of the target increases as well. These are two competing phenomena, whose interaction gives rise to the critical rate Dp, p 0. The detailed proof appears in Section 6. ote: In order for the limits of functions of M, to be well-defined as M,, we assume that M is, in general, a function of. Hence, all limits lim M, should be interpreted as lim, with the proviso that M is increasing according to some function of. We kept the notation lim M, for simplicity. 3 Unknown Distributions We now look at the case where p 0 and p are unknown. It is clear that the error rate of any estimator in this context cannot be lower than the error rate of the mle with known p 0 and p. Hence, 3 holds even when em, is the error rate of any estimator. Can one build an estimator of the target for which the error rate will satisfy 2? The answer is yes as we shall see now. A simple way of building an estimator of the target when p 0 and p are unknown is to plug-in estimators of p 0 and p in the previous mle estimator 4. Hence, let us define 5

6 tx = arg max m M+ n= log ˆp mx n m ˆp m x n m 4 where ˆp m and ˆp m are the empirical distributions of the rvs in line m and in all the other lines, respectively. I.e., and ˆp m x = ˆp m x = M {Xm n = x}, 5 n= M+ n= j=,j m {X n j = x}. 6 ote that tx = arg max Dˆp m, ˆp m. 7 m M+ Hence, t is the line that differs the most in the Kulback-Leibler sense from the average distribution of the other lines. The reader may be more familiar with the variant ṫx = arg max Dˆp m, ˆp, 8 m M+ where ˆp is the empirical distribution over all rvs, including line m; both ṫ and t are similar in the sense that they pick the sequence which differs the most from the rest. It turns out that the error rate of t, that is ẽm, = IP tx t, 9 where, as before, t denotes the target, has the same asymptotic behavior as the mle 4 in the case of known distributions. Theorem 3 If ε > 0, such that lim M, M2 Dp,p0 ε = 0 then lim ẽm, = 0 20 M, 6

7 error 0.5 error Figure : Estimates of the probability of error for various, for the case p 0 = 0.9, 0., p = 0.8, 0.2, M = 000. The two plots correspond to the cases of known and unknown distributions, respectively. The red line represents the upper bound as established by Theorem. and If ε > 0, such that lim M, M2 Dp,p0+ε = + then lim ẽm, =. 2 M, The proof uses the same large deviations techniques as the proof of Theorem 2 but is slightly more complex due to the fact that the rewards are not independent anymore. The proof appears in Section 6. 4 Simulations We now provide simulations that show Theorems, 2 and 3 in action. We generated M = 000 binary sequences with probabilities p 0 = 0.9, 0. and p = 0.8, 0.2 for the background and the target, respectively. We varied the number from 0 to 500, and we observed the probability of error decreasing to zero. We performed the random experiment 00 times for each value of. The procedure was replicated 20 times in 7

8 error 0.5 error Figure 2: Estimates of the probability of error for various, for the case p 0 = 0.9, 0., p = 0.7, 0.3, M = 000. The two plots correspond to the cases of known and unknown distributions, respectively. The red line represents the upper bound as established by Theorem. order to compute error bars. The plots in Figure show the estimated probability of error versus, for the two maximum likelihood detectors known and unknown distributions, respectively, along with standard deviation error bars. As expected, the error for the case of unknown distributions is somewhat higher, as there is an additional error due to the inaccuracy in estimating the two distributions. The KL divergence is Dp, p The dashed line shows the phase transition boundary, i.e., the value of such that M = 2 Dp,p0. For M = 000, this value is equal to For the known distributions plot, the red line corresponds to the upper bound established by Theorem, and it is equal to Similar plots for the case p 0 = 0.9, 0. and p = 0.7, 0.3 are shown in Figure 2. As expected, the error curves of Figure are higher than the ones in Figure 2, since the former detection case is harder than the latter. The phase transition boundary is depicted in Figure 2 with the dashed line at value = The upper bound of Theorem is given by

9 5 Conclusions We have considered a statistical model with M + sequences of independent random variables, each of length. All random variables have the same point mass function p 0 except for one sequence, the target, for which the common point mass function is p. The error of the maximum likelihood estimator for the target converges to 0 if there exists an ε > 0 such that M2 Dp,p0 ε 0, and it converges to if there exists an ε > 0 such that M2 Dp,p0+ε +. Moreover, when p 0 and p are unknown, we are able to build an estimator of the target with the same performace; this allows us to study the important practical problem of outlier detection. We conjecture that these results can be generalized to the case of ergodic Markov chains, and we plan to report the more general results in a subsequent publication. 6 Proofs Without loss of generality, we assume that the target line is line number. Proof of Theorem : em, = IP max log p X n m 2 m M+ p n= 0 Xm n > log p X n p n= 0 X n 22 MIP log p X n 2 p n= 0 X2 n > log p X n p n= 0 X n 23 p X2 n p 0 X n s = MIP p n= 0 X2 np X n >, for all s > 0 24 [ p X2 n p 0 X n ] s ME p n= 0 X2 np X n 25 [ p X2 p 0 X s ] = M E p 0 X2 p X 26 where 25 is due to the Markov inequality. 9

10 Let us define fs E [ p X2 p 0 X s ] p 0 X2 p X and gs ln fs 27 One can check that f /2 = g /2 = 0. Moreover, using Hölder s inequality, Grimmett et. al. 992, it is easy to show that, for any s, t > 0 and 0 α, E [ p X2 p 0 X αs+ αt ] p 0 X2 p X E [ p X2 p 0 X s ] α p 0 X2 p X E [ p X2 p 0 X t ] α p 0 X2 p X. 28 By taking the log on both sides, we deduce that g is a convex function of s. Hence, it achieves its minimum value at s = /2 therefore, f achieves its minimum value at s = /2. This leads to the tightest upper bound in 26, i.e., em, Mf 2 = M x p0 xp x In order to prove Theorems 2 and 3, we start with two technical lemmas that will be useful later on. Lemma Let U and R be two rvs, and y IR. Then, for any ε > 0, IP U > y + ε IP R < ε IP U + R > y IP U > y ε + IP R > ε 30 Proof of Lemma : IP U + R > y = IP U + R > y, R ε + IP U + R > y, R > ε 3 IP U > y ε + IP R > ε 32 0

11 and IP U + R y = IP U + R y, R < ε + IP U + R y, R ε 33 IP U y + ε + IP R < ε 34 which allows us to obtain the lower bound by computing the complementary event. Lemma 2 Let V,..., VM be a sequence of M independent, identically distributed, discrete random variables. Moreover, assume that the following large deviation property holds for some z IR, IP V > z =. 2 Iz., where Iz > 0 and a = b lim + log a = b Then, if ε > 0 s.t. lim M, + M2 Iz ε = 0, then lim IP max V m > z = M, + m M Also, if ε > 0 s.t. lim M, + M2 Iz+ε = +, then lim IP max m > z =. M, + m M 37 Proof of Lemma 2: Let ε > 0 be arbitrarily small. Then, there exists 0 ε > 0 such that > 0, IP V log > z < ε Iz To prove the first part, we start with the following claim: ε > 0 > 0 > : IP max m M V m > z > ε.

12 Then, using the union bound, we obtain ε > 0 > 0 > : M m= IP V m > z = MIP V > z > ε. 39 By picking > 0 ε, 39 becomes ε > 0 > 0 ε > : M2 Iz ε > ε. Hence, M2 Iz ε does not converge to zero for any ε > 0, as required. To prove the second part, we first assume that > 0 ε, as above. Then, IP max m M V m > z = IP max m M V m z 40 = IP M V z 4 M log IP V = 2 >z 42 MIP V 2 >z 43 2 M2 Iz+ε, 44 where the first inequality is a consequence of the inequality log x x, and the second inequality arises from 38. ote that 44 is true for any arbitrary ε > 0. Hence, if there exists ε > 0 such that M2 Iz+ε +, then necessarily IP max m M Vm > z. This concludes the proof of the second part, and the proof of the lemma. We are now ready for Proof of Theorem 2: em, = IP max 2 m M+ n= log p X n m p 0 X n m > n= log p X n p 0 X n 45 = IP U M + R > Dp, p 0, where 46 2

13 UM = max 2 m M+ R = Dp, p 0 n= log p X n m p 0 X n m n= and 47 log p X n p 0 X n 48 From the law of large numbers, R 0 in probability. Hence, for all η > 0 and α > 0, and for sufficiently large, using Lemma, IP U M > Dp, p 0 + η α em, IP U M > Dp, p 0 η + α 49 Let us define V m n= log p X n m p 0 X n m 50 ow, using Sanov s theorem, Dembo at. al.998, IP V 2 Dp, p 0. = 2 Dp,p0 5 Indeed, IP V 2 Dp, p 0. = 2 Dp,p 0 52 where Dp, p 0 = inf p C Dp, p 0, with C = {p; E p log p p 0 Dp, p 0 }, 53 and for p C, Dp, p 0 = E p log p p 0 = Dp, p + E p log p p 0 54 Dp, p + Dp, p 0 Dp, p ow, by continuity of the rate function, there exists ε > 0 such that IP V 2 > Dp, p 0 η. = 2 Dp,p0 ε 56 3

14 and there exists ε > 0 such that IP V 2 > Dp, p 0 + η. = 2 Dp,p0+ε 57 Finally, since U M = max V m 58 2 m M+ and the rvs V2,..., VM+ are iid, we obtain the required result from Lemma 2. Proof of Theorem 3: We proceed along the same lines as for the proof of Theorem 2. ẽm, = IP max 2 m M+ n= log ˆp mx n m ˆp m X n m > n= log ˆp X n ˆp X n 59 IP UM + RM > Dp, p 0, with 60 UM = max log ˆp mxm n 2 m M+ p 0 Xm n 6 n= RM = A M + B + C 62 A M = max log p 0Xm n 2 m M+ ˆp m Xm n 63 B = n= C = Dp, p 0 For all η > 0, from Lemma, n= log ˆp X n p 0 X n 64 n= log ˆp X n p 0 X n 65 ẽm, IP U M > Dp, p 0 η + IP R M > η. 66 Let V m = n= log ˆp mxm n p 0 Xm n, 2 m M

15 Using Sanov s theorem, IP V 2 Dp, p 0. = 2 Dp,p0. 68 Indeed, IP V 2 Dp, p 0. = 2 Dp,p 0, 69 where Dp, p 0 = inf p C Dp, p 0, with C = {p; E p log p p 0 Dp, p 0 }. 70 And for p C, Dp, p 0 = E p log p p 0 Dp, p 0. 7 ow, by continuity of the rate function, there exists ε > 0 such that IP V 2 > Dp, p 0 η. = 2 Dp,p0 ε. 72 To show that 66 approaches zero as M, with M2 Dp,p0 ε 0, it suffices to prove that R M 0, since the term IP U M > Dp, p 0 η of 66 goes to zero by virtue of 72, Lemma 2, and the fact that U M = max V m, 73 2 m M+ and the rvs V2,..., VM+ are iid. Using the law of large numbers, C 0 in probability. Also, IP A M > η MIP log p 0X2 n ˆp n= 2 X2 n > η MIP max log p 0X n 2 n ˆp 2 X2 n > η MIP log p 0X2 ˆp 2 X2 > η

16 M max IP log p 0x x ˆp 2 x > η 77 = M max x IP ˆp 2x < 2 η p 0 x 78 M max x 2 M Ix,η = M2 M Jη 79 where Ix, η > 0 is a rate function, and Jη = min x Ix, η. The last inequality comes from the fact that ˆp 2 x p 0 x in probability. A similar argument shows that B 0 in probability as well. ote that the result would still hold if we replaced ˆp m with ˆp, i.e., with the empirical distribution over the full data. References [] Amir Dembo and Ofer Zeitouni. Large Deviations Techniques and Applications. Springer Verlag, 998. [2] D. Geman and B. Jedynak. An active testing model for tracking roads from satellite images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8: 4, January 996. [3] G.R. Grimmett and D. R. Stirzaker. Probability and Random Processes. Oxford Science Publications, 992. [4] Alan L. Yuille and James M. Coughlan. Fundamental limits of bayesian inference: Order parameters and phase transitions for road tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 222:60 73, February [5] Alan L. Yuille, James M. Coughlan, Yingnian Wu, and Song Chun Zhu. Order parameters for detecting target curves in images: When does high level knowledge help? International Journal of Computer Vision, 4:9 33,

Order Parameters for Minimax Entropy Distributions: When does high level knowledge help?

Order Parameters for Minimax Entropy Distributions: When does high level knowledge help? Order Parameters for Minimax Entropy Distributions: When does high level knowledge help? A.L. Yuille and James Coughlan Smith-Kettlewell Eye Research Institute 2318 Fillmore Street San Francisco CA 94115

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Large Deviations Techniques and Applications

Large Deviations Techniques and Applications Amir Dembo Ofer Zeitouni Large Deviations Techniques and Applications Second Edition With 29 Figures Springer Contents Preface to the Second Edition Preface to the First Edition vii ix 1 Introduction 1

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Gärtner-Ellis Theorem and applications.

Gärtner-Ellis Theorem and applications. Gärtner-Ellis Theorem and applications. Elena Kosygina July 25, 208 In this lecture we turn to the non-i.i.d. case and discuss Gärtner-Ellis theorem. As an application, we study Curie-Weiss model with

More information

Order Parameters for Detecting Target Curves in Images: When Does High Level Knowledge Help?

Order Parameters for Detecting Target Curves in Images: When Does High Level Knowledge Help? International Journal of Computer Vision 41(1/2), 9 33, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Order Parameters for Detecting Target Curves in Images: When Does High Level

More information

Convergence Rates of Algorithms for Visual Search: Detecting Visual Contours

Convergence Rates of Algorithms for Visual Search: Detecting Visual Contours Convergence Rates of Algorithms for Visual Search: Detecting Visual Contours A.L. Yuille Smith-Kettlewell Inst. San Francisco, CA 94115 James M. Coughlan Smith-Kettlewell Inst. San Francisco, CA 94115

More information

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 22: Error exponents in hypothesis testing, GLRT 10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 3: Lower Bounds for Bandit Algorithms CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and

More information

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1] ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax

More information

Computation of Csiszár s Mutual Information of Order α

Computation of Csiszár s Mutual Information of Order α Computation of Csiszár s Mutual Information of Order Damianos Karakos, Sanjeev Khudanpur and Care E. Priebe Department of Electrical and Computer Engineering and Center for Language and Speech Processing

More information

5.1 Inequalities via joint range

5.1 Inequalities via joint range ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 5: Inequalities between f-divergences via their joint range Lecturer: Yihong Wu Scribe: Pengkun Yang, Feb 9, 2016

More information

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series C.7 Numerical series Pag. 147 Proof of the converging criteria for series Theorem 5.29 (Comparison test) Let and be positive-term series such that 0, for any k 0. i) If the series converges, then also

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

Machine Learning Theory Lecture 4

Machine Learning Theory Lecture 4 Machine Learning Theory Lecture 4 Nicholas Harvey October 9, 018 1 Basic Probability One of the first concentration bounds that you learn in probability theory is Markov s inequality. It bounds the right-tail

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Entropy and Ergodic Theory Lecture 15: A first look at concentration

Entropy and Ergodic Theory Lecture 15: A first look at concentration Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Maximum likelihood in log-linear models

Maximum likelihood in log-linear models Graphical Models, Lecture 4, Michaelmas Term 2010 October 22, 2010 Generating class Dependence graph of log-linear model Conformal graphical models Factor graphs Let A denote an arbitrary set of subsets

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 7: Information Theory Cosma Shalizi 3 February 2009 Entropy and Information Measuring randomness and dependence in bits The connection to statistics Long-run

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

On Finding Optimal Policies for Markovian Decision Processes Using Simulation On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation

More information

On Improved Bounds for Probability Metrics and f- Divergences

On Improved Bounds for Probability Metrics and f- Divergences IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES On Improved Bounds for Probability Metrics and f- Divergences Igal Sason CCIT Report #855 March 014 Electronics Computers Communications

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

The Game of Twenty Questions with noisy answers. Applications to Fast face detection, micro-surgical tool tracking and electron microscopy

The Game of Twenty Questions with noisy answers. Applications to Fast face detection, micro-surgical tool tracking and electron microscopy The Game of Twenty Questions with noisy answers. Applications to Fast face detection, micro-surgical tool tracking and electron microscopy Graduate Summer School: Computer Vision July 22 - August 9, 2013

More information

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing arxiv:uant-ph/9906090 v 24 Jun 999 Tomohiro Ogawa and Hiroshi Nagaoka Abstract The hypothesis testing problem of two uantum states is

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

A minimalist s exposition of EM

A minimalist s exposition of EM A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability

More information

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs 2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html

More information

Prequential Analysis

Prequential Analysis Prequential Analysis Philip Dawid University of Cambridge NIPS 2008 Tutorial Forecasting 2 Context and purpose...................................................... 3 One-step Forecasts.......................................................

More information

Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Supplemental for Spectral Algorithm For Latent Tree Graphical Models Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015

More information

Scalable Hash-Based Estimation of Divergence Measures

Scalable Hash-Based Estimation of Divergence Measures Scalable Hash-Based Estimation of Divergence easures orteza oshad and Alfred O. Hero III Electrical Engineering and Computer Science University of ichigan Ann Arbor, I 4805, USA {noshad,hero} @umich.edu

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Convergence of Random Variables

Convergence of Random Variables 1 / 15 Convergence of Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay March 19, 2014 2 / 15 Motivation Theorem (Weak

More information

ELEC633: Graphical Models

ELEC633: Graphical Models ELEC633: Graphical Models Tahira isa Saleem Scribe from 7 October 2008 References: Casella and George Exploring the Gibbs sampler (1992) Chib and Greenberg Understanding the Metropolis-Hastings algorithm

More information

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < + Random Walks: WEEK 2 Recurrence and transience Consider the event {X n = i for some n > 0} by which we mean {X = i}or{x 2 = i,x i}or{x 3 = i,x 2 i,x i},. Definition.. A state i S is recurrent if P(X n

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Hardness of the Covering Radius Problem on Lattices

Hardness of the Covering Radius Problem on Lattices Hardness of the Covering Radius Problem on Lattices Ishay Haviv Oded Regev June 6, 2006 Abstract We provide the first hardness result for the Covering Radius Problem on lattices (CRP). Namely, we show

More information

Computable priors sharpened into Occam s razors

Computable priors sharpened into Occam s razors Computable priors sharpened into Occam s razors David R. Bickel To cite this version: David R. Bickel. Computable priors sharpened into Occam s razors. 2016. HAL Id: hal-01423673 https://hal.archives-ouvertes.fr/hal-01423673v2

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY 2nd International Symposium on Information Geometry and its Applications December 2-6, 2005, Tokyo Pages 000 000 STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY JUN-ICHI TAKEUCHI, ANDREW R. BARRON, AND

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of

More information

certain class of distributions, any SFQ can be expressed as a set of thresholds on the sufficient statistic. For distributions

certain class of distributions, any SFQ can be expressed as a set of thresholds on the sufficient statistic. For distributions Score-Function Quantization for Distributed Estimation Parvathinathan Venkitasubramaniam and Lang Tong School of Electrical and Computer Engineering Cornell University Ithaca, NY 4853 Email: {pv45, lt35}@cornell.edu

More information

A Note on the Central Limit Theorem for a Class of Linear Systems 1

A Note on the Central Limit Theorem for a Class of Linear Systems 1 A Note on the Central Limit Theorem for a Class of Linear Systems 1 Contents Yukio Nagahata Department of Mathematics, Graduate School of Engineering Science Osaka University, Toyonaka 560-8531, Japan.

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

1 Inverse Transform Method and some alternative algorithms

1 Inverse Transform Method and some alternative algorithms Copyright c 2016 by Karl Sigman 1 Inverse Transform Method and some alternative algorithms Assuming our computer can hand us, upon demand, iid copies of rvs that are uniformly distributed on (0, 1), it

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321 Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process

More information

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 3, MARCH

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 3, MARCH IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 3, MARCH 1998 315 Asymptotic Buffer Overflow Probabilities in Multiclass Multiplexers: An Optimal Control Approach Dimitris Bertsimas, Ioannis Ch. Paschalidis,

More information

Reputations. Larry Samuelson. Yale University. February 13, 2013

Reputations. Larry Samuelson. Yale University. February 13, 2013 Reputations Larry Samuelson Yale University February 13, 2013 I. Introduction I.1 An Example: The Chain Store Game Consider the chain-store game: Out In Acquiesce 5, 0 2, 2 F ight 5,0 1, 1 If played once,

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

Exponential Tail Bounds

Exponential Tail Bounds Exponential Tail Bounds Mathias Winther Madsen January 2, 205 Here s a warm-up problem to get you started: Problem You enter the casino with 00 chips and start playing a game in which you double your capital

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

1 Sequences of events and their limits

1 Sequences of events and their limits O.H. Probability II (MATH 2647 M15 1 Sequences of events and their limits 1.1 Monotone sequences of events Sequences of events arise naturally when a probabilistic experiment is repeated many times. For

More information

Universal Outlier Hypothesis Testing

Universal Outlier Hypothesis Testing Universal Outlier Hypothesis Testing Venu Veeravalli ECE Dept & CSL & ITI University of Illinois at Urbana-Champaign http://www.ifp.illinois.edu/~vvv (with Sirin Nitinawarat and Yun Li) NSF Workshop on

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Iowa State University. Instructor: Alex Roitershtein Summer Exam #1. Solutions. x u = 2 x v

Iowa State University. Instructor: Alex Roitershtein Summer Exam #1. Solutions. x u = 2 x v Math 501 Iowa State University Introduction to Real Analysis Department of Mathematics Instructor: Alex Roitershtein Summer 015 Exam #1 Solutions This is a take-home examination. The exam includes 8 questions.

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University. Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December

More information

Asymptotic formulae for the number of repeating prime sequences less than N

Asymptotic formulae for the number of repeating prime sequences less than N otes on umber Theory and Discrete Mathematics Print ISS 30 532, Online ISS 2367 8275 Vol. 22, 206, o. 4, 29 40 Asymptotic formulae for the number of repeating prime sequences less than Christopher L. Garvie

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 HMM Particle Filter Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Rejection Sampling Rejection Sampling Sample variables from joint

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 4: Proof of Shannon s theorem and an explicit code CSE 533: Error-Correcting Codes (Autumn 006 Lecture 4: Proof of Shannon s theorem and an explicit code October 11, 006 Lecturer: Venkatesan Guruswami Scribe: Atri Rudra 1 Overview Last lecture we stated

More information

Math 229: Introduction to Analytic Number Theory The product formula for ξ(s) and ζ(s); vertical distribution of zeros

Math 229: Introduction to Analytic Number Theory The product formula for ξ(s) and ζ(s); vertical distribution of zeros Math 9: Introduction to Analytic Number Theory The product formula for ξs) and ζs); vertical distribution of zeros Behavior on vertical lines. We next show that s s)ξs) is an entire function of order ;

More information

10. Composite Hypothesis Testing. ECE 830, Spring 2014

10. Composite Hypothesis Testing. ECE 830, Spring 2014 10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters

More information

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models IEEE Transactions on Information Theory, vol.58, no.2, pp.708 720, 2012. 1 f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models Takafumi Kanamori Nagoya University,

More information

An inverse of Sanov s theorem

An inverse of Sanov s theorem An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider

More information

First Hitting Time Analysis of the Independence Metropolis Sampler

First Hitting Time Analysis of the Independence Metropolis Sampler (To appear in the Journal for Theoretical Probability) First Hitting Time Analysis of the Independence Metropolis Sampler Romeo Maciuca and Song-Chun Zhu In this paper, we study a special case of the Metropolis

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information