Supplementary Material: A Robust Approach to Sequential Information Theoretic Planning

Size: px
Start display at page:

Download "Supplementary Material: A Robust Approach to Sequential Information Theoretic Planning"

Transcription

1 Supplementar aterial: A Robust Approach to Sequential Information Theoretic Planning Sue Zheng Jason Pacheco John W. Fisher, III. Proofs of Estimator Properties.. Proof of Prop. Here we show how the bias of the empirical estimator depends on,. Since we use a plug in estimate of the predictive posterior ˆp i p i Y to estimate I, we first determine the mean and variance of the samples θ i for I estimation which incorporate the plug in estimate θ i = log pi x i ˆp. i Let {x i, i } i= px, Y and for each i let {x ij } j= px Y be independent samples. Going forward, we will drop the explicit dependence on the observed data, Y, for clarit but all expressions are conditioned on them. The empirical estimate of the posterior predictive takes form ˆp i = j pi x ij. From Thm. 3.9 of DasGupta, 008, if we have iid observations X, X,..., X with finite fourth moment, mean µ, and variance, and a scalar transformation g with four uniforml bounded derivatives, then EgX = gµ + g µ + On var gx = g µ + On i Xi. In our where X is the empirical mean X = case, we have samples p i x ij which are iid when conditioned on i with conditional mean p i and conditional variance p i X x pi x pxdx p i. We are interested in the transformation g = log. Consequentl, we have Using the law of total expectation to remove the conditioning on i in the mean and dropping the superscript i since measurements are sampled iid, we have: E logˆp = HY + E p + O. Expanding the second term on the RHS, we obtain: x E p = p p x pxdx p p d = x p x pxdx d p = p x px dx d x ppx = χ p, x ppx where χ p, x ppx denotes the chi-square divergence of p, x from ppx. Plugging this back into, we obtain E logˆp = HY + χ p, x ppx + O. From Eqn. 3 we can now obtain the mean of each sample θ i = log pi x i ˆp i used in the empirical I estimate Î = i θ i and therefore, the mean of the I estimate: 3 E logˆp i i = logp i + p i X p i + O var logˆp i i = p i X p i + O. * Equal contribution Computer Science and Artificial Intelligence Lab, assachusetts Institute of Technolog, Boston, A, USA. Correspondence to: Sue Zheng <szheng@csail.mit.edu>. Proceedings of the 35 th International Conference on achine Learning, Stockholm, Sweden, PLR 80, 08. Copright 08 b the authors. EÎ = Eθ = Elog p x Elog ˆp = HY X + HY + χ p, x ppx + O = IX; Y + χ p, x ppx + O. We have thus shown how the bias of the empirical estimator depends on,. 4

2 .. Proof of Prop. We first show that the empirical plug-in estimator is consistent and asmptoticall normal. We will next show that the influence function of the robust estimator converges to the influence function of the empirical estimator, such that the asmptotic analsis of the empirical estimator holds for the robust estimate as well. We begin b finding expressions for the mean and variance of the iid θ i samples used in I estimation. Then, using these expression show that the empirical mean converges to a ormal distribution. In Sec.., we have an expression for the variance of the log predictive posterior conditioned on i. We can remove the conditioning on i though the law of total variance. Again removing the superscript i for clarit, we have the following expression for variance: Eqn. 5 to get the following dependence on : var log ˆp = log p 6 + HY χ p, x ppx log p /pd + O. We can now find the variance of the samples used in I estimation: var θ = log p x cov log p x, log ˆp + var log ˆp. 7 var logˆp 5 = Evar logˆp + var Elogˆp = E p + O + var Elogˆp. We can simplif the first term on the RHS through Eqn.. Keeping track of onl the order terms and higher, we expand the last term on the RHS as var Elogˆp = E logp + p + O Elogˆp = p log p log p p + O d HY + χ p, x ppx + O = plog p log p d + O p HY + HY χ p, x ppx = log p log p d p HY χ p, x ppx + O. We plug the above back into the variance expression To simplif our analsis, let us assume that we use independent sets of samples {x i, i } i= and {xi, i } i=+ for estimating Elog p x and Elog ˆp respectivel. Thus, the covariance term in Eqn. 7 is zero. Substituting the log posterior predictive variance Eqn. 6 into the I sample variance Eqn. 7 ields the full expression for the I sample variance var θ = log p x + log p + HY χ p, x ppx log p /pd + O = log p x 8 p + HY χ p, x ppx log p /pd + O. We now have the variance and mean of θ i, which compose the samples for the plug-in empirical estimate of I. We will now show that this I estimate admits a CLT. Let us define θ i = θ i IX; Y and let each θ i have cdf F θ i = P Θ i θ i. Let us assume that it has finite moment and its moment generating function mgf θt = Eexpt θ i exists. We will show that moment generating function for Z Î IX; Y = i θ i log p x p approaches that of a zero-mean normal with variance Î so that the estimator is consistent and a CLT holds. Because θ i are iid, the mgf for Z

3 can be written as Z t = θ i= i / t = θ/ t = θt/ = Eexpt θ/. 9 Using Talor s theorem on the exponential, we obtain Z t = = E + t θ + t θ + t θ ! 3/ + t χ p, x ppx + t E θ + + O t3 6 E θ / ote that, since θ = θ IX; Y, we have that var θ = var θ and E θ = O such that E θ = O. We now expand the third term in Eqn. 0 using our expression for variance Eqn. 8 of θ: E θ = var θ + E θ = var θ + O = log p x p + HY χ p, x ppx log p /pd + O. Furthermore, note that since all the moments of θ are finite, we can ignore the 3/ or lower terms since the disappear as. For the limit to exist, the coefficient for / must deca at a rate at least / ; thus, we must have = Ω. Finall, recall that the mgf for a Gaussian with mean µ and variance is exptµ + t. Therefore, to have a consistent estimator Z has zero mean in the limit, we require the coefficient on the t term to disappear in the limit. This requires that grows strictl faster than : = ω. Letting = ω, the limit of Eqn. 0 is: lim Z t = exp log p x t. p Since the mgf approaches that of a zero mean Gaussian with variance log p x p we can conclude that the empirical plug-in I estimator is consistent and satisfies the following CLT when = ω : lim Î IX; Y 0,. 3 Î Lastl, we demonstrate that as, the robust influence function approaches that of the empirical such that the consistenc and CLT guarantees also hold for our robust es- timator. Let α =. ote that the influence function corresponding to an empirical estimate is ψ empir x = cx where c is an constant. The robust cost function, parameterized b α, is given as ψx; α = { log + αx + αx /, x θ log αx + αx /, x < θ. For simplicit of analsis, we will onl consider x 0 however, the proof for x < 0 follows similarl. ote that lim αx = lim = 0. Therefore, we take the Talor series of log + + / at 0, where = αx, and obtain x ψx; α = αx + 0 αx αx3 3! + As, the first term dominates and we get that the robust influence function approaches the empirical function with c = α..3. Proof of Prop. 3 Here, we first show the finite sample deviation bound on the robust estimator, then secondl the bound on the empirical. Let {x i, i } i= px, Y and for each i let {x ij } j= px Y be independent samples. We define x i {x ij } j=. Recall that Î = ROOT i ψαθi Î where θ i = log pi x i ˆp i ;x i and ˆpi ; x i is the posterior predictive estimator. Assuming α = / Î and > + log ɛ, we have from Prop..4 of Catoni, 0, the robust estimator satisfies with probabilit at least ɛ where c Î m c 4 m = Eθ i = E log and c = +log ɛ Î / + +log ɛ / p x ˆp; x. Since the samples are identicall distributed, m does not depend on sample index i. ote that, because the denominator contains the estimated posterior predictive, this currentl does not bound

4 deviation from the true I value, I. We can rewrite m as follows: p x p Y m = E log ˆp; x p Y p x p Y = E log + E log p Y ˆp; x p Y = I + E log ˆp; x = I + E x KLp Y ˆp; x. 5 ote that the above derivation holds for both the robust and empirical estimators. Letting b = E x KLp Y ˆp; x, we obtain the presented deviation bounds on the robust estimator: b c Î m b + c 6 Similarl, the empirical estimate can be bounded as in Eqn. 4 but with c = î ɛ b Chebshev s inequalit. Since the same derivation relating the true mean to the plug-in mean in Eqn. 5 holds for the empirical, we have that Eqn. 6 also carries through. Although the high-level expression stas the same, we must be careful to note that both b and c differ for the two estimators. The difference in b is a bit more subtle. b is the expected KL divergence between the posterior predictive and the estimated posterior predictive; the estimated posterior predictive differs between robust and empirical..4. Proof of Prop. 4 We now show how we approximate the probabilit that the correct action is selected. Without loss of generalit, let I I... I A and let f a Îa and F a Îa denote the pdf and cdf of the estimate. Since independent samples are drawn to estimate I under each candidate action, their estimates are independent. This allows us to decompose the probabilit of selecting the correct action as: + A Î Pa = = fî fîadîa dî a= + A = fî F a Î dî 7 For, we have that Pa = + a= Î; I, A Î I a Φ dî a= where a = log pa x p a Y and Φ is the cumulative distribution function of the standard normal distribution. a Posterior Prob odel: Power Experiment Random SC Sequential Robust Posterior Prob odel: Exponential Experiment Figure. emor retention model. Plots of posterior model probabilit of the correct retention model for 500 random trials of 00 experiments with 500 particles. Data are sampled from POW left and EXP right. Plots show median solid and quartiles dashed over the same set of runs.. Experiment: emor Retention odel Selection We appl our method to sequential experiment design for Baesian model discrimination. We adopt the analsis posed in Cavagnaro et al., 00 where the aim is to discriminate among candidate models of memor retention. At each stage an experiment is conducted where a simulated participant is given a list of words to memorize. After a chosen lag time d t participants are asked to recall the words. At stage t the aim is to select lag time d t which maximall discriminates between two retention models, power POW and exponential EXP: p 0 d t θ = θ 0 d t + θ, p d t θ = θ 0 e θdt. The joint distribution combines the retention model with a response variable t where t = indicates the participant remembers the words and t = 0 otherwise, m Unif, θ 0 Betaα 0, β 0, θ Betaα, β, t m, θ; d t Bernoulli p m d t θ. At each stage we select the lag time which maximizes mutual information between the model and observation, arg max dt I dt ; Y t Y t. Expectations over model and response Y t are discrete, and can be computed efficientl. Furthermore, under a uniform prior we have that the model posterior is proportional to the evidences, Z mt t, d t pθ m t p i θ, m; d t dθ 8 i= We compare our integrated inference and planning approach to the sequential onte Carlo SC approach of Drovandi et al., 04; Chopin, 00, which utilizes the SC estimate of the evidence for planning: log Ẑmt = log Ẑm,t + log i= wi mt. We observe moderate slightl improved median performance and tighter quartiles as shown in Fig.. Im-

5 provements over SC are modest, likel due to the lowdimensional D integration over θ. We also find that when data are sampled from the exponential model posterior concentrates quickl under both methods, which confirms findings of the original authors. Both our method and SC significantl outperform random design. References Catoni, O. Challenging the empirical mean and empirical variance: A deviation stud. Ann. Inst. H. Poincar Probab. Statist., 484:48 85, 0. doi: 0. 4/-AIHP454. Cavagnaro, D. R., ung, J. I., Pitt,. A., and Kujala, J. V. Adaptive design optimization: A I-based approach to model discrimination in cognitive science. eural- Comp., 4: , 00. Chopin,. A sequential particle filter method for static models. Biometrika, 893:539 55, 00. DasGupta, A. Asmptotic Theor of Statistics and Probabilit. Springer Texts in Statistics. Springer ew York, 008. ISB Drovandi, C. C., cgree, J.., and Pettitt, A.. A sequential monte carlo algorithm to incorporate model uncertaint in baesian sequential design. JCGS, 3: 3 4, 04.

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Robust Monte Carlo Methods for Sequential Planning and Decision Making Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory

More information

Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo

Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo, James McGree, Tony Pettitt October 7, 2 Introduction Motivation Model choice abundant throughout literature Take into account

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

8 STOCHASTIC SIMULATION

8 STOCHASTIC SIMULATION 8 STOCHASTIC SIMULATIO 59 8 STOCHASTIC SIMULATIO Whereas in optimization we seek a set of parameters x to minimize a cost, or to maximize a reward function J( x), here we pose a related but different question.

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Sequential Monte Carlo Algorithms for Bayesian Sequential Design

Sequential Monte Carlo Algorithms for Bayesian Sequential Design Sequential Monte Carlo Algorithms for Bayesian Sequential Design Dr Queensland University of Technology c.drovandi@qut.edu.au Collaborators: James McGree, Tony Pettitt, Gentry White Acknowledgements: Australian

More information

CSCI-6971 Lecture Notes: Monte Carlo integration

CSCI-6971 Lecture Notes: Monte Carlo integration CSCI-6971 Lecture otes: Monte Carlo integration Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu February 21, 2006 1 Overview Consider the following

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

Review of Elementary Probability Lecture I Hamid R. Rabiee

Review of Elementary Probability Lecture I Hamid R. Rabiee Stochastic Processes Review o Elementar Probabilit Lecture I Hamid R. Rabiee Outline Histor/Philosoph Random Variables Densit/Distribution Functions Joint/Conditional Distributions Correlation Important

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Limiting Distributions

Limiting Distributions Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the

More information

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Chapter 2. Discrete Distributions

Chapter 2. Discrete Distributions Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation

More information

Perturbation Theory for Variational Inference

Perturbation Theory for Variational Inference Perturbation heor for Variational Inference Manfred Opper U Berlin Marco Fraccaro echnical Universit of Denmark Ulrich Paquet Apple Ale Susemihl U Berlin Ole Winther echnical Universit of Denmark Abstract

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Chapter 4. Chapter 4 sections

Chapter 4. Chapter 4 sections Chapter 4 sections 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP: 4.8 Utility Expectation

More information

7 Random samples and sampling distributions

7 Random samples and sampling distributions 7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Expect Values and Probability Density Functions

Expect Values and Probability Density Functions Intelligent Systems: Reasoning and Recognition James L. Crowley ESIAG / osig Second Semester 00/0 Lesson 5 8 april 0 Expect Values and Probability Density Functions otation... Bayesian Classification (Reminder...3

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()

More information

Statistics 3657 : Moment Approximations

Statistics 3657 : Moment Approximations Statistics 3657 : Moment Approximations Preliminaries Suppose that we have a r.v. and that we wish to calculate the expectation of g) for some function g. Of course we could calculate it as Eg)) by the

More information

Predictive Hypothesis Identification

Predictive Hypothesis Identification Marcus Hutter - 1 - Predictive Hypothesis Identification Predictive Hypothesis Identification Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................

More information

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Expectation. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Expectation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance,

More information

Annealing Between Distributions by Averaging Moments

Annealing Between Distributions by Averaging Moments Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We

More information

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work. 1 A socioeconomic stud analzes two discrete random variables in a certain population of households = number of adult residents and = number of child residents It is found that their joint probabilit mass

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm Midterm Examination STA 215: Statistical Inference Due Wednesday, 2006 Mar 8, 1:15 pm This is an open-book take-home examination. You may work on it during any consecutive 24-hour period you like; please

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n

More information

Optimal input design for nonlinear dynamical systems: a graph-theory approach

Optimal input design for nonlinear dynamical systems: a graph-theory approach Optimal input design for nonlinear dynamical systems: a graph-theory approach Patricio E. Valenzuela Department of Automatic Control and ACCESS Linnaeus Centre KTH Royal Institute of Technology, Stockholm,

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Analysis of Experimental Designs

Analysis of Experimental Designs Analysis of Experimental Designs p. 1/? Analysis of Experimental Designs Gilles Lamothe Mathematics and Statistics University of Ottawa Analysis of Experimental Designs p. 2/? Review of Probability A short

More information

Squeezing Every Ounce of Information from An Experiment: Adaptive Design Optimization

Squeezing Every Ounce of Information from An Experiment: Adaptive Design Optimization Squeezing Every Ounce of Information from An Experiment: Adaptive Design Optimization Jay Myung Department of Psychology Ohio State University UCI Department of Cognitive Sciences Colloquium (May 21, 2014)

More information

MATH Notebook 5 Fall 2018/2019

MATH Notebook 5 Fall 2018/2019 MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables.............................

More information

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The

More information

3.6 Applications of Poisson s equation

3.6 Applications of Poisson s equation William J. Parnell: MT3432. Section 3: Green s functions in 2 and 3D 69 3.6 Applications of Poisson s equation The beaut of Green s functions is that the are useful! Let us consider an example here where

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

6 The normal distribution, the central limit theorem and random samples

6 The normal distribution, the central limit theorem and random samples 6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e

More information

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Expectation. DS GA 1002 Probability and Statistics for Data Science.   Carlos Fernandez-Granda Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean,

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Probability Models. 4. What is the definition of the expectation of a discrete random variable? 1 Probability Models The list of questions below is provided in order to help you to prepare for the test and exam. It reflects only the theoretical part of the course. You should expect the questions

More information

Monte-Carlo SURE: A Black-Box Optimization of Regularization Parameters for General Denoising Algorithms - Supplementary Material

Monte-Carlo SURE: A Black-Box Optimization of Regularization Parameters for General Denoising Algorithms - Supplementary Material Monte-Carlo SURE: A Blac-Box Optimization of Regularization Parameters for General Denoising Algorithms - Supplementar Material Sathish Ramani, Student Member, Thierr Blu, Senior Member, and Michael Unser,

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Fault Detection and Diagnosis Using Information Measures

Fault Detection and Diagnosis Using Information Measures Fault Detection and Diagnosis Using Information Measures Rudolf Kulhavý Honewell Technolog Center & Institute of Information Theor and Automation Prague, Cech Republic Outline Probabilit-based inference

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Brief Review of Probability

Brief Review of Probability Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Posted June 26, 2013 In the solution of Problem 4 in Practice Examination 8, the fourth item in the list of six events was mistyped as { X 1

Posted June 26, 2013 In the solution of Problem 4 in Practice Examination 8, the fourth item in the list of six events was mistyped as { X 1 ASM Stud Manual for Exam P, Sixth Edition B Dr. Krzsztof M. Ostaszewski, FSA, CFA, MAAA Web site: http://www.krzsio.net E-mail: krzsio@krzsio.net Errata Effective Jul 5, 3, onl the latest edition of this

More information

Entropy, Relative Entropy and Mutual Information Exercises

Entropy, Relative Entropy and Mutual Information Exercises Entrop, Relative Entrop and Mutual Information Exercises Exercise 2.: Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. (a) Find the entrop H(X)

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Section 9.1. Expected Values of Sums

Section 9.1. Expected Values of Sums Section 9.1 Expected Values of Sums Theorem 9.1 For any set of random variables X 1,..., X n, the sum W n = X 1 + + X n has expected value E [W n ] = E [X 1 ] + E [X 2 ] + + E [X n ]. Proof: Theorem 9.1

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Discrete Variables and Gradient Estimators

Discrete Variables and Gradient Estimators iscrete Variables and Gradient Estimators This assignment is designed to get you comfortable deriving gradient estimators, and optimizing distributions over discrete random variables. For most questions,

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Continuous Distributions

Continuous Distributions Continuous Distributions 1.8-1.9: Continuous Random Variables 1.10.1: Uniform Distribution (Continuous) 1.10.4-5 Exponential and Gamma Distributions: Distance between crossovers Prof. Tesler Math 283 Fall

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Advanced Introduction to Machine Learning: Homework 2 MLE, MAP and Naive Bayes

Advanced Introduction to Machine Learning: Homework 2 MLE, MAP and Naive Bayes 10-715 Advanced Introduction to Machine Learning: Homework 2 MLE, MAP and Naive Baes Released: Wednesda, September 12, 2018 Due: 11:59 p.m. Wednesda, September 19, 2018 Instructions Late homework polic:

More information

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information