Supplementary Material: A Robust Approach to Sequential Information Theoretic Planning
|
|
- Pamela Watson
- 5 years ago
- Views:
Transcription
1 Supplementar aterial: A Robust Approach to Sequential Information Theoretic Planning Sue Zheng Jason Pacheco John W. Fisher, III. Proofs of Estimator Properties.. Proof of Prop. Here we show how the bias of the empirical estimator depends on,. Since we use a plug in estimate of the predictive posterior ˆp i p i Y to estimate I, we first determine the mean and variance of the samples θ i for I estimation which incorporate the plug in estimate θ i = log pi x i ˆp. i Let {x i, i } i= px, Y and for each i let {x ij } j= px Y be independent samples. Going forward, we will drop the explicit dependence on the observed data, Y, for clarit but all expressions are conditioned on them. The empirical estimate of the posterior predictive takes form ˆp i = j pi x ij. From Thm. 3.9 of DasGupta, 008, if we have iid observations X, X,..., X with finite fourth moment, mean µ, and variance, and a scalar transformation g with four uniforml bounded derivatives, then EgX = gµ + g µ + On var gx = g µ + On i Xi. In our where X is the empirical mean X = case, we have samples p i x ij which are iid when conditioned on i with conditional mean p i and conditional variance p i X x pi x pxdx p i. We are interested in the transformation g = log. Consequentl, we have Using the law of total expectation to remove the conditioning on i in the mean and dropping the superscript i since measurements are sampled iid, we have: E logˆp = HY + E p + O. Expanding the second term on the RHS, we obtain: x E p = p p x pxdx p p d = x p x pxdx d p = p x px dx d x ppx = χ p, x ppx where χ p, x ppx denotes the chi-square divergence of p, x from ppx. Plugging this back into, we obtain E logˆp = HY + χ p, x ppx + O. From Eqn. 3 we can now obtain the mean of each sample θ i = log pi x i ˆp i used in the empirical I estimate Î = i θ i and therefore, the mean of the I estimate: 3 E logˆp i i = logp i + p i X p i + O var logˆp i i = p i X p i + O. * Equal contribution Computer Science and Artificial Intelligence Lab, assachusetts Institute of Technolog, Boston, A, USA. Correspondence to: Sue Zheng <szheng@csail.mit.edu>. Proceedings of the 35 th International Conference on achine Learning, Stockholm, Sweden, PLR 80, 08. Copright 08 b the authors. EÎ = Eθ = Elog p x Elog ˆp = HY X + HY + χ p, x ppx + O = IX; Y + χ p, x ppx + O. We have thus shown how the bias of the empirical estimator depends on,. 4
2 .. Proof of Prop. We first show that the empirical plug-in estimator is consistent and asmptoticall normal. We will next show that the influence function of the robust estimator converges to the influence function of the empirical estimator, such that the asmptotic analsis of the empirical estimator holds for the robust estimate as well. We begin b finding expressions for the mean and variance of the iid θ i samples used in I estimation. Then, using these expression show that the empirical mean converges to a ormal distribution. In Sec.., we have an expression for the variance of the log predictive posterior conditioned on i. We can remove the conditioning on i though the law of total variance. Again removing the superscript i for clarit, we have the following expression for variance: Eqn. 5 to get the following dependence on : var log ˆp = log p 6 + HY χ p, x ppx log p /pd + O. We can now find the variance of the samples used in I estimation: var θ = log p x cov log p x, log ˆp + var log ˆp. 7 var logˆp 5 = Evar logˆp + var Elogˆp = E p + O + var Elogˆp. We can simplif the first term on the RHS through Eqn.. Keeping track of onl the order terms and higher, we expand the last term on the RHS as var Elogˆp = E logp + p + O Elogˆp = p log p log p p + O d HY + χ p, x ppx + O = plog p log p d + O p HY + HY χ p, x ppx = log p log p d p HY χ p, x ppx + O. We plug the above back into the variance expression To simplif our analsis, let us assume that we use independent sets of samples {x i, i } i= and {xi, i } i=+ for estimating Elog p x and Elog ˆp respectivel. Thus, the covariance term in Eqn. 7 is zero. Substituting the log posterior predictive variance Eqn. 6 into the I sample variance Eqn. 7 ields the full expression for the I sample variance var θ = log p x + log p + HY χ p, x ppx log p /pd + O = log p x 8 p + HY χ p, x ppx log p /pd + O. We now have the variance and mean of θ i, which compose the samples for the plug-in empirical estimate of I. We will now show that this I estimate admits a CLT. Let us define θ i = θ i IX; Y and let each θ i have cdf F θ i = P Θ i θ i. Let us assume that it has finite moment and its moment generating function mgf θt = Eexpt θ i exists. We will show that moment generating function for Z Î IX; Y = i θ i log p x p approaches that of a zero-mean normal with variance Î so that the estimator is consistent and a CLT holds. Because θ i are iid, the mgf for Z
3 can be written as Z t = θ i= i / t = θ/ t = θt/ = Eexpt θ/. 9 Using Talor s theorem on the exponential, we obtain Z t = = E + t θ + t θ + t θ ! 3/ + t χ p, x ppx + t E θ + + O t3 6 E θ / ote that, since θ = θ IX; Y, we have that var θ = var θ and E θ = O such that E θ = O. We now expand the third term in Eqn. 0 using our expression for variance Eqn. 8 of θ: E θ = var θ + E θ = var θ + O = log p x p + HY χ p, x ppx log p /pd + O. Furthermore, note that since all the moments of θ are finite, we can ignore the 3/ or lower terms since the disappear as. For the limit to exist, the coefficient for / must deca at a rate at least / ; thus, we must have = Ω. Finall, recall that the mgf for a Gaussian with mean µ and variance is exptµ + t. Therefore, to have a consistent estimator Z has zero mean in the limit, we require the coefficient on the t term to disappear in the limit. This requires that grows strictl faster than : = ω. Letting = ω, the limit of Eqn. 0 is: lim Z t = exp log p x t. p Since the mgf approaches that of a zero mean Gaussian with variance log p x p we can conclude that the empirical plug-in I estimator is consistent and satisfies the following CLT when = ω : lim Î IX; Y 0,. 3 Î Lastl, we demonstrate that as, the robust influence function approaches that of the empirical such that the consistenc and CLT guarantees also hold for our robust es- timator. Let α =. ote that the influence function corresponding to an empirical estimate is ψ empir x = cx where c is an constant. The robust cost function, parameterized b α, is given as ψx; α = { log + αx + αx /, x θ log αx + αx /, x < θ. For simplicit of analsis, we will onl consider x 0 however, the proof for x < 0 follows similarl. ote that lim αx = lim = 0. Therefore, we take the Talor series of log + + / at 0, where = αx, and obtain x ψx; α = αx + 0 αx αx3 3! + As, the first term dominates and we get that the robust influence function approaches the empirical function with c = α..3. Proof of Prop. 3 Here, we first show the finite sample deviation bound on the robust estimator, then secondl the bound on the empirical. Let {x i, i } i= px, Y and for each i let {x ij } j= px Y be independent samples. We define x i {x ij } j=. Recall that Î = ROOT i ψαθi Î where θ i = log pi x i ˆp i ;x i and ˆpi ; x i is the posterior predictive estimator. Assuming α = / Î and > + log ɛ, we have from Prop..4 of Catoni, 0, the robust estimator satisfies with probabilit at least ɛ where c Î m c 4 m = Eθ i = E log and c = +log ɛ Î / + +log ɛ / p x ˆp; x. Since the samples are identicall distributed, m does not depend on sample index i. ote that, because the denominator contains the estimated posterior predictive, this currentl does not bound
4 deviation from the true I value, I. We can rewrite m as follows: p x p Y m = E log ˆp; x p Y p x p Y = E log + E log p Y ˆp; x p Y = I + E log ˆp; x = I + E x KLp Y ˆp; x. 5 ote that the above derivation holds for both the robust and empirical estimators. Letting b = E x KLp Y ˆp; x, we obtain the presented deviation bounds on the robust estimator: b c Î m b + c 6 Similarl, the empirical estimate can be bounded as in Eqn. 4 but with c = î ɛ b Chebshev s inequalit. Since the same derivation relating the true mean to the plug-in mean in Eqn. 5 holds for the empirical, we have that Eqn. 6 also carries through. Although the high-level expression stas the same, we must be careful to note that both b and c differ for the two estimators. The difference in b is a bit more subtle. b is the expected KL divergence between the posterior predictive and the estimated posterior predictive; the estimated posterior predictive differs between robust and empirical..4. Proof of Prop. 4 We now show how we approximate the probabilit that the correct action is selected. Without loss of generalit, let I I... I A and let f a Îa and F a Îa denote the pdf and cdf of the estimate. Since independent samples are drawn to estimate I under each candidate action, their estimates are independent. This allows us to decompose the probabilit of selecting the correct action as: + A Î Pa = = fî fîadîa dî a= + A = fî F a Î dî 7 For, we have that Pa = + a= Î; I, A Î I a Φ dî a= where a = log pa x p a Y and Φ is the cumulative distribution function of the standard normal distribution. a Posterior Prob odel: Power Experiment Random SC Sequential Robust Posterior Prob odel: Exponential Experiment Figure. emor retention model. Plots of posterior model probabilit of the correct retention model for 500 random trials of 00 experiments with 500 particles. Data are sampled from POW left and EXP right. Plots show median solid and quartiles dashed over the same set of runs.. Experiment: emor Retention odel Selection We appl our method to sequential experiment design for Baesian model discrimination. We adopt the analsis posed in Cavagnaro et al., 00 where the aim is to discriminate among candidate models of memor retention. At each stage an experiment is conducted where a simulated participant is given a list of words to memorize. After a chosen lag time d t participants are asked to recall the words. At stage t the aim is to select lag time d t which maximall discriminates between two retention models, power POW and exponential EXP: p 0 d t θ = θ 0 d t + θ, p d t θ = θ 0 e θdt. The joint distribution combines the retention model with a response variable t where t = indicates the participant remembers the words and t = 0 otherwise, m Unif, θ 0 Betaα 0, β 0, θ Betaα, β, t m, θ; d t Bernoulli p m d t θ. At each stage we select the lag time which maximizes mutual information between the model and observation, arg max dt I dt ; Y t Y t. Expectations over model and response Y t are discrete, and can be computed efficientl. Furthermore, under a uniform prior we have that the model posterior is proportional to the evidences, Z mt t, d t pθ m t p i θ, m; d t dθ 8 i= We compare our integrated inference and planning approach to the sequential onte Carlo SC approach of Drovandi et al., 04; Chopin, 00, which utilizes the SC estimate of the evidence for planning: log Ẑmt = log Ẑm,t + log i= wi mt. We observe moderate slightl improved median performance and tighter quartiles as shown in Fig.. Im-
5 provements over SC are modest, likel due to the lowdimensional D integration over θ. We also find that when data are sampled from the exponential model posterior concentrates quickl under both methods, which confirms findings of the original authors. Both our method and SC significantl outperform random design. References Catoni, O. Challenging the empirical mean and empirical variance: A deviation stud. Ann. Inst. H. Poincar Probab. Statist., 484:48 85, 0. doi: 0. 4/-AIHP454. Cavagnaro, D. R., ung, J. I., Pitt,. A., and Kujala, J. V. Adaptive design optimization: A I-based approach to model discrimination in cognitive science. eural- Comp., 4: , 00. Chopin,. A sequential particle filter method for static models. Biometrika, 893:539 55, 00. DasGupta, A. Asmptotic Theor of Statistics and Probabilit. Springer Texts in Statistics. Springer ew York, 008. ISB Drovandi, C. C., cgree, J.., and Pettitt, A.. A sequential monte carlo algorithm to incorporate model uncertaint in baesian sequential design. JCGS, 3: 3 4, 04.
Robust Monte Carlo Methods for Sequential Planning and Decision Making
Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory
More informationBayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo
Bayesian Sequential Design under Model Uncertainty using Sequential Monte Carlo, James McGree, Tony Pettitt October 7, 2 Introduction Motivation Model choice abundant throughout literature Take into account
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationA Few Notes on Fisher Information (WIP)
A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More information8 STOCHASTIC SIMULATION
8 STOCHASTIC SIMULATIO 59 8 STOCHASTIC SIMULATIO Whereas in optimization we seek a set of parameters x to minimize a cost, or to maximize a reward function J( x), here we pose a related but different question.
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationSequential Monte Carlo Algorithms for Bayesian Sequential Design
Sequential Monte Carlo Algorithms for Bayesian Sequential Design Dr Queensland University of Technology c.drovandi@qut.edu.au Collaborators: James McGree, Tony Pettitt, Gentry White Acknowledgements: Australian
More informationCSCI-6971 Lecture Notes: Monte Carlo integration
CSCI-6971 Lecture otes: Monte Carlo integration Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu February 21, 2006 1 Overview Consider the following
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationReview of Elementary Probability Lecture I Hamid R. Rabiee
Stochastic Processes Review o Elementar Probabilit Lecture I Hamid R. Rabiee Outline Histor/Philosoph Random Variables Densit/Distribution Functions Joint/Conditional Distributions Correlation Important
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationUC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes
UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationLimiting Distributions
Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the
More informationPhysics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester
Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationChapter 2. Discrete Distributions
Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation
More informationPerturbation Theory for Variational Inference
Perturbation heor for Variational Inference Manfred Opper U Berlin Marco Fraccaro echnical Universit of Denmark Ulrich Paquet Apple Ale Susemihl U Berlin Ole Winther echnical Universit of Denmark Abstract
More informationDistributions of Functions of Random Variables. 5.1 Functions of One Random Variable
Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationChapter 4. Chapter 4 sections
Chapter 4 sections 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP: 4.8 Utility Expectation
More information7 Random samples and sampling distributions
7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationExpect Values and Probability Density Functions
Intelligent Systems: Reasoning and Recognition James L. Crowley ESIAG / osig Second Semester 00/0 Lesson 5 8 april 0 Expect Values and Probability Density Functions otation... Bayesian Classification (Reminder...3
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationCS 540: Machine Learning Lecture 2: Review of Probability & Statistics
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()
More informationStatistics 3657 : Moment Approximations
Statistics 3657 : Moment Approximations Preliminaries Suppose that we have a r.v. and that we wish to calculate the expectation of g) for some function g. Of course we could calculate it as Eg)) by the
More informationPredictive Hypothesis Identification
Marcus Hutter - 1 - Predictive Hypothesis Identification Predictive Hypothesis Identification Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive
More informationGeneralized Linear Models
Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationContents 1. Contents
Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................
More informationExpectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Expectation DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance,
More informationAnnealing Between Distributions by Averaging Moments
Annealing Between Distributions by Averaging Moments Chris J. Maddison Dept. of Comp. Sci. University of Toronto Roger Grosse CSAIL MIT Ruslan Salakhutdinov University of Toronto Partition Functions We
More information0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.
1 A socioeconomic stud analzes two discrete random variables in a certain population of households = number of adult residents and = number of child residents It is found that their joint probabilit mass
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationMidterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm
Midterm Examination STA 215: Statistical Inference Due Wednesday, 2006 Mar 8, 1:15 pm This is an open-book take-home examination. You may work on it during any consecutive 24-hour period you like; please
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More information4. Distributions of Functions of Random Variables
4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n
More informationOptimal input design for nonlinear dynamical systems: a graph-theory approach
Optimal input design for nonlinear dynamical systems: a graph-theory approach Patricio E. Valenzuela Department of Automatic Control and ACCESS Linnaeus Centre KTH Royal Institute of Technology, Stockholm,
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationAnalysis of Experimental Designs
Analysis of Experimental Designs p. 1/? Analysis of Experimental Designs Gilles Lamothe Mathematics and Statistics University of Ottawa Analysis of Experimental Designs p. 2/? Review of Probability A short
More informationSqueezing Every Ounce of Information from An Experiment: Adaptive Design Optimization
Squeezing Every Ounce of Information from An Experiment: Adaptive Design Optimization Jay Myung Department of Psychology Ohio State University UCI Department of Cognitive Sciences Colloquium (May 21, 2014)
More informationMATH Notebook 5 Fall 2018/2019
MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables.............................
More informationA Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution
A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The
More information3.6 Applications of Poisson s equation
William J. Parnell: MT3432. Section 3: Green s functions in 2 and 3D 69 3.6 Applications of Poisson s equation The beaut of Green s functions is that the are useful! Let us consider an example here where
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More information6 The normal distribution, the central limit theorem and random samples
6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e
More informationExpectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda
Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean,
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationProbability Models. 4. What is the definition of the expectation of a discrete random variable?
1 Probability Models The list of questions below is provided in order to help you to prepare for the test and exam. It reflects only the theoretical part of the course. You should expect the questions
More informationMonte-Carlo SURE: A Black-Box Optimization of Regularization Parameters for General Denoising Algorithms - Supplementary Material
Monte-Carlo SURE: A Blac-Box Optimization of Regularization Parameters for General Denoising Algorithms - Supplementar Material Sathish Ramani, Student Member, Thierr Blu, Senior Member, and Michael Unser,
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationFault Detection and Diagnosis Using Information Measures
Fault Detection and Diagnosis Using Information Measures Rudolf Kulhavý Honewell Technolog Center & Institute of Information Theor and Automation Prague, Cech Republic Outline Probabilit-based inference
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationLecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.
Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationBrief Review of Probability
Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationPosted June 26, 2013 In the solution of Problem 4 in Practice Examination 8, the fourth item in the list of six events was mistyped as { X 1
ASM Stud Manual for Exam P, Sixth Edition B Dr. Krzsztof M. Ostaszewski, FSA, CFA, MAAA Web site: http://www.krzsio.net E-mail: krzsio@krzsio.net Errata Effective Jul 5, 3, onl the latest edition of this
More informationEntropy, Relative Entropy and Mutual Information Exercises
Entrop, Relative Entrop and Mutual Information Exercises Exercise 2.: Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. (a) Find the entrop H(X)
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationSection 9.1. Expected Values of Sums
Section 9.1 Expected Values of Sums Theorem 9.1 For any set of random variables X 1,..., X n, the sum W n = X 1 + + X n has expected value E [W n ] = E [X 1 ] + E [X 2 ] + + E [X n ]. Proof: Theorem 9.1
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationDiscrete Variables and Gradient Estimators
iscrete Variables and Gradient Estimators This assignment is designed to get you comfortable deriving gradient estimators, and optimizing distributions over discrete random variables. For most questions,
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationContinuous Distributions
Continuous Distributions 1.8-1.9: Continuous Random Variables 1.10.1: Uniform Distribution (Continuous) 1.10.4-5 Exponential and Gamma Distributions: Distance between crossovers Prof. Tesler Math 283 Fall
More informationMathematical Statistics
Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationAdvanced Introduction to Machine Learning: Homework 2 MLE, MAP and Naive Bayes
10-715 Advanced Introduction to Machine Learning: Homework 2 MLE, MAP and Naive Baes Released: Wednesda, September 12, 2018 Due: 11:59 p.m. Wednesda, September 19, 2018 Instructions Late homework polic:
More informationIterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk
Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract
More informationChapter 4 Multiple Random Variables
Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More information