Lecture 13 and 14: Bayesian estimation theory
|
|
- Marcia Howard
- 6 years ago
- Views:
Transcription
1 1 Lecture 13 and 14: Bayesian estimation theory Spring EE 194 Networked estimation and control (Prof. Khan) March I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates a parameter θ from a probability density function p(θ). This parameter θ then codes (or parameterizes) the conditional (or measurement) density f(x θ). A random experiment generates a measurement x from f(x θ). The problem is to estimate θ from x. We denote the estimate by (x). The Bayesian setup consists of the following notion. Loss function: The quality of the estimate (x) is measured by a real-valued loss function. Some examples are: Quadratic loss function: L(θ (x)) [θ (x)] T [θ (x]. Binary (01) loss function: L(θ (x)) 0 if (x θ and 1 otherwise. Risk: The risk can be defined as the average loss function over the density p(x θ). The risk basically addresses what is the average loss or risk associate with the estimate (x). Mathematically R(θ ) E x L(θ (x)) L(θ (x))f(x θ)dx. The notation E x indicates that the expectation is over the distribution of the random measurement x (with θ fixed). Bayes risk: Bayes risk is the risk averaged over the prior distribution on θ: R(p ) E θ R(θ ) R(θ )p(θ)dθ L(θ (x)) f(x θ)dxp(θ)dθ. }{{} f(xθ)dxdθ Bayes Risk estimator: The Bayes risk estimator minimizes the Bayes risk: B arg min R(p ) i.e. the value of that minimizes the Bayes risk.
2 2 The Bayes risk estimator is a rule for mapping the observations x into estimates B (x). It depends on the conditional distribution of the measurements and the prior distribution of the parameter. When this prior is not known then the mini-max principle may be used. Mini-max estimator Suppose an experimentalist chooses an estimate and Mother Nature (M) is allowed to choose her prior after the experimentalist (E) has made his/her choice. If Mother Nature does not like the experimentalist she will try to maximize the average risk for any choice : max R(p ). p We can turn this into a game between M and E by allowing E to observe the resulting average risk and permitting him/her to choose a decision rule to minimize this max average risk: min max R(p ). p The estimator that does this is called the mini-max estimator θ: θ arg min max R(p ). p There are other variants of this setup and leads to very fundamental questions in game theory. Recall that the Bayes risk is given by R(p ) II. COMPUTING BAYES RISK ESTIMATORS L(θ (x))f(x θ)dxdθ where f(x θ) f(x θ)p(θ). From Bayes rule we have f(x θ) f(θ x)f(x) where f(θ x) is the posterior density of θ given x and f(x) is the marginal density for x: f(θ x) f(x) f(x θ) f(x θ) f(x) f(x) p(θ) f(x θ)dθ f(x θ)p(θ)dθ. There is an important physical interpretation of the first formula. The prior density is mapped to the posterior density by the ratio of the conditional measurement density to the marginal density p(θ) x f(θ x)
3 3 i.e. the data x is used to map the prior into the posterior. The Bayes risk estimator is thus B (x) arg min L(θ (x))f(x θ)dxdθ arg min L(θ (x))f(θ x)f(x)dxdθ ( ) arg min L(θ (x))f(θ x)dθ f(x)dx arg min L(θ (x))f(θ x)dθ }{{} Conditional Bayes risk since the marginal density f(x) is non-negative. The result says that the Bayes risk estimator is the estimator that minimizes the conditional risk; conditional risk is the loss averaged over the conditional distribution of θ given x. Now to compute a particular estimator we need to consider some typical loss functions. Quadratic loss function: When the loss function is quadratic: L(θ (x)) [θ (x)] T [θ (x)] we may write the conditional Bayes risk as L(θ (x))f(θ x)dθ [θ (x)] T [θ (x)]f(θ x)dθ the gradient of the risk w.r.t is ( ) [θ (x)] T [θ (x)]f(θ x)dθ ( [θ (x)] T [θ (x)] ) f(θ x)dθ 2 [θ (x)]f(θ x)dθ and the second-derivative (curl?) is [ ] T ( ) [ T 2 [θ I > 0. The Bayes risk estimator now becomes 2 [θ (x)]f(θ x)dθ 0 θf(θ x)dθ (x)f(θ x)dθ B (x) θf(θ x)dθ E(θ x).
4 4 We say that the Bayes risk estimator under the quadratic loss function is the conditional mean of θ given x. In a nutshell Bayes estimation under quadratic loss comes down to the computation of the mean of the conditional density f(θ x). Nonlinear filtering is a generic term for this calculation because generally the result is a nonlinear function of the measurement x. Uniform loss function: Assume that the loss function is L(θ (x)) { 0 θ (x) ε 1 θ (x) > ε where ε > 0. Based on this loss function the expected posterior loss function becomes EL(θ (x)) 1 P ( θ (x) > ε) + 0 P ( θ (x) ε) P ( θ (x) > ε) 1 P ( θ (x) ε) 1 (x)+ε (x) ε f(θ x)dθ. The above is minimized when the negative term is maximized: (x) arg max θ In the limit that ε 0 the above becomes (x)+ε (x) ε f(θ x)dθ. which is the MAP estimator. lim ε 0 (x) arg max f(θ x) θ
5 5 Lecture 14: Wednesday Example 1: A radioactive source emits n radioactive particles and an imperfect Geiger counter records k n of them. Our problem is to estimate n from the measurements k. We assume that n is drawn from a Poisson distribution with known parameter λ: λ λn P [n] e n! n 0. The Poisson distribution characterizes the rate of emission of a process in a given interval of time (or space). Its likely to see a large n when the expected number of occurrences λ is high and a small n when the expected number of occurrences λ is small. We can show that E[n] λ and E((n E[n]) 2 ) λ. The number of recorded counts follow a binomial distribution: P [k n] ( n k ) p k (1 p) n k 0 k n E[k n] np E[k n] np(1 p). The Binomial distribution is the sum of i.i.d Bernoulli trials. Suppose a rv is 1 with probability p and 0 with 1 p. Then the binomial distribution characterizes what is the total number of 1 s we may observe over n trials. In order to proceed with the Bayesian analysis we need to compute the posterior distribution of n k: P [n k] P [n k] P [k] which requires the joint and the marginals. We have ( n P [n k] P [k n]p [n] )p k (1 p) n k λ λn e 0 k n n 0. k n! The marginal of k is P [k] ( n )p k (1 p) n k λ λn e k n! k 0 λ n k (λp) k (1 p) n k e λ k!(n k)! (λp) k e λ k! (λp)k e λ+λ λp k! λp (λp)k e k! (λ(1 p)) n k (n k)!
6 6 which is Poisson with rate λp. Now the posterior is P [n k] n! k!(n k)! pk (1 p) n k λ λn e n! e λp (λp)k k! 1 (n k)! (λ(1 p))n k e λ(1 p) n k which is similar to Poisson but n instead of starting from 0 starts from k. This has been called the Poisson distribution with displacement k. The conditional mean and variance are 1 E[n k] n (n k)! (λ(1 p))n k e λ(1 p) 1 (n k + k) (n k)! (λ(1 p))n k e λ(1 p) 1 (n k) (n k)! (λ(1 p))n k e λ(1 p) + λ(1 p)e λ(1 p) λ(1 p) + k; n k (n k)! (λ(1 p))n k 1 + ke λ(1 p) e λ(1 p) 1 k (n k)! (λ(1 p))n k e λ(1 p) E[(n E[n k]) 2 k] λ(1 p) Exercise. When the loss function is quadratic the optimal Bayes estimator is the conditional mean and thus n B E[n k] λ(1 p) + k. The Bayes estimate is k when p 1 independent of the expected number of occurrences λ. Since our measurement model is Bernoulli we can show that P (k n n) 1 when p 1. Similarly when p 0 i.e. we see no observations almost surely then the Bayes estimate is λ which is the expected number of occurrences. For 0 < p < 1 Bayes estimate optimally combines the two extremes. We can also think of λ(1 p) as the expected number of missed counts in this sense Bayes estimate applies a correction to include the missed counts. One can easily show that E[ n B ] λ E[n] i.e. the estimate is unbiased. However this is not conditionally unbiased i.e. The mean squared error in the estimator is E[ n B n] E[k n] + λ(1 p) np + λ(1 p) n. E[(n n B ) 2 ] E k ( E[(n nb ) 2 ] k ) E k ( E[(n E(n k)) 2 ] k ) E k (λ(1 p) k) λ(1 p).
7 7 III. MULTIVARIATE NORMAL Let x and y be jointly distributed according to the normal distribution: [ ] ([ ] [ ]) x 0 R xx R xy N y 0 Recall that the marginals are also normal i.e. R yx R yy x N(0 R xx ) y N(0 R yy ) where R xx E[xx T ] and so on. It can be shown that y x N(R yx R 1 xx x R yy R yx R 1 xx R xy ) x y N(R xy R 1 yy y R xx R xy R 1 yy R yx ). Hence the optimal Bayes estimate under quadratic loss is the mean of the posterior i.e. x B R xy Ryy 1 y. We can think of this as Mother Nature generating x from p(x) N(0 R xx ) distribution and Father Nature generating a measurement from from f(y x) which is also normal. What function relating y to x will result into the above f(y x)? Recalling that the sum of two normal random variables is also normal note that y Hx + r with H R yx Rxx 1 and r N(0 Q) statistically independent from x will result into the above f(y x). In other words we can generate the jointly normal x and y process as described above by two statistically independent normal random vectors x N(0 R xx ) and r N(0 Q) and by relating y and x as above. While generating this signal plus measurement model i.e. x being a signal and y Hx + r being the measurement we define one new matrix R yx and R yy is directly given by R xx and Q. Clearly R xy R T yx. Show that R yy R xy R 1 xx R T xy + Q. In short one can generate a jointly normal random process by two independent normal processes and a linear map.
8 8 IV. LINEAR STATISTICAL MODEL Consider the following signal plus noise model: y Hx + n where x N(0 R xx ) and n N(0 R nn ) are statistically independent. The correlation between x and y is R yx E[yx T ] E[(Hx + n)x T ] HR xx and the covariance of y is E[yy T ] HR xx H T + R nn. Thus x and y are jointly normal: [ ] ([ x N y 0 0 ] [ R xx HR xx R xx H T HR xx H T + R nn ]). Clearly the Bayes estimate under quadratic loss is the conditional mean of x y: with conditional covariance: x B R xx H T (HR xx H T + R nn ) 1 y }{{} G then P R xx R xx H T (HR xx H T + R nn ) 1 HR }{{} xx. G From the matrix inversion lemma note that P (R 1 nnh) 1 P 1 (R 1 nnh) GHR xx R xx P P (P 1 R xx I)(R 1 nnh) P ((R 1 nnh)r xx I) P (I + H T R 1 nnhr xx I) P H T R 1 nnhr xx G P H T R 1 nn. Hence the estimator can be re-written with x B P H T R 1 nny P (R 1 nnh) 1.
9 9 V. SEQUENTIAL BAYES The results of the previous section may be used to derive recursive estimates of the random vector x when the measurement vector y t [y 0 y 1... y t ] T increases in dimension with time. The basic idea is to write [ y t H t x + n t ] [ ] [ ] y t 1 y t H t 1 c T t x + n t 1 n t i.e. the kth measurement can be written as y k c T k x + n k where x N(0 R xx ) and n N(0 R t ) are statistically independent with R t being diagonal with elements r tt on the diagonal; R 00 r 00. This means that [ ] R 1 t Rt T rtt 1 The joint distribution of x and y t is [ ] ([ ] [ ]) x 0 R xx R xx Ht T N. y t 0 H t R xx H t R xx Ht T + R t The posterior is x y N( x t P t ) x t P t H T t R 1 t y t P 1 t (R 1 xx + H T t R 1 t H t ). The dimensions of H t R t and y t increase in time whereas they are fixed for x t and P t. How can we make the estimate equations recursive?.
Mathematical statistics: Estimation theory
Mathematical statistics: Estimation theory EE 30: Networked estimation and control Prof. Khan I. BASIC STATISTICS The sample space Ω is the set of all possible outcomes of a random eriment. A sample point
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationSTAT 830 Bayesian Estimation
STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More information13. Parameter Estimation. ECE 830, Spring 2014
13. Parameter Estimation ECE 830, Spring 2014 1 / 18 Primary Goal General problem statement: We observe X p(x θ), θ Θ and the goal is to determine the θ that produced X. Given a collection of observations
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationEstimation Theory Fredrik Rusek. Chapter 11
Estimation Theory Fredrik Rusek Chapter 11 Chapter 10 Bayesian Estimation Section 10.8 Bayesian estimators for deterministic parameters If no MVU estimator exists, or is very hard to find, we can apply
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationINTRODUCTION TO BAYESIAN METHODS II
INTRODUCTION TO BAYESIAN METHODS II Abstract. We will revisit point estimation and hypothesis testing from the Bayesian perspective.. Bayes estimators Let X = (X,..., X n ) be a random sample from the
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More informationLecture 16: State Space Model and Kalman Filter Bus 41910, Time Series Analysis, Mr. R. Tsay
Lecture 6: State Space Model and Kalman Filter Bus 490, Time Series Analysis, Mr R Tsay A state space model consists of two equations: S t+ F S t + Ge t+, () Z t HS t + ɛ t (2) where S t is a state vector
More informationCS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro
CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your
More informationName of the Student: Problems on Discrete & Continuous R.Vs
Engineering Mathematics 05 SUBJECT NAME : Probability & Random Process SUBJECT CODE : MA6 MATERIAL NAME : University Questions MATERIAL CODE : JM08AM004 REGULATION : R008 UPDATED ON : Nov-Dec 04 (Scan
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationSTAT 430/510 Probability
STAT 430/510 Probability Hui Nie Lecture 16 June 24th, 2009 Review Sum of Independent Normal Random Variables Sum of Independent Poisson Random Variables Sum of Independent Binomial Random Variables Conditional
More informationLecture 8 October Bayes Estimators and Average Risk Optimality
STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.
More informationComputational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept
Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples
More information18.175: Lecture 13 Infinite divisibility and Lévy processes
18.175 Lecture 13 18.175: Lecture 13 Infinite divisibility and Lévy processes Scott Sheffield MIT Outline Poisson random variable convergence Extend CLT idea to stable random variables Infinite divisibility
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationACM 116: Lectures 3 4
1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationUSEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*
USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationSTAT 430/510: Lecture 15
STAT 430/510: Lecture 15 James Piette June 23, 2010 Updates HW4 is up on my website. It is due next Mon. (June 28th). Starting today back at section 6.4... Conditional Distribution: Discrete Def: The conditional
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationChapter 2. Discrete Distributions
Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationConvergence in Distribution
Convergence in Distribution Undergraduate version of central limit theorem: if X 1,..., X n are iid from a population with mean µ and standard deviation σ then n 1/2 ( X µ)/σ has approximately a normal
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More informationP (A G) dp G P (A G)
First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume
More informationDepartment of Statistical Science FIRST YEAR EXAM - SPRING 2017
Department of Statistical Science Duke University FIRST YEAR EXAM - SPRING 017 Monday May 8th 017, 9:00 AM 1:00 PM NOTES: PLEASE READ CAREFULLY BEFORE BEGINNING EXAM! 1. Do not write solutions on the exam;
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationMeasure-theoretic probability
Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationProblems on Discrete & Continuous R.Vs
013 SUBJECT NAME SUBJECT CODE MATERIAL NAME MATERIAL CODE : Probability & Random Process : MA 61 : University Questions : SKMA1004 Name of the Student: Branch: Unit I (Random Variables) Problems on Discrete
More informationLECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline
LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity
More informationStatistical Theory MT 2006 Problems 4: Solution sketches
Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationLecture 12. Poisson random variables
18.440: Lecture 12 Poisson random variables Scott Sheffield MIT 1 Outline Poisson random variable definition Poisson random variable properties Poisson random variable problems 2 Outline Poisson random
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationLecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes
Lecture Notes 7 Random Processes Definition IID Processes Bernoulli Process Binomial Counting Process Interarrival Time Process Markov Processes Markov Chains Classification of States Steady State Probabilities
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationIntro to Probability. Andrei Barbu
Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More information1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.
probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I
More informationLecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )
LECURE NOES 21 Lecture 4 7. Sufficient statistics Consider the usual statistical setup: the data is X and the paramter is. o gain information about the parameter we study various functions of the data
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationfor valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I
Code: 15A04304 R15 B.Tech II Year I Semester (R15) Regular Examinations November/December 016 PROBABILITY THEY & STOCHASTIC PROCESSES (Electronics and Communication Engineering) Time: 3 hours Max. Marks:
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationEE514A Information Theory I Fall 2013
EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/
More informationName of the Student: Problems on Discrete & Continuous R.Vs
Engineering Mathematics 08 SUBJECT NAME : Probability & Random Processes SUBJECT CODE : MA645 MATERIAL NAME : University Questions REGULATION : R03 UPDATED ON : November 07 (Upto N/D 07 Q.P) (Scan the
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive
More informationRandom Variables. P(x) = P[X(e)] = P(e). (1)
Random Variables Random variable (discrete or continuous) is used to derive the output statistical properties of a system whose input is a random variable or random in nature. Definition Consider an experiment
More informationReview of probability
Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables
More information18.175: Lecture 17 Poisson random variables
18.175: Lecture 17 Poisson random variables Scott Sheffield MIT 1 Outline More on random walks and local CLT Poisson random variable convergence Extend CLT idea to stable random variables 2 Outline More
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationMultivariate distributions
CHAPTER Multivariate distributions.. Introduction We want to discuss collections of random variables (X, X,..., X n ), which are known as random vectors. In the discrete case, we can define the density
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationFrank Porter April 3, 2017
Frank Porter April 3, 2017 Chapter 1 Probability 1.1 Definition of Probability The notion of probability concerns the measure ( size ) of sets in a space. The space may be called a Sample Space or an Event
More informationConvergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit
Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Evan Kwiatkowski, Jan Mandel University of Colorado Denver December 11, 2014 OUTLINE 2 Data Assimilation Bayesian Estimation
More informationContents 1. Contents
Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................
More informationPractice Examination # 3
Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationFinal Exam # 3. Sta 230: Probability. December 16, 2012
Final Exam # 3 Sta 230: Probability December 16, 2012 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use the extra sheets
More information