Methods of Estimation

Size: px
Start display at page:

Download "Methods of Estimation"

Transcription

1 Methods of Estimation MIT Dr. Kempthorne Spring

2 Outline Methods of Estimation I 1 Methods of Estimation I 2

3 X X, X P P = {P θ, θ Θ}. Problem: Finding a function θˆ(x ) which is close to θ. Consider ρ : X Θ R. and define D(θ 0, θ) to measure the discrepancy between θ and the true value θ 0. D(θ 0, θ) = E θ0 ρ(x, θ). As a discrepancy measure, D makes sense if the value of θ minimizing the function is θ = θ 0. If P θ0 were true, and we knew D(θ 0, θ), we could obtain θ 0 as the minimizer. Instead of observing D(θ 0, θ), we observe ρ(x, θ). ρ(, ) is a contrast function ˆθ(X ) is a minimum-contrast estimate. 3

4 The definition extends to Euclidean Θ R d. θ 0 an interior point of Θ. Smooth mapping: θ D(θ 0, θ). θ = θ 0 solves ' θ D(θ 0, θ) = 0. where ' θ = (,..., θ 1 θ ) T d Substitute ρ(x, θ) for D(θ 0, θ) and solve ˆ ' θ ρ(x, θ) = 0 at θ = θ. 4

5 Estimating Equations: Ψ : X R d R d, where Ψ = (ψ 1,..., ψ d ) T. For every θ 0 Θ, the expectation of Ψ given P θ0 has a unique solution V (θ 0, θ) = E θ0 [Ψ(X, θ)] = 0 at θ = θ 0. 5

6 Example Least Squares. µ(z) = g(β, z), β R d. x = {(z i, Y i ) : 1 i n}, where Y 1,..., Y n are independent. 1 Define ρ(x, β) = Y µ 2 = n i=1[y i g(β, z i )] 2. Consider Y i = µ(z i ) + E i, where µ(z i ) = g(β, z i ) and the E i are iid N(0, σ 2 0 ). Then, β parametrizes the model and we can write: D(β 0, β) = E β0 ρ(x 1, β) = nσ2 n 0 + i=1[g(β 0, z i ) g(β, z i )] 2 ]. This is minimized by β = β 0 and uniquely so iff β identifiable. 6

7 ˆ The least-squares estimate β minimizes ρ(x, β). Conditions to guarantee existence of βˆ: Continuity of g(, z i ). Minimum of ρ(x, ) existing on compact set {β} e.g., lim g(β, z i ) =. β If g(β, z i ) is differentiable in β, then βˆ satisfies the Normal Equations obtained by 1 taking partial derivatives of n ρ(x, β) = Y µ 2 = i=1[y i g(β, z i )] 2 and solving: ρ(x, β) β j = 0 7

8 1 n ρ(x, β) = Y µ 2 = [Y i g(β, z i )] 2 i=1 Solve: ρ(x, β) β j = 0 n g(β, z i ) 2[Y i g(β, z i )] ( 1) β j = 0 i=1 n n g(β, z i ) g(β, z i ) Y i g(β, z i ) = 0 β j β j i=1 i=1 8

9 Linear case: g(β, z i ) = 1 d j=1 z ij β = z T j i β ρ(x, β) = 0 β j n n g(β, z i ) g(β, z i ) Y i g(β, z i ) = 0 β j β j i=1 i=1 n n T z ij Y i z i,j (z β) = 0 i=1 i=1 n d n i z ij Y i z i,j z i,k β k = 0, j = 1,..., d i=1 k=1 i=1 Z T D Y ZT D Z D β = 0 where Z D is the (n d) design matrix with (i, j) element z i,j 9

10 Note: Least Squares exemplifies minimum contrast and estimating equation methodology. Distribution assumptions are not necessary to motivate the estimate as a mathematical approximation. 10

11 Method of Moments Methods of Estimation I Method of Moments X 1,..., X n iid X P θ, θ R d. µ 1 (θ), µ 2 (θ),..., µ d (θ): µ j (θ) = µ j = E [X j θ] the jth moment of X. Sample moments: n µˆj = X j i, j = 1,..., d. i=1 Method of Moments: Solve for θ in the system of equations µ 1 (θ) = µˆ1 µ 2 (θ) = µˆ2.. µ d (θ) = µˆd 11

12 Note:. θ must be identifiable Existence of µ j : lim µˆj = µ j with µ j <. n If q(θ) = h(µ 1,..., µ d ), then the Method-of-Moments Estimate of q(θ) is qˆ(θ) = h(ˆµ 1,..., µˆd ). The MOM estimate of θ may not be unique! (See Problem ) 12

13 Plug-In and Extension Principles Frequency Plug-In Multinomial Sample: X 1,..., X n with K values v 1,..., v K P(X i = v j ) = p j j = 1,..., K Plug in estimates: ˆp j = N j /n where N j = count({i : X i = v j }) Apply to any function q(p 1,..., p K ): qˆ = q(ˆp 1,..., pˆk ) Equivalent to substituting the true distribution function P θ (t) = P(X t θ) underlying an iid sample with the empirical distribution function: n ˆP(t) = 1 1{x i t} n i=1 Pˆ is an estimate of P, and ν(pˆ) is an estimate of ν(p). 13

14 14 Example: αth population quantile ν α (P) = 1 [F 1 (α) + F 1 2 U (α)], with 0 < α < 1: where F 1 (α) = inf {x : F (x) α} F 1 U (α) = sup{x : F (x) α} The plug-in estimate is νˆα(p) = ν α (Pˆ) = 1 [Fˆ 1 (α) + Fˆ 1 2 U (α)]. Example: Method of Moments Estimates of jth Moment ν(p) = µ j = E (X j ) 1 νˆ(p) = = ν(ˆ n j µˆj P) = 1 Extension Principle n i=1 x i Objective: estimate q(θ), a function of θ. Assume q(θ) = h(p 1 (θ),..., p K (θ)), where h() is continuous. The extension principle estimates q(θ) with qˆ(θ) = h(ˆp 1,..., pˆk ) h() may not be unique: what h() is optimal? MIT Methods of Estimation

15 Notes on Method-of-Moments/Frequency Plug-In Estimates Easy to compute Valuable as initial estimates in iterative algorithms. Consistent estimates (close to true parameter in large samples). Best Frequency Plug-In Estimates are Maximum-Likelihood Estimates. In some cases, MOM estimators are foolish (See Example 2.1.7). 15

16 Outline Methods of Estimation I 1 Methods of Estimation I 16

17 Least Squares Methods of Estimation I General Model: Only Y Random X = {(z i, Y i ) : 1 i n}, where Y 1,..., Y n are independent. z 1,..., z n R d are fixed, non-random. For cases i = 1,..., n Y i = µ(z i ) + E i, where µ(z) = g(β, z), β R d. E i are independent with E [E i ] = 0. The Least-Squares Contrast function 1 is ρ(x, β) = Y µ 2 = n i=1[y i g(β, z i )] 2. β parametrizes the model and we can write the discrepancy function D(β 0, β) = E β0 ρ(x, β) 17

18 Least Squares: Only Y Random Contrast Function: 1 ρ(x, β) = Y µ 2 = n i=1[y i g(β, z i )] 2. Discrepancy Function: D(β 0, β) = E 1 β0 ρ(x, β) n 1n = Var(E i ) + [g(β 0, z i ) g(β, z i )] 2 ]. i=1 i=1 The model is semiparametric with unknown parameter β and unknown (joint) distribution P e of E= (E 1,..., E n ). Gauss-Markov Assumptions Assume that the distribution of E satisfy: E (E i ) = 0 Var(E i ) = σ 2 Cov(E i, E j ) = 0 for i = j 18

19 General Model: (Y,Z) Both Random (Y 1, Z 1 ),..., (Y n, Z n ) are i.i.d. as X = (Y, Z ) P Define µ(z) = E [Y Z = z] = g(β, z), where g(, ) is a known function and β R d is unknown parameter Given Z i = z i, define E i = Y i µ(z i ) for i = 1,..., n Conditioning on the z i we can write: Y i = g(β, z i ) + E i, i = 1, 2,..., n where E = (E 1,..., E n ) has (joint) distribution P e The Least-Squares Estimate of βˆ is the plug-in estimate β(pˆ), where Pˆ is the empirical distribution for the sample {(Z i, Y i ), i = 1,..., n} The function g(β, z) can be linear in β and z or nonlinear. Closed-form solutions exist for βˆ when g is linear in β. 19

20 Outline Methods of Estimation I 1 Methods of Estimation I 20

21 : Assumptions y1 x 1,1 x 1,2 x 1,p y 2 Data y =. and X = x 2,1 x 2,2 x 2,p y n x n,1 x n,2 x p,n follow a linear model satisfying the Gauss-Markov Assumptions if y is an observation of random vector Y = (Y 1, Y 2,... Y N ) T and E (Y X, β) = Xβ, where β = (β 1, β 2,... β p ) T is the p-vector of regression parameters. Cov(Y X, β) = σ 2 I n, for some σ 2 > 0. I.e., the random variables generating the observations are uncorrelated and have constant variance σ 2 (conditional on X, and β). 21

22 For known constants c 1, c 2,..., c p, c p+1, consider the problem of estimating θ = c 1 β 1 + c 2 β 2 + c p β p + c p+1. Under the Gauss-Markov assumptions, the estimator θˆ = c 1 βˆ1 + c 2 βˆ2 + c p βˆp + c p+1, where βˆ1, βˆ 2,... βˆ p are the least squares estimates is 1) An Unbiased Estimator of θ 2) A Linear Estimator 1 of θ, that is θ = n b i y i, for some known (given X) constants b i. i=1 Theorem: Under the Gauss-Markov Assumptions, the estimator ˆθ has the smallest (Best) variance among all Linear Unbiased Estimators of θ, i.e., θˆ is BLUE. 22

23 : Proof Proof: Without loss of generality, assume c p+1 = 0 and define c =(c 1, c 2,..., c p ) T. The Least Squares Estimate of θ = c T β is: θˆ = c T βˆ = c T (X T X) 1 X T y d T y a linear estimate in y given by coefficients d = (d 1, d 2,..., d n ) T. Consider an alternative linear estimate of θ: θ = b T y with fixed coefficients given by b = (b 1,..., b n ) T. Define f = b d and note that θ = b T y = (d + f) T y = θˆ + f T y If θ is unbiased then because θˆ is unbiased 0 = E (f T y) = f T E (y) = f T (Xβ) for all β R p = f is orthogonal to column space of X = f is orthogonal to d = X(X T X) 1 c 23

24 If θ is unbiased then The orthogonality of f to d implies Var(θ ) = Var(b T y) = Var(d T y + f T y) = Var(d T y) + Var(f T y) + 2Cov(d T y, f T y) = Var(θˆ) + Var(f T y) + 2d T Cov(y)f = Var(θˆ) + Var(f T y) + 2d T (σ 2 I n )f = Var(θˆ) + Var(f T y) + 2σ 2 d T f = Var(θˆ) + Var(f T y) + 2σ 2 0 Var(θˆ) 24

25 Outline Methods of Estimation I 1 Methods of Estimation I 25

26 Estimates Consider generalizing the Gauss-Markov assumptions for the linear regression model to Y = Xβ + E where the random n-vector E: E [E] = 0 n and E [EE T ] = σ 2 Σ. σ 2 is an unknown scale parameter Σ is a known (n n) positive definite matrix specifying the relative variances and correlations of the component observations. Transform the data (Y, X) to Y = Σ 1 2 Y and X = Σ 1 2 X and the model becomes Y = X β + E, where E [E ] = 0 n and E [E (E ) T ] = σ 2 I n By the, the BLUE ( GLS ) of β is βˆ = [(X ) T (X )] 1 (X ) T (Y ) = [X T Σ 1 X] 1 (X T Σ 1 Y) 26

27 Outline Methods of Estimation I 1 Methods of Estimation I 27

28 Estimation 28 X P θ, θ Θ with density or pmf function p(x θ). Given an observation X = x, define the likelihood function L x (θ) = p(x θ) : a mapping: Θ R. θˆ ML = θˆ ML (x): the Maximum-Likelihood Estimate of θ is the value making L x ( ) a maximum θ ˆis the MLE if L x (θˆ) = max L x (θ). θ Θ The MLE θˆml (x) identifies the distribution making x most likely The MLE coincides with the mode of the Posterior Distribution if the Prior Distribution on Θ is uniform: π(θ x) p(x θ)π(θ) p(x θ). MIT Methods of Estimation

29 Methods of Estimation I Examples Example 2.2.4: Normal Distribution with Known Variance Example 2.2.5: Size of a Population X 1,..., X n are iid U{1, 2,..., θ}, with θ {1, 2,...}. For x = (x 1,.., x n ), L n x (θ) = i=1 θ 1 1(1 x i θ) = θ n 1(max(x 1,..., x n )) θ) 0, if θ = 0, 1,..., max(x i ) 1 = θ n if θ max(x i ) 29

30 As a Minimum Contrast Method Define l x (θ) = log L x (θ) = log p(x θ) Because log( ) is monotone decreasing, ˆ θ ML (x) minimizes l x (θ) For an iid sample X = (X 1,..., X n ) with densities p(x i θ), l X (θ) = log p(x. 1,..., x n theta) = log [ n 1 i=1 p(x i θ)] = n i=1 log p(x i θ) As a minimum contrast function, ρ(x, θ) = l X (θ) yields the MLE θˆml (x) The discrepancy function corresonding to the contrast function ρ(x, θ) is D(θ 0, θ) = E [ρ(x, θ) θ 0 ] = E [log p(x θ) θ 0 ] 30

31 Suppose that θ = θ 0 uniquely minimizes D(θ 0, ). Then D(θ 0, θ) D(θ 0, θ 0 ) = E [log p(x θ) θ 0 ] ( E[log p(x θ 0 ) θ 0 ]) p( x θ) = E [log p ( x θ0) θ 0] > 0, unless θ = θ 0. This difference is the Kullback-Leibler Information Divergence between distribution P θ0 and P θ : p( x θ) K (P θ0, P θ ) = E [log( p ( x θ0) ) θ 0] Lemma (Shannon, 1948) The mutual entropy K (P θ0, P θ ) is always well defined and K (P θ0, P θ ) 0 Equality holds if and only if {x : p(x θ) = p(x θ 0 )} has probability 1 under both P θ0 and P θ. Proof Apply Jensen s Inequality (B.9.3) 31

32 Likelihood Equations Suppose: X P θ, with θ Θ, an open parameter space the likelihood function l X (θ) is differentiable in θ ˆθ ML (x) exists Then: θˆml (x) must satisfy the Likelihood Equation(s) ' θ l X (θ) = 0. Important Cases For independent X i with 1 densities/pmfs p i (x i θ), ' n θ l X (θ) = i=1 ' θ log p i (x i θ) = 0 NOTE: p i ( θ) may vary with i. 32

33 Examples Hardy-Weinberg Proportions (Example 2.2.6) Queues: Poisson Process Models (Exponential Arrival Times and Poisson Counts) (Example 2.2.7) Multinomial Trials (Example 2.2.8) Normal Regression Models (Example 2.2.9). 33

34 MIT OpenCourseWare Mathematical Statistics Spring 2016 For information about citing these materials or our Terms of Use, visit:

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Maximum Likelihood Large Sample Theory

Maximum Likelihood Large Sample Theory Maximum Likelihood Large Sample Theory MIT 18.443 Dr. Kempthorne Spring 2015 1 Outline 1 Large Sample Theory of Maximum Likelihood Estimates 2 Asymptotic Results: Overview Asymptotic Framework Data Model

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

MIT Spring 2016

MIT Spring 2016 Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

COMP2610/COMP Information Theory

COMP2610/COMP Information Theory COMP2610/COMP6261 - Information Theory Lecture 9: Probabilistic Inequalities Mark Reid and Aditya Menon Research School of Computer Science The Australian National University August 19th, 2014 Mark Reid

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Prediction. Prediction MIT Dr. Kempthorne. Spring MIT Prediction

Prediction. Prediction MIT Dr. Kempthorne. Spring MIT Prediction MIT 18.655 Dr. Kempthorne Spring 2016 1 Problems Targets of Change in value of portfolio over fixed holding period. Long-term interest rate in 3 months Survival time of patients being treated for cancer

More information

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions 18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 18.175 Lecture 8 1 Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

MIT Spring 2016

MIT Spring 2016 Decision Theoretic Framework MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Decision Theoretic Framework 1 Decision Theoretic Framework 2 Decision Problems of Statistical Inference Estimation: estimating

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

A Resampling Method on Pivotal Estimating Functions

A Resampling Method on Pivotal Estimating Functions A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December

More information

Post-exam 2 practice questions 18.05, Spring 2014

Post-exam 2 practice questions 18.05, Spring 2014 Post-exam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The

More information

Convergence of Random Variables Probability Inequalities

Convergence of Random Variables Probability Inequalities Convergence, MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline Convergence, 1 Convergence, 2 MIT 18.655 Convergence, /Vectors Framework Z n = (Z n,1, Z n,2,..., Z n,d ) T, a sequence of random

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants 18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37 Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009

More information

Model Selection for Geostatistical Models

Model Selection for Geostatistical Models Model Selection for Geostatistical Models Richard A. Davis Colorado State University http://www.stat.colostate.edu/~rdavis/lectures Joint work with: Jennifer A. Hoeting, Colorado State University Andrew

More information

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Analysis of the Behrens Fisher Problem Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

6.041/6.431 Fall 2010 Final Exam Wednesday, December 15, 9:00AM - 12:00noon.

6.041/6.431 Fall 2010 Final Exam Wednesday, December 15, 9:00AM - 12:00noon. 6.041/6.431 Fall 2010 Final Exam Wednesday, December 15, 9:00AM - 12:00noon. DO NOT TURN THIS PAGE OVER UNTIL YOU ARE TOLD TO DO SO Name: Recitation Instructor: TA: Question Score Out of 1.1 4 1.2 4 1.3

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Lecture 6: Linear models and Gauss-Markov theorem

Lecture 6: Linear models and Gauss-Markov theorem Lecture 6: Linear models and Gauss-Markov theorem Linear model setting Results in simple linear regression can be extended to the following general linear model with independently observed response variables

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

6.041/6.431 Fall 2010 Final Exam Solutions Wednesday, December 15, 9:00AM - 12:00noon.

6.041/6.431 Fall 2010 Final Exam Solutions Wednesday, December 15, 9:00AM - 12:00noon. 604/643 Fall 200 Final Exam Solutions Wednesday, December 5, 9:00AM - 2:00noon Problem (32 points) Consider a Markov chain {X n ; n 0,, }, specified by the following transition diagram 06 05 09 04 03 2

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

MIT Spring 2015

MIT Spring 2015 Assessing Goodness Of Fit MIT 8.443 Dr. Kempthorne Spring 205 Outline 2 Poisson Distribution Counts of events that occur at constant rate Counts in disjoint intervals/regions are independent If intervals/regions

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22 MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Modelling of Dependent Credit Rating Transitions

Modelling of Dependent Credit Rating Transitions ling of (Joint work with Uwe Schmock) Financial and Actuarial Mathematics Vienna University of Technology Wien, 15.07.2010 Introduction Motivation: Volcano on Iceland erupted and caused that most of the

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

MIT Spring 2016

MIT Spring 2016 Exponential Families II MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Exponential Families II 1 Exponential Families II 2 : Expectation and Variance U (k 1) and V (l 1) are random vectors If A (m k),

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Statistics Ph.D. Qualifying Exam: Part II November 9, 2002

Statistics Ph.D. Qualifying Exam: Part II November 9, 2002 Statistics Ph.D. Qualifying Exam: Part II November 9, 2002 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Continuous Data with Continuous Priors Class 14, Jeremy Orloff and Jonathan Bloom

Continuous Data with Continuous Priors Class 14, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Continuous Data with Continuous Priors Class 14, 18.0 Jeremy Orloff and Jonathan Bloom 1. Be able to construct a ian update table for continuous hypotheses and continuous data. 2. Be able

More information

arxiv: v2 [math.st] 13 Apr 2012

arxiv: v2 [math.st] 13 Apr 2012 The Asymptotic Covariance Matrix of the Odds Ratio Parameter Estimator in Semiparametric Log-bilinear Odds Ratio Models Angelika Franke, Gerhard Osius Faculty 3 / Mathematics / Computer Science University

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

SOLUTION FOR HOMEWORK 6, STAT 6331

SOLUTION FOR HOMEWORK 6, STAT 6331 SOLUTION FOR HOMEWORK 6, STAT 633. Exerc.7.. It is given that X,...,X n is a sample from N(θ, σ ), and the Bayesian approach is used with Θ N(µ, τ ). The parameters σ, µ and τ are given. (a) Find the joinf

More information

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) 1 ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Minimum Variance Unbiased Estimators (MVUE)

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Graduate Econometrics I: Maximum Likelihood II

Graduate Econometrics I: Maximum Likelihood II Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Estimating Unnormalised Models by Score Matching

Estimating Unnormalised Models by Score Matching Estimating Unnormalised Models by Score Matching Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Program 1. Basics

More information

Problem 1 (20) Log-normal. f(x) Cauchy

Problem 1 (20) Log-normal. f(x) Cauchy ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5

More information

Section 8.2. Asymptotic normality

Section 8.2. Asymptotic normality 30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

15 Discrete Distributions

15 Discrete Distributions Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

ECE534, Spring 2018: Solutions for Problem Set #3

ECE534, Spring 2018: Solutions for Problem Set #3 ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information