A tailor made nonparametric density estimate
|
|
- Adrian Mosley
- 5 years ago
- Views:
Transcription
1 A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory and Applications ICTP/CNPq January 2007, Campos do Jordão, Brasil
2 The density estimation problem X a random variable on R d with density f. The density f is unknown. We have an i.i.d. sample X 1,..., X n drawn from f.
3 The density estimation problem X a random variable on R d with density f. The density f is unknown. We have an i.i.d. sample X 1,..., X n drawn from f. We look for an estimate of f based in the sample. That is, a mapping f n : R d (R d ) n R
4 The density estimation problem X a random variable on R d with density f. The density f is unknown. We have an i.i.d. sample X 1,..., X n drawn from f. We look for an estimate of f based in the sample. That is, a mapping f n : R d (R d ) n R The density f is assumed to belong to a certain class F.
5 Maximum likelihood estimates For every density function g E(log g(x )) E(log f (X ))
6 Maximum likelihood estimates For every density function g E(log g(x )) E(log f (X )) The Maximum Likelihood Estimate (MLE) is defined as the maximizer of the empirical mean (log-likelihood function) L n (g) := 1 n over the class F ( n n log g(x i ) = log g(x i ) i=1 i=1 ) 1/n
7 Maximum likelihood estimates For every density function g E(log g(x )) E(log f (X )) The Maximum Likelihood Estimate (MLE) is defined as the maximizer of the empirical mean (log-likelihood function) L n (g) := 1 n over the class F ( n n log g(x i ) = log g(x i ) i=1 i=1 ) 1/n This problem is not always well posed (depending on the class F)
8 The parametric case The class F can be parameterized (by a finite number of parameters) Example: X N(µ, σ 2 ) F = {f (µ,σ 2 ), µ R, σ 2 > 0} f µ,σ 2(x) = 1 e (x µ)2 2σ 2 2πσ The maximizer f n is the density of a gaussian random variable with parameters µ n = 1 n n X i, i=1 σ 2 n = 1 n n (X i µ n ) 2. i=1
9 The parametric case Under some (fairly weak) conditions, in the parametric case, the MLE are known to be 1. Strongly consistent, i.e. f n f a.s. 2. Asymptotically minimum variance unbiased estimators 3. Asymptotically gaussian.
10 The MLE make use of the knowledge we have on f since depends strongly on the class F where we look for the maximizer
11 The nonparametric case The class F has infinite dimension Example: F = {g L 1 (R d ), g 0, g L 1 = 1}, the class of all densities.
12 The nonparametric case The class F has infinite dimension Example: F = {g L 1 (R d ), g 0, g L 1 = 1}, the class of all densities. The MLE method fails since L n (g) is unbounded Approximations of the identity belong to F
13 Alternatives Kernel density estimates (Parzen and Rosenblatt, 1956) f n (x) = 1 nh d n ( ) x Xi K h i=1 K 0, K = 1, h > 0
14 Alternatives Kernel density estimates (Parzen and Rosenblatt, 1956) f n (x) = 1 nh d n ( ) x Xi K h i=1 K 0, K = 1, h > 0 Very popular. Very flexible.
15 Alternatives Kernel density estimates (Parzen and Rosenblatt, 1956) f n (x) = 1 nh d n ( ) x Xi K h i=1 K 0, K = 1, h > 0 Very popular. Very flexible. The kernel K and the bandwidth h must be chosen
16 Alternatives The kernel density estimate The kernel density estimate for different choices of K and h
17 Alternatives The kernel density estimate The kernel density estimate is a Universal estimate. It works for all densities f. Does not make use of further knowledge on f.
18 Alternatives Maximum penalized likelihood estimates, Good and Gaskins (1971) Idea: Penalize the lack of smoothness Instead of looking for a maximizer of L n (g), we look for a maximizer of ( n ) 1 log g(x i ) h g 2. n i=1
19 Alternatives Tailor-designed Maximum Likelihood Estimates If we have some knowledge on f then F is not the class of all densities and, may be, we can apply MLE techniques
20 Tailor-designed Maximum Likelihood Estimates Grenander s estimate Grenander (1956) considered F to be the class of decreasing densities in R +
21 Tailor-designed Maximum Likelihood Estimates Grenander s estimate Grenander (1956) considered F to be the class of decreasing densities in R + In this case it turns out that the MLE is well defined and is the derivative of the least concave majorant of the empirical distribution function
22 Tailor-designed Maximum Likelihood Estimates Grenander s estimate Grenander (1956) considered F to be the class of decreasing densities in R + In this case it turns out that the MLE is well defined and is the derivative of the least concave majorant of the empirical distribution function It is consistent and minimax optimal in this class.
23 Tailor-designed Maximum Likelihood Estimates Grenander s estimate Grenander (1956) considered F to be the class of decreasing densities in R + In this case it turns out that the MLE is well defined and is the derivative of the least concave majorant of the empirical distribution function It is consistent and minimax optimal in this class. Robertson (1967), Wegman (1969, 1970), Sager (1982) and Polonik (1998) generalized Grenander s estimate to other kinds of shape restrictions
24 Tailor-designed Maximum Likelihood Estimates Grenander s estimate Grenander (1956) considered F to be the class of decreasing densities in R + In this case it turns out that the MLE is well defined and is the derivative of the least concave majorant of the empirical distribution function It is consistent and minimax optimal in this class. Robertson (1967), Wegman (1969, 1970), Sager (1982) and Polonik (1998) generalized Grenander s estimate to other kinds of shape restrictions
25 MLE for Lipschitz densities We consider F to be the class of densities g with compact support S(g) that verify g(x) g(y) κ x y, x, y S(g). That is, F is the class of Lipschitz densities with prescribed Lipschitz constant κ. We allow g to be discontinuous at the boundary of its support.
26 MLE for Lipschitz densities We consider F to be the class of densities g with compact support S(g) that verify g(x) g(y) κ x y, x, y S(g). That is, F is the class of Lipschitz densities with prescribed Lipschitz constant κ. We allow g to be discontinuous at the boundary of its support. The support of the density f can be unknown (In this case we ask S(f ) to be convex)
27 Theorem (i) There exists a unique maximizer f n of L n (g) in F. Moreover, f n is supported in C n, the convex hull of {X 1,..., X n }, and its value there is given by the maximum of n cone functions, i.e. f n (x) = max (f n(x i ) κ x X i ) +. (1) 1 i n (ii) f n is consistent in the following sense: for every compact set K S(f ), (iii) Hence lim f n f L n (K) 0 lim f n f n L 1 (R d ) 0 a.s. a.s.
28 The MLE in dimension d = 1 x i
29 The MLE in dimension d = 2
30 Proof. (i) Existence Picture. Uniqueness We are looking for a maximum of a concave function in a convex set.
31 Proof. (i) Existence Picture. Uniqueness We are looking for a maximum of a concave function in a convex set. (ii) Is a consequence of Huber s Theorem (1967). Idea: Use Huber s Theorem we need a sequence ˆf n of (almost) maximizers of L n belonging to a (fixed) compact class. We construct them us follows ˆf n := A n max 1 i n (f n(x i ) κ x X i ) +, for all x S(f ). The constant A n is chosen to guarantee f n = 1.
32 Proof. (i) Existence Picture. Uniqueness We are looking for a maximum of a concave function in a convex set. (ii) Is a consequence of Huber s Theorem (1967). Idea: Use Huber s Theorem we need a sequence ˆf n of (almost) maximizers of L n belonging to a (fixed) compact class. We construct them us follows ˆf n := A n max 1 i n (f n(x i ) κ x X i ) +, for all x S(f ). The constant A n is chosen to guarantee f n = 1. And ˆf n Lip(κ, S(f )), which is compact f n ˆf n L (K) A n 1 f n L (K) 0, (2) since A n 1 and ( f n L (K)) n is bounded a.s.
33 (iii) Since Cn S(f ) S(f ) < fn (x) κdiam(s(f )) + 1 C n we can find K S(f ) such that f n (x) f (x) dx R d f n (x) f (x) dx + f n (x) f (x) dx ε K S(f )\K
34 Computing the estimator We have proved that the estimator lives in a certain finite-dimensional space and that is determined by its value at the sample points. For y R n we define Our problem read us Find ( +, g y (x) = max y i x X i ) x Cn. 1 i n argmax y P n y i. i=1 P = {y R n, y i > 0, y i y j κ X i X j (i j), g y = 1}. P is convex and y i is concave To have an efficient method to solve this problem we need to decide (efficiently) if a point y P Easy in d = 1. Not so easy if d > 1
35 Computing the estimator Dimension d = 1 Let (X (1),..., X (n) ) the order statistics. The Lipschitz conditions reads us κ(x (i+1) X (i) ) y i+1 y i κ(x (i+1) X (i) ), And g y (x) dx = i = 1,..., n 1. = 1 n 1 (y i+1 y i ) 2 +2(y i+1 +y i )(X (i+1) X (i) ) (X (i+1) X (i) ) 2. 4 i=1
36 Computing the estimator Dimension d=1 - Sample size: n=
37 Computing the estimator Dimension d=1 - Sample size: n=
38 Computing the estimator Dimension d > 1 We can not order the sample points We have not an explicit formula for the integral g y (x) dx
39 Some problems... Too many peaks
40 Some problems... Too many peaks An optimization nonlinear problem has to be solved.
41 An alternative ML type estimator Dimension one - PLMLE V = V(X 1,..., X n ) = {g Lip(κ, [X (1), X (n) ]) : g [X (i),x (i+1)] is linear } g = 1,
42 An alternative ML type estimator Dimension one - PLMLE V = V(X 1,..., X n ) = {g Lip(κ, [X (1), X (n) ]) : g [X (i),x (i+1)] is linear } g = 1, Definition The PLMLE is the maximizer f n of L n over V(X 1,..., X n ).
43 An alternative ML type estimator Dimension one - PLMLE V = V(X 1,..., X n ) = {g Lip(κ, [X (1), X (n) ]) : g [X (i),x (i+1)] is linear } g = 1, Definition The PLMLE is the maximizer f n of L n over V(X 1,..., X n ). Existence and uniqueness of this estimator is guaranteed since V is a finite dimensional compact and convex subset of F.
44 An alternative ML type estimator Dimension one - PLMLE V = V(X 1,..., X n ) = {g Lip(κ, [X (1), X (n) ]) : g [X (i),x (i+1)] is linear } g = 1, Definition The PLMLE is the maximizer f n of L n over V(X 1,..., X n ). Existence and uniqueness of this estimator is guaranteed since V is a finite dimensional compact and convex subset of F. It has lower likelihood than f n but is asymptotically the same
45 Computation of PLMLE maximize n y i ; subject to i=1 a Ay a, By = 1. A = , a = κ x 2 x 1. x i+1 x i. x n x n 1, B = 1 2 (x 2 x 1, x 3 x 1,..., x i+1 x i 1,..., x n x n 2, x n x n 1 ) The equation a Ay a guarantees the Lipschitz condition and By = 1 represents the restriction f n = 1.
46 PLMLE demonstration PLMLE vs. Kernels. Sample size: n=
47 PLMLE demonstration PLMLE vs. Kernels. Sample size: n=
48 Delaunay triangulations
49 Delaunay triangulations
50 Delaunay triangulations
51 Delaunay triangulations
52 Voronoi tessellations and Delaunay triangulations
53 Voronoi tessellations and Delaunay triangulations
54 Some facts on Delaunay Triangulations For sample points of absolutely continuous probabilities is well defined with probability one.
55 Some facts on Delaunay Triangulations For sample points of absolutely continuous probabilities is well defined with probability one. Maximizes the mimimum angle of the triangles among all possible triangulations of the points.
56 Some facts on Delaunay Triangulations For sample points of absolutely continuous probabilities is well defined with probability one. Maximizes the mimimum angle of the triangles among all possible triangulations of the points. Useful for the numerical treatment of Partial Differential Equations by the Finite Element Method.
57 Some facts on Delaunay Triangulations For sample points of absolutely continuous probabilities is well defined with probability one. Maximizes the mimimum angle of the triangles among all possible triangulations of the points. Useful for the numerical treatment of Partial Differential Equations by the Finite Element Method. Useful to compute the Euclidean Minimum Spanning Tree of a set of points (Is a subgraph of the Delaunay triangulation).
58 Some facts on Delaunay Triangulations For sample points of absolutely continuous probabilities is well defined with probability one. Maximizes the mimimum angle of the triangles among all possible triangulations of the points. Useful for the numerical treatment of Partial Differential Equations by the Finite Element Method. Useful to compute the Euclidean Minimum Spanning Tree of a set of points (Is a subgraph of the Delaunay triangulation). Very used in Computational Geometry.
59 T = {τ 1,..., τ N } The Delaunay Tesselation.
60 T = {τ 1,..., τ N } The Delaunay Tesselation. C n = N i=1 τ i
61 T = {τ 1,..., τ N } The Delaunay Tesselation. C n = For any i j, τ i τ j is either a point, a (d 1) dimensional face, or the empty set. We consider now the class of piecewise linear functions on T N i=1 τ i
62 T = {τ 1,..., τ N } The Delaunay Tesselation. C n = For any i j, τ i τ j is either a point, a (d 1) dimensional face, or the empty set. We consider now the class of piecewise linear functions on T N i=1 τ i V = V(X 1,..., X n ) = { g Lip(κ, C n ) : g τi is linear, } g = 1, Definition The PLMLE f n is the argument that maximizes L n over V(X 1,..., X n ).
63 Theorem For every compact set K S(f ) we have f n f L (K) 0 a.s.
64 Computing the estimator V is a compact subset of the (finite dimensional) vector space Ṽ = {g : C n R : g τi We need a (good) basis for Ṽ. is linear}
65 Computing the estimator V is a compact subset of the (finite dimensional) vector space Ṽ = {g : C n R : g τi is linear} We need a (good) basis for Ṽ. We borrow from FEM. g Ṽ g(x) = i g(x i )ϕ i (x) ϕ i (X j ) = δ ij.
66 Computing the estimator g(x) dx = R d R d ( n ) g(x i )ϕ i (x) dx = By, i=1
67 Computing the estimator ( n ) g(x) dx = g(x i )ϕ i (x) dx = By, R d R d i=1 ( ) B = ϕ 1,..., ϕ n y = (g(x 1 ),..., g(x n ))
68 Computing the estimator ( n ) g(x) dx = g(x i )ϕ i (x) dx = By, R d R d i=1 ( ) B = ϕ 1,..., ϕ n y = (g(x 1 ),..., g(x n )) We compute B just once!
69 Computing the estimator ( n ) g(x) dx = g(x i )ϕ i (x) dx = By, R d R d i=1 ( ) B = ϕ 1,..., ϕ n y = (g(x 1 ),..., g(x n )) We compute B just once! We also have g τk = i y i ϕ i τk = A k y, A k = ( ( ϕ 1 τk ) t ( ϕn τk ) t),
70 Computing the estimator The optimization problem reads us n maximize y i ; subject to i=1 A k y κ, 1 k N, By = 1.
71 Computing the estimator The optimization problem reads us n maximize y i ; subject to i=1 A k y κ, 1 k N, By = 1. If =, all the restrictions are linear.
72 Computing the estimator The optimization problem reads us n maximize y i ; subject to i=1 A k y κ, 1 k N, By = 1. If =, all the restrictions are linear. The size of A k grows linearly with d
73 The PLMLE for a cone density Sample size: n=250
74 The PLMLE for a Uniform density Sample size: n=200
75 The PLMLE for a bivariate sum of uniform variables Sample size: n=
Statistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationSupervised Learning: Non-parametric Estimation
Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationNonparametric estimation under Shape Restrictions
Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationParametric Models: from data to models
Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationNONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION
NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION R. HASHEMI, S. REZAEI AND L. AMIRI Department of Statistics, Faculty of Science, Razi University, 67149, Kermanshah, Iran. ABSTRACT
More informationO Combining cross-validation and plug-in methods - for kernel density bandwidth selection O
O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More information(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.
Problem 1 (21 points) An economist runs the regression y i = β 0 + x 1i β 1 + x 2i β 2 + x 3i β 3 + ε i (1) The results are summarized in the following table: Equation 1. Variable Coefficient Std. Error
More informationConfidence intervals for kernel density estimation
Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting
More informationIn class midterm Exam - Answer key
Fall 2013 In class midterm Exam - Answer key ARE211 Problem 1 (20 points). Metrics: Let B be the set of all sequences x = (x 1,x 2,...). Define d(x,y) = sup{ x i y i : i = 1,2,...}. a) Prove that d is
More informationBoundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta
Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationMinimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.
Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationC 1 convex extensions of 1-jets from arbitrary subsets of R n, and applications
C 1 convex extensions of 1-jets from arbitrary subsets of R n, and applications Daniel Azagra and Carlos Mudarra 11th Whitney Extension Problems Workshop Trinity College, Dublin, August 2018 Two related
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationA new method of nonparametric density estimation
A new method of nonparametric density estimation Andrey Pepelyshev Cardi December 7, 2011 1/32 A. Pepelyshev A new method of nonparametric density estimation Contents Introduction A new density estimate
More informationA Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers
6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures
More informationNONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS. By Piet Groeneboom and Geurt Jongbloed Delft University of Technology
NONPARAMETRIC CONFIDENCE INTERVALS FOR MONOTONE FUNCTIONS By Piet Groeneboom and Geurt Jongbloed Delft University of Technology We study nonparametric isotonic confidence intervals for monotone functions.
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationECO Class 6 Nonparametric Econometrics
ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................
More informationProbability Models for Bayesian Recognition
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationCURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University
CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationMaximum Smoothed Likelihood for Multivariate Nonparametric Mixtures
Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,
More informationEmpirical Likelihood
Empirical Likelihood Patrick Breheny September 20 Patrick Breheny STA 621: Nonparametric Statistics 1/15 Introduction Empirical likelihood We will discuss one final approach to constructing confidence
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationMaximum likelihood estimation of a log-concave density based on censored data
Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationNonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood
Nonparametric estimation of s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Cowles Foundation Seminar, Yale University November
More informationPDEs in Image Processing, Tutorials
PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationRobust regression and non-linear kernel methods for characterization of neuronal response functions from limited data
Robust regression and non-linear kernel methods for characterization of neuronal response functions from limited data Maneesh Sahani Gatsby Computational Neuroscience Unit University College, London Jennifer
More informationSemi-Parametric Importance Sampling for Rare-event probability Estimation
Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationGeneralized continuous isotonic regression
Generalized continuous isotonic regression Piet Groeneboom, Geurt Jongbloed To cite this version: Piet Groeneboom, Geurt Jongbloed. Generalized continuous isotonic regression. Statistics and Probability
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationIntensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis
Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 4 Spatial Point Patterns Definition Set of point locations with recorded events" within study
More informationBasic Properties of Metric and Normed Spaces
Basic Properties of Metric and Normed Spaces Computational and Metric Geometry Instructor: Yury Makarychev The second part of this course is about metric geometry. We will study metric spaces, low distortion
More informationHypothesis Testing via Convex Optimization
Hypothesis Testing via Convex Optimization Arkadi Nemirovski Joint research with Alexander Goldenshluger Haifa University Anatoli Iouditski Grenoble University Information Theory, Learning and Big Data
More informationNonparametric estimation under shape constraints, part 1
Nonparametric estimation under shape constraints, part 1 Piet Groeneboom, Delft University August 6, 2013 What to expect? History of the subject Distribution of order restricted monotone estimates. Connection
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationInverse Statistical Learning
Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationChapter 2: Linear Programming Basics. (Bertsimas & Tsitsiklis, Chapter 1)
Chapter 2: Linear Programming Basics (Bertsimas & Tsitsiklis, Chapter 1) 33 Example of a Linear Program Remarks. minimize 2x 1 x 2 + 4x 3 subject to x 1 + x 2 + x 4 2 3x 2 x 3 = 5 x 3 + x 4 3 x 1 0 x 3
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationNonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood
Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Statistics Seminar, York October 15, 2015 Statistics Seminar,
More informationON SOME TWO-STEP DENSITY ESTIMATION METHOD
UNIVESITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XLIII 2005 ON SOME TWO-STEP DENSITY ESTIMATION METHOD by Jolanta Jarnicka Abstract. We introduce a new two-step kernel density estimation method,
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was
More informationScientific Computing WS 2018/2019. Lecture 15. Jürgen Fuhrmann Lecture 15 Slide 1
Scientific Computing WS 2018/2019 Lecture 15 Jürgen Fuhrmann juergen.fuhrmann@wias-berlin.de Lecture 15 Slide 1 Lecture 15 Slide 2 Problems with strong formulation Writing the PDE with divergence and gradient
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More information1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).
Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationMaximum likelihood estimation
Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization
More informationLocal Whittle Likelihood Estimators and Tests for non-gaussian Linear Processes
Local Whittle Likelihood Estimators and Tests for non-gaussian Linear Processes By Tomohito NAITO, Kohei ASAI and Masanobu TANIGUCHI Department of Mathematical Sciences, School of Science and Engineering,
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationChapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)
Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis
More informationNonparametric estimation under shape constraints, part 2
Nonparametric estimation under shape constraints, part 2 Piet Groeneboom, Delft University August 7, 2013 What to expect? Theory and open problems for interval censoring, case 2. Same for the bivariate
More informationEconomics 204 Fall 2011 Problem Set 1 Suggested Solutions
Economics 204 Fall 2011 Problem Set 1 Suggested Solutions 1. Suppose k is a positive integer. Use induction to prove the following two statements. (a) For all n N 0, the inequality (k 2 + n)! k 2n holds.
More informationSection 8: Asymptotic Properties of the MLE
2 Section 8: Asymptotic Properties of the MLE In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency,
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions
More informationImplications of the Constant Rank Constraint Qualification
Mathematical Programming manuscript No. (will be inserted by the editor) Implications of the Constant Rank Constraint Qualification Shu Lu Received: date / Accepted: date Abstract This paper investigates
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon
More informationMathematics 530. Practice Problems. n + 1 }
Department of Mathematical Sciences University of Delaware Prof. T. Angell October 19, 2015 Mathematics 530 Practice Problems 1. Recall that an indifference relation on a partially ordered set is defined
More informationLecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.
Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal
More informationChapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets
Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Confidence sets X: a sample from a population P P. θ = θ(p): a functional from P to Θ R k for a fixed integer k. C(X): a confidence
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationOptimal Design in Regression. and Spline Smoothing
Optimal Design in Regression and Spline Smoothing by Jaerin Cho A thesis submitted to the Department of Mathematics and Statistics in conformity with the requirements for the degree of Doctor of Philosophy
More informationComparison of relative density of two random geometric digraph families in testing spatial clustering. Elvan Ceyhan
Comparison of relative density of two random geometric digraph families in testing spatial clustering Elvan Ceyhan TEST An Official Journal of the Spanish Society of Statistics and Operations Research
More information3.10 Lagrangian relaxation
3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the
More informationKernel-based density. Nuno Vasconcelos ECE Department, UCSD
Kernel-based density estimation Nuno Vasconcelos ECE Department, UCSD Announcement last week of classes we will have Cheetah Day (exact day TBA) what: 4 teams of 6 people each team will write a report
More informationLiquidation in Limit Order Books. LOBs with Controlled Intensity
Limit Order Book Model Power-Law Intensity Law Exponential Decay Order Books Extensions Liquidation in Limit Order Books with Controlled Intensity Erhan and Mike Ludkovski University of Michigan and UCSB
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationMLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22
MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More information1 Glivenko-Cantelli type theorems
STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then
More informationApplied/Numerical Analysis Qualifying Exam
Applied/Numerical Analysis Qualifying Exam August 9, 212 Cover Sheet Applied Analysis Part Policy on misprints: The qualifying exam committee tries to proofread exams as carefully as possible. Nevertheless,
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More information