Density Estimation. We are concerned more here with the non-parametric case (see Roger Barlow s lectures for parametric statistics)
|
|
- Dylan Harrington
- 5 years ago
- Views:
Transcription
1 Density Estimation Density Estimation: Deals with the problem of estimating probability density functions (PDFs) based on some data sampled from the PDF. May use assumed forms of the distribution, parameterized in some way (parametric statistics); or May avoid making assumptions about the form of the PDF (nonparametric statistics). We are concerned more here with the non-parametric case (see Roger Barlow s lectures for parametric statistics) 1 Frank Porter, SLUO Lectures on Statistics, August 2006
2 Some References (I) Richard A. Tapia & James R. Thompson, Nonparametric Density Estimation, Johns Hopkins University Press, Baltimore (1978). David W. Scott, Multivariate Density Estimation, John Wiley & Sons, Inc., New York (1992). Adrian W. Bowman and Adelchi Azzalini, Applied Smoothing Techniques for Data Analysis, Clarendon Press, Oxford (1997). B. W. Silverman, Density Estimation for Statistics and Data Analysis, Monographs on Statistics and Applied Probability, Chapman and Hall (1986); contents.html K. S. Cranmer, Kernel Estimation in High Energy Physics, Comp. Phys. Comm. 136, 198 (2001) [hep-ex/ v1]; cache/hep-ex/pdf/0011/ pdf 2 Frank Porter, SLUO Lectures on Statistics, August 2006
3 Some References (II) M. Pivk & F. R. Le Diberder, splot: a statistical tool to unfold data distributions, Nucl. Instr. Meth. A 555, 356 (2005). R. Cahn, How splots are Best (2005), rev splots best.pdf BaBar Statistics Working Group, Recommendations for Display of Projections in Multi-Dimensional Analyses, Statistics/Documents/MDgraphRec.pdf Additional specific references will noted in the course of the lectures. 3 Frank Porter, SLUO Lectures on Statistics, August 2006
4 Preliminaries We ll couch discussion in terms of observations (dataset) from some experiment. Our dataset consists of the values x i ; i =1, 2,...,n. Our dataset consists of repeated samplings from a (presumed unknown) probability distribution. IID Independent, Identically Distributed We ll note generalizations here and there. Order is not important; if we are discussing a time series, we could introduce ordered pairs {(x i,t i ),i = 1,...,n}, and call it two-dimensional [But beware the correlations then; probably not IID!]. In general, our quantities can be multi-dimensional; no special notation will be used to distinguish one- from multi-variate cases. We ll discuss where issues enter with dimensionality. 4 Frank Porter, SLUO Lectures on Statistics, August 2006
5 Notation At our convenience we may use E,, and all to mean expectation : E(x) x x xp(x)dx, where p(x) is the probability density function (PDF) for x (or, more generally p(x)dx μ(dx) is the probability measure). Estimators are denoted with a hat : In these lectures, we ll be concerned with estimators for the density function itself, hence p(x) is a random variable giving our estimate for p(x). We will not be especially rigorous. For example, we won t make a notational distinction between the random variable and an instance. 5 Frank Porter, SLUO Lectures on Statistics, August 2006
6 Motivation Why do we want to estimate densities? Well, that is the whole point... Harder question: Why non-parametric estimates? Comparison with models (which may be parametric) May be easier/better than parametric modeling for efficiency corrections and background subtraction Visualization Unfolding Comparing samples 6 Frank Porter, SLUO Lectures on Statistics, August 2006
7 R, A Toolkit, er, Language, You Might be Interested In... Histogram of x The S Language: developed with statistical analysis of data in mind. > x <- rnorm(100,10,1) > hist(x,xlim=range(5,15)) > Frequency x Free, open source version is R, fromther Project. Downloads available for Linux/MacOS X/Windows, e.g., at: Commercial version is S-Plus, at 7 Frank Porter, SLUO Lectures on Statistics, August 2006
8 Empirical Probability Density Function Place a delta function at each data point. The estimator is (EPDF, for Empirical Probability Density Function ) p(x) = 1 n n δ(x x i ). i= x Note that x could be multi-dimensional here. This is the sampling density for the bootstrap (more later; also see Ilya Narsky lectures). 8 Frank Porter, SLUO Lectures on Statistics, August 2006
9 The Histogram Perhaps our most ubiquitous density estimator is the histogram: h(x) = n B(x x i ; w), i=1 where x i is the center of the bin in which observation x i lies, w is the bin width, and B(x; w) ={ 1 x ( w/2,w/2) 0 otherwise (called the Indicator function in probability). 1 B(x-x ~ i ; w) x i w ~ x i x This is written for uniform bin widths, but may be generalized to differing widths with appropriate relative normalization factors. The estimator for the probability density function (PDF) is: p(x) = 1 nw h(x). 9 Frank Porter, SLUO Lectures on Statistics, August 2006
10 Histogram Example 6 5 Events/10 MeV x m(p pi) - m(p) - m(pi) Left: EPDF; Right: Histogram with w = 10 MeV. [Actual sampling is 100 points from a Δ(1232) Breit-Wigner (Cauchy) on a second-order polynomial background. Background probability is 50%.] 10 Frank Porter, SLUO Lectures on Statistics, August 2006
11 Criticisms of Histogram as Density Estimator Discontinuous even if PDF is continuous. Dependence on bin size and bin origin. Information from location of datum within a bin is ignored. 11 Frank Porter, SLUO Lectures on Statistics, August 2006
12 Kernel Estimation Take the histogram, but replace bin function B with something else: p(x) = 1 n k(x x n i ; w), i=1 where k(x, w) is the kernel function, normalized to unity: k(x; w) dx =1. Usually interested in kernels of the form k(x x i ; w) = 1 w K ( x xi w indeed this may be used as the definition of kernel. The kernel estimator for the PDF is then: p(x) = 1 n ( ) x xi K, nw w i=1 ), The role of parameter w as a smoothing parameter is clearer. 12 Frank Porter, SLUO Lectures on Statistics, August 2006
13 Multi-Variate Kernel Esitmation Explicit multi-variate case, d =2dimensions: p(x, y) = 1 nw x w y n K i=1 ( ) x xi K w x ( ) y yi. w y This is a product kernel form, with the same kernel in each dimension, except for possibly different smoothing parameters. It does not have correlations. The kernels we have introduced are classified more explicitly as fixed kernels : The smoothing parameter is independent of x. 13 Frank Porter, SLUO Lectures on Statistics, August 2006
14 Ideogram A simple variant on the kernel idea is to permit the kernel to depend on additional knowledge in the data. Physicists call this an ideogram. Most common is the Gaussian ideogram, in which each data point is entered as a Gaussian of area one and standard deviation appropriate to that datum. This addresses a way that the IID assumption might be broken. [Aside: Be careful to get your likelihood function right if you are incorporating variable resolution information in your fits; see, e.g., Punzi: ] 14 Frank Porter, SLUO Lectures on Statistics, August 2006
15 Sample Ideograms (I) WEIGHTED AVERAGE ±0.011 (Error scaled by 2.5) m K ± (MeV) Values above of weighted average, error, and scale factor are based upon the data in this ideogram only. They are not necessarily the same as our `best' values, obtained from a least-squares constrained fit utilizing measurements of other (related) quantities as additional information. 2 DENISOV GALL 88 K Pb GALL 88 K Pb GALL 88 K W GALL 88 K W LUM BARKOV CHENG 75 K Pb CHENG 75 K Pb CHENG 75 K Pb CHENG 75 K Pb CHENG 75 K Pb BACKENSTO (Confidence Level 0.001) (from RPP 2006) 15 Frank Porter, SLUO Lectures on Statistics, August 2006
16 Sample Ideograms (II) Note detailed comparison. Figure 1. A histogram of magnetic field values (black), compared with a smoothed frequency distribution constructed using a Gaussian ideogram technique (red). (from J. S. Halekas et al., Magnetic Properties of Lunar Geologic Terranes: New Statistical Results, Lunar and Planetary Science XXXIII (2002), 1368.pdf) 16 Frank Porter, SLUO Lectures on Statistics, August 2006
17 Parametric vs non-parametric Density Estimation (I) Distinction is fuzzy A histogram is non-parametric, in the sense that no assumption about the form of the sampling distribution is made. Often an implicit assumption that distribution is smooth on scale smaller than bin size. For example, we know something about the resolution of our apparatus. But the estimator of the parent distribution made with a histogram is parametric the parameters are populations (or frequencies) in each bin. The estimators for those parameters are the observed histogram populations. Even more parameters than a typical parametric fit! 17 Frank Porter, SLUO Lectures on Statistics, August 2006
18 Parametric vs non-parametric Density Estimation (II) Essence of difference may be captured in notions of local and nonlocal : If a datum at x i influences the density estimator at some other point x this is non-local. A non-parametric estimator is one in which the influence of a point at x i on the estimate at any x with d(x i,x) >ɛvanishes, asymptotically. Notice that for a kernel estimator, the bigger the smoothing paramter w, the more non-local the estimator, p(x) = 1 n ( ) x xi K. nw w i=1 As we ll discuss, the optimal choice of smoothing parameter depends on n. 18 Frank Porter, SLUO Lectures on Statistics, August 2006
19 Optimization We would like to make an optimal density estimate from our data. What does that mean? Need a criterion for optimal Choice of criterion is subjective; it depends on what you want to achieve. ^ f(x) We may compare the estimator for a quantity (here, value of Δ(x) the density at x) with the true f(x) value: Δ(x) = f(x) f(x). x 19 Frank Porter, SLUO Lectures on Statistics, August 2006
20 Mean Squared Error (I) A common choice in parametric estimation is to minimize the sum of the squares. We may take this idea over here, and form the Mean Squared Error (MSE): MSE[ f(x)] [ f(x) f(x) ] 2 =Var[ f(x)] + Bias 2 [ f(x)], where Var[ f(x)] E [ ( f(x) E[ f(x)] ) 2 ] Bias[ f(x)] E[ f(x)] f(x) 20 Frank Porter, SLUO Lectures on Statistics, August 2006
21 Mean Squared Error (II) Since this isn t quite our familiar parameter estimation, let s take a little time to make sure it is understood: Suppose p(x) is an estimator for the PDF p(x), based on data {x i ; i = 1,...,n}, IIDfromp(x). Then E[ p(x)] = p(x; {x i })Prob({x i })d n ({x i }) = n p(x; {x i }) [p(x i )dx i ] i=1 21 Frank Porter, SLUO Lectures on Statistics, August 2006
22 Exercise: Proof of formula for the MSE MSE[ f(x)] = ( f(x) f(x)) 2 ] 2 = [ f(x; n {xi }) f(x) [p(x i )dx i ] = = i=1 ] 2 [ f(x; n {xi }) E( f)+e( f) f(x) i=1 ] 2 { ] 2 [ [ f(x; {xi }) E( f) + E( f) f(x) ][ ]} 2 [ f(x; n {xi }) E( f) E( f) f(x) =Var[ f(x)] + Bias 2 [ f(x)] + 0. i=1 [p(x i )dx i ] [p(x i )dx i ] [In typical treatments of parametric statistics, we assume unbiased estimators, hence the Bias term is zero. That isn t a good assumption here.] 22 Frank Porter, SLUO Lectures on Statistics, August 2006
23 The Problem With Smoothing (I) Thm: [Rosenblatt (1956)] A uniform minimum variance unbiased estimator for p(x) does not exist. Unbiased: E[ p(x)] = p(x). Uniform minimum variance: Var [ p(x) p(x)] Var [ q(x) p(x)], x, for all p(x), where q(x) is any other estimator of p(x). 23 Frank Porter, SLUO Lectures on Statistics, August 2006
24 The Problem With Smoothing (II) For example, suppose we have a kernel estimator: p(x) = 1 n n k(x x i ; w), i=1 Its expectation is: E[ p(x)] = 1 n = n k(x x i ; w)p(x i )dx i i=1 k(x y)p(y)dy. Unless k(x y) =δ(x y), p(x) will be biased for some p(x). But δ(x y) has infinite variance. 24 Frank Porter, SLUO Lectures on Statistics, August 2006
25 The Problem with Smoothing (III) So the nice properties we strive for in parameter estimation (and sometimes achieve) are beyond reach. Intuition: smoothing lowers peaks and fills in valleys. Frequency Red curve: PDF Histogram: Sampling from PDF Black curve: Gaussian kernel estimator for PDF x 25 Frank Porter, SLUO Lectures on Statistics, August 2006
26 Comment on Number of Bins in Histogram Note: Sturges rule, based on optimizing MSE, was used in deciding how many bins, k, to make in the histogram: k =1+log 2 n. The argument behind this rule has been criticized (1995): hyndman/papers/sturges.pdf Indeed we see in our example that we would have by hand selected more bins; our histogram is over-smoothed. There are other rules for optimizing the number of bins. For example, Scott s rule for the bin width is: w =3.5sn 1/3, where s is the sample standard deviation. [More later] 26 Frank Porter, SLUO Lectures on Statistics, August 2006
27 Dependence on Smoothing Parameter Plot showing effect of choice of smoothing parameter : Frequency Red: Sampling PDF Black: Default smoothing (w) Blue: w/2 smoothing Turquoise: w/4 smoothing Green: 2w smoothing x 27 Frank Porter, SLUO Lectures on Statistics, August 2006
28 The Curse of Dimensionality Roger Barlow gave a nice example of the impact of the Curse of Dimensionality in parametric statistics. It is a significant affliction in density estimation as well. Difficult to display and visualize as the number of dimensions increases. All the volume (of a bounded region) goes to the boundary (exponentially!) as the dimensions increases. I.e., data becomes sparse. 1/2, d 1/4 1/8 Tendency for exponentially growing computation requirement with dimensions. Even worse than parametric statistics. 28 Frank Porter, SLUO Lectures on Statistics, August 2006
29 Summary We have introduced: Basic notions in (non-parametric) density estimation Some simple variations on the theme A foundation towards optimization An idea of where and how things will fail Next: Further sophistication on these ideas; and introduction of other variations in approach and application. 29 Frank Porter, SLUO Lectures on Statistics, August 2006
Density Estimation (II)
Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation
More informationDensity Estimation (III)
Density Estimation (III) Yesterday Cross-validation Adaptive kernels Variance(bootstrap) Bias(jackknife) Multivariate kernel estimation Today Series estimation Monte Carlo weighting Unfolding Non-parametric
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationQuantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management
Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationFrank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 78 le-tex
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c04 203/9/9 page 78 le-tex 78 4 Resampling Techniques.2 bootstrap interval bounds 0.8 0.6 0.4 0.2 0 0 200 400 600
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationBoundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta
Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationO Combining cross-validation and plug-in methods - for kernel density bandwidth selection O
O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationLECTURE NOTE #3 PROF. ALAN YUILLE
LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.
More informationStatistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That
Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval
More informationA Few Notes on Fisher Information (WIP)
A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationAdaptive Nonparametric Density Estimators
Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationStatistical Methods for Particle Physics (I)
Statistical Methods for Particle Physics (I) https://agenda.infn.it/conferencedisplay.py?confid=14407 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
More informationInferring from data. Theory of estimators
Inferring from data Theory of estimators 1 Estimators Estimator is any function of the data e(x) used to provide an estimate ( a measurement ) of an unknown parameter. Because estimators are functions
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationChapter 2: Resampling Maarten Jansen
Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationStatistical inference
Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall
More informationAkaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data
Journal of Modern Applied Statistical Methods Volume 12 Issue 2 Article 21 11-1-2013 Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationStatistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests
Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics
More informationHYPOTHESIS TESTING: FREQUENTIST APPROACH.
HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous
More informationStatistics and Data Analysis
Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationDensity and Distribution Estimation
Density and Distribution Estimation Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Density
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationExpect Values and Probability Density Functions
Intelligent Systems: Reasoning and Recognition James L. Crowley ESIAG / osig Second Semester 00/0 Lesson 5 8 april 0 Expect Values and Probability Density Functions otation... Bayesian Classification (Reminder...3
More informationMonte Carlo Simulations
Monte Carlo Simulations What are Monte Carlo Simulations and why ones them? Pseudo Random Number generators Creating a realization of a general PDF The Bootstrap approach A real life example: LOFAR simulations
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationThe function graphed below is continuous everywhere. The function graphed below is NOT continuous everywhere, it is discontinuous at x 2 and
Section 1.4 Continuity A function is a continuous at a point if its graph has no gaps, holes, breaks or jumps at that point. If a function is not continuous at a point, then we say it is discontinuous
More informationSTATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN
Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:
More informationStatistics for Python
Statistics for Python An extension module for the Python scripting language Michiel de Hoon, Columbia University 2 September 2010 Statistics for Python, an extension module for the Python scripting language.
More informationProbability Models for Bayesian Recognition
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationFinite Population Sampling and Inference
Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane
More informationPreface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation
Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric
More informationEconometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018
Econometrics I Lecture 10: Nonparametric Estimation with Kernels Paul T. Scott NYU Stern Fall 2018 Paul T. Scott NYU Stern Econometrics I Fall 2018 1 / 12 Nonparametric Regression: Intuition Let s get
More informationSummary and discussion of The central role of the propensity score in observational studies for causal effects
Summary and discussion of The central role of the propensity score in observational studies for causal effects Statistics Journal Club, 36-825 Jessica Chemali and Michael Vespe 1 Summary 1.1 Background
More informationEstimation of cumulative distribution function with spline functions
INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative
More informationThe bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap
Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationA Novel Nonparametric Density Estimator
A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationSupervised Learning: Non-parametric Estimation
Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:
More informationKernel density estimation
Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms
More informationNONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION
NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION R. HASHEMI, S. REZAEI AND L. AMIRI Department of Statistics, Faculty of Science, Razi University, 67149, Kermanshah, Iran. ABSTRACT
More informationBaBar Analysis School Statistics Topics
BaBar Analysis School Statistics Topics A toolkit The R Project Frequency and/or Bayes? Interval estimation (much to say here) Hypothesis testing (here also) Displaying Poisson errors Blind methodologies
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationHistogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction
Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationRecall the Basics of Hypothesis Testing
Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationStatistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That
Statistics Lecture 4 August 9, 2000 Frank Porter Caltech The plan for these lectures: 1. The Fundamentals; Point Estimation 2. Maximum Likelihood, Least Squares and All That 3. What is a Confidence Interval?
More information26, 24, 26, 28, 23, 23, 25, 24, 26, 25
The ormal Distribution Introduction Chapter 5 in the text constitutes the theoretical heart of the subject of error analysis. We start by envisioning a series of experimental measurements of a quantity.
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More information1 The Multiple Regression Model: Freeing Up the Classical Assumptions
1 The Multiple Regression Model: Freeing Up the Classical Assumptions Some or all of classical assumptions were crucial for many of the derivations of the previous chapters. Derivation of the OLS estimator
More informationReduction of Variance. Importance Sampling
Reduction of Variance As we discussed earlier, the statistical error goes as: error = sqrt(variance/computer time). DEFINE: Efficiency = = 1/vT v = error of mean and T = total CPU time How can you make
More informationConfidence Intervals
Quantitative Foundations Project 3 Instructor: Linwei Wang Confidence Intervals Contents 1 Introduction 3 1.1 Warning....................................... 3 1.2 Goals of Statistics..................................
More informationA tailor made nonparametric density estimate
A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory
More informationA Calculator for Confidence Intervals
A Calculator for Confidence Intervals Roger Barlow Department of Physics Manchester University England Abstract A calculator program has been written to give confidence intervals on branching ratios for
More informationCHMC: Finite Fields 9/23/17
CHMC: Finite Fields 9/23/17 1 Introduction This worksheet is an introduction to the fascinating subject of finite fields. Finite fields have many important applications in coding theory and cryptography,
More informationNumerical Methods Lecture 7 - Statistics, Probability and Reliability
Topics Numerical Methods Lecture 7 - Statistics, Probability and Reliability A summary of statistical analysis A summary of probability methods A summary of reliability analysis concepts Statistical Analysis
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationMore on Estimation. Maximum Likelihood Estimation.
More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More information1.1.1 Algebraic Operations
1.1.1 Algebraic Operations We need to learn how our basic algebraic operations interact. When confronted with many operations, we follow the order of operations: Parentheses Exponentials Multiplication
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationNonparametric Methods Lecture 5
Nonparametric Methods Lecture 5 Jason Corso SUNY at Buffalo 17 Feb. 29 J. Corso (SUNY at Buffalo) Nonparametric Methods Lecture 5 17 Feb. 29 1 / 49 Nonparametric Methods Lecture 5 Overview Previously,
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationNADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i
NADARAYA WATSON ESTIMATE JAN 0, 2006: version 2 DATA: (x i, Y i, i =,..., n. ESTIMATE E(Y x = m(x by n i= ˆm (x = Y ik ( x i x n i= K ( x i x EXAMPLES OF K: K(u = I{ u c} (uniform or box kernel K(u = u
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationSolution: chapter 2, problem 5, part a:
Learning Chap. 4/8/ 5:38 page Solution: chapter, problem 5, part a: Let y be the observed value of a sampling from a normal distribution with mean µ and standard deviation. We ll reserve µ for the estimator
More informationStatistics 3657 : Moment Approximations
Statistics 3657 : Moment Approximations Preliminaries Suppose that we have a r.v. and that we wish to calculate the expectation of g) for some function g. Of course we could calculate it as Eg)) by the
More information