12 - Nonparametric Density Estimation

Size: px
Start display at page:

Download "12 - Nonparametric Density Estimation"

Transcription

1 ST 697 Fall / Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama

2 Density Review ST 697 Fall /49

3 Continuous Random Variables ST 697 Fall / F(x) f(x) x

4 Discrete Random Variables ST 697 Fall / F(x) p(x) x

5 Empirical CDF and PDF Gaussian CDF (µ=0, σ=1) CDF ECDF F(x) f_n(x) x ECDF of n = 50 from X N(0, 1) ST 697 Fall /49

6 Review of Parametric Density Estimation ST 697 Fall /49 1. Choose parametric distribution family 2. Estimate parameters Method of Moments Maximum Likelihood Bayesian (also choose prior)

7 Density Histograms ST 697 Fall /49

8 Density Histograms ST 697 Fall /49 Histograms estimate the density as a piecewise constant function. J ˆf(x) = b j (x) ˆθ j j=1 where b j (x) = 1(x bin j )/h j and bin j = [t j, t j+1 ) t 1 < t 2 <... < t J are the break points for the bins h j = [t j, t j+1 ) = t j+1 t j is the bin width of bin j bin widths do not have to be equal bin j bin k = Some slight adjustments need to be made for multivariate

9 Estimating Density Histograms ST 697 Fall /49 Observe data D = {X 1, X 2,..., X n } Denote n j as the number of observations in bin j ˆθj = ˆp j = n j /n is the usual (MLE) estimate Shrinkage Estimator ˆθ j = ( ) ( ) n A ˆp j + u j n + A n + A where = πˆp j + (1 π)u j A (number of pseudo observations in bin j) u j = h j / k h k (uniform prior) 0 π 1 (alternative representation) What are constraints on {θ j } that ensure ˆf is a proper density?

10 Regular Histogram ST 697 Fall / Density x

11 Histogram with Shrinkage ST 697 Fall / MLE Prior Shrinkage A=1 Shrinkage A= Density x

12 German Credit Data ST 697 Fall /49 statlog/german/german.data url = " data = read.delim(url,sep=" ",header=false) germancredit = data[,-21] Y = data[,21] G = ifelse(y==1, "good", "bad") good = (G == "good")

13 Histogram Density Ratio: German Credit Data ST 697 Fall /49 density x2 3 2 shrinkage a=1 log(f1/f0) x2

14 Binning ST 697 Fall /49 The bins of a histogram can be of fixed width h j = h or variable width. Smaller bins have less bias, but more variance Shrinkage lowers variance, but increases bias Percentile binning set the bin width according to the number of observations Creating J bins {t j : j = 1, 2,..., J + 1}: = {t 1, h, J} (equal bin width of width h) = {t 1, t J+1, J} (equal bin width h = (t J+1 t 1 )/J) = {t j : t j = F 1 n (j/j)} (percentile binning)

15 Binning Example ST 697 Fall / Histogram Truth Regular Binning 0.3 Density Histogram Truth Percentile Binning Density

16 Binning Example: Basis and Parameters ST 697 Fall /49 Fixed Binning Basis Functions Percentile Binning Basis Functions b(x) b(x) x x Parameters Fixed Binning Parameters Percentile Binning θ θ x x

17 Optimal Bin Width ST 697 Fall /49 The optimal bin widths for density estimation may not be the same as the optimal bin widths for finding the log density ratio (like we need for naive Bayes or anomaly detection). If we are interested in the log density ratio, should we use the same binning? What if unbalanced classes?

18 Histogram Bin Width: German Credit Data ST 697 Fall /49 J = 10 Bins Fixed bin width Percentile bin width density 0.00 density x log(f1/f0) 0 log(f1/f0)

19 Histogram Bin Width: German Credit Data ST 697 Fall /49 J = 20 Bins Fixed bin width Percentile bin width density 0.00 density x log(f1/f0) 0 log(f1/f0)

20 Histogram Bin Width: German Credit Data ST 697 Fall /49 J = 5 Bins Fixed bin width Percentile bin width density 0.00 density x log(f1/f0) 0 log(f1/f0)

21 Review of Histograms ST 697 Fall /49 We have considered regular and percentile histograms for nonparametric density estimation Histograms can be thought of as a local method of density estimation. Points are local to each other if they fall in the same bin Local is determined by bin breaks But this has some issues: Some observations closer" to observations in a neighboring bin Estimate is not smooth (but true density can often be assumed smooth) Bin shifts can have a big influence on the resulting density Do parametric approaches estimate a density locally?

22 Histogram Neighborhood Consider estimating the density at a location x 0 For a regular histogram (with bin width h), the MLE density is ˆf(x 0 ) = n j for x 0 bin j nh which is a function of the number of observations in bin j. But how do you feel if x 0 is close to the boundary? Histogram of x Density x ST 697 Fall 2017 x 22/49

23 Buffalo Snowfall ( ): Bin width ST 697 Fall /49 Scott (1992) Multivariate Density Estimation, Chapter 4

24 Buffalo Snowfall ( ): Sensitivity to bin shifts ST 697 Fall /49 Scott (1992) Multivariate Density Estimation, Chapter 4

25 Kernel Density Estimation ST 697 Fall /49

26 Local Density Estimation - Moving Window Consider again estimating the density at a location x 0 - Regular Histogram (with midpoints m j and bin width h) ( ) ˆf(x 0 ) = 1 n 1 x i m j h 2 for x 0 B j n h i=1 Consider a moving window approach ( ) ˆf(x 0 ) = 1 n 1 x i x 0 h 2 n h i=1 This gives a more pleasing definition of local by centering a bin at x 0. Equivalently this estimates the derivative of ECDF ˆf(x 0 ) = F n(x 0 + h/2) F n (x 0 h/2) h ST 697 Fall /49

27 Uniform Kernel ST 697 Fall /49 Uniform Kernel 0.4 Histogram Uniform Kernel Truth 0.3 Density x

28 Kernel Density Estimation ST 697 Fall /49 The moving window approach looks better than a histogram with the same bin width, but it is still not smooth Instead of giving every observation in the window the same weight, we can assign a weight according to its distance from x Uniform Gaussian 1.5 kernel weight distance from x_0

29 Gaussian Kernel ST 697 Fall /49 Gaussian Kernel 0.4 Histogram Gaussian Kernel Uniform Kernel Truth 0.3 Density x

30 Kernel Density Estimation ST 697 Fall /49 More generally, the weights K h (u) = h 1 K ( u h) are called kernel functions Thus, a kernel density estimator is of the form ˆf(x 0 ) = 1 n K h (x i x 0 ) n i=1 where the smoothing parameter h is called the bandwidth and controls how fast the weights decay as a function from x 0 In R, density() function uses a bandwidth (bw argument) that is the standard deviation of the kernel

31 Kernel Properties ST 697 Fall /49 A kernel is usually considered to be a symmetric probability density function: K h (u) 0 K h (u) du = 1 (non-negative) (integrates to one) K h (u) = K h ( u) (symmetric about 0) Notice that if the kernel has compact support, so does the resulting density estimate The Gaussian kernel is the most popular, but has infinite support The is good when the true density has infinite support However, this requires more computation But easier to calculate properties (e.g., bandwidth selection)

32 Popular Kernels ST 697 Fall /49 bw = gaussian epanechnikov rectangular triangular biweight optcosine kernel weight distance from x_0

33 Bandwidth ST 697 Fall /49 The bandwidth parameter, h controls the amount of smoothing What happens when h? What happens when h 0? The choice of bandwidth is much more important than the choice of kernel There is no standard representation of the bandwidth parameter h In R, density() uses two arguments to set the bandwidth: bw is the standard deviation of the kernel, and width is the length of the support of the kernel The two are very different so check carefully what an author is using in their definition of bandwidth

34 Bandwidth Effects ST 697 Fall /49 gaussian kernel, bw = 0.1 gaussian kernel, bw = 0.3 gaussian kernel, bw = density density density x x x biweight kernel, bw = 0.1 biweight kernel, bw = 0.3 biweight kernel, bw = density density density x x x

35 Another Perspectives ST 697 Fall /49 We have described KDE as taking a local density around location x 0 An alternative perspective is to view KDE as an n component mixture model with mixture weights of 1/n ˆf(x) = 1 n K h (x i x) n i=1 n 1 = n f i(x) j=1 ( fi (x) = K h (x i x) ) Or in a basis function representation n ( ˆf(x) = θ j b i (x) θj = 1 n, b i(x) = K h (x i x) ) j=1

36 Component Kernels ˆf(x) = 1 n K h (x i x) n i= density x ST 697 Fall /49

37 Kernel Based Density Ratio ST 697 Fall /49 Using kernel density estimation (KDE), the density ratio useful for naive Bayes, etc. becomes f 1 (x) f 0 (x) ˆ= ˆf 1 (x) ˆf 0 (x) = 1 n1 n 1 i=1 K h 1 (x i x) 1 n0 n 0 j=1 K h 0 (x j x)

38 KDE: German Credit Data ST 697 Fall /49 bw1 = 3, bw0 = 3 bw1 = 10, bw0 = 10 bw1 = 1, bw0 = density 0.02 density density x x log(f1/f0) 0 log(f1/f0) 0 log(f1/f0)

39 Edge Effects ST 697 Fall /49 Sometimes there are known boundaries in the data (e.g., the amount of rainfall cannot be negative). Here are some options: 1. Do nothing - as long as not many events are near the boundary and the bandwidth is small, this may not be too problematic. However, it will lead to an increased bias around the boundaries. 2. Transform the data (e.g., x = log(x)), estimate the density is the transformed space, then transform back 3. Use an edge correction technique

40 Edge Correction (1) ST 697 Fall /49 The simplest approach requires a modification of the kernels near the boundary. Let S = [a, b]. Recall that b a K h(x i x) dx should be 1 for every i. But near a boundary b a K h(x i x) dx 1 Denote w h (x i ) = b a K h(x i x) dx The resulting edge corrected KDE equation becomes ˆf(x) = 1 n w h (x i ) 1 K h (x i x) n i=1

41 Edge Correction (2) ST 697 Fall /49 Another approach corrects the kernel for each particular x Denote w h (x) = b a K h(u x) du The resulting edge corrected KDE equation becomes ˆf(x) = 1 n w h (x) 1 K h (x i x) n i=1 This approach is not guaranteed to integrate to 1, but for some problems this is not a major concern

42 Adaptive Kernels Up to this point, we have considered fixed bandwidths. But what if we let the bandwidth vary? There are two main approaches: Balloon Estimator ˆf(x) = 1 n K n h(x) (x i x) i=1 = 1 n ( ) xi x K nh(x) h(x) i=1 Sample Point Estimator \end{frame} ˆf(x) = 1 n K n h(xi )(x i x) i=1 = 1 n ( ) h(x i ) 1 xi x K n h(x i=1 i ) ST 697 Fall /49

43 k-nearest Neighbor Density Approach ST 697 Fall /49 Like what we discussed for percentile binning, we can estimate the density from the size of the window containing the k nearest observations. k ˆf k (x) = nv k (x) where V k (x) is the volume of a neighborhood that contains the k-nearest neighbors. This is an adaptive version of the moving window (uniform kernel) approach Probably won t integrate to 1 It is also possible to use the k-nn distance as a way to select an adaptive bandwidth h(x).

44 Multivariate Density Estimation ST 697 Fall /49 1. Multivariate kernels (e.g., K(u) = N(0, Σ)) ˆf(x) = 1 (2π) d/2 Σ 1/2 n n i=1 ( exp 1 ) 2 (x x i) T Σ 1 (x x i ) Let Σ = h 2 A where A = 1, thus Σ = h 2d ˆf(x) = 1 (2π) d/2 h d n n i=1 ( exp 1 ) 2 (x x i) T A 1 (x x i ) 2. Product Kernels (A = I d ) ˆf(x) = 1 n d K hj (x j x ij ) n i=1 j=1

45 Mixture Models ST 697 Fall /49 Mixture models offer a flexible compromise between kernel density and parametric methods. A mixture model is a mixture of densities where p f(x) = π j g j (x ξ j ) j=1 0 π j 1 and p j=1 π j are the mixing proportions g j (x ξ j ) are the component densities This idea is behind model-based clustering (ST 640), radial basis functions (ESL 6.7), etc. Usually the parameters θ j and weights π j have to be estimated (EM algorithm shows up here)

46 Kernels, Mixtures, and Splines ST 697 Fall /49 All of these methods can be written: J f(x) = b j (x)θ j j=1 For KDE: J = n, b j (x) = K h (x x j ), θ j = 1/n (bw h estimated) For Mixture Models: b j (x) = g j (x ξ j ), θ j = π j (J, ξ j, π j estimated) B-splines b j (x) is a B-spline, (θ j, and maybe J and knots, estimated) J Note: For density estimation, log f(x) = j=1 b j(x)θ j may be easier to estimate

47 Kernel Regression ST 697 Fall /49

48 Other Issues ST 697 Fall /49 One major problem with KDE and kernel regression (and k-nn) is that all of training data must be stored in memory. For large data, this is unreasonable Multi-dimensional kernels are not very good for high dimensions (unless simplified by using product kernels) But temporal kernels good for adaptive procedures (e.g., only remember most recent observations) Think about what EWMA is doing

49 Smoothing for Categorical Data ST 697 Fall /49

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Kernel density estimation

Kernel density estimation Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 4 Spatial Point Patterns Definition Set of point locations with recorded events" within study

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Supervised Learning: Non-parametric Estimation

Supervised Learning: Non-parametric Estimation Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:

More information

Density and Distribution Estimation

Density and Distribution Estimation Density and Distribution Estimation Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Density

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 3. Instance Based Learning Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Outline Parzen Windows Kernels, algorithm Model selection

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION

NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION R. HASHEMI, S. REZAEI AND L. AMIRI Department of Statistics, Faculty of Science, Razi University, 67149, Kermanshah, Iran. ABSTRACT

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Probability Models for Bayesian Recognition

Probability Models for Bayesian Recognition Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

From CDF to PDF A Density Estimation Method for High Dimensional Data

From CDF to PDF A Density Estimation Method for High Dimensional Data From CDF to PDF A Density Estimation Method for High Dimensional Data Shengdong Zhang Simon Fraser University sza75@sfu.ca arxiv:1804.05316v1 [stat.ml] 15 Apr 2018 April 17, 2018 1 Introduction Probability

More information

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Nonparametric Density Estimation (Multidimension)

Nonparametric Density Estimation (Multidimension) Nonparametric Density Estimation (Multidimension) Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann February 19, 2007 Setup One-dimensional

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Nonparametric Methods Lecture 5

Nonparametric Methods Lecture 5 Nonparametric Methods Lecture 5 Jason Corso SUNY at Buffalo 17 Feb. 29 J. Corso (SUNY at Buffalo) Nonparametric Methods Lecture 5 17 Feb. 29 1 / 49 Nonparametric Methods Lecture 5 Overview Previously,

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Multiple Linear Regression for the Supervisor Data

Multiple Linear Regression for the Supervisor Data for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

probability of k samples out of J fall in R.

probability of k samples out of J fall in R. Nonparametric Techniques for Density Estimation (DHS Ch. 4) n Introduction n Estimation Procedure n Parzen Window Estimation n Parzen Window Example n K n -Nearest Neighbor Estimation Introduction Suppose

More information

A tailor made nonparametric density estimate

A tailor made nonparametric density estimate A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

Non-parametric Methods

Non-parametric Methods Non-parametric Methods Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 2 of Pattern Recognition and Machine Learning by Bishop (with an emphasis on section 2.5) Möller/Mori 2 Outline Last

More information

Adaptive Nonparametric Density Estimators

Adaptive Nonparametric Density Estimators Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Kernel-based density. Nuno Vasconcelos ECE Department, UCSD

Kernel-based density. Nuno Vasconcelos ECE Department, UCSD Kernel-based density estimation Nuno Vasconcelos ECE Department, UCSD Announcement last week of classes we will have Cheetah Day (exact day TBA) what: 4 teams of 6 people each team will write a report

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Copulas. MOU Lili. December, 2014

Copulas. MOU Lili. December, 2014 Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics Jessi Cisewski Yale University Astrostatistics Summer School - XI Wednesday, June 3, 2015 1 Overview Many of the standard statistical inference procedures are based on assumptions

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Kernel Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 2: 2 late days to hand it in tonight. Assignment 3: Due Feburary

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Proofs and derivations

Proofs and derivations A Proofs and derivations Proposition 1. In the sheltering decision problem of section 1.1, if m( b P M + s) = u(w b P M + s), where u( ) is weakly concave and twice continuously differentiable, then f

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Kernel Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 21

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn

More information

On variable bandwidth kernel density estimation

On variable bandwidth kernel density estimation JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Motivational Example

Motivational Example Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.

More information

Nonparametric Estimation of Luminosity Functions

Nonparametric Estimation of Luminosity Functions x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1 Luminosity Functions The luminosity function gives the number

More information

2. (Today) Kernel methods for regression and classification ( 6.1, 6.2, 6.6)

2. (Today) Kernel methods for regression and classification ( 6.1, 6.2, 6.6) 1. Recap linear regression, model selection, coefficient shrinkage ( 3.1, 3.2, 3.3, 3.4.1,2,3) logistic regression, linear discriminant analysis (lda) ( 4.4, 4.3) regression with spline basis (ns, bs),

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Econometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018

Econometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018 Econometrics I Lecture 10: Nonparametric Estimation with Kernels Paul T. Scott NYU Stern Fall 2018 Paul T. Scott NYU Stern Econometrics I Fall 2018 1 / 12 Nonparametric Regression: Intuition Let s get

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation Univariate Density Estimation Suppose tat we ave a random sample of data X 1,..., X n from an unknown continuous distribution wit probability density function (pdf) f(x) and cumulative

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Introduction to Regression

Introduction to Regression Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML) Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information