Kernel Density Estimation
|
|
- Laura Underwood
- 5 years ago
- Views:
Transcription
1 Kernel Density Estimation and Application in Discriminant Analysis Thomas Ledl Universität Wien
2 Contents: Aspects of Application
3
4 observations: 0 Which distribution?
5 0?? ???
6 0 Kernel density estimator model: K(.) and h to choose
7 triangular 0 small h large h kernel/ bandwidth: gaussian
8 Question : Which choice of K(.) and h is the best for a descriptive purpose?
9 - -, 0, - -, -,6-0,9-0, 0,,,9,6 0 0,0 0,0 0,06 0,08 0, 0, 0, Classification: - -, -, -0,6 0,,8,6 - -, 0,, 0,00 0,0 0,0 0,0 0,0 0,0 0,06 0,07 0,08 0,09
10 Levelplot LDA (based on assumption of a multivariate normal distribution): 7 Classification: V V
11 7 Classification: V V
12 Levelplot KDE classificator: 7 7 Classification: V V V V
13 Question : Performance of classification based on KDE in more than dimensions?
14
15 Essential issues Optimization criteria Improvements of the standard model Resulting optimal choices of the model parameters K(.) and h
16 Essential issues Optimization criteria Improvements of the standard model Resulting optimal choices of the model parameters K(.) and h
17 Optimization criteria L p -distances:
18 f(.) g(.)
19
20 =IAE Integrated absolute error =ISE Integrated squared error
21 =IAE Integrated absolute error =ISE Integrated squared error
22 Other ideas: Minimization of the maximum vertical distance Consideration of horizontal distances for a more intuitive fit (Marron and Tsybakov, 99) Compare the number and position of modes
23 Overview about some minimization criteria L -distance=iae Difficult mathematical tractability L -distance=maximum Does not consider difference overall fit Modern criteria, Difficult mathematical which include a kind of tractability measure of the horizontal distances L -distance=ise, MISE,AMISE,... Most commonly used
24 ISE, MISE, AMISE,... ISE is a random variable MISE=E(ISE), the expectation of ISE AMISE=Taylor approximation of MISE, easier to calculate Density x MISE,IV,ISB AMISE,AIV,AISB log0(h)
25 Essential issues Optimization criteria Improvements of the standard model Resulting optimal choices of the model parameters K(.) and h
26 The AMISE-optimal bandwidth
27 The AMISE-optimal bandwidth dependent on the kernel function K(.) minimized by Epanechnikov kernel
28 The AMISE-optimal bandwidth dependent on the unknown density f(.) How to proceed?
29 Data-driven bandwidth selection methods Leave-one-out selectors Maximum Likelihood Cross- Validation Least-squares cross-validation (Bowman, 98) Criteria based on substituting R(f ) in the AMISE-formula Normal rule ( Rule of thumb ; Silverman, 986) Plug-in methods (Sheather and Jones, 99; Park and Marron,990) Smoothed bootstrap
30 Data-driven bandwidth selection methods Leave-one-out selectors Maximum Likelihood Cross- Validation Least-squares cross-validation (Bowman, 98) Criteria based on substituting R(f ) in the AMISE-formula Normal rule ( Rule of thumb ; Silverman, 986) Plug-in methods (Sheather and Jones, 99; Park and Marron,990) Smoothed bootstrap
31 Least squares cross-validation (LSCV) Undisputed selector in the 980s Gives an unbiased estimator for the ISE Suffers from more than one local minimizer no agreement about which one to use Bad convergence rate for the resulting bandwidth h opt
32 Data-driven bandwidth selection methods Leave-one-out selectors Maximum Likelihood Cross- Validation Least-squares cross-validation (Bowman, 98) Criteria based on substituting R(f ) in the AMISE-formula Normal rule ( Rule of thumb ; Silverman, 986) Plug-in methods (Sheather and Jones, 99; Park and Marron,990) Smoothed bootstrap
33 Normal rule ( Rule of thumb ) Assumes f(x) to be N(µ,σ ) Easiest selector Often oversmooths the function The resulting bandwidth is given by:
34 Data-driven bandwidth selection methods Leave-one-out selectors Maximum Likelihood Cross- Validation Least-squares cross-validation (Bowman, 98) Criteria based on substituting R(f ) in the AMISE-formula Normal rule ( Rule of thumb ; Silverman, 986) Plug-in methods (Sheather and Jones, 99; Park and Marron,990) Smoothed bootstrap
35 Plug in-methods (Sheather and Jones, 99; Park and Marron,990) Does not substitute R(f ) in the AMISEformula, but estimates it via R(f (IV) ) and R(f (IV) ) via R(f (VI) ),etc. Another parameter i to chose (the number of stages to go back) one stage is mostly sufficient Better rates of convergence Does not finally circumvent the problem of the unknown density, either
36 The multivariate case h H...the bandwidth matrix
37 Issues of generalization in d dimensions d instead of one bandwidth parameter Unstable estimates Bandwidth selectors are essentially straightforward to generalize For Plug-in methods it is too difficult to give succint expressions for d> dimensions
38 Aspects of Application
39 Essential issues Curse of dimensionality Connection between goodness-of-fit and optimal classification Two methods for discrimatory purposes
40 Essential issues Curse of dimensionality Connection between goodness-of-fit and optimal classification Two methods for discrimatory purposes
41 The curse of dimensionality The data disappears into the distribution tails in high dimensions Probability mass NOT in the "Tail" of a Multivariate Normal Density 00% 80% 60% 0% 0% 0% d # of dimensions : a good fit in the tails is desired!
42 The curse of dimensionality Much data is necessary to obey a constant estimation error in high dimensions Dimensionality Required sample size
43 Essential issues Curse of dimensionality Connection between goodness-of-fit and optimal classification Two methods for discrimatory purposes
44 Essential issues AMISE-optimal parameter choice L -optimal Worse fit in the tails Calculation intensive for large n Optimal classification (in high dimensions) L -optimal (Misclassification rate) Estimation of tails important Many observations required for a reasonable fit
45 Essential issues Curse of dimensionality Connection between goodness-of-fit and optimal classification Two methods for discrimatory purposes
46 Method : Reduction of the data onto a subspace which allows a somewhat accurate estimation, however does not destoy too much information trade-off Use the multivariate kernel density concept to estimate the class densities
47 Method : Use the univariate concept to normalize the data nonparametrically Use the classical methods like LDA and QDA for classification Drawback: calculation intensive
48 Method : a) f(x) F(x) G(x) b) x x t(x) t(x+ ) x x+
49
50 Criticism on former simulation studies Carried out 0-0 years ago Out-dated parameter selectors Restriction to uncorrelated normals Fruitless estimation because of high dimensions No dimension reduction
51 The present simulation study datasets estimators error criteria xx=88 classification scores Many results
52 The present simulation study datasets estimators error criteria xx=88 classification scores Many results
53 Each dataset has classes for distinction observations/class...00 test observations, 00 produced by each class... therfore dimension 00x0
54 Normal Normal-noise small Univariate prototype distributions: Normal-noise medium Normal-noise large Exponential () Bimodal - close Bimodal - far
55 Dataset Nr. Abbrev. contains NN 0 normal distributions with "small noise" NN 0 normal distributions with "medium noise" NN 0 normal distributions with "small noise" SkN skewed (exp-)distributions and 7 normals SkN skewed (exp-)distributions and normals 6 SkN 7 skewed (exp-)distributions and normals 7 Bi normals, skewed and bimodal (close)-dist. 8 Bi normals, skewed and bimodal (close)-dist. 9 Bi 8 skewed and bimodal (far)-dist. 0 Bi 8 skewed and bimodal (far)-dist. 0 datasets having equal covariance matrices +0 datasets having unequal covariance matrices + insurance dataset datasets total
56 datasets estimators error criteria xx=88 classification scores Many results
57 Method (multivariate density estimator): Principal component reduction onto,, and dimensions () x multivariate normal rule and multivariate LSCV-criterion,resp. () Method ( marginal normalizations ): Univariate normal rule and Sheather-Jones plug-in () x subsequent LDA and QDA () Classical methods: LDA and QDA () 8 estimators estimators estimators estimators
58 datasets estimators misclassification criteria xx=88 classification scores Many results
59 Misclassification Criteria The classical Misclassification rate ( Error rate ) The Brier score
60 datasets estimators error criteria xx= 88 classification scores Many results
61 Results The choice of the misclassification criterion is not essential Error rate vs. Brier score Brier score 0,80 0,70 0,60 0,0 0,0 0,0 0,0 0,0 0,00 0,0 0, 0, 0, 0, 0, 0,6 Error rate
62 Results The choice of the multivariate bandwidth parameter (method ) is not essential in most cases Error rates for method LSCV 0,00 0,0 0,00 0,0 0,00 0,0 0,00 0,0 0,00 0,00 Superiority of LSCV in case of bimodals having unequal covariance matrices 0,000 0,000 0,00 0,00 0,00 0,00 0,00 0,600 "Normal rule"
63 Results The choice of the univariate bandwidth parameter (method ) is not essential Error rates for method Sheather-Jones selecto or 0,00 0,0 0,00 0,0 0,00 0,00 0,000 0,000 0,00 0,00 0,0 0,00 0,0 0,00 "Normal rule"
64 Results The best trade-off is a projection onto - dimensions Error rate regarding different subspaces 0,0 0,00 0,0 0,00 0,0 0,00 0,00 0,000 # dimensions NN-distributions SkN-distributions Bi-distributions
65 Results Equal covariance matrices: Method performs sometimes inferior slightly against improves LDA Error rate 0,00 0,00 0,0 0,0 0,00 0,00 0,0 0,00 0,0 0,0 0,00 0,00 0,00 0,00 0,000 0,000 N N N N N N SkN SkN SkN Bi Bi Bi Bi NN NN NN SkN SkN SkN Bi Bi Bi Bi LDA (classical) LDA (classical) Normal LSCV() rule (in- method )
66 Results E rror rate E rro r rate Unequal Unqual covariance matrices: matrices: Method Method often performs improves quite essentially poor, but not for skewed distributions 0,0 0,0 0,00 0,00 0,0 0,0 0,00 0,00 0,00 0,000 0,000 NN NN NN NN NN NN SkN SkN SkN SkN SkN SkN Bi Bi Bi Bi Bi Bi Bi Bi QDA (classical) QDA (classical) LSCV() Normal - rule method (in method )
67 Results Is the additional calculation time justified? Required calculation time LDA,QDA multivariate "normal rule" Preliminary univariate normalizations,lscv, Sheather-Jones plug-in
68
69 (/) Classification Performance Restriction to only a few dimensions Improvements with respect to the classical discrimination methods by marginal normalizations (especially for unequal covariance matrices) Poor performance of the multivariate kernel density classificator LDA is undisputed in the case of equal covariance matrices and equal prior probabilities Additional computation time seems not to be justified
70 (/) Classification Performance Restriction to only a few dimensions Improvements with respect to the classical discrimination methods by marginal normalizations (especially for unequal covariance matrices) Poor performance of the multivariate kernel density classificator LDA is undisputed in the case of equal covariance matrices and equal prior probabilities Additional computation time seems not to be justified
71 (/) Classification Performance Restriction to only a few dimensions Improvements with respect to the classical discrimination methods by marginal normalizations (especially for unequal covariance matrices) Poor performance of the multivariate kernel density classificator LDA is undisputed in the case of equal covariance matrices and equal prior probabilities Additional computation time seems not to be justified
72 (/) Classification Performance Restriction to only a few dimensions Improvements with respect to the classical discrimination methods by marginal normalizations (especially for unequal covariance matrices) Poor performance of the multivariate kernel density classificator LDA is undisputed in the case of equal covariance matrices and equal prior probabilities Additional computation time seems not to be justified
73 (/) Classification Performance Restriction to only a few dimensions Improvements with respect to the classical discrimination methods by marginal normalizations (especially for unequal covariance matrices) Poor performance of the multivariate kernel density classificator LDA is undisputed in the case of equal covariance matrices and equal prior probabilities Additional computation time seems not to be justified
74 (/) KDE for Data Description Great variety in error criteria, parameter selection procedures and additional model improvements ( dimensions) No correspondence about a feasible error criterion Nobody knows, what is finally optimized ( upper bounds in L -theory, L -theory: ISE MISE AMISE,several minima in LSCV,...) Different parameter selectors are of varying quality with respect to different underlying densities
75 (/) KDE for Data Description Great variety in error criteria, parameter selection procedures and additional model improvements ( dimensions) No correspondence about a feasible error criterion Nobody knows, what is finally optimized ( upper bounds in L -theory, L -theory: ISE MISE AMISE,several minima in LSCV,...) Different parameter selectors are of varying quality with respect to different underlying densities
76 (/) KDE for Data Description Great variety in error criteria, parameter selection procedures and additional model improvements ( dimensions) No correspondence about a feasible error criterion Nobody knows, what is finally optimized ( upper bounds in L -theory, L -theory: ISE MISE AMISE,several minima in LSCV,...) Different parameter selectors are of varying quality with respect to different underlying densities
77 (/) KDE for Data Description Great variety in error criteria, parameter selection procedures and additional model improvements ( dimensions) No correspondence about a feasible error criterion Nobody knows, what is finally optimized ( upper bounds in L -theory, L -theory: ISE MISE AMISE,several minima in LSCV,...) Different parameter selectors are of varying quality with respect to different underlying densities
78 (/) vs. Application Comprehensive theoretical results about optimal kernels or optimal bandwidths are not relevant for classification For discrimatory purposes the issue of estimating logdensities is much more important Some univariate model improvements are not generalizable The widely ignored curse of dimensionality forces the user to achieve a trade-off between necessary dimension reduction and information loss Dilemma: Much data is required for accurate estimates Much data lead to a explosion of the computation time
79 (/) vs. Application Comprehensive theoretical results about optimal kernels or optimal bandwidths are not relevant for classification For discrimatory purposes the issue of estimating logdensities is much more important Some univariate model improvements are not generalizable The widely ignored curse of dimensionality forces the user to achieve a trade-off between necessary dimension reduction and information loss Dilemma: Much data is required for accurate estimates Much data lead to a explosion of the computation time
80 (/) vs. Application Comprehensive theoretical results about optimal kernels or optimal bandwidths are not relevant for classification For discrimatory purposes the issue of estimating logdensities is much more important Some univariate model improvements are not generalizable The widely ignored curse of dimensionality forces the user to achieve a trade-off between necessary dimension reduction and information loss Dilemma: Much data is required for accurate estimates Much data lead to a explosion of the computation time
81 (/) vs. Application Comprehensive theoretical results about optimal kernels or optimal bandwidths are not relevant for classification For discrimatory purposes the issue of estimating logdensities is much more important Some univariate model improvements are not generalizable The widely ignored curse of dimensionality forces the user to achieve a trade-off between necessary dimension reduction and information loss Dilemma: Much data is required for accurate estimates Much data lead to a explosion of the computation time
82 (/) vs. Application Comprehensive theoretical results about optimal kernels or optimal bandwidths are not relevant for classification For discrimatory purposes the issue of estimating logdensities is much more important Some univariate model improvements are not generalizable The widely ignored curse of dimensionality forces the user to achieve a trade-off between necessary dimension reduction and information loss Dilemma: Much data is required for accurate estimates Much data lead to a explosion of the computation time
83 The End
Kernel Density Estimation: Theory and Application in Discriminant Analysis
AUSTRIAN JOURNAL OF STATISTICS Volume 33 (2004), Number 3, 267-279 Kernel Density Estimation: Theory and Application in Discriminant Analysis Thomas Ledl Department of Statistics and Decision Support Systems,
More informationO Combining cross-validation and plug-in methods - for kernel density bandwidth selection O
O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem
More informationA Novel Nonparametric Density Estimator
A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with
More informationKernel Density Estimation
Kernel Density Estimation Theory, Aspects of Dimension and Application in Discriminant Analysis eingereicht von: Thomas Ledl DIPLOMARBEIT zur Erlangung des akademischen Grades Magister rerum socialium
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More information1 Introduction A central problem in kernel density estimation is the data-driven selection of smoothing parameters. During the recent years, many dier
Bias corrected bootstrap bandwidth selection Birgit Grund and Jorg Polzehl y January 1996 School of Statistics, University of Minnesota Technical Report No. 611 Abstract Current bandwidth selectors for
More informationNonparametric Density Estimation (Multidimension)
Nonparametric Density Estimation (Multidimension) Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann February 19, 2007 Setup One-dimensional
More informationQuantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management
Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationPreface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation
Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationJ. Cwik and J. Koronacki. Institute of Computer Science, Polish Academy of Sciences. to appear in. Computational Statistics and Data Analysis
A Combined Adaptive-Mixtures/Plug-In Estimator of Multivariate Probability Densities 1 J. Cwik and J. Koronacki Institute of Computer Science, Polish Academy of Sciences Ordona 21, 01-237 Warsaw, Poland
More informationLocal Polynomial Modelling and Its Applications
Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationBootstrap-Based T 2 Multivariate Control Charts
Bootstrap-Based T 2 Multivariate Control Charts Poovich Phaladiganon Department of Industrial and Manufacturing Systems Engineering University of Texas at Arlington Arlington, Texas, USA Seoung Bum Kim
More informationBoosting kernel density estimates: A bias reduction technique?
Biometrika (2004), 91, 1, pp. 226 233 2004 Biometrika Trust Printed in Great Britain Boosting kernel density estimates: A bias reduction technique? BY MARCO DI MARZIO Dipartimento di Metodi Quantitativi
More informationESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL A COMPARISON OF TWO NONPARAMETRIC DENSITY MENGJUE TANG A THESIS MATHEMATICS AND STATISTICS
A COMPARISON OF TWO NONPARAMETRIC DENSITY ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL MENGJUE TANG A THESIS IN THE DEPARTMENT OF MATHEMATICS AND STATISTICS PRESENTED IN PARTIAL FULFILLMENT OF THE
More informationNonparametric Density Estimation. October 1, 2018
Nonparametric Density Estimation October 1, 2018 Introduction If we can t fit a distribution to our data, then we use nonparametric density estimation. Start with a histogram. But there are problems with
More informationDensity Estimation (II)
Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation
More informationA NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE
BRAC University Journal, vol. V1, no. 1, 2009, pp. 59-68 A NOTE ON THE CHOICE OF THE SMOOTHING PARAMETER IN THE KERNEL DENSITY ESTIMATE Daniel F. Froelich Minnesota State University, Mankato, USA and Mezbahur
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationOFFICE OF NAVAL RESEARCH FINAL REPORT for TASK NO. NR PRINCIPAL INVESTIGATORS: Jeffrey D. Hart Thomas E. Wehrly
AD-A240 830 S ~September 1 10, 1991 OFFICE OF NAVAL RESEARCH FINAL REPORT for 1 OCTOBER 1985 THROUGH 31 AUGUST 1991 CONTRACT N00014-85-K-0723 TASK NO. NR 042-551 Nonparametric Estimation of Functions Based
More informationInput-Dependent Estimation of Generalization Error under Covariate Shift
Statistics & Decisions, vol.23, no.4, pp.249 279, 25. 1 Input-Dependent Estimation of Generalization Error under Covariate Shift Masashi Sugiyama (sugi@cs.titech.ac.jp) Department of Computer Science,
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationON SOME TWO-STEP DENSITY ESTIMATION METHOD
UNIVESITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XLIII 2005 ON SOME TWO-STEP DENSITY ESTIMATION METHOD by Jolanta Jarnicka Abstract. We introduce a new two-step kernel density estimation method,
More informationHypothesis testing:power, test statistic CMS:
Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationIndependent Factor Discriminant Analysis
Independent Factor Discriminant Analysis Angela Montanari, Daniela Giovanna Caló, and Cinzia Viroli Statistics Department University of Bologna, via Belle Arti 41, 40126, Bologna, Italy (e-mail: montanari@stat.unibo.it,
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLikelihood Cross-Validation Versus Least Squares Cross- Validation for Choosing the Smoothing Parameter in Kernel Home-Range Analysis
Research Article Likelihood Cross-Validation Versus Least Squares Cross- Validation for Choosing the Smoothing Parameter in Kernel Home-Range Analysis JON S. HORNE, 1 University of Idaho, Department of
More informationarxiv: v2 [stat.me] 12 Sep 2017
1 arxiv:1704.03924v2 [stat.me] 12 Sep 2017 A Tutorial on Kernel Density Estimation and Recent Advances Yen-Chi Chen Department of Statistics University of Washington September 13, 2017 This tutorial provides
More informationAdaptive Nonparametric Density Estimators
Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationDimensionality Reduction and Principal Components
Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X
More informationECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd
ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation Petra E. Todd Fall, 2014 2 Contents 1 Review of Stochastic Order Symbols 1 2 Nonparametric Density Estimation 3 2.1 Histogram
More informationCLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS
CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS EECS 833, March 006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@gs.u.edu 864-093 Overheads and resources available at http://people.u.edu/~gbohling/eecs833
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationClassification via kernel regression based on univariate product density estimators
Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP
More informationTransformation-based Nonparametric Estimation of Multivariate Densities
Transformation-based Nonparametric Estimation of Multivariate Densities Meng-Shiuh Chang Ximing Wu March 9, 2013 Abstract We propose a probability-integral-transformation-based estimator of multivariate
More informationIntroduction to Curve Estimation
Introduction to Curve Estimation Density 0.000 0.002 0.004 0.006 700 800 900 1000 1100 1200 1300 Wilcoxon score Michael E. Tarter & Micheal D. Lock Model-Free Curve Estimation Monographs on Statistics
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationComputational Statistics. Jian Pei School of Computing Science Simon Fraser University
Computational Statistics Jian Pei School of Computing Science Simon Fraser University jpei@cs.sfu.ca BASIC OPTIMIZATION METHODS J. Pei: Computational Statistics 2 Why Optimization? In statistical inference,
More informationFeature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size
Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski
More informationDimensionality Reduction and Principle Components
Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,
More informationMultivariate Gaussians. Sargur Srihari
Multivariate Gaussians Sargur srihari@cedar.buffalo.edu 1 Topics 1. Multivariate Gaussian: Basic Parameterization 2. Covariance and Information Form 3. Operations on Gaussians 4. Independencies in Gaussians
More informationAPPLIED NONPARAMETRIC DENSITY AND REGRESSION ESTIMATION WITH DISCRETE DATA: PLUG-IN BANDWIDTH SELECTION AND NON-GEOMETRIC KERNEL FUNCTIONS
APPLIED NONPARAMETRIC DENSITY AND REGRESSION ESTIMATION WITH DISCRETE DATA: PLUG-IN BANDWIDTH SELECTION AND NON-GEOMETRIC KERNEL FUNCTIONS by CHI-YANG CHU DANIEL HENDERSON, COMMITTEE CHAIR SUBHA CHAKRABORTI
More informationMultivariate Distributions
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More informationarxiv: v2 [stat.me] 13 Sep 2007
Electronic Journal of Statistics Vol. 0 (0000) ISSN: 1935-7524 DOI: 10.1214/154957804100000000 Bandwidth Selection for Weighted Kernel Density Estimation arxiv:0709.1616v2 [stat.me] 13 Sep 2007 Bin Wang
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationLinear Decision Boundaries
Linear Decision Boundaries A basic approach to classification is to find a decision boundary in the space of the predictor variables. The decision boundary is often a curve formed by a regression model:
More informationAcceleration of some empirical means. Application to semiparametric regression
Acceleration of some empirical means. Application to semiparametric regression François Portier Université catholique de Louvain - ISBA November, 8 2013 In collaboration with Bernard Delyon Regression
More informationarxiv: v1 [stat.me] 25 Mar 2019
-Divergence loss for the kernel density estimation with bias reduced Hamza Dhaker a,, El Hadji Deme b, and Youssou Ciss b a Département de mathématiques et statistique,université de Moncton, NB, Canada
More informationPivot Selection Techniques
Pivot Selection Techniques Proximity Searching in Metric Spaces by Benjamin Bustos, Gonzalo Navarro and Edgar Chávez Catarina Moreira Outline Introduction Pivots and Metric Spaces Pivots in Nearest Neighbor
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationBayesian Adaptive Bandwidth Kernel Density Estimation of Irregular Multivariate Distributions
Bayesian Adaptive Bandwidth Kernel Density Estimation of Irregular Multivariate Distributions Shuowen Hu, D. S. Poskitt, Xibin Zhang Department of Econometrics and Business Statistics, Monash University,
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationFrontier estimation based on extreme risk measures
Frontier estimation based on extreme risk measures by Jonathan EL METHNI in collaboration with Ste phane GIRARD & Laurent GARDES CMStatistics 2016 University of Seville December 2016 1 Risk measures 2
More informationData-Based Choice of Histogram Bin Width. M. P. Wand. Australian Graduate School of Management. University of New South Wales.
Data-Based Choice of Histogram Bin Width M. P. Wand Australian Graduate School of Management University of New South Wales 13th May, 199 Abstract The most important parameter of a histogram is the bin
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationBayesian estimation of bandwidths for a nonparametric regression model with a flexible error density
ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Bayesian estimation of bandwidths for a nonparametric regression model
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationNonparametric Estimation of Luminosity Functions
x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1 Luminosity Functions The luminosity function gives the number
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationAnalysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example
Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.
More informationChemometrics: Classification of spectra
Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture
More informationKernel density estimation
Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms
More informationAn Introduction to Multivariate Statistical Analysis
An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents
More informationEstimation for nonparametric mixture models
Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationDIFFERENTIATION. MICROECONOMICS Principles and Analysis Frank Cowell. July 2017 Frank Cowell: Differentiation
DIFFERENTIATION MICROECONOMICS Principles and Analysis Frank Cowell 1 Overview... Differentiation Basics Basic definitions Chain rule Elasticities l Hôpital s rule 2 Definition (1) Take the univariate
More informationHypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationWhy is the field of statistics still an active one?
Why is the field of statistics still an active one? It s obvious that one needs statistics: to describe experimental data in a compact way, to compare datasets, to ask whether data are consistent with
More informationBasic Statistical Tools
Structural Health Monitoring Using Statistical Pattern Recognition Basic Statistical Tools Presented by Charles R. Farrar, Ph.D., P.E. Los Alamos Dynamics Structural Dynamics and Mechanical Vibration Consultants
More information