From Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch
|
|
- Barnaby Scott
- 6 years ago
- Views:
Transcription
1 From Histograms to Multivariate Polynomial Histograms and Shape Estimation Assoc Prof Inge Koch Statistics, School of Mathematical Sciences University of Adelaide Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 1 / 27
2 Motivation: determine the shape of data We have 12 measurements on each of 27,994 blood cells How many cluster? How big are they and where are they? Data: Centre for Immunology, St Vincent Hospital, Sydney Immunologists want to differentiate between healthy individuals from those with HIV +. Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 2 / 27
3 Look at the (Log-Data) 2 blood cells 4 blood cells 8 CD3 5 CD CD CD4 2 5 CD CD4 1 blood cells blood cells CD CD4 CD8 CD CD4 CD8 Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 3 / 27
4 Histograms of the (Log-Data) 2 CD3 1 bins 4 CD3 5 bins Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 4 / 27
5 Histograms of the (Log-Data) Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 5 / 27
6 How Many Cluster are in the Data? One-dimensional data: 1 or 2 modes; Two-dimensional data: 1 to 3 or 4 modes; How many clusters are in the 12-dimensional data? If the measurements were independent, then the number of modes would be the product but this is not the case in our data Can you think of a 3D example with k modes such that the 2D projections have k 1 modes? Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 6 / 27
7 Polynomial Histogram Estimators Main idea histograms have flat tops, so instead of only estimating the number of points in each bin estimate the shape separately in each bin Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 7 / 27
8 What are Polynomial Histogram Estimators? Number of observations n, dimension d, binwidth h B l = h d a bin with n l observations The model for each bin B l 1 histogram estimators (Hist) f (x) = a 2 first-order polynomial histogram estimator (Fophe) f 1 (x) = a + a T x 3 second-order polynomial histogram estimator (Sophe) f 2 (x) = a + a T x + x T Ax Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 8 / 27
9 Relationships for Coefficients In each bin B l the estimate f k satisfies 1 proportion of data 2 local mean 3 local second moment B l f k (x)dx = n l n B l xf k (x)dx = n l n x l B l xx T f k (x)dx = n l n M l Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 9 / 27
10 The New Estimators In each bin B l with bin centre t l Fophe Sophe f1 (x) = 1 n l [ h ( x h d+2 l t l ) T (x t l ) ] n f2 (x) = 1 n l h d+4 n { (4 + 5d) h 4 15h 2 tr(s l ) + 12h 2 (x t l ) T ( x l t l ) 4 + (x t l ) T [ 72S l + 18 diag(s l ) 15h 2 I ] (x t l ) }. Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 1 / 27
11 Roederer Data: 1, observations, CD4 & CD8 Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
12 The performance of estimators We assess the performance of estimators with the MSE. Let θ be an estimator for a true quantity θ. Then MSE( θ) = [ bias( θ)] 2 + var( θ) bias( θ) = E θ θ var( θ) = { 2 ] [ ] 2 E [ θ E θ]} = E [ θ2 E θ Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
13 Sophe s Performance For a fixed point x B l we want the bias of f = f 2 at x Consider ] ( 1 n l E [ f (x) = E h d+4 n { (4 + 5d) h 4 15h 2 tr(s l ) + 12h 2 (x t l ) T ( x l t l ) 4 + (x t l ) [ T 72S l + 18 diag(s l ) 15h 2 I ] (x t l ) }) Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
14 Some Expectation Calculations I We show that and so [ 12h 2 E h (x t l) T n ] l d+4 n ( x l t l ) [ nl ] E n ( x l t l ) = (y t l )f (y)dy B l = 12 h (x t l) T d+2 B l (y t l )f (y)dy then use a Taylor expansion of f about the bin centre t l f (y) = f (t l ) + (y t l )Df (t l ) (y t l) 2 D 2 f (t l ) (y t l) 3 D 3 f (t l ) + o ( y t l 3) Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
15 Some Expectation Calculations II The first non-zero integral gives [ 12 E h (x t l) T n ] l d+2 n ( x l t l ) (x t l ) T Df (t l ) ] We prove similar results for all terms contributing to E [ f (x)... and finally get Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
16 The Bias E[ f (x)] = f (t l ) + (x t l ) T Df (t l ) (x t l) 2 D 2 f (t l ) ( + h2 12 (x t l) T i f uii f ) uuu + o(h 3 ) 2 5 Taylor expansion of f about the bin centre t l f (x) = f (t l ) + (x t l )Df (t l ) (x t l) 2 D 2 f (t l ) (x t l) 3 D 3 f (t l ) + o ( x t l 3) so bias[ f (x)] depends on difference of 3 rd order derivatives Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
17 Moving on... and making some big leaps We have the following steps in the performance calculations 1 pointwise bias and variance MSE at f (x) 2 integrated squared bias and integrated variance of f over all x 3 finally some asymptotics when n We want to know how Fophe and Sophe depend on the sample size n, the binwidth h, and the dimension d Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
18 How Good are Fophe and Sophe Bias 2 Variance Rate of Convergence hist C H h 2 1 nh d kernel C K h 4 R(K) nh d fophe C F h 4 d + 1 nh d sophe C S h 6 (d + 1)(d + 2) 2nh d n 2/(d+2) n 4/(d+4) n 4/(d+4) n 6/(d+6) Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
19 Performance for 2, 1 and 1 Observations 5 x x x hist Fophe Sophe kernel Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
20 27,994 obs: Kernel est. takes 92 Sophe Inge Koch (UNSW, Adelaide) Poly Histograms 19 March 29 2 / 27
21 Advantages of Fophe and Sophe Computational advantages 1 a smaller number of bins is required 2 number of bins only needs to be approximately correct Sophe better than Fophe in visual and computational aspects use Sophe for data Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
22 Finding Modes with the Sophe 1 Fix binwidth h, # of bins ν bin, thresholds θ, and κ. 2 Find bins with high density. 1 Find n l in each bin, and discard bins that contain fewer than θ observations. Let B = {B l : n l > θ }. 2 Sort bins in B by # of observations, starting with largest. 3 Determine modes from B using (1) or (2) below. 1 For i, j = 1..., κ calculate pairwise distances (i,j) between the bin centres. For i consider the set of nearest neighbours nn (i) = { ( (i,j), n (j) ) : (i,j) h }. B (i) contains a mode, if n (i) is maximum over nn (i). 2 If matrix A (j) is negative definite, then B (j) contains a mode. Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
23 Look at the (Log-Data) 2 blood cells 4 blood cells 8 CD3 5 CD CD CD4 2 5 CD CD4 1 blood cells blood cells CD CD4 CD8 CD CD4 CD8 Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
24 Modes for 12-Dimensional Data Use 5 bins in each variable compare # of modes and % of non-empty bins # variables # modes # of bins % non-empty CDs 3,4, CDs 14, 19, all ,14, Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
25 The End J Jing, I Koch and K Naito (29). Polynomial Histograms for Multivariate Density and Mode Estimation preprint. Thank you Inge Koch (UNSW, Adelaide) Poly Histograms 19 March / 27
SOPHE: Second order polynomial histogram estimators for density estimation and clustering with applications to flow cytometry.
SOPHE: Second order polynomial histogram estimators for density estimation and clustering with applications to flow cytometry Inge Koch School of Mathematical Sciences The University of Adelaide and Australian
More informationNonparametric Regression. Changliang Zou
Nonparametric Regression Institute of Statistics, Nankai University Email: nk.chlzou@gmail.com Smoothing parameter selection An overall measure of how well m h (x) performs in estimating m(x) over x (0,
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationHistogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction
Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density
More informationUsing R in Undergraduate Probability and Mathematical Statistics Courses. Amy G. Froelich Department of Statistics Iowa State University
Using R in Undergraduate Probability and Mathematical Statistics Courses Amy G. Froelich Department of Statistics Iowa State University Undergraduate Probability and Mathematical Statistics at Iowa State
More information3 Nonparametric Density Estimation
3 Nonparametric Density Estimation Example: Income distribution Source: U.K. Family Expenditure Survey (FES) 1968-1995 Approximately 7000 British Households per year For each household many different variables
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationProving the central limit theorem
SOR3012: Stochastic Processes Proving the central limit theorem Gareth Tribello March 3, 2019 1 Purpose In the lectures and exercises we have learnt about the law of large numbers and the central limit
More informationNonparametric Regression. Badr Missaoui
Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y
More informationEstimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction
Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept
More informationQuantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management
Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.
More informationMotivational Example
Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend
More informationECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd
ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation Petra E. Todd Fall, 2014 2 Contents 1 Review of Stochastic Order Symbols 1 2 Nonparametric Density Estimation 3 2.1 Histogram
More informationIntroduction to Regression
Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric
More informationEconomics 241B Review of Limit Theorems for Sequences of Random Variables
Economics 241B Review of Limit Theorems for Sequences of Random Variables Convergence in Distribution The previous de nitions of convergence focus on the outcome sequences of a random variable. Convergence
More informationInformation Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n
Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua
More informationIntroduction to Regression
Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression
More informationIntroduction to Regression
Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression
More informationBickel Rosenblatt test
University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationDART_LAB Tutorial Section 5: Adaptive Inflation
DART_LAB Tutorial Section 5: Adaptive Inflation UCAR 14 The National Center for Atmospheric Research is sponsored by the National Science Foundation. Any opinions, findings and conclusions or recommendations
More informationDay 3B Nonparametrics and Bootstrap
Day 3B Nonparametrics and Bootstrap c A. Colin Cameron Univ. of Calif.- Davis Frontiers in Econometrics Bavarian Graduate Program in Economics. Based on A. Colin Cameron and Pravin K. Trivedi (2009,2010),
More informationGoodness-of-fit tests for the cure rate in a mixture cure model
Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationEstimation of Parameters
CHAPTER Probability, Statistics, and Reliability for Engineers and Scientists FUNDAMENTALS OF STATISTICAL ANALYSIS Second Edition A. J. Clark School of Engineering Department of Civil and Environmental
More informationDiscussion Paper No. 28
Discussion Paper No. 28 Asymptotic Property of Wrapped Cauchy Kernel Density Estimation on the Circle Yasuhito Tsuruta Masahiko Sagae Asymptotic Property of Wrapped Cauchy Kernel Density Estimation on
More informationDensity estimators for the convolution of discrete and continuous random variables
Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationBoundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta
Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density
More informationKernel Density Estimation
Kernel Density Estimation Univariate Density Estimation Suppose tat we ave a random sample of data X 1,..., X n from an unknown continuous distribution wit probability density function (pdf) f(x) and cumulative
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationLecture Stat Information Criterion
Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationAdaptive Nonparametric Density Estimators
Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where
More informationLecture 4: Law of Large Number and Central Limit Theorem
ECE 645: Estimation Theory Sring 2015 Instructor: Prof. Stanley H. Chan Lecture 4: Law of Large Number and Central Limit Theorem (LaTeX reared by Jing Li) March 31, 2015 This lecture note is based on ECE
More informationChapter 1. Density Estimation
Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationANOVA: Analysis of Variation
ANOVA: Analysis of Variation The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical
More informationDependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.
MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationOn variable bandwidth kernel density estimation
JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced
More informationCorrelation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University
Correlation, z-values, and the Accuracy of Large-Scale Estimators Bradley Efron Stanford University Correlation and Accuracy Modern Scientific Studies N cases (genes, SNPs, pixels,... ) each with its own
More informationData Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 17 K - Nearest Neighbor I Welcome to our discussion on the classification
More informationDESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA
Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,
More informationf (1 0.5)/n Z =
Math 466/566 - Homework 4. We want to test a hypothesis involving a population proportion. The unknown population proportion is p. The null hypothesis is p = / and the alternative hypothesis is p > /.
More informationA Practitioner s Guide to Cluster-Robust Inference
A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode
More informationO Combining cross-validation and plug-in methods - for kernel density bandwidth selection O
O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationunadjusted model for baseline cholesterol 22:31 Monday, April 19,
unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationNonparametric Function Estimation with Infinite-Order Kernels
Nonparametric Function Estimation with Infinite-Order Kernels Arthur Berg Department of Statistics, University of Florida March 15, 2008 Kernel Density Estimation (IID Case) Let X 1,..., X n iid density
More informationDIFFERENT KINDS OF ESTIMATORS OF THE MEAN DENSITY OF RANDOM CLOSED SETS: THEORETICAL RESULTS AND NUMERICAL EXPERIMENTS.
DIFFERENT KINDS OF ESTIMATORS OF THE MEAN DENSITY OF RANDOM CLOSED SETS: THEORETICAL RESULTS AND NUMERICAL EXPERIMENTS Elena Villa Dept. of Mathematics Università degli Studi di Milano Toronto May 22,
More informationRandom Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.
Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average
More informationDEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE
Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH
More informationLECTURE NOTE #3 PROF. ALAN YUILLE
LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.
More information1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College
1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More informationSanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore
AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department
More informationHeteroskedasticity and Autocorrelation Consistent Standard Errors
NBER Summer Institute Minicourse What s New in Econometrics: ime Series Lecture 9 July 6, 008 Heteroskedasticity and Autocorrelation Consistent Standard Errors Lecture 9, July, 008 Outline. What are HAC
More informationConvex Feasibility Problems
Laureate Prof. Jonathan Borwein with Matthew Tam http://carma.newcastle.edu.au/drmethods/paseky.html Spring School on Variational Analysis VI Paseky nad Jizerou, April 19 25, 2015 Last Revised: May 6,
More informationMarch 4, 2019 LECTURE 11: DIFFERENTIALS AND TAYLOR SERIES.
March 4, 2019 LECTURE 11: DIFFERENTIALS AND TAYLOR SERIES 110211 HONORS MULTIVARIABLE CALCULUS PROFESSOR RICHARD BROWN Synopsis Herein, we give a brief interpretation of the differential of a function
More informationIntroduction to Regression
Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures
More informationANOVA Analysis of Variance
ANOVA Analysis of Variance ANOVA Analysis of Variance Extends independent samples t test ANOVA Analysis of Variance Extends independent samples t test Compares the means of groups of independent observations
More informationCritical Values for the Test of Flatness of a Histogram Using the Bhattacharyya Measure.
Tina Memo No. 24-1 To Appear in; Towards a Quantitative mehtodology for the Quantitative Assessment of Cerebral Blood Flow in Magnetic Resonance Imaging. PhD Thesis, M.L.J.Scott, Manchester, 24. Critical
More informationAutonomous Navigation for Flying Robots
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.2: Kalman Filter Jürgen Sturm Technische Universität München Motivation Bayes filter is a useful tool for state
More informationBias Reduction when Data are Rounded
Bias Reduction when Data are Rounded Christopher S. Withers & Saralees Nadarajah First version: 10 December 2013 Research Report No. 12, 2013, Probability and Statistics Group School of Mathematics, The
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationRegression: Lecture 2
Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and
More informationToday: Fundamentals of Monte Carlo
Today: Fundamentals of Monte Carlo What is Monte Carlo? Named at Los Alamos in 1940 s after the casino. Any method which uses (pseudo)random numbers as an essential part of the algorithm. Stochastic -
More informationIntroduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β
Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured
More informationApplication of Copulas as a New Geostatistical Tool
Application of Copulas as a New Geostatistical Tool Presented by Jing Li Supervisors András Bardossy, Sjoerd Van der zee, Insa Neuweiler Universität Stuttgart Institut für Wasserbau Lehrstuhl für Hydrologie
More informationWeb-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach
1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and
More informationMaking Models with Polynomials: Taylor Series
Making Models with Polynomials: Taylor Series Why polynomials? i a 3 x 3 + a 2 x 2 + a 1 x + a 0 How do we write a general degree n polynomial? n a i x i i=0 Why polynomials and not something else? We
More informationThe Time Complexity of an Algorithm
CSE 3101Z Design and Analysis of Algorithms The Time Complexity of an Algorithm Specifies how the running time depends on the size of the input. Purpose To estimate how long a program will run. To estimate
More informationOn corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR
Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional
More informationSome Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model
Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;
More informationHistogram density estimators based upon a fuzzy partition
Statistics and Probability Letters www.elsevier.com/locate/stapro Histogram density estimators based upon a fuzzy partition Kevin Loquin, Olivier Strauss Laboratoire d Informatique, de Robotique et de
More information1. Point Estimators, Review
AMS571 Prof. Wei Zhu 1. Point Estimators, Review Example 1. Let be a random sample from. Please find a good point estimator for Solutions. There are the typical estimators for and. Both are unbiased estimators.
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationWeek 9 The Central Limit Theorem and Estimation Concepts
Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population
More informationBIOE 198MI Biomedical Data Analysis. Spring Semester Lab 5: Introduction to Statistics
BIOE 98MI Biomedical Data Analysis. Spring Semester 209. Lab 5: Introduction to Statistics A. Review: Ensemble and Sample Statistics The normal probability density function (pdf) from which random samples
More informationNonparametric Density Estimation
Nonparametric Density Estimation Advanced Econometrics Douglas G. Steigerwald UC Santa Barbara D. Steigerwald (UCSB) Density Estimation 1 / 20 Overview Question of interest has wage inequality among women
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationSimple and Efficient Improvements of Multivariate Local Linear Regression
Journal of Multivariate Analysis Simple and Efficient Improvements of Multivariate Local Linear Regression Ming-Yen Cheng 1 and Liang Peng Abstract This paper studies improvements of multivariate local
More informationIntroduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data
Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation
More informationA Rigorous Implementation of Taylor Model-Based Integrator in Chebyshev Basis Representation 1
A Rigorous Implementation of Taylor Model-Based Integrator in Chebyshev Basis Representation 1 Tomáš Dzetkulič Institute of Computer Science Czech Academy of Sciences December 15, 2011 1 This work was
More informationNonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction
Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Univariate Kernel Regression The relationship between two variables, X and Y where m(
More informationAssignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran
Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Let A m n be a matrix of real numbers. The matrix AA T has an eigenvector x with eigenvalue b. Then the eigenvector y of A T A
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationChapter 7: Variances. October 14, In this chapter we consider a variety of extensions to the linear model that allow for more gen-
Chapter 7: Variances October 14, 2018 In this chapter we consider a variety of extensions to the linear model that allow for more gen- eral variance structures than the independent, identically distributed
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationAn Introduction to Multivariate Methods
Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate
More informationJordan normal form notes (version date: 11/21/07)
Jordan normal form notes (version date: /2/7) If A has an eigenbasis {u,, u n }, ie a basis made up of eigenvectors, so that Au j = λ j u j, then A is diagonal with respect to that basis To see this, let
More informationE[X n ]= dn dt n M X(t). ). What is the mgf? Solution. Found this the other day in the Kernel matching exercise: 1 M X (t) =
Chapter 7 Generating functions Definition 7.. Let X be a random variable. The moment generating function is given by M X (t) =E[e tx ], provided that the expectation exists for t in some neighborhood of
More information