Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Size: px
Start display at page:

Download "Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo"

Transcription

1 Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University July 1st, 2006

2 Outline Motivation Research background Rodeo is a general strategy for nonparametric inference. It has been successfully applied to solve sparse nonparametric regression problems in high dimensions by Lafferty & Wasserman, Our goal Trying to adapt the rodeo framework to nonparametric density estimation problems. So that we have a unified framework for both density estimation and regression problems which is computationally efficient and theoretically soundable

3 Outline Outline 1 Background Nonparametric density estimation in high dimensions Sparsity assumptions for density estimation 2 The main idea The local rodeo algorithm for the kernel density estimator 3 The asymptotic running time and minimax risk 4 The global density rodeo and the reverse density rodeo Using other distributions as irrelevant dimensions 5 Empirical results on both synthetic and real-world datasets

4 Problem statement Nonparametric density estimation in high dimensions Sparsity assumptions and the rodeo framework Problem To estimate the joint density of a continuous d-dimensional random vector X = (X 1, X 2,..., X d ) F, d 3 where F is the unknown distribution with density function f(x). This problem is essentially hard, since the high dimensionality causes both computational and theoretical problems.

5 Previous work Background Nonparametric density estimation in high dimensions Sparsity assumptions and the rodeo framework From a frequentist perspective Kernel density estimation and the local likelihood method Projection pursuit method Log-spline models and the penalized likelihood method From a Bayesian perspective Mixture of normals with Dirichlet processes as prior Difficulties of current approaches Some methods only work well for low-dimensional problems Some heuristics lack the theoretical guarantees More importantly, they suffer from the curse of dimensionality

6 The curse of dimensionality Nonparametric density estimation in high dimensions Sparsity assumptions and the rodeo framework Characterizing the curse In a Sobolev space of order k, minimax theory shows that the best convergence rate for the mean squared error is R opt = O (n 2k/(2k+d)) which is practically slow when the dimension d is large. Combating the curse by some sparsity assumptions If the high-dimensional data has a low dimensional structure or a sparsity condition, we expect that some methods could be developed to combat the curse of dimensionality. This motivates the development of the rodeo framework

7 Rodeo for nonparametric regression (I) Nonparametric density estimation in high dimensions Sparsity assumptions and the rodeo framework Rodeo (regularization of derivative expectation operator) is a general strategy for nonparametric inference. Which has been used for nonparametric regression For a regression problem Y i = m(x i ) + ɛ i, i = 1,..., n where X i = (X i1,..., X id ) R d is a d-dimensional covariate. If m is in a d-dimensional Sobolev space of order 2, the best convergence rate for the risk is R = O (n 4/(4+d)) Which shows the curse of dimensionality in a regression setting.

8 Rodeo for nonparametric regression (II) Nonparametric density estimation in high dimensions Sparsity assumptions and the rodeo framework Assume the true function only depends on r covariates (r d) m(x) = m(x 1,..., x r ) for any ɛ > 0, the rodeo can simultaneously perform bandwidth selection and (implicitly) variable selection to achieve a better minimax convergence rate of R rodeo = O (n 4/(4+r)+ɛ) as if the r relevant variables were explicitly isolated in advance. Rodeo beats the curse of dimensionality in this sense. We expect to apply the same idea to solve density estimation problems.

9 Sparse density estimation Nonparametric density estimation in high dimensions Sparsity assumptions and the rodeo framework For many applications, the true density function can be characterized by some low dimensional structure Sparsity assumption for density estimation problems Assume h jj (x) is the second partial derivative of h on the j-th varaible, there exists some r d, such that f(x) g(x 1,..., x r )h(x) where h jj (x) = 0 for j = 1,..., d. Where x R = {x 1,..., x r } are the relevant dimensions. This condition imposes that h( ) belongs to a family of very smooth functions (e.g. the uniform distribution). h( ) can be generalized to be any parametric distribution!

10 Generalized sparse density estimation Nonparametric density estimation in high dimensions Sparsity assumptions and the rodeo framework We can generalize h( ) to other distributions (e.g. Gaussian). General sparsity assumption for density estimation problems Assume h( ) is any distribution (e.g. Gaussian) that we are not interested in f(x) g(x 1,..., x r )h(x) where r d. Thus, the density function f( ) can be factored into two parts: the relevant components g( ) and the irrelevant components h( ). Where x R = {x 1,..., x r } are the relevant dimensions. Under this framework, we can hope to achieve a better minimax rate R rodeo = O (n 4/(4+r))

11 Related work Background Nonparametric density estimation in high dimensions Sparsity assumptions and the rodeo framework Recent work that addressed this problem Minimum volume set (Scott& Nowak, JMLR06) Nongaussian component analysis; (Blanchard et al. JMLR06) Log-ANOVA model; (Lin & Joen, Statistical Sinica 2006) Advantages of our approach: Rodeo can utilize well-established nonparametric estimators A unified framework for different kinds of problems easy to implement and is amenable to theoretical analysis

12 Density rodeo: the main idea The main idea The local rodeo algorithm for the kernel density estim The key intuition: if a dimension is irrelevant, then changing the smoothing parameter of that dimension should only result in a small change in the whole estimator Basically, Rodeo is just a regularization strategy Use a kernel density estimator start with large bandwidths Calculate the gradient of the estimator w.r.t. the bandwidth Sequentially decrease the bandwidths in a greedy way, and try to freeze this decay process by some thresholding strategy to achieve a sparse solution

13 Density rodeo: the main idea The main idea The local rodeo algorithm for the kernel density estim Assuming a fixed point x and let ˆf H (x) denote an estimator of f(x) based on smoothing parameter matrix H = diag(h 1,..., h d ) Let M(h) = E( ˆf h (x)) denote the mean of ˆf h (x), therefore, f(x) = M(0) = E( ˆf 0 (x)). Assuming P = {h(t) : 0 t 1} is a smooth path through the set of smoothing parameters with h(0) = 0 and h(1) = 1, then 1 f(x) = M(1) (M(1) M(0)) = M(1) where D(h) = M(h) = ḣ(s) = dh(s) ds = M(1) dm(h(s)) ds ds D(s), ḣ(s) ds ( M h 1,..., M h d ) T is the gradient of M(h) and is the derivative of h(s) along the path.

14 Density rodeo: the main idea The main idea The local rodeo algorithm for the kernel density estim A biased, low variance estimator of M(1) is ˆf 1 (x). An unbiased estimator of D(h) is Z(h) = ( ˆf H (x),..., ˆf ) T H (x) h 1 h d This naive estimator has poor risk due to the large variance of Z(h) for small bandwidth. However, the sparsity assumption on f suggests that there should be paths for which D(h) is also sparse. Along such a path, Z(h) could be replaced with an estimator ˆD(h) that makes use of the sparsity assumption. Then, the estimate of f(x) becomes f H (x) = ˆf 1 (x) 1 0 ˆD(s), ḣ(s) ds

15 Kernel density estimator The main idea The local rodeo algorithm for the kernel density estim Let ˆf H (x) represents the kernel density estimator of f(x) with a bandwidth matrix H. assuming that K is a standard symmetric kernel, s.t. K(u)du = 1, uk(u)du = 0 d while K H ( ) = 1 det(h) K(H 1 ) represents the kernel with bandwidth matrix H = diag(h 1,..., h d ). ˆf H (x) = = 1 n 1 n det(h) n d i=i j=1 n K(H 1 (x X i )) i=1 1 h j K Here, we assume that K is a product kernel. ( ) xj X ij h j

16 The main idea The local rodeo algorithm for the kernel density estim The rodeo statistics for the kernel density estimator The density estimation Rodeo is based on the statistics Z j = ˆf H (x) h j ( ) xj X = 1 n K ij h j d ( ) xk X ( ) ik K 1 n xj X i=1 K ij h k n h j k=1 For the conditional variance term s 2 j = Var(Z j X 1,..., X n ) ( ) 1 n = Var Z ji X 1,..., X n = 1 n n Var(Z j1 X 1,..., X n ) i=1 Here, we used the sample variance of the Z ji to estimate Var(Z j1 ). n i=1 Z ji

17 Density rodeo algorithms The main idea The local rodeo algorithm for the kernel density estim Density Rodeo: Hard thresholding version 1. Select parameter 0 < β < 1 and initial bandwidth h 0, where h 0 = c 0 /log log n for some constant c 0. Let c n be a sequence, s.t. c n = O(1). 2. Initialize the bandwidths, and activate all dimensions: (a) h j = h 0, j = 1, 2,..., d. (b) A = {1, 2,..., d}. 3. While A is nonempty, do for each j A (a) Compute the derivative and variance: Z j and s j. (b) Compute the threshold λ j = s j 2 log(ncn ). (c) If Z j > λ j, then set h j βh j ; Otherwise remove j from A. 4. Output bandwidths h and the estimator f(x) = ˆf H (x)

18 The purpose of the analysis The analysis is trying to characterize the asymptotic aspects of the selected bandwidths the asymptotic running time (efficiency) the convergence rate of the risk (accuracy) To make the theoretical results more realistic, a key aspect of our analysis is that we allow the dimension d to increase with the sample size n! For this, we need to make some assumptions about the unknown density function to take the increasing dimension into account.

19 Assumptions Background (A1) Kernel assumption: uu T K(u)du = v 2 I d and v 2 < and K 2 (u)du = R(K) < (A2) Dimension assumption: d log d = O(log n) (A3) Initial bandwidth assumption: h (0) j = c 0 /log log n for (j = 1,..., d) Combing with A2, this implies that lim n n d j=1 h(0) j = (A4) Sparsity assumption: f(x) g(x 1,..., x r )h(x) where h jj (x) = 0, and satisfies r = O(1) (A5) Hessian assumption: tr(h T R (u)h R (u))du <, and f jj (x) 0 for j = 1, 2,..., r where H R (u) represents the Hessian for the relevant dimensions

20 Derivatives of both relevant and irrelevant dimensions Key Lemma: Under assumptions A1 A5, suppose that x is interior to the support of f. Suppose that K is a product kernel with bandwidth matrix H (s) = diag(h (s) 1,..., h(s) d ). Then µ (s) j = h (s) j For j R we have µ (s) j = h (s) j E[ ˆf H (s)(x) f(x)] = o P (h (s) j ) for all j R c E[ ˆf H (s)(x) f(x)] = h (s) j v 2 f jj (x) + o P (h (s) j ). Thus, for any integer s > 0, h s = h 0 β s, each j > r satisfies µ (s) j = o P (h (s) j ) = o P (h (0) j ).

21 Main theorem Background Main Theorem: Suppose that r = O(1) and (A1) (A5) hold. In addition, suppose that A min = min j r f jj (x) = Ω(1) and A max = max j r f jj (x) = Õ(1). Then, for every ɛ > 0, the number of iterations T n until the Rodeo stops satisfies ( 1 P 4 + r log 1/β(n 1 ɛ a n ) T n 1 ) 4 + r log 1/β(n 1+ɛ b n ) 1 where a n = Ω(1) and b n = Õ(1). More over, the algorithm outputs bandwidths h that satisfy ( ) P h j = h (0) j for all j > r 1 Also, we have ( ) P h (0) j (nb n ) 1/(4+r) ɛ h j h (0) j (na n ) 1/(4+r)+ɛ for all j r 1 assuming that h (0) j is defined as in A3.

22 Convergence rate of the risk Theorem 2 Under the same condition of the main theorem, the risk R h of the density Rodeo estimator satisfies for every ɛ > 0. R h = ÕP (n 4/(4+r)+ɛ) We write Y n = ÕP (a n ) to mean that Y n = O(b n a n ) where b n is logarithmic in n. As noted earlier, we write a n = Ω(b n ) if lim inf n a n bn > 0; similarly an = Ω(b n ) if a n = Ω(b n c n ) where c n is logarithmic in n.

23 Possible extensions Different versions of the density rodeo algorithm Using other distributions as irrelevant dimensions The density rodeo algorithm could be extended in many ways the soft-thresholding version the global version the reverse version the bootstrapping version the rodeo algorithm for local linear density estimator the greedier version using other distributions as irrelevant

24 Global density rodeo Different versions of the density rodeo algorithm Using other distributions as irrelevant dimensions The idea is by averaging the test statistics for multiple evaluation points x 1,..., x k sampled from the empirical distribution. To avoid the cancellation problem, the statistic is squared, let Z j (x i ) represents the derivative for the i-th evaluation point with respect to the bandwidth h j. We define the test statistic while s j = T j = 1 m m Zj 2 (x k ), k=1 j = 1,..., d Var(T j ) = 1 2tr(Cj 2 m ) + 4 ˆµ j T C j ˆµ j where ˆµ = 1 m m i=1 Z j(x i ). The threshold is λ j = s 2 j + 2s j log(ncn )

25 The bootstrapping version Different versions of the density rodeo algorithm Using other distributions as irrelevant dimensions Instead of derive the expression explicitly, bootstrapping can be used to evaluate the variance of Z j. Bootstrapping Method to calculate the s 2 j 1. Draw a sample X 1,..., X n of size n, with replacement: Loop for i = 1,..., B, Compute the estimate Z ji from data X 1,..., X n 2. Compute the bootstrapped variance s 2 j = 1 B B b=1 (Z ji Z j ) 2. where Zj = 1 B B b=1 Ẑ j 3. Output the resulted s 2 j.

26 The reverse and greedier version Different versions of the density rodeo algorithm Using other distributions as irrelevant dimensions Reverse density Rodeo: Instead of using a sequence of decreasing bandwidths. On the contrary, we could begin from a very small bandwidth, and use a sequence of increasing bandwidths to estimate the optimal value Greedier density rodeo: Instead of decaying all the bandwidths, only the bandwidths associated the largest Z j /λ j quantities is decayed Hybrid density Rodeo: Different variations could be combined arbitrarily

27 Different versions of the density rodeo algorithm Using other distributions as irrelevant dimensions Using other distributions as irrelevant dimensions We can use a general parametric distribution h(x) as irrelevant dimensions. The key trick is that a new semi-parametric density estimator will be used ˆf H (x) = ĥ(x) n i=1 K H(X i x) n K H (u x)ĥ(u)du where ĥ(x) is a parametric density estimator at point x. The motivation of this estimator comes from local likelihood method (Loader 1996). We see that, for one dimensional case, starting from a large bandwidth, if the true function is h(x), the algorithm will tend to freeze the bandwidth decaying process immediately.

28 Experimental design The density rodeo algorithms were applied on both synthetic and real data, including one-dimensional, two-dimensional, high-dimensional and very high-dimensional examples to measure the distance between the estimated density function with the true density function. The Hellinger distance are used ( D( ˆf f) = ˆf(x) 2 ˆf(x) f(x)) dx = 2 2 f(x) f(x) dx Assuming that we have altogether m evaluation points, then the hellinger distance could be numerically calculated by the Monte Carlo integration D( ˆf f) 2 2 m ˆfH (X i ) m f(x i ) i=1

29 One-dimensional example: strongly skewed density Strongly skewed density: We simulated 200 samples from the skewed distribution. The boxplot of the Hellinger distance is produced based on 100 simulations. Strongly skewed This density is chosen to resemble to lognormal distribution, it distributes as ( 7 ( 1 X 8 N 3 ( 2 ) ( ) ) 2i 2 3 )i 1, 3 i=0 The true density is plotted density x

30 Strongly skewed density: result Unbiased Cross Validation Local kernel density Rodeo density density True density Unbiased C.V x, bandwidth = Global kernel density Rodeo True density Global Rodeo density bandwidth True density Local Rodeo x Bandwidth estimated Unibased CV Local Rodeo Global Rodeo x, bandwidth = 0.05 x

31 Two-dimensional example: Combined Beta and Uniform Combined Beta distribution with unifrom distribution as irrelevant dimension. We simulate a 2-dimensional dataset with n = 500 points. true relevant dimension The two dimensions are independently generated as X Beta(1, 2) + 1 Beta(10, 10) 3 X 2 Uniform(0, 1) The true density for the relevant dimension is density x, bandwidth =

32 Density Density Background Combined Beta and Uniform: result The first plot is the rodeo result, the second plot is the result fitted by the built-in function KDE2d (MASS package in R) relevant dimension irrelevant dimension relevant dimension irrelevant dimension

33 Combined Beta and Uniform: marginal distribution True relevant density True irrelevant dimension density density x x relevant by Rodeo irrelvant by Rodeo density relevant dimension density irrelevant dimension Numerically integrated marginal distributions based on the perspective plots of the two estimators (not normalized). relevant by KDE2d irrelevant by KDE2d density density relevant dimension irrelevant dimension

34 Two-dimensional example: geyser data Geyser Data: A version of the eruptions data from the Old Faithful geyser in Yellowstone National Park, Wyoming. ( Azzalini and Bowman 1990) and is of continuous measurement from August 1 to August 15, There are altogether n = 299 samples and two variables. Duration = the numeric eruption time in minutes. waiting = the waiting time to next eruption.

35 Density first dimension Density first dimension Background Geyser data: result The first plot is the rodeo result, the second plot is the result fitted by the built-in function KDE2d (MASS package in R) second dimension second dimension

36 Geyser data: contour plot The first plot is the contour plot fitted by the built-in function KDE2d (MASS package in R), the second one is fitted by the rodeo algorithm second dimension second dimension first dimension first dimension

37 High dimensional example 30-dimensional example: We generate 30-dimensional synthetic dataset with r = 5 relevant dimensions (n = 100, with 30 trials). The relevant dimensions are generated as X i N (0.5, (0.02i) 2 ), for i = 1,..., 5 while the irrelevant dimensions are generated as X i Uniform(0, 1), for i = 6,..., 30 The test point is x = ( 1 2,..., 1 2 ).

38 30-dimensional example: result The Rodeo path for the 30-dimensional synthetic dataset and the boxplot of the selected bandwidths for 30 trials Bandwidth selected bandwidths Rodeo Step X1 X4 X7 X10 X14 X18 X22 X26 X30 dimensions

39 Very high dimensional example: image processing The algorithm was run on 2200 grayscale images of 1s and 2s, each with 256 = pixels with some unknown background noise; thus this is a 256-dimensional density estimation problem. A test point and the output bandwidth plot are shown here

40 Image processing example: evolution plot The output bandwidth plots sampled at the Rodeo step 10, 20, 30, 40, 50, 60, 70, 80, 90, 100. Which visualizes the evolution of the bandwidths and could be viewed as a dynamic process for feature selection the earlier a dimension s bandwidth decays, the more informative it is

41 A example using Gaussian as irrelevant Using Gaussian as irrelevant dimensions: We apply the semiparametric rodeo algorithm on both 15-dimensional and 20-dimensional synthetic datasets with r = 5 relevant dimensions (n = 1000). When using gaussian distributions as irrelevant dimensions, the relevant dimensions are generated as X i Uniform(0, 1), for i = 1,..., 5 while the irrelevant dimensions are generated as X i N (0.5, (0.05i) 2 ), for i = 6,..., d The test point is x = ( 1 2,..., 1 2 ).

42 Using Gaussian as irrelevant: result The Rodeo path for the 15-dimensional synthetic data(left) and for the 20-dimensional data (Right) Bandwidth Bandwidth Rodeo Step Rodeo Step

43 Using Gaussian as irrelevant: one-dimensional case (I) True density fitted by Rodeo density density bandwidth x Bandwidth estimated Density x By unbiased CV 1000 one-dimensional data points with x i Uniform(0, 1) x N = 1000 Bandwidth =

44 Using Gaussian as irrelevant: one-dimensional case (II) Gaussian fitted by Rodeo density bandwidth density x Bandwidth estimated C x correction factor 1000 one-dimensional data points with x i N (0, 1) x test

45 Summary Background This work adapts the general rodeo framework to solve density estimation problems The sparsity assumption is crucial to guarantee the success of the density rodeo algorithm The density rodeo algorithm is efficient on high-dimensional problems both theoretically and empirically. The rodeo framework can utilize current available density estimators, the implementation is simple Future work: develop the rodeo algorithms for the other kinds of problems, e.g. classification.

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Sparse Additive Functional and kernel CCA

Sparse Additive Functional and kernel CCA Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis

More information

Bandwidth selection for kernel conditional density

Bandwidth selection for kernel conditional density Bandwidth selection for kernel conditional density estimation David M Bashtannyk and Rob J Hyndman 1 10 August 1999 Abstract: We consider bandwidth selection for the kernel estimator of conditional density

More information

Estimation for nonparametric mixture models

Estimation for nonparametric mixture models Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Nonparametric Density Estimation (Multidimension)

Nonparametric Density Estimation (Multidimension) Nonparametric Density Estimation (Multidimension) Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann February 19, 2007 Setup One-dimensional

More information

Kernel density estimation

Kernel density estimation Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms

More information

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Local regression, intrinsic dimension, and nonparametric sparsity

Local regression, intrinsic dimension, and nonparametric sparsity Local regression, intrinsic dimension, and nonparametric sparsity Samory Kpotufe Toyota Technological Institute - Chicago and Max Planck Institute for Intelligent Systems I. Local regression and (local)

More information

STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA

STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY 0 Final Report DISTRIBUTION A: Distribution approved for public

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Introduction to Regression

Introduction to Regression Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation Univariate Density Estimation Suppose tat we ave a random sample of data X 1,..., X n from an unknown continuous distribution wit probability density function (pdf) f(x) and cumulative

More information

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Distribution-Free Distribution Regression

Distribution-Free Distribution Regression Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Computation of an efficient and robust estimator in a semiparametric mixture model

Computation of an efficient and robust estimator in a semiparametric mixture model Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Semi-Supervised Learning by Multi-Manifold Separation

Semi-Supervised Learning by Multi-Manifold Separation Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

Markov Chain Monte Carlo Lecture 4

Markov Chain Monte Carlo Lecture 4 The local-trap problem refers to that in simulations of a complex system whose energy landscape is rugged, the sampler gets trapped in a local energy minimum indefinitely, rendering the simulation ineffective.

More information

Why experimenters should not randomize, and what they should do instead

Why experimenters should not randomize, and what they should do instead Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Approximate Message Passing

Approximate Message Passing Approximate Message Passing Mohammad Emtiyaz Khan CS, UBC February 8, 2012 Abstract In this note, I summarize Sections 5.1 and 5.2 of Arian Maleki s PhD thesis. 1 Notation We denote scalars by small letters

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Semi-Parametric Importance Sampling for Rare-event probability Estimation Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Is the test error unbiased for these programs? 2017 Kevin Jamieson

Is the test error unbiased for these programs? 2017 Kevin Jamieson Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Dreem Challenge report (team Bussanati)

Dreem Challenge report (team Bussanati) Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee Predictive

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Frontier estimation based on extreme risk measures

Frontier estimation based on extreme risk measures Frontier estimation based on extreme risk measures by Jonathan EL METHNI in collaboration with Ste phane GIRARD & Laurent GARDES CMStatistics 2016 University of Seville December 2016 1 Risk measures 2

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Variable Selection in High Dimensional Convex Regression

Variable Selection in High Dimensional Convex Regression Sensing and Analysis of High-Dimensional Data UCL-Duke Workshop September 4 & 5, 2014 Variable Selection in High Dimensional Convex Regression John Lafferty Department of Statistics & Department of Computer

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Structure Learning: the good, the bad, the ugly

Structure Learning: the good, the bad, the ugly Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform

More information

Sparse Additive machine

Sparse Additive machine Sparse Additive machine Tuo Zhao Han Liu Department of Biostatistics and Computer Science, Johns Hopkins University Abstract We develop a high dimensional nonparametric classification method named sparse

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information