Compressive Inference
|
|
- Esmond Hopkins
- 6 years ago
- Views:
Transcription
1 Compressive Inference Weihong Guo and Dan Yang Case Western Reserve University and SAMSI SAMSI transition workshop Project of Compressive Inference subgroup of Imaging WG Active members: Garvesh Raskutti, Jiayang Sun and Grace Yi Wang May 22, 2013 G & Y (CWRU & SAMSI) Compressive inference 1 / 20
2 Outline 1 Compressive Sensing 2 Compressive Inference 3 Method 4 Simulation 5 Conclusion G & Y (CWRU & SAMSI) Compressive inference 2 / 20
3 Compressive Sensing Why: Too long/expensive to collect data, too much space to store, too much time to analyze/retrieve information, and increases the risk of developing cancer in medical application. A volume of human brain scans. 175 slices. (Courtesy: oasis-brains.org.) G & Y (CWRU & SAMSI) Compressive inference 3 / 20
4 Traditional v.s. Compressive Sensing (CS) Fourier Domain Image Domain Traditional Compressive G & Y (CWRU & SAMSI) Compressive inference 4 / 20
5 Formulation Continuous Setup: f : X R is the intensity (or difference) function of an image. X R d is the ROI. Example: [0, 1] or [0, 1] 2 G & Y (CWRU & SAMSI) Compressive inference 5 / 20
6 Formulation Continuous Setup: f : X R is the intensity (or difference) function of an image. X R d is the ROI. Example: [0, 1] or [0, 1] 2 Discrete Setup: f = (f (x 1 ), f (x 2 ),, f (x p)) Example: difference of intensities at grid ( 1, 2,..., 1) for grayscale images p p G & Y (CWRU & SAMSI) Compressive inference 5 / 20
7 Formulation Continuous Setup: f : X R is the intensity (or difference) function of an image. X R d is the ROI. Example: [0, 1] or [0, 1] 2 Discrete Setup: f = (f (x 1 ), f (x 2 ),, f (x p)) Example: difference of intensities at grid ( 1, 2,..., 1) for grayscale images p p Compressive sensing: Observe: y = Af + ɛ where A R nxp is a sampling matrix with n p, satisfying RIP ɛ N(0, σ 2 I n) Goal: recover f or make inference about f G & Y (CWRU & SAMSI) Compressive inference 5 / 20
8 Formulation Continuous Setup: f : X R is the intensity (or difference) function of an image. X R d is the ROI. Example: [0, 1] or [0, 1] 2 Discrete Setup: f = (f (x 1 ), f (x 2 ),, f (x p)) Example: difference of intensities at grid ( 1, 2,..., 1) for grayscale images p p Compressive sensing: Observe: y = Af + ɛ where A R nxp is a sampling matrix with n p, satisfying RIP ɛ N(0, σ 2 I n) Goal: recover f or make inference about f Comparison with statistical latent variable model y = f + ɛ z = Ay + γ G & Y (CWRU & SAMSI) Compressive inference 5 / 20
9 Example Low dimensional information retrieval (Courtesy: sadies-brain-tumor.org) G & Y (CWRU & SAMSI) Compressive inference 6 / 20
10 Compressive Inference Recall: given data y = Af + ɛ, make reference about f from y. Examples: 1 Test H 0 : f (x) = 0, x X - Applicable in comparison of images - Aside: Find the support of f, i.e., {x : f (x) 0} - Multiple hypothesis testing G & Y (CWRU & SAMSI) Compressive inference 7 / 20
11 Compressive Inference Recall: given data y = Af + ɛ, make reference about f from y. Examples: 1 Test H 0 : f (x) = 0, x X - Applicable in comparison of images - Aside: Find the support of f, i.e., {x : f (x) 0} - Multiple hypothesis testing 2 Test H 0 : f = 0 vs H 1 : f = s, where s is known or unknown - Davenport et al. (2006, 2010), Duarte et al. (2006) - Single hypothesis testing - Likelihood or Hotelling T 2 G & Y (CWRU & SAMSI) Compressive inference 7 / 20
12 Compressive Inference Recall: given data y = Af + ɛ, make reference about f from y. Examples: 1 Test H 0 : f (x) = 0, x X - Applicable in comparison of images - Aside: Find the support of f, i.e., {x : f (x) 0} - Multiple hypothesis testing 2 Test H 0 : f = 0 vs H 1 : f = s, where s is known or unknown - Davenport et al. (2006, 2010), Duarte et al. (2006) - Single hypothesis testing - Likelihood or Hotelling T 2 3 Test H 0i : f i = 0 vs H 1i : f i 0 - Buhlmann (2012) - Multiple hypothesis testing - Sparsity (no function perspective) - Combination of LASSO and Ridge - Discrete conservative G & Y (CWRU & SAMSI) Compressive inference 7 / 20
13 Compressive Inference Recall: given data y = Af + ɛ, make reference about f from y. Examples: 1 Test H 0 : f (x) = 0, x X - Applicable in comparison of images - Aside: Find the support of f, i.e., {x : f (x) 0} - Multiple hypothesis testing 2 Test H 0 : f = 0 vs H 1 : f = s, where s is known or unknown - Davenport et al. (2006, 2010), Duarte et al. (2006) - Single hypothesis testing - Likelihood or Hotelling T 2 3 Test H 0i : f i = 0 vs H 1i : f i 0 - Buhlmann (2012) - Multiple hypothesis testing - Sparsity (no function perspective) - Combination of LASSO and Ridge - Discrete conservative Key: CI takes advantage of compressibility of smooth image (continuous f ), so that complex information can be obtained from small amount information y G & Y (CWRU & SAMSI) Compressive inference 7 / 20
14 Method Smooth assumption of f Estimation of f by kernel ridge regression Tube method on compressed sensing G & Y (CWRU & SAMSI) Compressive inference 8 / 20
15 Smoothness Assumption Need to impose assumptions on class of difference of images function f. Eg. Polynomial, Lipschitz, smoothing spline. Special cases of Reproducing Kernel Hilbert Spaces (RKHS). G & Y (CWRU & SAMSI) Compressive inference 9 / 20
16 Smoothness Assumption Need to impose assumptions on class of difference of images function f. Eg. Polynomial, Lipschitz, smoothing spline. Special cases of Reproducing Kernel Hilbert Spaces (RKHS). Important properties: Mercer s theorem: K : X X R K(x, x ) = λ k φ k (x)φ k (x ) k=1 Statistical complexity decay of eigenvalues G & Y (CWRU & SAMSI) Compressive inference 9 / 20
17 Example Lipschitz Kernel K(x, x ) = min{x, x } Function class {f : f L 2, f L 2 } Corresponds to Sobolev class with smoothness α = 1 Other examples G & Y (CWRU & SAMSI) Compressive inference 10 / 20
18 From Assumption to Penalty Expansion for f in RKHS K(x, x ) = λ k φ k (x)φ k (x ) k=1 f (x) = a k φ k (x) k=1 Hilbert ball of radius ρ B H(ρ) = {f : f 2 H = k a 2 k λ k ρ 2 } G & Y (CWRU & SAMSI) Compressive inference 11 / 20
19 From Assumption to Penalty Expansion for f in RKHS K(x, x ) = λ k φ k (x)φ k (x ) k=1 f (x) = a k φ k (x) k=1 Hilbert ball of radius ρ B H(ρ) = {f : f 2 H = k a 2 k λ k ρ 2 } Kernel Ridge Regression min y Af λ f 2 H min y AΦa λa T Λ 1 a where (Φ) ik = φ k (x i ), Λ = diag(λ 1, λ 2,...), a = (a 1, a 2,...) T. G & Y (CWRU & SAMSI) Compressive inference 11 / 20
20 Estimator Minimizer ˆfλ (x) = κ(x) T A T (AK A T + λi) 1 y where (K ) ij = K(x i, x j ), κ(x) = (K(x, x 1 ),..., K(x, x n)) T G & Y (CWRU & SAMSI) Compressive inference 12 / 20
21 Estimator Minimizer ˆfλ (x) = κ(x) T A T (AK A T + λi) 1 y def. = l(x), y where (K ) ij = K(x i, x j ), κ(x) = (K(x, x 1 ),..., K(x, x n)) T Linear in y G & Y (CWRU & SAMSI) Compressive inference 12 / 20
22 SCR via Tube Method Confidence bands (ˆf (x) ± cˆσ l(x) ) G & Y (CWRU & SAMSI) Compressive inference 13 / 20
23 SCR via Tube Method Confidence bands (ˆf (x) ± cˆσ l(x) ) Simultaneous: α = P( ˆf (x) f (x) > cˆσ l(x), x X ) G & Y (CWRU & SAMSI) Compressive inference 13 / 20
24 SCR via Tube Method Confidence bands (ˆf (x) ± cˆσ l(x) ) Simultaneous: α = P( ˆf (x) f (x) > cˆσ l(x), x X ) Tube method (Sun and Loader, 94) for d = 1 α κ 0 π where κ 0 can be derived from l(x) ) ν/2 (1 + c2 + P( t ν > c) ν G & Y (CWRU & SAMSI) Compressive inference 13 / 20
25 SCR via Tube Method Confidence bands (ˆf (x) ± cˆσ l(x) ) Simultaneous: α = P( ˆf (x) f (x) > cˆσ l(x), x X ) Tube method (Sun and Loader, 94) for d = 1 α κ 0 π where κ 0 can be derived from l(x) ) ν/2 (1 + c2 + P( t ν > c) ν Decision: reject H 0 if there exists x such that ˆf (x) ˆσ l(x) > c G & Y (CWRU & SAMSI) Compressive inference 13 / 20
26 Simulation Setup 1 Fix p; vary n 2 Consider functions f = 0, f 1, f 2 evaluated at x i = i/p. 3 Generate A n p : A ij iid N(0, 1/n). 4 Generate ɛ N(0, I n). 5 y = Af + ɛ. Tube Bonferroni H 0 : f (x) = 0, x X H 0i : f (x i ) = 0 vs H 1i : f (x i ) 0 G & Y (CWRU & SAMSI) Compressive inference 14 / 20
27 Table: Test Size;(T: tube method; B: Bonferroni). n/p method α = 1% 10% 20% 30% 40% 50% 100% T 0.90% 1.20% 1.10% 0.70% 1.10% 0.70% B 0.20% 0.20% 0.00% 0.20% 0.10% 0.00% n/p method α = 5% 10% 20% 30% 40% 50% 100% T 4.80% 4.50% 5.30% 4.40% 3.10% 3.70% B 0.30% 0.50% 0.30% 0.60% 0.80% 0.00% n/p method α = 10% 10% 20% 30% 40% 50% 100% T 9.10% 9.40% 8.90% 9.60% 7.70% 7.70% B 0.40% 1.00% 1.00% 1.10% 1.10% 0.00% G & Y (CWRU & SAMSI) Compressive inference 15 / 20
28 Figure: f 1 = 2δ x 0.5, x [0, 1]; f 2 = δ exp ( 10 4 x ), x [0, 1] G & Y (CWRU & SAMSI) Compressive inference 16 / 20
29 Figure: Test Power for f 1 = 2δ x 0.5, x [0, 1] G & Y (CWRU & SAMSI) Compressive inference 17 / 20
30 Figure: Test Power for f 2 = δ exp ( 10 4 x ), x [0, 1] G & Y (CWRU & SAMSI) Compressive inference 18 / 20
31 Future Work Multidimensional image d 2; real image and video Automating λ Supervised selection of K More asymptotics Conditional tube formula (A random) Software Real applications: medical, hidden messages, security monitoring, etc. G & Y (CWRU & SAMSI) Compressive inference 19 / 20
32 Thank you! G & Y (CWRU & SAMSI) Compressive inference 20 / 20
Can we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationRegularization in Reproducing Kernel Banach Spaces
.... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationDirect Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationSpline Density Estimation and Inference with Model-Based Penalities
Spline Density Estimation and Inference with Model-Based Penalities December 7, 016 Abstract In this paper we propose model-based penalties for smoothing spline density estimation and inference. These
More informationMATH 590: Meshfree Methods
MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationDivide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates
: A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationNonparametric Inference In Functional Data
Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +
More informationAlternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods
Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationMathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More informationNonparametric Regression. Badr Missaoui
Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationThe Learning Problem and Regularization
9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning
More informationSpectral Regularization
Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationRegularization via Spectral Filtering
Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationTheoretical Exercises Statistical Learning, 2009
Theoretical Exercises Statistical Learning, 2009 Niels Richard Hansen April 20, 2009 The following exercises are going to play a central role in the course Statistical learning, block 4, 2009. The exercises
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationMATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels
1/12 MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels Dominique Guillot Departments of Mathematical Sciences University of Delaware March 14, 2016 Separating sets:
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationAdaptive Piecewise Polynomial Estimation via Trend Filtering
Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationHypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA
Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationKernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444
Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More information21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationA Selective Review of Sufficient Dimension Reduction
A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationSliced Inverse Regression
Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationChapter 7, continued: MANOVA
Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to
More informationOnline gradient descent learning algorithm
Online gradient descent learning algorithm Yiming Ying and Massimiliano Pontil Department of Computer Science, University College London Gower Street, London, WCE 6BT, England, UK {y.ying, m.pontil}@cs.ucl.ac.uk
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationLasso, Ridge, and Elastic Net
Lasso, Ridge, and Elastic Net David Rosenberg New York University February 7, 2017 David Rosenberg (New York University) DS-GA 1003 February 7, 2017 1 / 29 Linearly Dependent Features Linearly Dependent
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationLecture 11: Regression Methods I (Linear Regression)
Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationRANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor
RANDOM FIELDS AND GEOMETRY from the book of the same name by Robert Adler and Jonathan Taylor IE&M, Technion, Israel, Statistics, Stanford, US. ie.technion.ac.il/adler.phtml www-stat.stanford.edu/ jtaylor
More informationTowards stability and optimality in stochastic gradient descent
Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationInterpolation-Based Trust-Region Methods for DFO
Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationHilbert Space Methods in Learning
Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem
More informationLasso, Ridge, and Elastic Net
Lasso, Ridge, and Elastic Net David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 14 A Very Simple Model Suppose we have one feature
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationSparse Additive Functional and kernel CCA
Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More information