Sliced Inverse Regression
|
|
- Maude Craig
- 5 years ago
- Views:
Transcription
1 Sliced Inverse Regression Ge Zhao Department of Statistics The Pennsylvania State University
2 Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed Regression Curve SIR Algorithm of SIR Discussion on SIR Consistency and Sparsity of SIR
3 Background Regression analysis is a popular way of studying the relationship between a response variable y R and its explanatory variable x R p. In some cases finding correct parametric model is not easy, leading to a nonparametric way. When dimension increases higher more and more data are required in the sample. We want an ideal model catching most or all of the interesting feature with least dimension. y = f(β 1 x, β 2 x,..., β K x, ɛ), K p.
4 Dimension Reduction y = f(β 1 x, β 2 x,..., β K x, ɛ), K p. f is not identifiable, arbitrary function in R K+1. β can be changed since β T x is a projection on a K-dimension space. When K is much smaller than p, we may claim we reduce the dimension once most of the information about y is remained and we estimate β efficiently. Estimating the projection β indicates that we have the new space. We call β effective dimension reduction (e.d.r.) direction.
5 Dimension Reduction (Continue) Ideal statement: y = f(β 1 x, β 2 x,..., β K x, ɛ), K p. Alternative statement: The conditional distribution of y given x depends on x only through the K-dimensional variable (β 1 x,..., β K x). y x β T x
6 Intuition of SIR Difficulty arises. Dimension p is larger than n, regressing y against x does not make sense. Hard to view the data (cooradinates) by using traditional method due to high dimension. Flip y and x!
7 Definition of SIR Consider the model x = g p (y) as an one-dimension problem. E(x y) will be a curve in p-dimension space. If possible, this curve will hover around a K-dimensional affine subspace. Later shows the relationship between K-dimensional affine subspace and effective dimension reduction space (spanned from e.d.r. directions.)
8 Definition of SIR (Continue) Affine invariant criterion, the squared trace correlation: R 2 (b) = max β B If x is standardized as follows: (bσ xx β ) 2 (bσ xx b )(βσ xx β ). z = Σ 1 xx {x E(x)}, the inverse regression curve falls into a subspace which coincides with e.d.r. space.
9 Algorithm of SIR We have a data set (y i, x i ), i = 1, 2,..., n. 1. Standardized x: x i = Σ 1 xx {x i x}; 2. Divide range of y into H slices, I 1,..., I H. Each slice has proportion p h of total observations; 3. Compute sample mean of each slice, denoted by m h ; 4. Conduct a weighted Principle Component Analysis on m h from the weighted covariance matrix V = H h=1 p h m h m h ; 5. Output β k = η k Σ 1/2 xx, k = 1,..., K where η k s are K largest eigenvectors.
10 Remarks Sample mean is just for simplicity. We can use other methods to estimate inverse curve, such as kernel based nonparametric regression, nearest neighbor, smoothing splines. Here we are only interested in the orientation. Weighted version of PCA takes care of unequal sample slice. In general we have lim n p/n = 0 to guarantee the consistency. If violated, we need more conditions. First K components locate most important subspace. We will discuss how to decide k later. Last step transform back to β k.
11 Further discussion No need to standardize x i, but still need to transform sliced means as follows: H Σ 1 = p h ( x h x)( x h x) h=1 where x h is the sliced sample mean. Then we do PCA on Σ 1 instead of Σ xx. One can use other methods, such as robust version, to standardize x. The purpose is to downweight or cut out the influential design points.
12 Further discussion (Continue) Slice can be equal distance. We prefer to have it varied from slice to slice such that they have similar sample size with each other. We hope the range of each slice will converge to 0 so that only local points will contribute to the estimation. Even with large number of slice, we still have good consistency. Usually using following slice: I h = (F 1 y {(h 1)/H}, F 1 {h/h}) Choice of H may affect the asymptotic variance of β, but not as important as bandwidth in nonparametric model. y
13 Further discussion (Continue) Expectation of the squared trace correlation between β k x and β k x is given by E{R 2 ( B)} = 1 p K n ( K K λ k k=1 ) 1 + o ( ) 1. n To be really successful in picking up all K dimensions for reduction, the inverse regression curve cannot be too straight. In other words, the first K eigenvalues for V must be significantly different from zero compared with the sampling error.
14 Further discussion (Continue) Theorem If x is normally distributed, then n(p K) λ p K follows a χ 2 distribution with (p K)(H K 1) degree of freedom asymptotically, where λ p K denotes the average of the smallest p K eigenvalues for V. We can use this result to assess the number of component we have.
15 Theoretical results: Conditions 1. The conditional distribution of y given x depends on x only through the K-dimensional variable (β 1 x,..., β K x). y x β T x; 2. For any b R p, the conditional expectation E(bx β 1 x,..., β K x) is a linear combination of β 1 x,..., β K x.
16 Theoretical results: Inverse regression curve Theorem The centered inverse regression curve E(x y) E(x) is contained in the linear subspace spanned by β k Σ xx, k = 1,..., K, where Σ xx denotest the covariance matrix of x. Here we have E(E(x y)) = E(x) by law of total expectation. The following corollary is straightforward. Corollary Assume x has been standardized to z, then the standardized inverse regression curve E(z y) is contained in the linear space generated by the standardized e.d.r. directions η 1,..., η K.
17 Theoretical results: Remarks According to law of total variance, E{cov(z y)} = covz cov{e(z y)} = I cov{e(z y)}. Hence the largest eigen value of E{cov(z y)} equals the smallest eigen value of cov{e(z y)}. The consistency is at rate of n. Let p h = P r{y I h } and m h = E(Z y I h ). We have m h m h at rate n 1/2, V H h=1 p hm h m h at rate n.
18 Conditions for more detialed discussion 1. Linearity condition: For and b R p, E(bx β 1 x,..., β K x) is a linear combination of β 1 x,..., β K x. 2. Coverage condition: The dimension of the space spanned by the central curve is the same as the dimension of the central space. 3. Boundness Condition: There exist positive constant C 1 and C 2 such that C 1 λ min (Σ xx ) λ max (Σ xx ) C 2 where λ min and λ max are the minimum and maximum eigenvalue of Σ xx respectively.
19 Conditions (Continue) 4. The central curve E(x y) has fourth finite moment and is κ-sliced stable with respect to y and E(x y). Sliced stable is an intrinsic property of E(x y). If we expect the slice estimate 1/H h m h m h of var(m(y)) is consistent, we must require that the average loss of variance in each slice (1/c m h,i m h,i m h m h ) to be decreasing as H is increasing.
20 Consistency Theorem Assume the conditions are all hold, for sufficiently large H and n, we have: ( ) Λ 1 p Λ p 2 O p H κ 1 + H2 p H n + 2 p n where Λ p = var(e(x y)) and Λ p is its estimate. A direct corollary is that if p/n 0, we may choose H = log(n/p) such that the right hand side converges to 0. Hence Λ p is a consistent estimate of Λ p = var(e(x y)).
21 Consistency (Continue) Theorem Assume the conditions are all hold, x is sub-gaussian and lim n p/n = 0, then 1 Σ Λ xx p Σ 1 xx Λ p 2 0 with probability converging to 1 as n, where Σ xx = 1 n n x i x i. i=1
22 Consistency (Continue) Theorem Assume the conditions are all hold and x n(0, I p ), for the single index model y = f(β T x, ɛ) we have When lim n p/n (0, ), Λ p Λ p 2 is (as a function of p/n) dominated by p/n p/n if H, n ; Let β be the PCA eigenvector of Λ p. If lim n p/n 0, the there exists a positive constant c(p/n) > 0 such that lim inf n E (β, β) > c(p/n) with probability converges to 1.
23 Conditions for ultrahigh dimension SIR We are now discussing the case when p n. 5. s = S p where S = { i β j (i) 0 for some j, 1 j K } and S is the number of elements in S. 6. Σ xx U(ɛ 0, α, C) and max 1 i p r i is bounded where r i is the number of non-zero elements in the i-th row of Σ xx. Where U(ɛ 0, α, C) = { Σ xx : max { σ i,j : i j > l} Cl α for all l > 0, j i and 0 < ɛ 0 λ min (Σ xx) λ max(σ xx) 1 } ɛ 0
24 Conditions for ultrahigh dimension SIR (Continue) We are now discussing the case when p n. 7. There exist positive constants C and ω such that var[e{x(k) y}] > C/s ω when E{x(k) y} is not constant. 8. There exists a constant K such that every coordinate x(k) is sub-gaussian and upper-exponentially bounded by K. Now we have the following theorems:
25 Ultrahigh dimension consistency Theorem Assuming the conditions are hold, let t = a/s ω where a is a sufficiently small positive constant such that t < var{m(y, k)}/2 for any k T, one has 1 T c ɛ p holds with probability at least { n } 1 C 1 exp C 2 H 2 s ω + C 3log(H) + log(p s) ; 2 T c I p holds with probability at least { n } 1 C 4 exp C 5 H 2 s ω + C 6log(H) + log(s) ; for some positive constants C 1,..., C 6.
26 Ultrahigh dimension consistency (Continue) Theorem Under the same assumptions of previous Theorem and the same setting of t, let T = I(t) and H = log(n/s ω log(p)), then ( ) e Λ T, T 2 p Λ p 0, n with probability converges to 1. As a direct corollary we have ( ) 1 Σ X e Λ T, T p Σ 1 with probability converges to 1. X Λ p 0, n 2
27 Ultrahigh dimension algorithm 1. Calculate var H,c {x(k)} for k = 1,..., p according to var H,c {x(k)} = 1 H 1 H { x h (k) x(k)} 2 ; i=1 2. Let T = {k var H,c {x(k)} > t} for an appropriate t; 3. Let Λ T, T p be the SIR estimator of the conditional covariance matrix for the data (y, x, T ) according to: Λ = 1 H 1 H { x h x} { x h x} T ; h=1
28 Ultrahigh dimension algorithm (Continue) 4. Calculate η i = e( η T i ) where η T i, 1 i K are the top eigenvectors of Λ T, T ; 5. Calculate β i = Σ xx ; Σ 1 xx η i where Σ xx is a consistent estimate of 6. The central space is estimated by the span of β i s.
29 Summary Introduce the Sliced Inverse Regression (SIR) Provide the algorithm of SIR Further discuss the consistency of SIR Extend original SIR to ultrahigh dimension SIR
30 Thank you!
High-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationMultivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma
Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationEstimation of Mars surface physical properties from hyperspectral images using the SIR method
Estimation of Mars surface physical properties from hyperspectral images using the SIR method Caroline Bernard-Michel, Sylvain Douté, Laurent Gardes and Stéphane Girard Source: ESA Outline I. Context Hyperspectral
More informationA Note on Hilbertian Elliptically Contoured Distributions
A Note on Hilbertian Elliptically Contoured Distributions Yehua Li Department of Statistics, University of Georgia, Athens, GA 30602, USA Abstract. In this paper, we discuss elliptically contoured distribution
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationPrincipal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude
Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Addy M. Boĺıvar Cimé Centro de Investigación en Matemáticas A.C. May 1, 2010
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationEconometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018
Econometrics I Lecture 10: Nonparametric Estimation with Kernels Paul T. Scott NYU Stern Fall 2018 Paul T. Scott NYU Stern Econometrics I Fall 2018 1 / 12 Nonparametric Regression: Intuition Let s get
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationMachine Learning for Data Science (CS4786) Lecture 11
Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will
More informationIntroduction to Regression
Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationAssessing the dependence of high-dimensional time series via sample autocovariances and correlations
Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),
More informationOptimal Linear Estimation under Unknown Nonlinear Transform
Optimal Linear Estimation under Unknown Nonlinear Transform Xinyang Yi The University of Texas at Austin yixy@utexas.edu Constantine Caramanis The University of Texas at Austin constantine@utexas.edu Zhaoran
More informationOn corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR
Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationPrincipal Component Analysis
Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental
More informationSpatial Process Estimates as Smoothers: A Review
Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationDissertation Defense
Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationVector Auto-Regressive Models
Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationOn corrections of classical multivariate tests for high-dimensional data
On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics
More informationVAR Models and Applications
VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationNonparametric Drift Estimation for Stochastic Differential Equations
Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationRegression #3: Properties of OLS Estimator
Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with
More informationSliced Inverse Regression for big data analysis
Sliced Inverse Regression for big data analysis Li Kevin To cite this version: Li Kevin. Sliced Inverse Regression for big data analysis. 2014. HAL Id: hal-01081141 https://hal.archives-ouvertes.fr/hal-01081141
More informationIntroduction to Regression
Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression
More informationThe Free Matrix Lunch
The Free Matrix Lunch Wouter M. Koolen Wojciech Kot lowski Manfred K. Warmuth Tuesday 24 th April, 2012 Koolen, Kot lowski, Warmuth (RHUL) The Free Matrix Lunch Tuesday 24 th April, 2012 1 / 26 Introduction
More informationVariable Selection in High Dimensional Convex Regression
Sensing and Analysis of High-Dimensional Data UCL-Duke Workshop September 4 & 5, 2014 Variable Selection in High Dimensional Convex Regression John Lafferty Department of Statistics & Department of Computer
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationLecture Notes 3 Convergence (Chapter 5)
Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let
More informationCS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)
CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis
More informationIntermediate Econometrics
Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the
More informationIntelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham
Intelligent Data Analysis Principal Component Analysis Peter Tiňo School of Computer Science University of Birmingham Discovering low-dimensional spatial layout in higher dimensional spaces - 1-D/3-D example
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationPrincipal Component Analysis!! Lecture 11!
Principal Component Analysis Lecture 11 1 Eigenvectors and Eigenvalues g Consider this problem of spreading butter on a bread slice 2 Eigenvectors and Eigenvalues g Consider this problem of stretching
More informationLecture 20: Linear model, the LSE, and UMVUE
Lecture 20: Linear model, the LSE, and UMVUE Linear Models One of the most useful statistical models is X i = β τ Z i + ε i, i = 1,...,n, where X i is the ith observation and is often called the ith response;
More informationLinear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016
Linear models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 5, 2016 1 / 16 Outline for today linear models least squares estimation orthogonal projections estimation
More informationIntroduction to Regression
Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric
More informationOn-line Variance Minimization
On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06
More informationSusceptible-Infective-Removed Epidemics and Erdős-Rényi random
Susceptible-Infective-Removed Epidemics and Erdős-Rényi random graphs MSR-Inria Joint Centre October 13, 2015 SIR epidemics: the Reed-Frost model Individuals i [n] when infected, attempt to infect all
More informationLecture 10: Dimension Reduction Techniques
Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set
More informationOnline Kernel PCA with Entropic Matrix Updates
Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationOther Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model
Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);
More informationMultivariate Statistics
Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro
More informationGlobal (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction
Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The
More informationExtreme inference in stationary time series
Extreme inference in stationary time series Moritz Jirak FOR 1735 February 8, 2013 1 / 30 Outline 1 Outline 2 Motivation The multivariate CLT Measuring discrepancies 3 Some theory and problems The problem
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationStatistics 910, #5 1. Regression Methods
Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationGaussian random variables inr n
Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationRegression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.
Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression
More informationA Selective Review of Sufficient Dimension Reduction
A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework
More informationIntroduction to Regression
Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression
More informationSingle Equation Linear GMM with Serially Correlated Moment Conditions
Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More information(a)
Chapter 8 Subspace Methods 8. Introduction Principal Component Analysis (PCA) is applied to the analysis of time series data. In this context we discuss measures of complexity and subspace methods for
More informationRobust Testing and Variable Selection for High-Dimensional Time Series
Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional
More informationIV. Matrix Approximation using Least-Squares
IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationL. Brown. Statistics Department, Wharton School University of Pennsylvania
Non-parametric Empirical Bayes and Compound Bayes Estimation of Independent Normal Means Joint work with E. Greenshtein L. Brown Statistics Department, Wharton School University of Pennsylvania lbrown@wharton.upenn.edu
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationCHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION
59 CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 4. INTRODUCTION Weighted average-based fusion algorithms are one of the widely used fusion methods for multi-sensor data integration. These methods
More informationChapter 4 - Fundamentals of spatial processes Lecture notes
TK4150 - Intro 1 Chapter 4 - Fundamentals of spatial processes Lecture notes Odd Kolbjørnsen and Geir Storvik January 30, 2017 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationThe LIML Estimator Has Finite Moments! T. W. Anderson. Department of Economics and Department of Statistics. Stanford University, Stanford, CA 94305
The LIML Estimator Has Finite Moments! T. W. Anderson Department of Economics and Department of Statistics Stanford University, Stanford, CA 9435 March 25, 2 Abstract The Limited Information Maximum Likelihood
More informationNonparametric Inference In Functional Data
Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +
More information