Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Similar documents
Statistical pattern recognition

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Lecture 12: Classification

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Generative classification models

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Maximum Likelihood Estimation (MLE)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Classification learning II

Probabilistic Classification: Bayes Classifiers. Lecture 6:

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

10-701/ Machine Learning, Fall 2005 Homework 3

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Composite Hypotheses testing

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

Classification as a Regression Problem

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Homework Assignment 3 Due in class, Thursday October 15

Pattern Classification

Learning from Data 1 Naive Bayes

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Comparison of Regression Lines

Machine learning: Density estimation

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

STAT 3008 Applied Regression Analysis

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

Lecture 2: Prelude to the big shrink

Which Separator? Spring 1

Differentiating Gaussian Processes

Logistic Classifier CISC 5800 Professor Daniel Leeds

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

Unified Subspace Analysis for Face Recognition

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Feb 14: Spatial analysis of data fields

Propagation of error for multivariable function

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Chap 10: Diagnostics, p384

Chapter 12 Analysis of Covariance

Error Bars in both X and Y

The big picture. Outline

15-381: Artificial Intelligence. Regression and cross validation

Negative Binomial Regression

The Geometry of Logit and Probit

x i1 =1 for all i (the constant ).

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

SDMML HT MSc Problem Sheet 4

Two-factor model. Statistical Models. Least Squares estimation in LM two-factor model. Rats

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Lecture 3: Dual problems and Kernels

Decision Analysis (part 2 of 2) Review Linear Regression

Gaussian process classification: a message-passing viewpoint

Machine Learning for Signal Processing Linear Gaussian Models

Linear Approximation with Regularization and Moving Least Squares

Review: Fit a line to N data points

Linear Regression Analysis: Terminology and Notation

Chapter 15 Student Lecture Notes 15-1

Introduction to Regression

9. Binary Dependent Variables

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

Kernel Methods and SVMs Extension

Lecture Notes on Linear Regression

Chapter 8 Indicator Variables

Continuous vs. Discrete Goods

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Linear Classification, SVMs and Nearest Neighbors

Clustering & Unsupervised Learning

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Kristin P. Bennett. Rensselaer Polytechnic Institute

U-Pb Geochronology Practical: Background

Statistics for Business and Economics

Notes on Frequency Estimation in Data Streams

STAT 511 FINAL EXAM NAME Spring 2001

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Graph the R Matrix in Linear Mixed Model

e i is a random error

UVA CS / Introduc8on to Machine Learning and Data Mining

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Lab 4: Two-level Random Intercept Model

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Support Vector Machines

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Explaining the Stein Paradox

Limited Dependent Variables

Transcription:

Outlne Multvarate Parametrc Methods Steven J Zel Old Domnon Unv. Fall 2010 1 Multvarate Data 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate Regresson 1 2 Multvarate Data Basc Multvarate Statstcs d nputs (a.k.a., features, attrbutes) nstances (a.k.a., observatons, examples) x1 1 x2 1... x 1 d x1 2 x2 2... x 2 d X =... x1 x2... xd (Later wll consder what happens f some gapes are allowed n observatons.) Mean: E[ x] = µ = [µ 1, µ 2,..., µ d ] T Covarance: σ j Cov(x, x j ) = E[(x µ )(x j µ j )] = E[x x j ] µ µ j Correlaton: Corr(x, x j ) ρ j = σ j σ σ j Covarance matrx Σ Cov( x) = E[( x µ)( x µ) T ] σ1 2 σ 12... σ 1d σ 21 σ2 2... σ 2d =... σ d1 σd 2... σd 2 3 4

Multvarate Parameter Estmaton t=1 Sample Mean m: m = xt, = 1... d Covarance Matrx: s j = Correlaton Matrx: R : r j = s j s s j t=1 (xt m )(xj t m j ) Imputaton What f certan nstances have mssng attrbutes? Throw out entre nstance? problem f the sample s small Imputaton: fll n the mssng value Mean mputaton: use the expected value Imputaton by regresson: predct based on other attrbutes 5 6 Multvarate ormal Dstrbuton Slcng p( x) = x d ( µ, Σ) [ 1 (2π) d/2 exp 1 ] Σ 1/2 2 ( x µ)t Σ 1 ( x µ) 7 Any slce (projecton) along a sngle drecton w s normal: w T x ( w T µ, w T Σ w) Any projecton onto a lnearly transformed set of axes of dmenton d s MV ormal 8

Effects of Covarance ormalzed Dstance z = x µ σ can be seen as a dstance from µ to x n normalzed σ-sze unts. Generalzng to d dmensons gves the Mahalanobs dstance ( x µ) T Σ 1 ( x µ) If x has larger varance than x j, x gets lower weght n ths dstance. If x and x j are hghly correlated, they get less weght than two less correlated varables. A small Σ ndcates that the samples are close to µ and/or the varables are hghly correlated If Σ s zero, then some of the varables are constant or there s a lnear dependency among varables. Ether way, reduce the dmensonalty by removng unneeded varables 9 10 Specal Cases of Mahalanobs Dstance Outlne d( x) = ( x µ) T Σ 1 ( x µ) 1 Multvarate Data If the x are ndependent, off-dagonal elements of Σ are zero d( x) = d ( x µ Σ =0 If the varances are also equal, reduces to Eucldean dstance ) 2 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate Regresson 11 12

Multvarate Classfcaton If p( x C ) ( µ, Σ ), [ 1 p( x C ) = (2π) d/2 exp 1 ] Σ 1/2 2 ( x µ ) T Σ 1 ( x µ ) Dscrmnants are g ( x) = log p( x C ) + log P(C ) Estmate as = d 2 log 2π 1 2 log Σ 1 2 ( x µ ) T Σ 1 ( x µ ) + log P(C ) Quadratc Dscrmnant g ( x) = d 2 log 2π 1 2 log S 1 2 ( x m ) T S 1 ( x = m ) + log ˆP(C ) 1 2 log S 1 2 ( x m ) T S 1 ( x = m ) + log ˆP(C ) = 1 2 log S 1 ( ) x T S 1 x 2 x T S 1 m + m T S 1 m 2 + log ˆP(C ) = x T W x + w T x + w 0 Ths s a quadratc n x. g ( x) = d 2 log 2π 1 2 log S 1 2 ( x m ) T S 1 ( x m )+log ˆP(C ) 13 14 Smplfcaton: Shared covarance Share a common sample covarance S S = ˆP(C )S Dscrmnant smplfes to lkelhoods g ( x) = 1 2 ( x m ) T S 1 ( x m ) + log ˆP(C ) Although ths functon s quadratc n x, t yelds a lnear dscrmnant because the x t x quadratc term s dentcal across all. posteror for C 15 16

Lnear Dscrmnant Further Smplfcaton: Independence If we share a common sample covarance S and the varables are ndependent, then the off-dagonal elements of S are zero. Dscrmnant smplfes to ( ) d x t 2 j m j + log ˆP(C ) g ( x) = 1 2 j=1 s j Ths s the ave Bayes Classfer. Each varable s an ndependent Gaussan Dstance measured n standard devaton unts 17 18 Dagonal S Further Smplfcaton: Equal Varances If varances are also equal, Dscrmnant smplfes to ( ) d x t 2 j m j + log ˆP(C ) g ( x) = 1 2 j=1 s Ths s the nearest mean classfer. 19 20

Model Selecton Assumpton Covarance matrx # Parameters Equal varances S = S = s 2 I 1 Independent S = S, s j = 0 d Shared Covarance S = S d(d + 1)/2 Dfferent Covarances S Kd(d + 1)/2 Bnary Features x j {0, 1} p j = p(x j = 1 C ) If the x j are ndependent (ave Bayes) p( x C ) = d j= p x j j (1 p j) (1 x j ) The dscrmnant n lnear g ( x) = j [x j log ˆp j + (1 x j ) log (1 ˆp j )] + log ˆP(C ) 21 22 Dscrete Features x j {v 1, v 2,..., v nj } p jk = p(z jk = 1 C ) = p(x j = v k C ) If the x j are ndependent g ( x) = j p( x C ) = n d j j= k= p z jk jk z jk log ˆp jk + log ˆP(C ) k 23 Multvarate Regresson Multvarate lnear model Error: E( w X ) = 1 2 r t = g( x t w 0, w 1,..., w d ) + ε w 0 + w 1 x t 1 + w 2 x t 2 +... + w d x t d [ r t (w 0 + w 1 x1 t + w 2 x2 t +... + w d xd t )] t 1 x 1 1 x 1 2... x 1 k 1 x 2 1 x 2 2... x 2 k D =.... 1 x1 x2... xk (D T D) w = D T r w = (D T D) 1 D T r r = r 1 r 2. r 24

Multvarate Regresson (D T D) w = D T r w = (D T D) 1 D T r Soluton s same as for Unvarate polynomal regresson, but usng the dstnct varables nstead of dfferent powers. 25