General Linear Model Introduction, Classes of Linear models and Estimation

Similar documents
Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Hotelling s Two- Sample T 2

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform

Introduction to Probability and Statistics

Finite Mixture EFA in Mplus

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

State Estimation with ARMarkov Models

Notes on Instrumental Variables Methods

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Chapter 3. GMM: Selected Topics

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

STK4900/ Lecture 7. Program

Estimation of the large covariance matrix with two-step monotone missing data

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Lecture 8 Genomic Selection

Variable Selection and Model Building

Chapter 7: Special Distributions

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A New Asymmetric Interaction Ridge (AIR) Regression Method

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression

Lecture 3 Consistency of Extremum Estimators 1

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

ECE 534 Information Theory - Midterm 2

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA)

Slides Prepared by JOHN S. LOUCKS St. Edward s s University Thomson/South-Western. Slide

Asymptotically Optimal Simulation Allocation under Dependent Sampling

A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE

One-way ANOVA Inference for one-way ANOVA

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

A CONCRETE EXAMPLE OF PRIME BEHAVIOR IN QUADRATIC FIELDS. 1. Abstract

Spectral Analysis by Stationary Time Series Modeling

Information collection on a graph

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

MULTIVARIATE STATISTICAL PROCESS OF HOTELLING S T CONTROL CHARTS PROCEDURES WITH INDUSTRIAL APPLICATION

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

1 Extremum Estimators

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Machine Learning: Homework 4

Unobservable Selection and Coefficient Stability: Theory and Evidence

Chapter 13 Variable Selection and Model Building

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Hidden Predictors: A Factor Analysis Primer

1 Probability Spaces and Random Variables

Information collection on a graph

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling

The Binomial Approach for Probability of Detection

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

arxiv:cond-mat/ v2 25 Sep 2002

Completely Randomized Design

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Using Factor Analysis to Study the Effecting Factor on Traffic Accidents

The Poisson Regression Model

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

Universal Finite Memory Coding of Binary Sequences

Elementary Analysis in Q p

Radial Basis Function Networks: Algorithms

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Convex Optimization methods for Computing Channel Capacity

The non-stochastic multi-armed bandit problem

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu

p-adic Measures and Bernoulli Numbers

ECON 4130 Supplementary Exercises 1-4

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Approximating min-max k-clustering

Asymptotic theory for linear regression and IV estimation

7. Introduction to Large Sample Theory

Numerical Linear Algebra

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Metrics Performance Evaluation: Application to Face Recognition

Optimism, Delay and (In)Efficiency in a Stochastic Model of Bargaining

Weakly Short Memory Stochastic Processes: Signal Processing Perspectives

MULTIVARIATE SHEWHART QUALITY CONTROL FOR STANDARD DEVIATION

Economics 101. Lecture 7 - Monopoly and Oligopoly

Estimating function analysis for a class of Tweedie regression models

Statics and dynamics: some elementary concepts

738 SCIENCE IN CHINA (Series A) Vol. 46 Let y = (x 1 x ) and the random variable ß m be the number of sibs' alleles shared identity by descent (IBD) a

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

VIBRATION ANALYSIS OF BEAMS WITH MULTIPLE CONSTRAINED LAYER DAMPING PATCHES

Nonparametric estimation of Exact consumer surplus with endogeneity in price

Named Entity Recognition using Maximum Entropy Model SEEM5680

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Churilova Maria Saint-Petersburg State Polytechnical University Department of Applied Mathematics

Online Learning of Noisy Data with Kernels

Keywords: pile, liquefaction, lateral spreading, analysis ABSTRACT

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Chapter 1 Fundamentals

Transcription:

Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory) conditions or in the real world. The underlying urose may be to Develo understanding of the underlying henomena Prediction of future events (outcome), Test some secified hyotheses, Control the outcome of future events. A MODEL (a mathematical equation) involving deterministic (controlled) variables, stochastic variables, as well as unknown arameters, will be useful in erforming this task. Note: Assumtions about the robability distribution of the stochastic variable may be made. These are considered art of the model. The unknown arameters in the model can be estimated (learned) from the available training data. Proosition: Suose that y denotes a quantity in the real world (resonse) about which we want to learn. We assume that there exists a finite, ossibly large, collection of variables{ x, x,, x r }, called exlanatory variables, and a function g, such that y= gx (, x,, x r ). Thus, y and { x, x,, x r } are assumed to be functionally related. We are not imlying that these variables are known and observable, and/or the function g is known. When this relationshi is not exact, it may be a good and useful aroximation to the exact model. This is called a statistical model.

Signal lus noise model: A oular class of linear models The exlanatory variables are known and observable, but g may be unknown and one is willing to assume that g( x, x,, xk) = µ ( x, x,, xk) + ε, such that the signal µ ( x, x,, x k ) is known u to a set of unknown arameters and the additive errorε acts as a random (uncontrollable or unexlainable) noise. Sometimes not all the xi ' s, that determine the resonse y, may be known. However, one can assume that g( x, x,, xk) = µ ( x, x,, x) + η ( x+, x+,, xk), where, conditional on the values of key exlanatory variables x, x,, x, the quantities x+, x+,, xk change so that η ( x+, x+,, xk) behaves like a random errorε. This error is called the equation error or the secification error. In certain other situations, the underlying resonse, y*, itself may not be observable exactly. Instead, we measure y = y* + ε, whereε denote the measurement error. For simlicity, one can write the model for the measurement (observable quantity) Y as = µ (,,, ) + ε, (P) Y x x x r Thus, the random noise ε could be either the secification error, or the measurement error, or a mixture of both these errors. If the errors are not additive, sometimes these models are called Generalized linear models (GLIM), e.g., logistic regression models, etc.

General Linear Model (GLM): The Poulation Model (P) where, Y and ε are random variables, x, x,, xr are deterministic variables, and The mean resonse function EY { x} = µ ( x ), is linear in unknown arameters { β, β,, β r }, i.e., for all ( x, x,, xr ) Χ,, µ ( x) = β j f j( x, x,, xr). Here, the features fi ' sare assumed to be comletely known functions of x, x,, xr. In engineering, one talks about feature extraction rocess, which searches for aroriate features to describe the resonse. In statistical science, this is called model selection. The variable Y is called a resonse variable, or an endogenous variable. The variables x, x,, xr [or the features fi ' s] are called the indeendent variables, the exlanatory variables, the redictor variables, or the exogenous variables. Broad Classification of General Linear Models: Linear Regression Models: (Y, X, X,, Xr ) are a set of jointly distributed random variables, such that EY [ X, X,, Xr] = µ ( x) = β j f j( x, x,, xr), and exression (P) above holds, e.g., Simle linear regression, or Multile linear regression. For analysis uroses, we treat the regression models as articular case of the GLM. Here, we are segmenting (stratifying) the whole oulation based on the values of the variables ( X, X,, Xr ) and studying the conditional exectation of the resonse variable as a function of these variables. [Why do we choose conditional mean, not some quantile?]

Exerimental Design Models: Each exlanatory variable in the GLM is quantitative or categorical levels of certain factors or traits under study. The categorical levels are reresented as {0,} or {-,} indicating resence or absence of traits, and the GLM is called an exerimental design or ANOVA model. Basically, we are interested in understanding major causes of variability in the resonse variable. Classically, the analysis of such exeriments could be simlified quite a bit due the underlying structure in the set of exlanatory variables. These days research effort is devoted to finding otimal designs that allow otimal estimation of a secified set of arametric functions, based on some otimality criterion. These models are called the fixed effects models. Examles include, One-Way, Two-Way, Cross-classified multifactor exeriments or nested designs. When the exeriment includes some design variables, as well as some continuous exlanatory variables, these models are called Analysis of Covariance (ANCOVA) models. Variance Comonent Models (Random Effect Models): In many exeriments, the levels of a factor are assumed to be randomly drawn from a oulation of levels. The effects of this factor are (unobserved) random variables, following a distribution with mean zero and unknown variance. These are called random effect models. Mixed Effects Models: The mixed effects models have some factors with fixed effects and some that have random effects. Remark: Functionally related variables, when all the variables are subject to measurement errors, are called Error-in-variables models. These should not be treated as a articular case of the GLM. Learning from Data: We wish to learn (make inference: estimate, test hyotheses) about the unknown arameters β, β,, β based on a Training Set (Samle). Start with the Samle model for the observations in the training set. But, it is assumed that the oulation model is valid, to enable us to relate the resonse y to x, x,, xr for unobserved or out-of-samle units in the oulation.

Training Samle Model: Given n observations [[( Yi, xi ), xi = ( xi,, xir )], i =,, n, the samle model can be exressed as Y = µ ( x, x,, x ) + ε, i =,,, n, () i i i ir i where, i, i,,, n and equal variance σ. ε =, denote the noise (random errors), each with mean zero Clearly, EY [ x ] = µ ( x, x,, x ), i=,,, n. () i i i i ir From now on, we denote the features f j, j =,,,, themselves as coded redictor variables x, x,, x. In the simlest setting, the random errors are also assumed to be uncorrelated. Thus the samle GLM can be exressed as i = β j ij + εi, [ εi ] = 0, [ εi ] = σ, ( εi, εk ) = 0,. (3) Y x E Var Cov i k EY [ ] = β x. i i j ij Examles: Simle linear regression model Multile linear regression model, Polynomial regression model, One-way fixed-effect ANOVA model, One-way random-effects ANOVA model. x (4) In order to use this model for rediction of future resonse given a set of redictor values, the unknown arameter needs to be estimated (learned) from the training samle. For any reasonable estimate β of the vector β, estimated errors (residuals) in the resonse e ( Y β x ) i i j ij = should be as small as ossible. The choice of β is based on solving an otimization roblem: Minimize a loss function l( β ), an imlicit function of the estimated errors, {,,, }, that tends to kee the errors as small as ossible. For examle, e e e n n n n l( β) = ei,or l( β) = ei, lw( β) = we i i i= i= i=, where

l : The absolute error ( L loss) criterion, l : The ordinary least squared error ( L loss) criterion, and l w : The weighted least squares error criterion. Ordinary Least Square (OLS) Criterion: Find the estimated coefficient vector ˆ β = arg min l( β), that minimizes the sum of squared errors, i.e., β R n min l( β) = min ( Y ). i β jx ij β R β R i = j = Historically, the LS criterion has been oular, since one could find its solution analytically as well as geometrically. Therefore, its statistical roerties can be studied easily. The absolute error loss criterion required solving a linear rograming roblem, thus it was difficult to derive its statistical roerties analytically. Nowadays, regularized versions (minimization subject to some uer bound on the size of the vector β ) of both these criteria are oular in data mining alications. For examle, Ridge regression: Minimize the squared error loss subject to an uer bound on the L -norm of the coefficient vector, LASSO: Minimize the squared error loss subject to an uer bound on the L -norm of the coefficient vector. Note that without a concise notation, it is tedious to exress these quantities. Vector/matrix notation for the resonse, redictor variables, error terms and the unknown coefficients: Y x j β ε Y x j β ε Yn =, Xn = [ x, x,, x], where x, β = and ε =. Y x n nj β ε n

Given the resonse vector Y, and the design matrix X, the samle GLM can be written in matrix notation as Y= Xβ + ε ε = ε = ε ε = σ I (5), E[ ] 0, Cov[ ] ((cov( i, j))). For this model, E[ Y] = X β, Cov( Y ) = σ I. (6) Thus the OLS criterion is equivalent to minimizing the residual sum of squares, i.e., min ee = min( Y X β) ( Y X β). β R β R Exand S( β) = ( β)( β) = β β + β β Y X Y X YY YX XY XX. In order to be able to write these models in a comact notation, we need to have some background in linear algebra. In the next few lectures, we will review some of these tools.