Regression Retrieval Overview. Larry McMillin

Similar documents
Principal Component Analysis (PCA) of AIRS Data

ISyE 691 Data mining and analytics

Linear regression methods

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Machine Learning for Biomedical Engineering. Enrico Grisan

Data Mining Stat 588

COMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso

Linear model selection and regularization

Linear Methods for Regression. Lijun Zhang

STAT 462-Computational Data Analysis

COMS 4771 Regression. Nakul Verma

1 Least Squares Estimation - multiple regression.

Machine Learning Linear Regression. Prof. Matteo Matteucci

Linear Model Selection and Regularization

Linear Regression Linear Regression with Shrinkage

Lecture 14: Shrinkage

Ridge Regression and Ill-Conditioning

Linear Regression Linear Regression with Shrinkage

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

MS-C1620 Statistical inference

The NOAA Unique CrIS/ATMS Processing System (NUCAPS): first light retrieval results

An evaluation of radiative transfer modelling error in AMSU-A data

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Addition and subtraction: element by element, and dimensions must match.

High-dimensional regression

Learning with Singular Vectors

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

Math 4329: Numerical Analysis Chapter 03: Newton s Method. Natasha S. Sharma, PhD

Application of Principal Component Analysis to TES data

Theorems. Least squares regression

Simple Linear Regression for the Climate Data

Bias correction of satellite data at the Met Office

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

Global Soundings of the Atmosphere from ATOVS Measurements: The Algorithm and Validation

Lecture VIII Dim. Reduction (I)

Regression, Ridge Regression, Lasso

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Regression Analysis: Basic Concepts

High-dimensional regression modeling

Multivariate Regression (Chapter 10)

Machine Learning for OR & FE

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

Simple and Multiple Linear Regression

Next is material on matrix rank. Please see the handout

The Perceptron algorithm

Machine Learning Linear Classification. Prof. Matteo Matteucci

Retrieval Algorithm Using Super channels

Inferences for Regression

Variable Selection and Model Building

Simple Linear Regression Analysis

Interpretation of two error statistics estimation methods: 1 - the Derozier s method 2 the NMC method (lagged forecast)

APPENDIX 1 BASIC STATISTICS. Summarizing Data

Optimal Interpolation ( 5.4) We now generalize the least squares method to obtain the OI equations for vectors of observations and background fields.

Bias correction of satellite data at the Met Office

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

Statistical Techniques II EXST7015 Simple Linear Regression

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?

IASI Level 2 Processing

The May 2013 SNPP Cal/Val Campaign Validation of Satellite Soundings W. L. Smith Sr., A. M. Larar, H. E. Revercomb, M. Yesalusky, and E.

Regression Shrinkage and Selection via the Lasso

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

MS&E 226: Small Data

Regression Models - Introduction

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

MSA220/MVE440 Statistical Learning for Big Data

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Day 4: Shrinkage Estimators

MSA220/MVE440 Statistical Learning for Big Data

LECTURE 10: LINEAR MODEL SELECTION PT. 1. October 16, 2017 SDS 293: Machine Learning

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Lines and Their Equations

Statistical View of Least Squares

Lecture Data Science

Improved Historical Reconstructions of SST and Marine Precipitation Variations

Atmospheric Soundings of Temperature, Moisture and Ozone from AIRS

Statistics 910, #5 1. Regression Methods

Lecture 18 Miscellaneous Topics in Multiple Regression

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

A Modern Look at Classical Multivariate Techniques

Consider the joint probability, P(x,y), shown as the contours in the figure above. P(x) is given by the integral of P(x,y) over all values of y.

Extreme Values and Positive/ Negative Definite Matrix Conditions

Spectral Methods for Subgraph Detection

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

Week 3: Simple Linear Regression

Lectures 5 & 6: Hypothesis Testing

Applied Machine Learning Annalisa Marsico

Sect Properties of Real Numbers and Simplifying Expressions

Econometric Forecasting Overview

IASI PC compression Searching for signal in the residuals. Thomas August, Nigel Atkinson, Fiona Smith

Prediction & Feature Selection in GLM

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Linear Regression Models

CS545 Contents XVI. Adaptive Control. Reading Assignment for Next Class. u Model Reference Adaptive Control. u Self-Tuning Regulators

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

MSA220 Statistical Learning for Big Data

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

An update on the NOAA MSU/AMSU CDR development

CORRELATION BETWEEN ATMOSPHERIC COMPOSITION AND VERTICAL STRUCTURE AS MEASURED BY THREE GENERATIONS OF HYPERSPECTRAL SOUNDERS IN SPACE

GIFTS SOUNDING RETRIEVAL ALGORITHM DEVELOPMENT

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Transcription:

Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite, Data, and Information Service Washington, D.C. Larry.McMillin@noaa.gov

Pick one - This is All you ever wanted to know about regression All you never wanted to know about regression

Overview What is regression? How correlated predictors affect the solution Synthetic regression or real regression? Regression with constraints Theory and applications Classification Normalized regression AMSU sample Recommendations

Regression - What are we trying to do? Obtain the estimate with the lowest RMS error. Or Obtain the true relationship

Which line do we want?

Considerations Single predictors Easy Multiple uncorrelated predictors Easy Multiple predictors with correlations Assume two predictors are highly correlated and each has a noise Difference is small with a larger noise than any of the two And that is the problem Theoretical approach is hard for these cases If there is an independent component, then you want the difference If they are perfectly correlated, then you want to average to reduce noise

Considerations continued Observational approach using stepwise regression Stability depends on the ratio of predictors to the predictands Stepwise steps 1. Find the predictor with the highest correlation with the predictand 2. Generate the regression coefficient 3. Make the predictands orthogonal to the selected predictor 4. Make all remaining predictors orthogonal to the selected predictor Problem When two predictors are highly correlated and one is removed, the calculation of correlation of the other one involves a division by essentially zero The other predictor is selected next The predictors end up with large coefficients of opposite sign

Considerations continued Consider two predictands with the same predictors Stable case (temperature for example) The correlation with the predictand is high Unstable case (water vapor for example) The correlation with the predictand is low Essentially when a selected predictor is removed from the predictand and the predictors If the residual variance of the predictand decays at least as fast as the residual variance of the predictors, the solution remains stable

Considerations continued Desire Damp the small eigenvectors but don t damp the regression coefficients C = YX T (XX T ) -1 But when removing the variable want to use (XX T + γι) 1 Solutions Decrease the contributions from the smaller eigenvectors This damps the slope of the regression coefficients and forces the solution towards the mean value Alternatives Increase the constraint with each step of the stepwise regression But no theory exists

Regression Retrievals T = T guess + C(R R guess ) R is measured R guess Measured Apples subtracted from apples (measured measured) Calculated Apples subtracted from oranges (measured calculated) This leads to a need for bias adjustment (tuning)

Synthetic or Real Synthetic regression use calculated radiances to generate regression coefficients Errors Can be controlled Need to be realistic Sample needs to be representative Systematic errors result if measurements and calculations are not perfectly matched Real regression - uses matches with radiosondes Compares measured to measured - no bias adjustment needed Sample size issues - sample size can be hard to achieve Sample consistency across scan spots - different samples for each angle Additional errors - match errors, truth errors

Regression with constraints Why add constraints? Problem is often singular or nearly so Possible regressions Normal regression Ridge regression Shrinkage Rotated regression Orthogonal regression Eigenvector regression Stepwise regression Stagewise regression Search all combinations for a subset

Definitions Y = predictands X = predictors C = coefficients C normal = normal coefficients C 0 = initial coefficients C ridge = ridge coefficients C shrinkage = shrinkage coefficients C rotated = rotated coefficients C orthogonal = orthogonal coefficients C eigenvector = eigenvector coefficients

Definitions continued γ = a constant ε = errors in y δ = errors in x X t = true value when known

Equations Y = C X C normal = YX T (XX T ) 1 C ridge = YX T (XX T + γi) 1 C shrinkage = (YX T + γc 0 )(XX T + γi) 1 C rotated = (YY T C 0T + YX T + γc 0 )(C 0T YX T + XX T + γi) -1 C orthogonal = multiple rotated regression until solution converges Note many of these differ only in the directions used to calculate the components The first 3 minimize differences along the y direction Rotated minimizes differences perpendicular to the previous solution Orthogonal minimizes differences perpendicular to the final solution

Regression examples

Regression examples

Regression Examples

Constraint summary True relationship Y = 1.2 X Guess Y = 1.0 X ss = 17.79 Ordinary Least Squares Y = 0.71 X ss = 13.64 Ridge - gamma = 2 Y = 0.64 X ss = 13.73 Shrinkage - gamma = 2 Y = 0.74 X ss = 13.65 Rotated - gamma = 4 ( equivalent to gamma = 2) Y = 1.15 X ss = 16.94 ss = 7.35 in rotated space Orthogonal - gamma = 4 Y = 1.22 X ss = 18.14 ss = 7.29 in orthogonal space

Regression Examples

Regression Examples

Regression Examples

Regression Examples

Regression Examples

Popular myth - Or the devil is in the details Two regression can be replaced by a single one Y = C X X = D Z Y = E Z Then Y = C D Z and E = C D True for normal regression but false for any constrained regression In particular, if X is a predicted value of Y from Z using an initial set of coefficients and C is obtained using a constrained regression, then the constrain is in a direction determined by D. If this is iterated, it becomes rotated regression.

Regression with Classification Pro Con Starts with a good guess Decreases the signal to noise ratio Can get a series of means values With noise, the adjacent groups have jumps at the boundaries

normalized regression Subtract the mean from both X an Y Divide by the standard deviation Theoretically makes no difference But numerical precision is not theory Good for variables with large dynamic range Recent experience with eigenvectors suggests dividing radiances by the noise

Example - Tuning AMSU on AQUA Predictors are the channel values Predictands are the observed minus calculated differences

Measured minus calculated

The predictors

Ordinary Least Squares

Ridge Regression

Shrinkage

Rotated Regression

Orthogonal Regression

Results Summarized Maximum means maximum absolute value Ordinary least squares - max coefficient = -2.5778 Ridge regression - max coefficient = 1.3248 Shrinkage - max coefficient = 1.3248 Shrinkage to 0 as the guess coefficient is the same as ridge regression Rotated regression - max coefficient = 1.1503 Rotated to the ordinary least squares solution Orthogonal Regression - 7.5785

Recommendations Think Know what you are doing