over the parameters θ. In both cases, consequently, we select the minimizing

Similar documents
Number of cases (objects) Number of variables Number of dimensions. n-vector with categorical observations

ALGORITHM CONSTRUCTION BY DECOMPOSITION 1. INTRODUCTION. The following theorem is so simple it s almost embarassing. Nevertheless

Number of analysis cases (objects) n n w Weighted number of analysis cases: matrix, with wi

CENTERING IN MULTILEVEL MODELS. Consider the situation in which we have m groups of individuals, where

Number of analysis cases (objects) n n w Weighted number of analysis cases: matrix, with wi

FACTOR ANALYSIS AS MATRIX DECOMPOSITION 1. INTRODUCTION

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

A Goodness-of-Fit Measure for the Mokken Double Monotonicity Model that Takes into Account the Size of Deviations

MAJORIZATION OF DIFFERENT ORDER

Evaluating Goodness of Fit in

Least squares multidimensional scaling with transformed distances

HOMOGENEITY ANALYSIS USING EUCLIDEAN MINIMUM SPANNING TREES 1. INTRODUCTION

MAXIMUM LIKELIHOOD IN GENERALIZED FIXED SCORE FACTOR ANALYSIS 1. INTRODUCTION

REGRESSION, DISCRIMINANT ANALYSIS, AND CANONICAL CORRELATION ANALYSIS WITH HOMALS 1. MORALS

Chapter 2 Nonlinear Principal Component Analysis

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

QUADRATIC MAJORIZATION 1. INTRODUCTION

Isotonic regression and isotonic projection

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Rank-Based Methods. Lukas Meier

A GENERALIZATION OF TAKANE'S ALGORITHM FOR DEDICOM. Yosmo TAKANE

Decomposing Isotonic Regression for Efficiently Solving Large Problems

The iterative convex minorant algorithm for nonparametric estimation

Bivariate Relationships Between Variables

What is a Hypothesis?

Poincaré Plots for Residual Analysis

AMS 7 Correlation and Regression Lecture 8

LEAST SQUARES ORTHOGONAL POLYNOMIAL COMPONENT ANALYSIS IN R. 1. Introduction

Unit 6 - Introduction to linear regression

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Unit 14: Nonparametric Statistical Methods

Textbook Examples of. SPSS Procedure

NONLINEAR PRINCIPAL COMPONENT ANALYSIS

An Information Criteria for Order-restricted Inference

1. Let A be a 2 2 nonzero real matrix. Which of the following is true?

Online isotonic regression

Multidimensional Scaling in R: SMACOF

Divide-Conquer-Glue Algorithms

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

appstats27.notebook April 06, 2017

Yorktown Heights, New York * San Jose, California * Zurich, Switzerland

Unit 6 - Simple linear regression

Non-parametric (Distribution-free) approaches p188 CN

Physics 509: Non-Parametric Statistics and Correlation Testing

Contents. Acknowledgments. xix

Kruskal-Wallis and Friedman type tests for. nested effects in hierarchical designs 1

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

Parametric Identification of Multiplicative Exponential Heteroskedasticity

UCLA Department of Statistics Papers

Nonparametric Statistics

AN ITERATIVE MAXIMUM EIGENVALUE METHOD FOR SQUARED DISTANCE SCALING 1. INTRODUCTION

Types of Statistical Tests DR. MIKE MARRAPODI

IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH

Supplementary Note on Bayesian analysis

subroutine is demonstrated and its accuracy examined, and the nal Section is for concluding discussions The rst of two distribution functions given in

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

CDA Chapter 3 part II

On squares of squares

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil ( )

Conditioning Nonparametric null hypotheses Permutation testing. Permutation tests. Patrick Breheny. October 5. STA 621: Nonparametric Statistics

Resampling Methods. Lukas Meier

Prentice Hall Mathematics, Course

FOR THE purposes of this paper, a (nonparametric) regression

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

3 Joint Distributions 71

Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information

Thus ρ is the direct sum of three blocks, where block on T 0 ρ is the identity, on block T 1 it is identically one, and on block T + it is non-zero

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13

Similarity and Dissimilarity

Lectures on Simple Linear Regression Stat 431, Summer 2012

Approximation of Average Run Length of Moving Sum Algorithms Using Multivariate Probabilities

Introduction to Statistical Data Analysis III

My data doesn t look like that..

Learning from permutations. Jean-Philippe Vert

correlated to the Ohio Academic Content Standards with Indicators Mathematics Grade 8

INFORMATION THEORY AND STATISTICS

Multilevel Analysis, with Extensions

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

Simultaneous Diagonalization of Positive Semi-definite Matrices

Flexible Estimation of Treatment Effect Parameters

Lecture 9. Econ August 20

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Bucket-Sort. Have seen lower bound of Ω(nlog n) for comparisonbased. Some cheating algorithms achieve O(n), given certain assumptions re input

Empirical Power of Four Statistical Tests in One Way Layout

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

ISOTONIC MAXIMUM LIKELIHOOD ESTIMATION FOR THE CHANGE POINT OF A HAZARD RATE

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Binary choice 3.3 Maximum likelihood estimation

Permutation tests. Patrick Breheny. September 25. Conditioning Nonparametric null hypotheses Permutation testing

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)

Non-Parametric Statistics: When Normal Isn t Good Enough"

Statistics Toolbox 6. Apply statistical algorithms and probability models

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Conditions for Regression Inference:

R u t c o r Research R e p o r t. The Branch and Bound Method

A Resampling Method on Pivotal Estimating Functions

Transcription:

MONOTONE REGRESSION JAN DE LEEUW Abstract. This is an entry for The Encyclopedia of Statistics in Behavioral Science, to be published by Wiley in 200. In linear regression we fit a linear function y = α + βx to a scatterplot of n points (x i, y i ). We find the parameters α and β by minimizing σ (α, β) = n w i (y i α βx i ) 2, i=1 where the w i are known positive weights. In the more general nonlinear regression problem we fit a nonlinear function φ θ (x) by minimizing σ (θ) = n w i (y i φ θ (x i )) 2 i=1 over the parameters θ. In both cases, consequently, we select the minimizing function from a family of functions indexed by a small number of parameters. Date: March 27, 2004. Key words and phrases. monotone regression, optimal scaling, multidimensional scaling. 1

2 JAN DE LEEUW In some statistical techniques low-dimensional parametric models are too restrictive. In nonmetric multidimensional scaling [], for example, we can only use the rank order of the x i and not their actual numerical values. Parametric methods become useless, but we still can fit the best fitting monotone (increasing) function non-parametrically. Suppose there are no ties in x, and the x i are ordered such that x 1 < < x n. In monotone regression we minimize σ (z) = n w i (y i z i ) 2, i=1 over z, under the linear inequality restrictions that z 1 z n. If the solution to this problem is ẑ, then the best fitting increasing function is the set of pairs (x i, ẑ i ). In monotone regression the number of parameters is equal to the number of observations. The only reason we do not get a perfect solution all the time is because of the order restrictions on z. Actual computation of the best fitting monotone function is based on the theorem that if y i > y i+1 then ẑ i = ẑ i+1. In words: if two consecutive values of y are in the wrong order, then the two corresponding consecutive values of the solution ẑ will be equal. This basic theorem leads to a simple algorithm, because knowing that two values of ẑ must be equal reduces the number of parameters by one. We thus have a monotone regression problem with n 1 parameters. Either the elements are now in the correct order, or there is a violation, in which case we can reduce the problem to one with

MONOTONE REGRESSION n 2 parameters. And so on. This process always comes to an end, in the worst possible case when we only have a single parameter left, which is obviously monotone. We can formalize this in more detail as the up-and-down blocks algorithm of [4]. It is illustrated in the table below, in which the first row is y. The first violation we find is > 0, or is not up-satisfied. We merge the two elements to a block, which contains their weighted average 2 (in our example all weights are one). But now 2 > 2, and thus the new value 2 is not down-satisfied. We merge all three values to a block of three and find, which is both up-satisfied and down-satisfied. We then continue with the next violation. Clearly the algorithm produces a decreasing number of blocks. The value of the block is computed using weighted averaging, where the weight of a block is the sum of the weights of the elements in the block. In our example we wind up with only two blocks, and thus the best fitting monotone function ẑ is a step function with a single step from to 4.

4 JAN DE LEEUW y 2 0 6 6 0 ẑ 2 2 2 6 6 0 6 6 0 6 4 4 4 Table 1. Computing Monotone Regression This is also illustrate in the figure below. The line through the points x and y is in blue. This is obviously the best possible fitting function. The best fitting monotone function, which we just computed, is in red. y 0 1 2 4 6 1 2 4 6 x Figure 1. Plotting Monotone Regression

MONOTONE REGRESSION If x has ties, then this simple algorithm does not apply. There are two straightforward adaptions [2]. In the primary approach to ties we start our monotone regression with blocks of y values corresponding to the ties in x. Thus we require tied x values to correspond with tied z values. In the secondary approach we pose no constraints on tied values, and it can be shown that in that case we merely have to order the y values such that they are increasing in blocks of tied x values. And then we perform an ordinary monotone regression. We finally mention that monotone regression can be generalized in two important directions. First, basically the same algorithm can be used to minimize any separable function of the form n i=1 f (y i z i )), with f any convex function with its minimum at zero. For instance, f can be the absolute value function, in which case we merge blocks by computing medians instead of means. And second, we can generalize the algorithm from weak orders to partial orders in which some elements cannot be compared. For details see [1]. References [1] R.E. Barlow, R.J. Bartholomew, J.M. Bremner, and H.D. Brunk. Statistical Inference under Order Restrictions. Wiley, New York, 1972.

6 JAN DE LEEUW [2] J. De Leeuw. Correctness of Kruskal s Algorithms for Monotone Regression with Ties. Psychometrika, 42:141 144, 1977. [] J. B. Kruskal. Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis. Psychometrika, 29:1 27, 1964. [4] J.B. Kruskal. Nonmetric Multidimensional Scaling: a Numerical Method. Psychometrika, 29:11 129, 1964. Department of Statistics, University of California, Los Angeles, CA 9009-14 E-mail address, Jan de Leeuw: deleeuw@stat.ucla.edu URL, Jan de Leeuw: http://gifi.stat.ucla.edu