Penalty Methods for Bivariate Smoothing and Chicago Land Values

Similar documents
Total Variation Regularization for Bivariate Density Estimation

Nonparametric Quantile Regression

ECAS Summer Course. Fundamentals of Quantile Regression Roger Koenker University of Illinois at Urbana-Champaign

CHAPTER 1 DENSITY ESTIMATION BY TOTAL VARIATION REGULARIZATION

Regularization prescriptions and convex duality: density estimation and Renyi entropies

ECAS Summer Course. Quantile Regression for Longitudinal Data. Roger Koenker University of Illinois at Urbana-Champaign

Time Series and Forecasting Lecture 4 NonLinear Time Series

Threshold Autoregressions and NonLinear Autoregressions

Single Index Quantile Regression for Heteroscedastic Data

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

Problem set 5, Real Analysis I, Spring, otherwise. (a) Verify that f is integrable. Solution: Compute since f is even, 1 x (log 1/ x ) 2 dx 1

INTRODUCTION TO FINITE ELEMENT METHODS

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Quantile Regression for Longitudinal Data

Quantile Regression Computation: From the Inside and the Outside

Asymptotics of minimax stochastic programs

10-725/ Optimization Midterm Exam

Estimation of cumulative distribution function with spline functions

A glimpse into convex geometry. A glimpse into convex geometry

Spatial Process Estimates as Smoothers: A Review

Optimization Problems

Density estimation by total variation penalized likelihood driven by the sparsity l 1 information criterion

Quantile Regression Computation: From Outside, Inside and the Proximal

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

Quantile Regression for Panel/Longitudinal Data

Lebesgue Measure on R n

Single Index Quantile Regression for Heteroscedastic Data

Smooth functions and local extreme values

A BIVARIATE SPLINE METHOD FOR SECOND ORDER ELLIPTIC EQUATIONS IN NON-DIVERGENCE FORM

Scientific Computing WS 2017/2018. Lecture 18. Jürgen Fuhrmann Lecture 18 Slide 1

Can we do statistical inference in a non-asymptotic way? 1

Bivariate Splines for Spatial Functional Regression Models

A direct formulation for sparse PCA using semidefinite programming

Tools from Lebesgue integration

Optimization methods

A direct formulation for sparse PCA using semidefinite programming

Optimal Estimation of a Nonsmooth Functional

Scientific Computing WS 2018/2019. Lecture 15. Jürgen Fuhrmann Lecture 15 Slide 1

5 Compact linear operators

Information geometry for bivariate distribution control

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Outline. Roadmap for the NPP segment: 1 Preliminaries: role of convexity. 2 Existence of a solution

IEOR 265 Lecture 3 Sparse Linear Regression

MEDDE: MAXIMUM ENTROPY DEREGULARIZED DENSITY ESTIMATION

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Inequality Constrained Quantile Regression

Teruko Takada Department of Economics, University of Illinois. Abstract

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood

RANDOM FIELDS AND GEOMETRY. Robert Adler and Jonathan Taylor

SMOOTH OPTIMIZATION WITH APPROXIMATE GRADIENT. 1. Introduction. In [13] it was shown that smooth convex minimization problems of the form:

Jensen s inequality for multivariate medians

Analysis Methods for Supersaturated Design: Some Comparisons

(Part 1) High-dimensional statistics May / 41

Eigenvalues and Eigenfunctions of the Laplacian

Spatially Adaptive Smoothing Splines

Maximum likelihood estimation of a log-concave density based on censored data

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

1 Sparsity and l 1 relaxation

Finite Elements. Colin Cotter. January 15, Colin Cotter FEM

Chapter Two: Numerical Methods for Elliptic PDEs. 1 Finite Difference Methods for Elliptic PDEs

A tailor made nonparametric density estimate

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Additive Models for Quantile Regression: An Analysis of Risk Factors for Malnutrition in India

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

arxiv: v1 [math.fa] 16 Jun 2011

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

The Closed Form Reproducing Polynomial Particle Shape Functions for Meshfree Particle Methods

STAT 200C: High-dimensional Statistics

Gauge optimization and duality

Nonparametric estimation of log-concave densities

Unobserved Heterogeneity in Income Dynamics: An Empirical Bayes Perspective

The Johnson-Lindenstrauss Lemma

An Optimal Affine Invariant Smooth Minimization Algorithm.

Orthogonality of hat functions in Sobolev spaces

Lecture 14: Variable Selection - Beyond LASSO

Concentration behavior of the penalized least squares estimator

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood,

Numerical Methods for Partial Differential Equations

Applied Numerical Analysis Homework #3

n 1 f n 1 c 1 n+1 = c 1 n $ c 1 n 1. After taking logs, this becomes

ECON 4117/5111 Mathematical Economics

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

Reproducing Kernel Hilbert Spaces

A DYNAMIC QUANTILE REGRESSION TRANSFORMATION MODEL FOR LONGITUDINAL DATA

Sparse Covariance Selection using Semidefinite Programming

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Estimating network degree distributions from sampled networks: An inverse problem

Kernel B Splines and Interpolation

Statistics for high-dimensional data: Group Lasso and additive models

Signal Recovery from Permuted Observations

A Modern Look at Classical Multivariate Techniques

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

Optimal global rates of convergence for interpolation problems with random design

Generalized quantiles as risk measures

Discussion of Hypothesis testing by convex optimization

Transcription:

Penalty Methods for Bivariate Smoothing and Chicago Land Values Roger Koenker University of Illinois, Urbana-Champaign Ivan Mizera University of Alberta, Edmonton Northwestern University: October 2001

1

Or... Pragmatic Goniolatry 2

Or... Pragmatic Goniolatry 2 Goniolatry, or the worship of angles,... Thomas Pynchon (Mason and Dixon, 1997).

Univariate L 2 Smoothing Splines 3 The Problem: min g G n (y i g(x i )) 2 + λ i=1 b a (g (x)) 2 dx, Gaussian Fidelity to the data: n (y i g(x i )) 2 Roughness Penalty on ĝ: λ i=1 b a (g (x)) 2 dx,

Quantile Smoothing Splines 4 The Problem: min g G n ρ τ (y i g(x i )) + λj(g), i=1 Quantile Fidelity to the Data: ρ τ (u) = u(τ I(u < 0)) Total Variation Roughness Penalty on ĝ: J(g) = V (g ) = g (x) dx, Ref: Koenker, Ng, Portnoy (Biometrika, 1994)

Thin Plate Smoothing Splines 5 Problem: Roughness Penalty: min g n (z i g(x i, y i )) 2 + λj(g) i=1 J(g, Ω) = Equivariant to translations and rotations. Ω (g 2 xx + 2g 2 xy + g 2 yy)dxdy Easy to compute provided Ω = R 2. But this creates boundary problems. References: Wahba(1990), Green and Silverman(1998).

Thin Plate Smoothing Splines 5 Problem: Roughness Penalty: min g n (z i g(x i, y i )) 2 + λj(g) i=1 J(g, Ω) = Equivariant to translations and rotations. Ω (g 2 xx + 2g 2 xy + g 2 yy)dxdy Easy to compute provided Ω = R 2. But this creates boundary problems. References: Wahba(1990), Green and Silverman(1998). Question: How to extend total variation penalties to g : R 2 R?

Thin Plate Example 6 6 4 2 0 0 0.2 0.4 0.6 0.8 1 0 1 0.8 0.6 0.4 0.2 Figure 1: Integrand of the thin plate penalty for the He, Ng, and Portnoy tent function interpolant of the points {(0, 0, 0), (0, 1, 0), (1, 0, 0), (1, 1, 1)}. The boundary effects are created by extension of the optimization over all of R 2. For the restricted domain Ω = [0, 1] 2 the optimal solution g(x, y) = xy has considerably smaller penalty: 2 versus 2.77 for the unrestricted domain solution.

Three Variations on Total Variation for f : [a, b] R 7 1. Jordan(1881) V (f) = sup π n 1 k=0 f(x k+1 ) f(x k ) where π denotes partitions: a = x 0 < x 1 <... < x n = b.

Three Variations on Total Variation for f : [a, b] R 7 1. Jordan(1881) V (f) = sup π n 1 k=0 f(x k+1 ) f(x k ) where π denotes partitions: a = x 0 < x 1 <... < x n = b. 2. Banach (1925) V (f) = N(y)dy where N(y) = card{x : f(x) = y} is the Banach indicatrix

Three Variations on Total Variation for f : [a, b] R 7 1. Jordan(1881) V (f) = sup π n 1 k=0 f(x k+1 ) f(x k ) where π denotes partitions: a = x 0 < x 1 <... < x n = b. 2. Banach (1925) V (f) = N(y)dy where N(y) = card{x : f(x) = y} is the Banach indicatrix 3. Vitali (1905) for absolutely continuous f. V (f) = f (x) dx

Total Variation for f : R k R m 8 A convoluted history... de Giorgi (1954) For smooth f : R R V (f, Ω) = Ω f (x) dx

Total Variation for f : R k R m 8 A convoluted history... de Giorgi (1954) For smooth f : R R V (f, Ω) = Ω f (x) dx For smooth f : R k R m V (f, Ω, ) = Ω f(x) dx

8 Total Variation for f : R k R m A convoluted history... de Giorgi (1954) For smooth f : R R V (f, Ω) = Ω f (x) dx For smooth f : R k R m V (f, Ω, ) = Ω f(x) dx Extension to nondifferentiable f via theory of distributions. V (f, Ω, ) = f(x) ϕ ɛ dx Ω

Roughness Penalties for g : R k R 9 For smooth g : R R J(g, Ω) = V (g, Ω) = Ω g (x) dx

Roughness Penalties for g : R k R 9 For smooth g : R R J(g, Ω) = V (g, Ω) = Ω g (x) dx For smooth g : R k R J(g, Ω, ) = V ( g, Ω, ) = Ω 2 g dx

Roughness Penalties for g : R k R 9 For smooth g : R R J(g, Ω) = V (g, Ω) = Ω g (x) dx For smooth g : R k R J(g, Ω, ) = V ( g, Ω, ) = Ω 2 g dx Again, extension to nondifferentiable g via theory of distributions.

Roughness Penalties for g : R k R 9 For smooth g : R R J(g, Ω) = V (g, Ω) = Ω g (x) dx For smooth g : R k R J(g, Ω, ) = V ( g, Ω, ) = Ω 2 g dx Again, extension to nondifferentiable g via theory of distributions. Choice of norm is subject to dispute.

Invariance Considerations 10 Invariance helps to narrow the choice of norm. For orthogonal U and symmetric matrix H, we would like: U HU = H

Invariance Considerations 10 Invariance helps to narrow the choice of norm. For orthogonal U and symmetric matrix H, we would like: U HU = H Examples: 2 g = g 2 xx + 2g 2 xy + g 2 yy 2 g = trace 2 g 2 g = max eigenvalue(h)

Invariance Considerations 10 Invariance helps to narrow the choice of norm. For orthogonal U and symmetric matrix H, we would like: U HU = H Examples: 2 g = g 2 xx + 2g 2 xy + g 2 yy 2 g = trace 2 g 2 g = max eigenvalue(h) 2 g = g xx + 2 g xy + g yy 2 g = g xx + g yy

Invariance Considerations 10 Invariance helps to narrow the choice of norm. For orthogonal U and symmetric matrix H, we would like: U HU = H Examples: 2 g = g 2 xx + 2g 2 xy + g 2 yy 2 g = trace 2 g 2 g = max eigenvalue(h) 2 g = g xx + 2 g xy + g yy 2 g = g xx + g yy Solution of associated variational problems is difficult!

Triograms 11 Following Hansen, Kooperberg and Sardy (JASA, 1998): Let U be a compact region of the plane, and let denote a collection of sets δ i : i = 1,..., n with disjoint interiors such that U = δ δ. If δ are planar triangles, is a triangulation of U, Definition: A continuous, piecewise linear function on a triangulation,, is called a triogram.

Triograms 11 Following Hansen, Kooperberg and Sardy (JASA, 1998): Let U be a compact region of the plane, and let denote a collection of sets δ i : i = 1,..., n with disjoint interiors such that U = δ δ. If δ are planar triangles, is a triangulation of U, Definition: A continuous, piecewise linear function on a triangulation,, is called a triogram. For triograms roughness is less ambiguous.

A Roughness Penalty for Triograms 12 For triograms the ambiguity of the norm problem for total variation roughness penalties is resolved. Theorem. Suppose that g : R 2 R, is a piecewise-linear function on the triangulation,. For any coordinate-independent penalty, J, there is a constant c dependent only on the choice of the norm such that J(g) = cj (g) = c e g + e g e e (1) where e runs over all the interior edges of the triangulation e is the length of the edge e, and g + e g e is the length of the difference between gradients of g on the triangles adjacent to e.

Computation of Median Triograms 13 The Problem: min g G zi g(x i, y i ) + λj (g) can be reformulated as an augmented l 1 (median) regression problem, min β R p n z i a i β + λ i=1 M h k β. where β denotes a vector of parameters representing the values taken by the function g at the vertices of the triangulation. The a i are barycentric coordinates of the (x i, y i ) points in terms of these vertices, and the h k represent the penalty contribution in terms of these vertices. k=1

Computation of Median Triograms 13 The Problem: min g G zi g(x i, y i ) + λj (g) can be reformulated as an augmented l 1 (median) regression problem, min β R p n z i a i β + λ i=1 M h k β. where β denotes a vector of parameters representing the values taken by the function g at the vertices of the triangulation. The a i are barycentric coordinates of the (x i, y i ) points in terms of these vertices, and the h k represent the penalty contribution in terms of these vertices. k=1 Extensions to quantile and mean triograms are straightforward.

Barycentric Coordinates 14 Triograms, G, on constitute a linear space with elements g(u) = 3 i=1 α i B i (u) u δ B 1 (u) = Area (u, v 2, v 3 ) Area (v 1, v 2, v 3 ) etc.

Delaunay Triangulation 15 Properties of Delaunay triangles: Circumscribing circles of Delaunay triangles exclude other vertices, Maximize the minimum angle of the triangulation.

Robert Delaunay 16

B.N. Delone (1890-1973) 17

Four Median Triograms Fits 18 Consider estimating the noisy cone: z i = max{0, 1/3 1/2 x 2 i + y2 i } + u i, with the (x i, y i ) s generated as independent uniforms on [ 1, 1] 2, and with the u i are iid Gaussian with standard deviation σ =.02. With sample size n = 400, the triogram problems are roughly 1600 by 400, but very sparse.

Four Median Triograms Fits 19

Figure 2: Four median triogram fits for the inverted cone example. The values of the smoothing parameter λ and the number of interpolated points in the fidelity component of the objective function, p λ are indicated above each of the four plots. 20

21

Four Mean Triograms Fits 22 Figure 3: Four mean triogram fits for the noisy cone example. The values of the smoothing parameter λ and the trace of the linear operator defining the estimator, p λ are indicated above each of the four plots.

23 Figure 4: Perspective Plot of Median Model for Chicago Land Values. Based on 1194 land sales in Chicago Metropolitan Area in 1995-97, prices in dollars per square foot.

Figure 5: Contour Plot of First Quartile Model for Chicago Land Values. 24

Figure 6: Contour Plot of Median Model for Chicago Land Values. 25

Figure 7: Contour Plot of Third Quartile Model for Chicago Land Values. 26

Automatic λ Selection 27 Schwarz Criterion: log(n 1 ρ τ (z i ĝ λ (x i, y i ))) + (2n) 1 p λ log n. where the dimension of the fitted function, p λ, is defined as the number of points interpolated by the fitted function ĝ λ. Other approaches: Stein s unbiased risk estimator, Donoho and Johnstone (1995), and e.g. Antoniadis and Fan (2001).

Extensions 28 Triograms can be constrained to be convex (or concave) by imposing m additional linear inequality constraints, one for each interior edge of the triangulation. This might be interesting for estimating bivariate densities since we could impose, or test (?) for log-concavity. Now computation is somewhat harder since the fidelity is more complicated. Partial linear model applications are quite straightforward. Extensions to penalties involving V (g) may also prove interesting.

Monte-Carlo Performance 29 Design: He and Shi (1996) z i = g 0 (x i, y i ) + u i i = 1,..., 100. g 0 (x, y) = 40 exp(8((x.5) 2 + (y.5) 2 )) (exp(8((x.2) 2 + (y.7) 2 )) + exp(8((x.7) 2 + (y.2) 2 ))) with (x, y) iid uniform on [0, 1] 2 and u i distributed as normal, normal scale mixture, or slash.

Monte-Carlo Performance 29 Design: He and Shi (1996) z i = g 0 (x i, y i ) + u i i = 1,..., 100. g 0 (x, y) = 40 exp(8((x.5) 2 + (y.5) 2 )) (exp(8((x.2) 2 + (y.7) 2 )) + exp(8((x.7) 2 + (y.2) 2 ))) with (x, y) iid uniform on [0, 1] 2 and u i distributed as normal, normal scale mixture, or slash. Comparison of both L 1 and L 2 triogram and tensor product splines.

Monte-Carlo MISE (1000 Replications) 30 Distribution L 1 tensor L 1 triogram L 2 tensor L 2 triogram Normal 0.609 0.442 0.544 0.3102 (0.095) (0.161) (0.072) (0.093) Normal Mixture 0.691 0.515 0.747 0.602 (0.233) (0.245) (0.327) (0.187) Slash 0.689 4.79 31.1 171.1 (6.52) (125.22) (18135) (4723) Table 1: Comparative mean integrated squared error

Monte-Carlo MISE (1000 Replications) 31 Distribution L 1 tensor L 1 triogram L 2 tensor L 2 triogram Normal 0.609 0.442 0.544 0.3102 (0.095) (0.161) (0.072) (0.093) Normal Mixture 0.691 0.515 0.747 0.602 (0.233) (0.245) (0.327) (0.187) Slash 0.689 4.79 31.1 171.1 (6.52) (125.22) (18135) (4723) Table 2: Comparative mean integrated squared error

Monte-Carlo MISE (998 Replications) 32 Distribution L 1 tensor L 1 triogram L 2 tensor L 2 triogram Normal 0.609 0.442 0.544 0.3102 (0.095) (0.161) (0.072) (0.093) Normal Mixture 0.691 0.515 0.747 0.602 (0.233) (0.245) (0.327) (0.187) Slash 0.689 0.486 31.1 171.1 (6.52) (3.25) (18135) (4723) Table 3: Comparative mean integrated squared error

Dogma of Goniolatry 33

Dogma of Goniolatry 33 Triograms are nice elementary surfaces

Dogma of Goniolatry 33 Triograms are nice elementary surfaces Roughness penalties are preferable to knot selection

Dogma of Goniolatry 33 Triograms are nice elementary surfaces Roughness penalties are preferable to knot selection Total variation provides a natural roughness penalty

Dogma of Goniolatry 33 Triograms are nice elementary surfaces Roughness penalties are preferable to knot selection Total variation provides a natural roughness penalty Schwarz penalty for λ selection based on model dimension

Dogma of Goniolatry 33 Triograms are nice elementary surfaces Roughness penalties are preferable to knot selection Total variation provides a natural roughness penalty Schwarz penalty for λ selection based on model dimension Sparsity of linear algebra facilitates computability

Dogma of Goniolatry 33 Triograms are nice elementary surfaces Roughness penalties are preferable to knot selection Total variation provides a natural roughness penalty Schwarz penalty for λ selection based on model dimension Sparsity of linear algebra facilitates computability Quantile fidelity yields a family of fitted surfaces

Dogma of Goniolatry 33 Triograms are nice elementary surfaces Roughness penalties are preferable to knot selection Total variation provides a natural roughness penalty Schwarz penalty for λ selection based on model dimension Sparsity of linear algebra facilitates computability Quantile fidelity yields a family of fitted surfaces