Modeling Real Estate Data using Quantile Regression

Similar documents
Beyond Mean Regression

Modelling geoadditive survival data

Hierarchical Modeling for Univariate Spatial Data

Bayesian Linear Regression

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Topic 12 Overview of Estimation

Conjugate Analysis for the Linear Model

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

variability of the model, represented by σ 2 and not accounted for by Xβ

Density Estimation. Seungjin Choi

Least Squares Regression

Analysing geoadditive regression data: a mixed model approach

Linear Models in Machine Learning

Covariance and Correlation

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Sparse Linear Models (10/7/13)

Short Questions (Do two out of three) 15 points each

A general mixed model approach for spatio-temporal regression data

Hierarchical Modelling for Univariate Spatial Data

g-priors for Linear Regression

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Least Squares Regression

Working Papers in Economics and Statistics

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Bayesian linear regression

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

A short introduction to INLA and R-INLA

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

ST 740: Linear Models and Multivariate Normal Inference

The linear model is the most fundamental of all serious statistical models encompassing:

Regression diagnostics

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Regression, Ridge Regression, Lasso

Statistics & Data Sciences: First Year Prelim Exam May 2018

A Resampling Method on Pivotal Estimating Functions

Gibbs Sampling in Endogenous Variables Models

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Wrapped Gaussian processes: a short review and some new results

Lecture : Probabilistic Machine Learning

Bayesian Inference in a Normal Population

A Short Introduction to the Lasso Methodology

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 3: Maximum Likelihood Theory

Multivariate Bayesian Linear Regression MLAI Lecture 11

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Gaussian Process Regression Model in Spatial Logistic Regression

Minimax Estimation of Kernel Mean Embeddings

Can we do statistical inference in a non-asymptotic way? 1

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Unsupervised Regressive Learning in High-dimensional Space

AFT Models and Empirical Likelihood

Linear models and their mathematical foundations: Simple linear regression

Weighted Least Squares

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Small Area Estimation for Skewed Georeferenced Data

Additive Isotonic Regression

Point-Referenced Data Models

Part 6: Multivariate Normal and Linear Models

Uncertainty Quantification for Inverse Problems. November 7, 2011

Statistical Machine Learning Hilary Term 2018

Matrix Approach to Simple Linear Regression: An Overview

Linear Models A linear model is defined by the expression

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Recovering Copulae from Conditional Quantiles

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

An Extended BIC for Model Selection

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Overfitting, Bias / Variance Analysis

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Efficient Estimation for the Partially Linear Models with Random Effects

Bayesian Linear Models

Bayesian Analysis of Risk for Data Mining Based on Empirical Likelihood

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Bayesian Inference in a Normal Population

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Bayesian data analysis in practice: Three simple examples

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

STA 4273H: Statistical Machine Learning

Outline Lecture 2 2(32)

Linear Regression With Special Variables

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Calibrating Environmental Engineering Models and Uncertainty Analysis

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

An Introduction to Bayesian Linear Regression

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Generalized Linear Models Introduction

Bayesian Linear Models

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Problem Selected Scores

Transcription:

Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011

Overview 1 Application: 2 3 4

Hedonic regression data for house prices in Austria Variable of primary interest House price per square meter Covariates Structural characteristics, like the floor space area, the plot area, the age, the equipment etc. Locational characteristics at different levels, like the buying power index (municipal), the share of academics (municipal), the real estate price index (district), etc.

Hedonic regression data for house prices in Austria Hierarchical semiparametric model p qm = f 1,1 (municipal) + f 1,2 (area) +... + f 1,l (age) + Xβ + ɛ 1 f 1,1 (municipal) = f 2,1 (district) + f 2,2 (buying power) +... + + f 2,m (academics) + ɛ 2 f 2,1 (district) = f 3,1 (state) + f 3,2 (real estate index)+ + g(dist) + ɛ 3 f 3,1 (state) = const + ɛ 4 The term Xβ contains the linear effects. The functions f i are possibly nonlinear functions of the covariates. The function g describes a spatial district effect.

Hedonic regression data for house prices in Austria Goal Determining the conditional quantiles of the distribution of the house prices Approaches Mean regression based on a normal distribution assumption Quantile regression

Overview 1 Application: 2 3 4

Linear quantile regression Linear model Given: Observations y i, x i1,..., x ip for i = 1,..., n from the model y = Xβ + ɛ. Assumptions for a particular quantile ϕ: ɛ i iid F Q ϕ (ɛ i ) := F 1 (ϕ) = 0 Then: Q ϕ (y) = Xβ

Linear quantile regression Loss function ρ ϕ (u) = Empirical loss of an estimation ŷ : { uϕ if u 0 u(ϕ 1) if u < 0 n ρ ϕ (y i ŷ) i=1

Linear quantile regression Regression quantile ˆβ = arg min β R p { n } ρ ϕ (y i x i β) i=1 Estimation of the conditional quantile: Q ϕ (y) = X ˆβ

Linear quantile regression Example y = 1 + 3 x + ɛ, ɛ N (0, 0.5) 5

Nonlinear and spatial quantile regression Nonlinear or spatial model: y = f (z) + ɛ = Z γ + ɛ Smooth effects: Penalize differences between the coefficients of adjacent B-splines or the coefficients of neighbouring regions, respectively. Penalized optimization problem { n } ( ˆγ = arg min ρ ϕ yi z i γ) + λγ K γ γ R d i=1

Semiparametric quantile regression Semiparametric model: y = η + ɛ = Xβ + f 1 (z 1 ) +... + f q (z q ) + ɛ Penalized optimization problem n ρ ϕ (y i η i ) + min β,γ k i=1 q λ j γ j K jγ j j=1

Overview 1 Application: 2 3 4

Bayesian Inference Asymmetric Laplace distribution Density function: p(y µ, σ 2, ϕ) = ( ϕ(1 ϕ) σ 2 exp 1 ) σ 2 ρ ϕ(y µ)

Bayesian Inference Assumption: y i ALD(η i, σ 2, ϕ) Joint likelihood: ( p(y η, σ 2, ϕ) 1 (σ 2 ) n exp 1 σ 2 ) n ρ ϕ (y i η i ) i=1 Maximizing this likelihood is equivalent to minimizing the former loss function in the linear case.

Bayesian Inference Priors for nonlinear or spatial effects: ( ( ) p γ j τj 2 1 exp 1 ( ) rk(k j ) τj 2 τ 2 j 2 2τ 2 j γ j K jγ j ) variance parameter, governs the smoothness of the respective function.

Bayesian Inference Representation of an asymmetric Laplace distribution: Y = D η + 1 2ϕ ϕ(1 ϕ) V + W 2 σ 2 ϕ(1 ϕ) V V, W independent random variables with exponential and normal distributions respectively: ( p v σ 2) ( ) = σ 2 exp σ 2 v and W N (0, 1) Important feature for MCMC-inference.

Overview 1 Application: 2 3 4

Floor space area:

Floor space area:

Floor space area:

Real estate price index:

Unexplained spatial effects:

Model selection: Quantile Method G1... G5 10% Quantile reg. 61.32... 61.74 61.11 Mean reg. 67.63... 67.68 66.62 30% Quantile reg. 124.10... 125.61 125.42 Mean reg. 125.15... 128.41 127.71 50% Quantile reg. 143.15... 149.81 147.99 Mean reg. 143.74... 149.50 148.32 70% Quantile reg. 129.89... 136.01 134.63 Mean reg. 132.45... 135.68 136.46 90% Quantile reg. 75.75... 72.67 76.18 Mean reg. 74.88... 73.75 77.07

Conclusion Efficient based on the ALD Individual marginal effects for each quantile Superior to mean regression

Thank you!