Exploiting Geographic Dependencies for Real Estate Appraisal

Similar documents
Listwise Approach to Learning to Rank Theory and Algorithm

Where to Find My Next Passenger?

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

Using 3D Geographic Information System to Improve Sales Comparison Approach for Real Estate Valuation

Decoupled Collaborative Ranking

Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data

Learning to Recommend Point-of-Interest with the Weighted Bayesian Personalized Ranking Method in LBSNs

Social and Technological Network Analysis: Spatial Networks, Mobility and Applications

Friendship and Mobility: User Movement In Location-Based Social Networks. Eunjoon Cho* Seth A. Myers* Jure Leskovec

Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales. ACM MobiCom 2014, Maui, HI

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

Area Classification of Surrounding Parking Facility Based on Land Use Functionality

K-Nearest Neighbor Temporal Aggregate Queries

Social Studies Grade 2 - Building a Society

arxiv: v1 [cs.si] 7 Jun 2013

Incorporating Spatio-Temporal Smoothness for Air Quality Inference

Understanding Wealth in New York City From the Activity of Local Businesses

ENV208/ENV508 Applied GIS. Week 1: What is GIS?

* Abstract. Keywords: Smart Card Data, Public Transportation, Land Use, Non-negative Matrix Factorization.

City monitoring with travel demand momentum vector fields: theoretical and empirical findings

Incorporating GIS into Hedonic Pricing Models

FirstStreet.org. Rising Seas Swallow $403 Million in New England Home Values For Immediate Release: January 22, 2019

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

SPATIAL ANALYSIS OF POPULATION DATA BASED ON GEOGRAPHIC INFORMATION SYSTEM

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Urban Computing Using Big Data to Solve Urban Challenges

SQL-Rank: A Listwise Approach to Collaborative Ranking

City of Hermosa Beach Beach Access and Parking Study. Submitted by. 600 Wilshire Blvd., Suite 1050 Los Angeles, CA

Location Regularization-Based POI Recommendation in Location-Based Social Networks

Urban Computing Using Big Data to Solve Urban Challenges

The Research of Urban Rail Transit Sectional Passenger Flow Prediction Method

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

CSCI-567: Machine Learning (Spring 2019)

Probabilistic Graphical Models

Multi-Layer Boosting for Pattern Recognition

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Application of GIS in Public Transportation Case-study: Almada, Portugal

Activity Identification from GPS Trajectories Using Spatial Temporal POIs Attractiveness

Algorithm-Independent Learning Issues

The 3V Approach. Transforming the Urban Space through Transit Oriented Development. Gerald Ollivier Transport Cluster Leader World Bank Hub Singapore

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation

Calculating Land Values by Using Advanced Statistical Approaches in Pendik

Spatial Bayesian Nonparametrics for Natural Image Segmentation

A Novel Click Model and Its Applications to Online Advertising

Demand and Trip Prediction in Bike Share Systems

Representing Spatiality in a Conceptual Multidimensional Model

arxiv: v2 [cs.ir] 13 Jun 2018

A Spatial-Temporal Probabilistic Matrix Factorization Model for Point-of-Interest Recommendation

Clustering, K-Means, EM Tutorial

Urban GIS for Health Metrics

IV Course Spring 14. Graduate Course. May 4th, Big Spatiotemporal Data Analytics & Visualization

An Introduction to Scientific Research Methods in Geography Chapter 3 Data Collection in Geography

APPLYING BIG DATA TOOLS TO ACQUIRE AND PROCESS DATA ON CITIES

Inferring activity choice from context measurements using Bayesian inference and random utility models

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Modeling Temporal-Spatial Correlations for Crime Prediction

Coupling Implicit and Explicit Knowledge for Customer Volume Prediction

Non-Parametric Bayes

Binary Classification, Multi-label Classification and Ranking: A Decision-theoretic Approach

Chapter 2: Linear Functions

Application of eigenvector-based spatial filtering approach to. a multinomial logit model for land use data

Spatial-Temporal Analytics with Students Data to recommend optimum regions to stay

Measuring Poverty. Introduction

Lecture 14. Clustering, K-means, and EM

Lesson Using Residuals to Determine If a Line Is a Good Fit

Nonparametric Bayesian Methods (Gaussian Processes)

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

W I N T E R I S C O M I N G P R E S E N T E D B Y H C S A

NOWADAYS, Collaborative Filtering (CF) [14] plays an

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

Collaborative topic models: motivations cont

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Integrating Geographic Information System and. Building Information Model for Real Estate Valuation. Haicong Yu Ying Liu

Compact guides GISCO. Geographic information system of the Commission

Introduction and Project Overview

Extracting mobility behavior from cell phone data DATA SIM Summer School 2013

Datahoods or Data-Driven Neighborhoods. Using GIS to better understand neighborhoods

INTEGRATION OF GIS AND MULTICRITORIAL HIERARCHICAL ANALYSIS FOR AID IN URBAN PLANNING: CASE STUDY OF KHEMISSET PROVINCE, MOROCCO

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Multiple Aspect Ranking Using the Good Grief Algorithm. Benjamin Snyder and Regina Barzilay MIT

CRP 608 Winter 10 Class presentation February 04, Senior Research Associate Kirwan Institute for the Study of Race and Ethnicity

MOR CO Analysis of future residential and mobility costs for private households in Munich Region

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017

BROOKINGS May

Generative Models for Discrete Data

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

AUTO SALES FORECASTING FOR PRODUCTION PLANNING AT FORD

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Diagnosing New York City s Noises with Ubiquitous Data

Greedy function optimization in learning to rank

Machine Learning, Midterm Exam

Lee County, Alabama 2015 Forecast Report Population, Housing and Commercial Demand

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Lecture 3: Multiclass Classification

Recurrent Latent Variable Networks for Session-Based Recommendation

Announcement LED Lighting in Public Transportation Vehicles Global Market Forecast & Analysis

Transcription:

Exploiting Geographic Dependencies for Real Estate Appraisal Yanjie Fu Joint work with Hui Xiong, Yu Zheng, Yong Ge, Zhihua Zhou, Zijun Yao Rutgers, the State University of New Jersey Microsoft Research Asia, UNC Charlotte, Nanjing University KDD2014@New York City, NY

Agenda 2 Background and Motivation Problem Statement Methodology Evaluation Conclusions

Housing Matters 3 Win your lover Zillow Settle down your family Realtors Make extra money as investment Yahoo Homes housing consultant services Call for an intelligent system to rank estates based on estate investment value

Quantifying Estate Investment Value 4 We don t predict future price We predict growth potential of resale value (long term thing) Return Rate: Prepare the benchmark investment values of estates for training data Value-adding capability (in Rising Market) Value-protecting capability (in Falling Market)

What Special In Estate Location! Location! Location! Urban geography (poi, road networks) features the geographical utility of houses Human Mobility reflects neighborhood popularity of houses Prosperity of Business Area

Geographic Dependencies (1) 6 Your personal profiles tell your research interests Point of Interest Transportation Accessibility Taxi Mobility Individual dependency the investment value of an estate is determined by the geographic characteristics of its own neighborhood

Geographic Dependencies (2) 7 Your friends tell your research interests A B A B Attribute A B Distance to Level2 road network 156 meters 143 meters Distance to subway station 1385 meters 1585 meters #Restaurants 3 4 #Transportation facilities 8 8 Peer dependency Inside a business area, value can be reflected by its nearby estates.

Geographic Dependencies (3) 8 Your associated research groups tell your research interests Depressed business area Rising Market Period (02/2012-05/2012) Prosperous business area Depressed business area Average regional prices Prosperous business area Average regional prices Zone dependency The estate value can also be influenced by the values of its associated latent business area.

Problem Definition 9 Given Urban geography (poi, road networks) Human mobility (taxi GPS traces) Estates with locations and historical prices Objective Rank estates based on their investment values Core tasks Predict estate investment value using urban geography and human mobility Jointly model three geographic dependencies as objective function to learn estate ranking predictor

Methodology Overview 10 Estate Investment Value Location Location Geographic Utility Urban Geography Neighborhood Popularity Influence of Business Area s Prosperity Human Mobility Business Area Prediction Model Road Networks Bus Stops Objective Function Subway Stations Places of interests Individual Individual Dependen Dependen cy cy Cell -Tower Traces Bus Traces Taxicab GPS Traces Check-ins Peer Peer Dependen Dependen cy cy Zone Zone Dependen Dependen cy cy

11

Modeling Estate Investment Value (1) 12 Geographic Utility Feature extraction by spatial indexing linearly regress geographic utility from geographic features of the neighborhood of each estate

Modeling Estate Investment Value (2) 13 Influence of business area (A generative view) There are K business areas in a city Each business area is a cluster of estates; estates tend to co-locate along multiple business areas The more prosperous, the more easier we identify an estate from business area Prosperities of K business areas The prosperities of K business areas (K spatial hidden states ) show influence on estates The inverse influence of geo-distance between estate and business area center Gaussian Mixture Model + Learning To Rank

Modeling Estate Investment Value (3) 14 Neighborhood Popularity (A propagation view) Propagate visit probability to POIs per drop-off point Aggregate visit probability per POI Aggregate visit probability per POI category Compute popularity score Spatial propagation and aggregation from taxi to house Taxi Passenger (drop-off point) Point of Interests Categories of POI House

Generative Processes

Modeling Three Dependencies

Parameter Estimations

Experimental Data 18 Beijing real-world Data Beijing estate data 2851 estates with transaction records from 04/2011 to 09/2012 Falling market(04/2011 to 02/2012) and Rising market (02/2012 to 09/2012) Beijing transportation facility data including bus stop, subway, road networks Beijing POI data Beijing taxi GPS traces

Evaluation Methods and Metrics 19 Baseline algorithms MART: it is a boosted tree model, specifically, a linear combination of the out puts of a set of regression trees RankBoost: it is a boosted pairwise ranking method, which trains multiple weak rankers and combines their outputs as final ranking Coordinate Ascent: it uses domination loss and applies coordinate descent for optimization ListNet: it is a listwise ranking model with permutation top-k ranking likelihood as the objective function Evaluation metrics Normalized Discounted Cumulative Gain (NDCG) Precision Recall Kendall s Tau Coefficient

Overall Performances 20 We investigate the ranking performances by comparing to baseline algorithms in terms of Tau, NDCG, Pecision and Recall

Overall Performances 21 We investigate the ranking performances by comparing to baseline algorithms in terms of Tau, NDCG, Pecision and Recall

Study on Geographic Dependencies 22 We study the impact of three geographic dependencies by designing different variants of objective function (posterior likelihoods) Individual dependency can achieve good overall ranking performance; Peer and zone dependency can achieve good top-k ranking performance; We recommend combination usage of three dependencies

Hierarchy of Needs for Human Life (1) 23

Hierarchy of Needs for Human Life (2) 24 The triangle need structure of human life in Beijing

Conclusions 25 Housing analysis is funny Use urban geography and human mobility to model estate investment value Capture geographic individual, peer, and zone dependencies to better learn estate ranking predictor Business applications Decision making support for homebuyers Improve price structure for housing agent/broker/consultant Optimize site selection for housing developer

26 Thank You! Email: yanjie.fu@rutgers.edu Homepage: http://pegasus.rutgers.edu/~yf99/