Bayesian Network Regularized Regression for Crime Modeling

Size: px

Start display at page:

Download "Bayesian Network Regularized Regression for Crime Modeling"

Angelina Walton
5 years ago
Views:

1 Bayesian Network Regularized Regression for Crime Modeling Luis Carvalho Joint work with Liz Upton Dept. of Mathematics and Statistics Boston University BU-Keio Workshop, August 2016

2 Introduction A motivating example: residential burglary in Boston

3 Introduction A motivating example: residential burglary in Boston Potential goals: Understanding crime rates: covariates? predictions?

4 Introduction A motivating example: residential burglary in Boston Potential goals: Understanding crime rates: covariates? predictions? Identifying hot zones for intervention

5 Residential Burglary Data description: 7K crimes occurring from July 2012 to October 2015 in Boston, provided bydata.cityofboston.gov Reported occurrences are pooled in time and by intersection Covariates: averaged tax income, district type, distance to nearest police station Network with 13K nodes, provided by boston.opendata.arcgis.com

6 Residential Burglary Data description: 7K crimes occurring from July 2012 to October 2015 in Boston, provided bydata.cityofboston.gov Reported occurrences are pooled in time and by intersection Covariates: averaged tax income, district type, distance to nearest police station Network with 13K nodes, provided by boston.opendata.arcgis.com First take: for v in a network (undirected simple graph) G, [ ( )] iid Y v Po exp x v β

7 Residential Burglary Data description: 7K crimes occurring from July 2012 to October 2015 in Boston, provided bydata.cityofboston.gov Reported occurrences are pooled in time and by intersection Covariates: averaged tax income, district type, distance to nearest police station Network with 13K nodes, provided by boston.opendata.arcgis.com First take: for v in a network (undirected simple graph) G, [ ( )] iid Y v Po exp x v β but: Crime rates are not spatially homogeneous

8 Residential Burglary Data description: 7K crimes occurring from July 2012 to October 2015 in Boston, provided bydata.cityofboston.gov Reported occurrences are pooled in time and by intersection Covariates: averaged tax income, district type, distance to nearest police station Network with 13K nodes, provided by boston.opendata.arcgis.com First take: for v in a network (undirected simple graph) G, [ ( )] iid Y v Po exp x v β but: Crime rates are not spatially homogeneous Crime rates can vary sharply

9 Network Regularized Regression Addressing the first issue, [ ind Y v Po where β is now network indexed exp ( )] x v β(v)

10 Network Regularized Regression Addressing the first issue, [ ind Y v Po where β is now network indexed exp ( )] x v β(v) To avoid overfitting, we impose smoothness on β, e.g., under a single intercept model, β := argmin β = argmin β Y β 2 2 +λ Mβ 2 2 D(Y;β)+λβ M Mβ where M is a differential operator and λ is a roughness penalty

11 Network Regularized Regression Addressing the first issue, [ ind Y v Po where β is now network indexed exp ( )] x v β(v) To avoid overfitting, we impose smoothness on β, e.g., under a single intercept model, β := argmin β = argmin β Y β 2 2 +λ Mβ 2 2 D(Y;β)+λβ M Mβ where M is a differential operator and λ is a roughness penalty Similar works: network kernel-based regression (Smola and Kondor, 2003; Kolaczyk, 2009), and, more generally, functional data analysis (Ramsay and Silverman, 1996)

12 Network Regularized Regression For network indexed coefficients, with M the oriented weighted incidence matrix: β M Mβ := β L w β = w uv (β(u) β(v)) 2 i.e., L w is weighted Laplacian (u,v) E(G)

13 Network Regularized Regression For network indexed coefficients, with M the oriented weighted incidence matrix: β M Mβ := β L w β = w uv (β(u) β(v)) 2 i.e., L w is weighted Laplacian (u,v) E(G) With L w := ΦΞΦ, Ξ := Diag i=1,..., V(G) (ξ i ), we adopt a basis expansion for β, β = Φ 1:k θ, k V(G), so the penalty becomes: β L w β = θ Φ 1:k ΦΞΦ Φ 1:k θ = θ Diag i=1,...,k (ξ i )θ

14 Network Regularized Regression For network indexed coefficients, with M the oriented weighted incidence matrix: β M Mβ := β L w β = w uv (β(u) β(v)) 2 i.e., L w is weighted Laplacian (u,v) E(G) With L w := ΦΞΦ, Ξ := Diag i=1,..., V(G) (ξ i ), we adopt a basis expansion for β, β = Φ 1:k θ, k V(G), so the penalty becomes: β L w β = θ Φ 1:k ΦΞΦ Φ 1:k θ = θ Diag i=1,...,k (ξ i )θ Under a Bayesian formulation, β is the posterior mode when Y v θ ind Po [exp ( φ kv θ )] { θ N (0, Diag i=1,...,k (λξi ) 1})

15 Network Regularized Regression Toy example: Y = (10,2,3,4), vertex 1 connected to triangle with vertices 2, 3, and 4, w(u, v) exp{ d(u, v)/2}i[d(u, v) > 0], and D = , L = CV fitted log 10 (λ) log 10 (λ)

16 Bayesian Network Regularized Regression Addressing the issue of abrupt rate changes, [ ( )] ind Y v ζ,β,z v Po exp Z v ζ +(1 Z v )x v β(v) [ Z v γ ind Bern logit 1( )] u v γ(v) where: ζ is the background crime rate Z v codes for v being in a hot zone, also varying smoothly Both β and γ are network indexed and assume a basis expansion as before

17 Bayesian Network Regularized Regression Addressing the issue of abrupt rate changes, [ ( )] ind Y v ζ,β,z v Po exp Z v ζ +(1 Z v )x v β(v) [ Z v γ ind Bern logit 1( )] u v γ(v) where: ζ is the background crime rate Z v codes for v being in a hot zone, also varying smoothly Both β and γ are network indexed and assume a basis expansion as before Using basis coefficients, x v β(v) D X (v) θ and u v γ(v) D U (v) ω, [ ( )] ind Y v ζ,θ,z v Po exp Z v ζ +(1 Z v )D X (v) θ [ Z v ω ind Bern logit 1( )] D U (v) ω

18 Bayesian Network Regularized Regression Quick methodological recap: Network regularized regression as a building block, [ Y v θ ind F g 1( D X (v) θ )], θ N ( 0,λ 1 θ Ω(X,L w(g)) ) Change regions using latent network-indexed indicators Z and conditional responses [ Z v ω ind Bern logit 1( D U (v) ω )], ω N ( 0,λ 1 ω Ω(U,L w (G)) )

19 Bayesian Network Regularized Regression Quick methodological recap: Network regularized regression as a building block, [ Y v θ ind F g 1( D X (v) θ )], θ N ( 0,λ 1 θ Ω(X,L w(g)) ) Change regions using latent network-indexed indicators Z and conditional responses [ Z v ω ind Bern logit 1( D U (v) ω )], ω N ( 0,λ 1 ω Ω(U,L w (G)) ) There are now two main practical problems: How to define the hyper-parameters controlling the smoothness of β and γ?

20 Bayesian Network Regularized Regression Quick methodological recap: Network regularized regression as a building block, [ Y v θ ind F g 1( D X (v) θ )], θ N ( 0,λ 1 θ Ω(X,L w(g)) ) Change regions using latent network-indexed indicators Z and conditional responses [ Z v ω ind Bern logit 1( D U (v) ω )], ω N ( 0,λ 1 ω Ω(U,L w (G)) ) There are now two main practical problems: How to define the hyper-parameters controlling the smoothness of β and γ? How to fit this model efficiently for large scale datasets?

21 Prior Elicitation: Guidelines Three main sets of hyper-parameters: θ N ( 0,λ 1 Ω(X,L w (G)) ), where Ω(X,L w (G)) := D X L wd X and D X depends on Φ 1:k

22 Prior Elicitation: Guidelines Three main sets of hyper-parameters: θ N ( 0,λ 1 Ω(X,L w (G)) ), where Ω(X,L w (G)) := D X L wd X and D X depends on Φ 1:k To define L w we need a measure of similarity as weights in G: in our application, we use w(u,v) exp{ d(u,v)/ψ} and set the network range ψ such that median similarity is 0.8

23 Prior Elicitation: Guidelines Three main sets of hyper-parameters: θ N ( 0,λ 1 Ω(X,L w (G)) ), where Ω(X,L w (G)) := D X L wd X and D X depends on Φ 1:k To define L w we need a measure of similarity as weights in G: in our application, we use w(u,v) exp{ d(u,v)/ψ} and set the network range ψ such that median similarity is 0.8 Penalty λ and basis rank k can be defined jointly using leave-one-out cross-validation via PRESS working residuals ( LOOP ) Adding Eigenvectors One by One Optimal Lambda Includes all covariates LOOP LOOP

24 Model Fitting Crime Counts: Predicted and Actual Pearson Residuals Predicted Residuals Latent Variable = 0 Latent Variable = Actual Log of Predicted Counts

25 Model Fitting

26 Model Fitting

27 Model Fitting

28 Conclusions and future work Summary

29 Conclusions and future work Summary Network regularization is useful in a number of applications and can be more suitable than other types of regularization

30 Conclusions and future work Summary Network regularization is useful in a number of applications and can be more suitable than other types of regularization Methodology can be used as a building block for more elaborated models

31 Conclusions and future work Summary Network regularization is useful in a number of applications and can be more suitable than other types of regularization Methodology can be used as a building block for more elaborated models Main concern: representative models and computational efficiency

32 Conclusions and future work Summary Network regularization is useful in a number of applications and can be more suitable than other types of regularization Methodology can be used as a building block for more elaborated models Main concern: representative models and computational efficiency New challenges

33 Conclusions and future work Summary Network regularization is useful in a number of applications and can be more suitable than other types of regularization Methodology can be used as a building block for more elaborated models Main concern: representative models and computational efficiency New challenges Refinements and extensions: dynamic model, basis selection, covariance structure

34 Conclusions and future work Summary Network regularization is useful in a number of applications and can be more suitable than other types of regularization Methodology can be used as a building block for more elaborated models Main concern: representative models and computational efficiency New challenges Refinements and extensions: dynamic model, basis selection, covariance structure Other applications in Biology, Epidemiology, and Engineering

35 Conclusions and future work Summary Network regularization is useful in a number of applications and can be more suitable than other types of regularization Methodology can be used as a building block for more elaborated models Main concern: representative models and computational efficiency New challenges Refinements and extensions: dynamic model, basis selection, covariance structure Other applications in Biology, Epidemiology, and Engineering Bayesian network regression (for topology inference)

36 Conclusions and future work Summary Network regularization is useful in a number of applications and can be more suitable than other types of regularization Methodology can be used as a building block for more elaborated models Main concern: representative models and computational efficiency New challenges Refinements and extensions: dynamic model, basis selection, covariance structure Other applications in Biology, Epidemiology, and Engineering Bayesian network regression (for topology inference) Thank you!

Bayesian Network Regularized Regression for Modeling Urban Crime Occurrences

Bayesian Network Regularized Regression for Modeling Urban Crime Occurrences arxiv:1708.05047v1 [stat.ap] 16 Aug 2017 Elizabeth Upton and Luis Carvalho Department of Mathematics and Statistics, Boston