Predcng and Prevenng Emergng Oubreaks of Crme Danel B. Nell Even and Paern Deecon Laboraory H.J. Henz III College, Carnege Mellon Unversy nell@cs.cmu.edu Jon work wh Seh Flaxman, Amru Nagasunder, Wl Gorr (CMU); Bre Goldsen (Cy of Chcago). Ths work was parally suppored by NSF grans IIS-0916345, IIS-0911032, and IIS-0953330.
Background: Crme Predcon n Chcago Snce 2009, we have been workng wh he Chcago Polce Deparmen (CPD) o predc and preven emergng clusers of volen crme. Our new crme predcon mehods have been ncorporaed no our CrmeScan sofware, run wce a day by CPD and used operaonally for deploymen of parols. From he Chcago Sun-Tmes, February 22, 2011: I was a b lke Mnory Repor, he 2002 move ha feaured genecally alered humans wh specal powers o predc crme. The CPD s new crme-forecasng un was analyzng 911 calls and produced an nellgence repor predcng a shoong would happen soon on a parcular block on he Souh Sde. Three mnues laer, dd
CrmeScan The key nsgh of our mehod s o use deecon for predcon: We can deec emergng clusers of varous leadng ndcaors (mnor crmes, 911 calls, ec.) and use hese o predc ha a cluser of volen crme s lkely o occur nearby. Some advanages of he CrmeScan approach: Advance predcon (up o 1 week) wh hgh accuracy. Hgh spaal and emporal resoluon (block x day). Predcng emergng ho spos of volence, as opposed o jus denfyng bad neghborhoods. How o deec leadng ndcaor clusers? How o use hese for predcon? Whch leadng ndcaors o use?
CrmeScan: Cluser Deecon We aggregae daly couns for each leadng ndcaor a he block level, and search for clusers of nearby blocks wh recen couns ha are sgnfcanly hgher han expeced. Imagne movng a crcular wndow around he cy, allowng he cener, radus, and emporal duraon o vary. Is here any spaal wndow and duraon T such ha couns have been sgnfcanly hgher han expeced for he las T days? Tme seres of pas couns Acual couns of las 3 days Expeced couns of las 3 days
CrmeScan: Cluser Deecon We aggregae daly couns for each leadng ndcaor a he block level, and search for clusers of nearby blocks wh recen couns ha are sgnfcanly hgher han expeced. Imagne movng a crcular wndow around he cy, allowng he cener, radus, and emporal duraon o vary. We fnd he hghes-scorng space-me regons, where he score of a regon s compued by he lkelhood rao sasc. F( S) Pr(Daa H 1( S)) Pr(Daa H 0) Alernave hypohess: cluser n regon S Null hypohess: no clusers These are he mos lkely clusers; we compue he p-value of each cluser by randomzaon, and repor clusers wh p-values <.
Expecaon-Based Scan Sasc Couns are Posson dsrbued: c ~ Posson(q b ) Under he null hypohess H 0, we expec couns o be equal o baselnes: q = 1 everywhere. Under he alernave hypohess H 1 (S), we expec ncreased rsk n space-me regon S: q = q n n S, for q n > 1, and q = 1 ousde. q s relave rsk. b s expeced coun under H 0, esmaed by me seres analyss of hsorcal daa. q n = 1.3 Ths gves a smple and effcenly compuable lkelhood rao sasc: F( S) C B C e B C, wherec S c and B S b. Many oher sascs can be used (see Kulldorff, 1997; Nell, 2006)
CrmeScan: Predcon The currenly deployed verson of CrmeScan uses a smple rule for predcon of volen crme clusers: Areas whch are closer o a sgnfcan cluser of any of he monored LI are assumed more lkely o have a spke n VC whn he nex 1 week. Toal proxmy o leadng ndcaor clusers s compued usng kernel densy esmaon: score = exp (-d 2 /2) (d s dsance o he h leadng ndcaor cluser) We are also nvesgang he use of logsc regresson for predcon (resuls no shown).
CrmeScan: Prelmnary Resuls Key resul: a block level, CrmeScan predcs 60% of he clusered* VC whch wll occur n he nex week, a a 15% false posve rae. * A leas 3 VC n ha bea, and 1.5 sd. dev. more han expeced. Predcon accuracy s sgnfcanly hgher han compeng mehods.
Whch Predcors o Use? Challenge #1: hundreds of possble predcors, ncludng mnor crmes, 911 emergency calls, 311 calls for servce, ec. Challenge #2: dfferen daa sources, or combnaons of sources, may be predcve n dfferen areas of he cy. We wsh o learn whch combnaons of sources are predcve, and where, usng cross-correlaon analyss of hsorcal daa. Typcal formulaon: gven an ndependen varable me seres and a dependen varable me seres Y, maxmze correlaon beween and lagged Y, over a range of lags L = L mn L max. For whch subse of leadng ndcaors, and whch subse of locaons, s cross-correlaon maxmzed?
Maxmzng cross-correlaon Gven monored locaons s ( = 1..N), we observe he mulple ndependen varable me seres x,m (m = 1..M) and he dependen varable me seres y a each locaon. Our goal s o maxmze he correlaon r(, Y) over all subses of leadng ndcaors, all proxmy-consraned subses of locaons, and all lags L = L mn..l max : max r(, Y S { s1.. sn }, D { d1.. dm }, L { Lmn.. Lmax } ) d D s S m where x, and Y m s S y L aggregaed ndependen var. me seres aggregaed, lagged dependen var. me seres
Maxmzng cross-correlaon How o effcenly maxmze correlaon r(d, S, L) over 2 N x 2 M subses of locaons and predcors? max r(, Y) Ierave S { s1framework.. sn }, D { d(ouer 1.. dm }, loop): L { Lmn.. Lmax } 1) Randomly nalze subse of sreams D. 2) Opmze over locaons: S = and arg max Y y m S r(d, S, L) 3) Opmze over sreams: D = arg max dm D s S D r(d, S, L) s S 4) Repea seps 2-3 unl convergence. 5) Repea seps 1-4 for R random resars. 6) Repea seps 1-5 for each lag L. where x, L
Opmzng over subses of sreams Gven fxed S and L, we wan o fnd a se D o maxmze r(d, S, L). We wre: = d m D m ; m = s S x,m ; and Y = s S y. Then we maxmze r(d S, L) = r(, Y) = = Y Y d m d m D D ( m m Y) Y Now we would lke o wre hs expresson as a convex funcon of wo addve suffcen sascs, r(d S, L) = F(C, B) where C = dm D C m and B = dm D B m. If we can do hs, we can show ha he opmal D consss of he k sreams wh hghes rao C m / B m, for some k {1..N}. Ths lnear-me subse scannng (LTSS) propery allows us o fnd he exac maxmum over he 2 M subses n O(M log M).
Opmzng over subses of sreams Gven fxed S and L, we wan o fnd a se D o maxmze r(d, S, L). We wre: = d m D m ; m = s S x,m ; and Y = s S y. Then we maxmze r(d S, L) = r(, Y) = = Y Y d m d m D D ( m m Y) Y Now we would lke o wre hs expresson as a convex funcon of wo addve suffcen sascs, r(d S, L) = F(C, B) where C = dm D C m and B = dm D B m. We can wre r(d S, L) = Y C addve suffcen sasc: C = C m = ( m Y ) B no an addve suffcen sasc! B = dm D ( m m ) + d, dj D, j ( j ) Soluon: we can approxmae he all-pars compuaon usng he average do produc of sream d m wh an arbrary se of sreams.
Ierave average do produc (IADP) Snce he opmal subse D s unknown, we compue he average do produc of each sream D m wh an arbrary 1 subse of sreams D (D m D ): Qm d D' m D' Then B dm D B m, Cwhere B m = m m + ( D -1) Q m. We We can wre have r(d approxmaed S, L) = r(d S, L) wh a funcon whch can be exacly and effcenly Y Bopmzed no an usng addve he LTSS suffcen propery! sasc! B = dm D ( m m ) + d, dj D, j ( j ) However, he approxmaon may be poor when D s far from D. Our soluon s o erae: a each sep, we se D equal o he bes subse D found on he prevous sep, and repea unl convergence.
Opmzng over subses of locaons Gven fxed D and L, we wan o fnd a se S o maxmze r(d, S, L). We wre: = s S ; = d m D x,m ; and Y = s S y. Then we maxmze r(s D, L) = r(, Y) = s S s S Y s S s S Y Ths expresson s more dffcul o approxmae by a funcon ha sasfes LTSS because we have summaons boh over and Y, resulng n all-pars compuaons boh n he numeraor and n he denomnaor. The erave average do produc mehod can also be appled n hs seng, bu now we mus make fve approxmaons nsead of one. Deals are provded n he full paper (Flaxman and Nell, 2012, submed).
Resuls: Comparson of Mehods For IADP and several compeng mehods, we maxmzed cross-correlaon over subses of predcors (and locaons) for each of he 77 Chcago neghborhoods. We hen compued he average cross-correlaon found by each mehod. Mehod IADP, searchng over subses of census racs whn each neghborhood. IADP, reang each neghborhood as a sngle locaon. Average crosscorrelaon.546.423 Google Correlae.404 LASSO.325 By jonly opmzng over subses of locaons and sreams, we fnd areas wh much sronger crosscorrelaons beween ndependen and dependen varables. Improved feaure selecon: Searchng over subses of sreams for each neghborhood, we fnd sgnfcanly hgher correlaons han prevous mehods.
Resuls: Exploraory Analyss Consderng all subses of census racs whn each of he 77 neghborhoods of Chcago, 28 dfferen poenal predcors, and a 1-week lag, we found a correlaon of r =.786 beween volen crme and a subse of 12 leadng ndcaors, for 10 census racs n he Wes Englewood neghborhood. Toal run me for all 77 neghborhoods was 2.1 hours.
Conclusons and Ongong Work CrmeScan s a new and powerful mehodology for crme predcon whch has been very successful n pracce. We are n he process of exendng CrmeScan by developng novel mehods o choose an opmal se of spaally varyng leadng ndcaors for predcon. Our resuls sugges ha dfferen subses of leadng ndcaors have hgh predcve accuracy n dfferen areas, and ha our new mehods can effcenly opmze cross-correlaon over subses of locaons and sreams. Our nex sep s o deermne wheher he opmzed, spaally varyng subse of leadng ndcaors can be used o mprove he overall predcve accuracy of CrmeScan.
From CrmeScan o CyScan Workng wh he Cy of Chcago s Chef Daa Offcer, we are currenly usng our new even deecon mehods for analyss of many oher daa sources relevan o he cy. Mos neresngly, we have some promsng nal resuls for predcon of emergng paerns of 311 calls. Examples: abandoned buldngs, graff cleanup, sanaon complans, roden removal, garbage cars Our CrmeScan sofware has been renamed CyScan and s beng ncorporaed no WndyGrd, he cy s new spaal daabase, whch wll enable real-me monorng of crme, 311, and many oher daa sources.