Contents. This is page i Printer: Opaque this

Size: px
Start display at page:

Download "Contents. This is page i Printer: Opaque this"

Transcription

1 Cntents This is page i Printer: Opaque this 9 Additive Mdels, Trees, and Related Methds Generalized Additive Mdels Fitting Additive Mdels Example: Additive Lgistic Regressin Summary Tree-Based Methds Backgrund Regressin Trees Classificatin Trees Other Issues Spam Example (Cntinued) PRIM: Bump Hunting Spam Example (Cntinued) MARS: Multivariate Adaptive Regressin Splines Spam Example (Cntinued) Example (Simulated Data) Other Issues Hierarchical Mixtures f Experts Missing Data Cmputatinal Cnsideratins Bibligraphic Ntes Exercises References 43

2 This is page i Printer: Opaque this The Elements f Statistical Learning Data Mining, Inference and Predictin Chapter 9: Additive Mdels, Trees, and Related Methds Jerme Friedman Trevr Hastie Rbert Tibshirani August 4, 2008 c Friedman, Hastie & Tibshirani

3 9 Additive Mdels, Trees, and Related Methds This is page 1 Printer: Opaque this In this chapter we begin ur discussin f sme specific methds fr supervised learning. These techniques each assume a (different) structured frm fr the unknwn regressin functin, and by ding s they finesse the curse f dimensinality. Of curse, they pay the pssible price f misspecifying the mdel, and s in each case there is a tradeff that has t be made. They take ff where Chapters 3 6 left ff. We describe five related techniques: generalized additive mdels, trees, multivariate adaptive regressin splines, the patient rule inductin methd, and hierarchical mixtures f experts. 9.1 Generalized Additive Mdels Regressin mdels play an imprtant rle in many data analyses, prviding predictin and classificatin rules, and data analytic tls fr understanding the imprtance f different inputs. Althugh attractively simple, the traditinal linear mdel ften fails in these situatins: in real life, effects are ften nt linear. In earlier chapters we described techniques that used predefined basis functins t achieve nnlinearities. This sectin describes mre autmatic flexible statistical methds that may be used t identify and characterize nnlinear regressin effects. These methds are called generalized additive mdels. In the regressin setting, a generalized additive mdel has the frm E(Y X 1, X 2,..., X p ) = α + f 1 (X 1 ) + f 2 (X 2 ) + + f p (X p ). (9.1)

4 2 9. Additive Mdels, Trees, and Related Methds As usual X 1, X 2,..., X p represent predictrs and Y is the utcme; the f j s are unspecified smth ( nnparametric ) functins. If we were t mdel each functin using an expansin f basis functins (as in Chapter 5), the resulting mdel culd then be fit by simple least squares. Our apprach here is different: we fit each functin using a scatterplt smther (e.g., a cubic smthing spline r kernel smther), and prvide an algrithm fr simultaneusly estimating all p functins (Sectin 9.1.1). Fr tw-class classificatin, recall the lgistic regressin mdel fr binary data discussed in Sectin 4.4. We relate the mean f the binary respnse µ(x) = Pr(Y = 1 X) t the predictrs via a linear regressin mdel and the lgit link functin: ( ) µ(x) lg = α + β 1 X β p X p. (9.2) 1 µ(x) The additive lgistic regressin mdel replaces each linear term by a mre general functinal frm ( ) µ(x) lg = α + f 1 (X 1 ) + + f p (X p ), (9.3) 1 µ(x) where again each f j is an unspecified smth functin. While the nnparametric frm fr the functins f j makes the mdel mre flexible, the additivity is retained and allws us t interpret the mdel in much the same way as befre. The additive lgistic regressin mdel is an example f a generalized additive mdel. In general, the cnditinal mean µ(x) f a respnse Y is related t an additive functin f the predictrs via a link functin g: g[µ(x)] = α + f 1 (X 1 ) + + f p (X p ). (9.4) Examples f classical link functins are the fllwing: g(µ) = µ is the identity link, used fr linear and additive mdels fr Gaussian respnse data. g(µ) = lgit(µ) as abve, r g(µ) = prbit(µ), the prbit link functin, fr mdeling binmial prbabilities. The prbit functin is the inverse Gaussian cumulative distributin functin: prbit(µ) = Φ 1 (µ). g(µ) = lg(µ) fr lg-linear r lg-additive mdels fr Pissn cunt data. All three f these arise frm expnential family sampling mdels, which in additin include the gamma and negative-binmial distributins. These families generate the well-knwn class f generalized linear mdels, which are all extended in the same way t generalized additive mdels. The functins f j are estimated in a flexible manner, using an algrithm whse basic building blck is a scatterplt smther. The estimated functin ˆf j can then reveal pssible nnlinearities in the effect f X j. Nt all

5 9.1 Generalized Additive Mdels 3 f the functins f j need t be nnlinear. We can easily mix in linear and ther parametric frms with the nnlinear terms, a necessity when sme f the inputs are qualitative variables (factrs). The nnlinear terms are nt restricted t main effects either; we can have nnlinear cmpnents in tw r mre variables, r separate curves in X j fr each level f the factr X k. Thus each f the fllwing wuld qualify: g(µ) = X T β + α k + f(z) a semiparametric mdel, where X is a vectr f predictrs t be mdeled linearly, α k the effect fr the kth level f a qualitative input V, and the effect f predictr Z is mdeled nnparametrically. g(µ) = f(x) + g k (Z) again k indexes the levels f a qualitative input V. and thus creates an interactin term g(v, Z) = g k (Z) fr the effect f V and Z. g(µ) = f(x) + g(z, W ) where g is a nnparametric functin in tw features. Additive mdels can replace linear mdels in a wide variety f settings, fr example an additive decmpsitin f time series, Y t = S t + T t + ε t, (9.5) where S t is a seasnal cmpnent, T t is a trend and ε is an errr term Fitting Additive Mdels In this sectin we describe a mdular algrithm fr fitting additive mdels and their generalizatins. The building blck is the scatterplt smther fr fitting nnlinear effects in a flexible way. Fr cncreteness we use as ur scatterplt smther the cubic smthing spline described in Chapter 5. The additive mdel has the frm Y = α + p f j (X j ) + ε, (9.6) j=1 where the errr term ε has mean zer. Given bservatins x i, y i, a criterin like the penalized sum f squares (5.9) f Sectin 5.4 can be specified fr this prblem, ( 2 N p p PRSS(α, f 1, f 2,..., f p ) = y i α f j (x ij )) + λ j i=1 j=1 j=1 f j (t j) 2 dt j, (9.7) where the λ j 0 are tuning parameters. It can be shwn that the minimizer f (9.7) is an additive cubic spline mdel; each f the functins f j is a

6 4 9. Additive Mdels, Trees, and Related Methds Algrithm 9.1 The Backfitting Algrithm fr Additive Mdels. 1. Initialize: ˆα = 1 N N 1 y i, ˆf j 0, i, j. 2. Cycle: j = 1, 2,..., p,..., 1, 2,..., p,..., [ ˆf j S j {y i ˆα ] ˆf k (x ik )} N 1, k j ˆf j ˆf j 1 N N ˆf j (x ij ). i=1 until the functins ˆf j change less than a prespecified threshld. cubic spline in the cmpnent X j, with knts at each f the unique values f x ij, i = 1,..., N. Hwever, withut further restrictins n the mdel, the slutin is nt unique. The cnstant α is nt identifiable, since we can add r subtract any cnstants t each f the functins f j, and adjust α accrdingly. The standard cnventin is t assume that N 1 f j(x ij ) = 0 j the functins average zer ver the data. It is easily seen that ˆα = ave(y i ) in this case. If in additin t this restrictin, the matrix f input values (having ijth entry x ij ) is nnsingular, then (9.7) is a strictly cnvex criterin and the minimizer is unique. If the matrix is singular, then the linear part f the cmpnents f j cannt be uniquely determined (while the nnlinear parts can!)(buja et al. 1989). Furthermre, a simple iterative prcedure exists fr finding the slutin. We set ˆα = ave(y i ), and it never changes. We apply a cubic smthing spline S j t the targets {y i ˆα ˆf k j k (x ik )} N 1, as a functin f x ij, t btain a new estimate ˆf j. This is dne fr each predictr in turn, using the current estimates f the ther functins ˆf ˆf k when cmputing y i ˆα k j k (x ik ). The prcess is cntinued until the estimates ˆf j stabilize. This prcedure, given in detail in Algrithm 9.1, is knwn as backfitting and the resulting fit is analgus t a multiple regressin fr linear mdels. In principle, the secnd step in (2) f Algrithm 9.1 is nt needed, since the smthing spline fit t a mean-zer respnse has mean zer (Exercise 9.1). In practice, machine runding can cause slippage, and the adjustment is advised. This same algrithm can accmmdate ther fitting methds in exactly the same way, by specifying apprpriate smthing peratrs S j : ther univariate regressin smthers such as lcal plynmial regressin and kernel methds;

7 9.1 Generalized Additive Mdels 5 linear regressin peratrs yielding plynmial fits, piecewise cnstant fits, parametric spline fits, series and Furier fits; mre cmplicated peratrs such as surface smthers fr secnd r higher-rder interactins r peridic smthers fr seasnal effects. If we cnsider the peratin f smther S j nly at the training pints, it can be represented by an N N peratr matrix S j (see Sectin 5.4.1). Then the degrees f freedm fr the jth term are (apprximately) cmputed as df j = trace[s j ] 1, by analgy with degrees f freedm fr smthers discussed in Chapters 5 and 6. Fr a large class f linear smthers S j, backfitting is equivalent t a Gauss Seidel algrithm fr slving a certain linear system f equatins. Details are given in Exercise 9.2. Fr the lgistic regressin mdel and ther generalized additive mdels, the apprpriate criterin is a penalized lg-likelihd. T maximize it, the backfitting prcedure is used in cnjunctin with a likelihd maximizer. The usual Newtn Raphsn rutine fr maximizing lg-likelihds in generalized linear mdels can be recast as an IRLS (iteratively reweighted least squares) algrithm. This invlves repeatedly fitting a weighted linear regressin f a wrking respnse variable n the cvariates; each regressin yields a new value f the parameter estimates, which in turn give new wrking respnses and weights, and the prcess is iterated (see Sectin 4.4.1). In the generalized additive mdel, the weighted linear regressin is simply replaced by a weighted backfitting algrithm. We describe the algrithm in mre detail fr lgistic regressin belw, and mre generally in Chapter 6 f Hastie and Tibshirani (1990) Example: Additive Lgistic Regressin Prbably the mst widely used mdel in medical research is the lgistic mdel fr binary data. In this mdel the utcme Y can be cded as 0 r 1, with 1 indicating an event (like death r relapse f a disease) and 0 indicating n event. We wish t mdel Pr(Y = 1 X), the prbability f an event given values f the prgnstic factrs X = (X 1,..., X p ). The gal is usually t understand the rles f the prgnstic factrs, rather than t classify new individuals. Lgistic mdels are als used in applicatins where ne is interested in estimating the class prbabilities, fr use in risk screening. Apart frm medical applicatins, credit risk screening is a ppular applicatin. The generalized additive lgistic mdel has the frm lg Pr(Y = 1 X) Pr(Y = 0 X) = α + f 1(X 1 ) + + f p (X p ). (9.8) The functins f 1, f 2,..., f p are estimated by a backfitting algrithm within a Newtn Raphsn prcedure, shwn in Algrithm 9.2.

8 6 9. Additive Mdels, Trees, and Related Methds Algrithm 9.2 Lcal Scring Algrithm fr the Additive Lgistic Regressin Mdel. 1. Cmpute starting values: ˆα = lg[ȳ/(1 ȳ)], where ȳ = ave(y i ), the sample prprtin f nes, and set ˆf j 0 j. 2. Define ˆη i = ˆα + j ˆf j (x ij ) and ˆp i = 1/[1 + exp( ˆη i )]. Iterate: (a) Cnstruct the wrking target variable z i = ˆη i + (y i ˆp i ) ˆp i (1 ˆp i ). (b) Cnstruct weights w i = ˆp i (1 ˆp i ) (c) Fit an additive mdel t the targets z i with weights w i, using a weighted backfitting algrithm. This gives new estimates ˆα, ˆf j, j 3. Cntinue step 2. until the change in the functins falls belw a prespecified threshld. The additive mdel fitting in step (2) f Algrithm 9.2 requires a weighted scatterplt smther. Mst smthing prcedures can accept bservatin weights (Exercise 5.12); see Chapter 3 f Hastie and Tibshirani (1990) fr further details. The additive lgistic regressin mdel can be generalized further t handle mre than tw classes, using the multilgit frmulatin as utlined in Sectin 4.4. While the frmulatin is a straightfrward extensin f (9.8), the algrithms fr fitting such mdels are mre cmplex. See Yee and Wild (1996) fr details, and the VGAM sftware currently available frm: yee. Example: Predicting Spam We apply a generalized additive mdel t the spam data intrduced in Chapter 1. The data cnsists f infrmatin frm messages, in a study t screen fr spam (i.e., junk ). The data is publicly available at ftp.ics.uci.edu, and was dnated by Gerge Frman frm Hewlett-Packard labratries, Pal Alt, Califrnia. The respnse variable is binary, with values r spam, and there are 57 predictrs as described belw: 48 quantitative predictrs the percentage f wrds in the that match a given wrd. Examples include business, address, internet,

9 9.1 Generalized Additive Mdels 7 TABLE 9.1. Test data cnfusin matrix fr the additive lgistic regressin mdel fit t the spam training data. The verall test errr rate is 5.5%. Predicted Class True Class (0) spam (1) (0) 58.3% 2.5% spam (1) 3.0% 36.3% free, and gerge. The idea was that these culd be custmized fr individual users. 6 quantitative predictrs the percentage f characters in the that match a given character. The characters are ch;, ch(, ch[, ch!, ch$, and ch#. The average length f uninterrupted sequences f capital letters: CAPAVE. The length f the lngest uninterrupted sequence f capital letters: CAPMAX. The sum f the length f uninterrupted sequences f capital letters: CAPTOT. We cded spam as 1 and as zer. A test set f size 1536 was randmly chsen, leaving 3065 bservatins in the training set. A generalized additive mdel was fit, using a cubic smthing spline with a nminal fur degrees f freedm fr each predictr. What this means is that fr each predictr X j, the smthing-spline parameter λ j was chsen s that trace[s j (λ j )] 1 = 4, where S j (λ) is the smthing spline peratr matrix cnstructed using the bserved values x ij, i = 1,..., N. This is a cnvenient way f specifying the amunt f smthing in such a cmplex mdel. Mst f the spam predictrs have a very lng-tailed distributin. Befre fitting the GAM mdel, we lg-transfrmed each variable (actually lg(x + 0.1)), but the plts in Figure 9.1 are shwn as a functin f the riginal variables. The test errr rates are shwn in Table 9.1; the verall errr rate is 5.3%. By cmparisn, a linear lgistic regressin has a test errr rate f 7.6%. Table 9.2 shws the predictrs that are highly significant in the additive mdel. Fr ease f interpretatin, in Table 9.2 the cntributin fr each variable is decmpsed int a linear cmpnent and the remaining nnlinear cmpnent. The tp blck f predictrs are psitively crrelated with spam, while the bttm blck is negatively crrelated. The linear cmpnent is a weighted least squares linear fit f the fitted curve n the predictr, while the nnlinear part is the residual. The linear cmpnent f an estimated

10 8 9. Additive Mdels, Trees, and Related Methds TABLE 9.2. Significant predictrs frm the additive mdel fit t the spam training data. The cefficients represent the linear part f ˆf j, alng with their standard errrs and Z-scre. The nnlinear P-value is fr a test f nnlinearity f ˆf j. Name Num. df Cefficient Std. Errr Z Scre Nnlinear P -value Psitive effects ur ver remve internet free business hpl ch! ch$ CAPMAX CAPTOT Negative effects hp gerge re edu functin is summarized by the cefficient, standard errr and Z-scre; the latter is the cefficient divided by its standard errr, and is cnsidered significant if it exceeds the apprpriate quantile f a standard nrmal distributin. The clumn labeled nnlinear P -value is a test f nnlinearity f the estimated functin. Nte, hwever, that the effect f each predictr is fully adjusted fr the entire effects f the ther predictrs, nt just fr their linear parts. The predictrs shwn in the table were judged significant by at least ne f the tests (linear r nnlinear) at the p = 0.01 level (tw-sided). Figure 9.1 shws the estimated functins fr the significant predictrs appearing in Table 9.2. Many f the nnlinear effects appear t accunt fr a strng discntinuity at zer. Fr example, the prbability f spam drps significantly as the frequency f gerge increases frm zer, but then des nt change much after that. This suggests that ne might replace each f the frequency predictrs by an indicatr variable fr a zer cunt, and resrt t a linear lgistic mdel. This gave a test errr rate f 7.4%; including the linear effects f the frequencies as well drpped the test errr t 6.6%. It appears that the nnlinearities in the additive mdel have an additinal predictive pwer.

11 9.1 Generalized Additive Mdels 9 PSfrag replacements ur ver remve internet free business hp hpl ˆf(gerge) ˆf(1999) ˆf(re) ˆf(edu) ˆf(free) ˆf(business) ˆf(hp) ˆf(hpl) ˆf(ur) ˆf(ver) ˆf(ch!) ˆf(remve) ˆf(internet) gerge 1999 re edu ˆf(ch$) ˆf(CAPMAX) ˆf(CAPTOT) ch! ch$ CAPMAX CAPTOT FIGURE 9.1. Spam analysis: estimated functins fr significant predictrs. The rug plt alng the bttm f each frame indicates the bserved values f the crrespnding predictr. Fr many f the predictrs the nnlinearity picks up the discntinuity at zer.

12 10 9. Additive Mdels, Trees, and Related Methds It is mre serius t classify a genuine message as spam, since then a gd wuld be filtered ut and wuld nt reach the user. We can alter the balance between the class errr rates by changing the lsses (see Sectin 2.4). If we assign a lss L 01 fr predicting a true class 0 as class 1, and L 10 fr predicting a true class 1 as class 0, then the estimated Bayes rule predicts class 1 if its prbability is greater than L 01 /(L 01 + L 10 ). Fr example, if we take L 01 = 10, L 10 = 1 then the (true) class 0 and class 1 errr rates change t 0.8% and 8.7%. Mre ambitiusly, we can encurage the mdel t fit better data in the class 0 by using weights L 01 fr the class 0 bservatins and L 10 fr the class 1 bservatins. As abve, we then use the estimated Bayes rule t predict. This gave errr rates f 1.2% and 8.0% in (true) class 0 and class 1, respectively. We discuss belw the issue f unequal lsses further, in the cntext f tree-based mdels. After fitting an additive mdel, ne shuld check whether the inclusin f sme interactins can significantly imprve the fit. This can be dne manually, by inserting prducts f sme r all f the significant inputs, r autmatically via the MARS prcedure (Sectin 9.4). This example uses the additive mdel in an autmatic fashin. As a data analysis tl, additive mdels are ften used in a mre interactive fashin, adding and drpping terms t determine their effect. By calibrating the amunt f smthing in terms f df j, ne can mve seamlessly between linear mdels (df j = 1) and partially linear mdels, where sme terms are mdeled mre flexibly. See Hastie and Tibshirani (1990) fr mre details Summary Additive mdels prvide a useful extensin f linear mdels, making them mre flexible while still retaining much f their interpretability. The familiar tls fr mdelling and inference in linear mdels are als available fr additive mdels, seen fr example in Table 9.2. The backfitting prcedure fr fitting these mdels is simple and mdular, allwing ne t chse a fitting methd apprpriate fr each input variable. As a result they have becme widely used in the statistical cmmunity. Hwever additive mdels can have limitatins fr large data-mining applicatins. The backfitting algrithm fits all predictrs, which is nt feasible r desirable when a large number are available. The BRUTO prcedure (Hastie and Tibshirani 1990, Chapter 9) cmbines backfitting with selectin f inputs, but is nt designed fr large data-mining prblems. There has als been recent wrk using lass-type penalties t estimate sparse additive mdels, fr example the COSSO prcedure f Lin and Zhang (2003) and the SpAM prpsal f Ravikumar et al. (2007). Fr large prblems a frward stagewise apprach such as bsting (Chapter 14) is mre effective, and als allws fr interactins t be included in the mdel.

13 9.2 Tree-Based Methds Tree-Based Methds Backgrund Tree-based methds partitin the feature space int a set f rectangles, and then fit a simple mdel (like a cnstant) in each ne. They are cnceptually simple yet pwerful. We first describe a ppular methd fr tree-based regressin and classificatin called CART, and later cntrast it with C4.5, a majr cmpetitr. Let s cnsider a regressin prblem with cntinuus respnse Y and inputs X 1 and X 2, each taking values in the unit interval. The tp left panel f Figure 9.2 shws a partitin f the feature space by lines that are parallel t the crdinate axes. In each partitin element we can mdel Y with a different cnstant. Hwever, there is a prblem: althugh each partitining line has a simple descriptin like X 1 = c, sme f the resulting regins are cmplicated t describe. T simplify matters, we restrict attentin t recursive binary partitins like that in the tp right panel f Figure 9.2. We first split the space int tw regins, and mdel the respnse by the mean f Y in each regin. We chse the variable and split-pint t achieve the best fit. Then ne r bth f these regins are split int tw mre regins, and this prcess is cntinued, until sme stpping rule is applied. Fr example, in the tp right panel f Figure 9.2, we first split at X 1 = t 1. Then the regin X 1 t 1 is split at X 2 = t 2 and the regin X 1 > t 1 is split at X 1 = t 3. Finally, the regin X 1 > t 3 is split at X 2 = t 4. The result f this prcess is a partitin int the five regins R 1, R 2,..., R 5 shwn in the figure. The crrespnding regressin mdel predicts Y with a cnstant c m in regin R m, that is, ˆf(X) = 5 c m I{(X 1, X 2 ) R m }. (9.9) m=1 This same mdel can be represented by the binary tree in the bttm left panel f Figure 9.2. The full dataset sits at the tp f the tree. Observatins satisfying the cnditin at each junctin are assigned t the left branch, and the thers t the right branch. The terminal ndes r leaves f the tree crrespnd t the regins R 1, R 2,..., R 5. The bttm right panel f Figure 9.2 is a perspective plt f the regressin surface frm this mdel. Fr illustratin, we chse the nde means c 1 = 5, c 2 = 7, c 3 = 0, c 4 = 2, c 5 = 4 t make this plt. A key advantage f the recursive binary tree is its interpretability. The feature space partitin is fully described by a single tree. With mre than tw inputs, partitins like that in the tp right panel f Figure 9.2 are difficult t draw, but the binary tree representatin wrks in the same way. This representatin is als ppular amng medical scientists, perhaps because it mimics the way that a dctr thinks. The tree stratifies the

14 12 9. Additive Mdels, Trees, and Related Methds R 5 R 2 t 4 X2 X2 t 2 R 3 R 4 R 1 PSfrag replacements X 1 t 1 X 1 t 3 X 1 t 1 X 2 t 2 X 1 t 3 X 2 t 4 R 1 R 2 R 3 X 2 X 1 R 4 R 5 FIGURE 9.2. Partitins and CART. Tp right panel shws a partitin f a tw-dimensinal feature space by recursive binary splitting, as used in CART, applied t sme fake data. Tp left panel shws a general partitin that cannt be btained frm recursive binary splitting. Bttm left panel shws the tree crrespnding t the partitin in the tp right panel, and a perspective plt f the predictin surface appears in the bttm right panel.

15 9.2 Tree-Based Methds 13 ppulatin int strata f high and lw utcme, n the basis f patient characteristics Regressin Trees We nw turn t the questin f hw t grw a regressin tree. Our data cnsists f p inputs and a respnse, fr each f N bservatins: that is, (x i, y i ) fr i = 1, 2,..., N, with x i = (x i1, x i2,..., x ip ). The algrithm needs t autmatically decide n the splitting variables and split pints, and als what tplgy (shape) the tree shuld have. Suppse first that we have a partitin int M regins R 1, R 2,..., R M, and we mdel the respnse as a cnstant c m in each regin: f(x) = M c m I(x R m ). (9.10) m=1 If we adpt as ur criterin minimizatin f the sum f squares (y i f(x i )) 2, it is easy t see that the best ĉ m is just the average f y i in regin R m : ĉ m = ave(y i x i R m ). (9.11) Nw finding the best binary partitin in terms f minimum sum f squares is generally cmputatinally infeasible. Hence we prceed with a greedy algrithm. Starting with all f the data, cnsider a splitting variable j and split pint s, and define the pair f half-planes R 1 (j, s) = {X X j s} and R 2 (j, s) = {X X j > s}. (9.12) Then we seek the splitting variable j and split pint s that slve [ min min (y i c 1 ) 2 + min (y i c 2 ) 2]. (9.13) j, s c 1 c 2 x i R 1(j,s) x i R 2(j,s) Fr any chice j and s, the inner minimizatin is slved by ĉ 1 = ave(y i x i R 1 (j, s)) and ĉ 2 = ave(y i x i R 2 (j, s)). (9.14) Fr each splitting variable, the determinatin f the split pint s can be dne very quickly and hence by scanning thrugh all f the inputs, determinatin f the best pair (j, s) is feasible. Having fund the best split, we partitin the data int the tw resulting regins and repeat the splitting prcess n each f the tw regins. Then this prcess is repeated n all f the resulting regins. Hw large shuld we grw the tree? Clearly a very large tree might verfit the data, while a small tree might nt capture the imprtant structure.

16 14 9. Additive Mdels, Trees, and Related Methds Tree size is a tuning parameter gverning the mdel s cmplexity, and the ptimal tree size shuld be adaptively chsen frm the data. One apprach wuld be t split tree ndes nly if the decrease in sum-f-squares due t the split exceeds sme threshld. This strategy is t shrt-sighted, hwever, since a seemingly wrthless split might lead t a very gd split belw it. The preferred strategy is t grw a large tree T 0, stpping the splitting prcess nly when sme minimum nde size (say 5) is reached. Then this large tree is pruned using cst-cmplexity pruning, which we nw describe. We define a subtree T T 0 t be any tree that can be btained by pruning T 0, that is, cllapsing any number f its internal (nn-terminal) ndes. We index terminal ndes by m, with nde m representing regin R m. Let T dente the number f terminal ndes in T. Letting N m = #{x i R m }, ĉ m = 1 y i, N m x i R m Q m (T ) = 1 (y i ĉ m ) 2, N m x i R m (9.15) we define the cst cmplexity criterin C α (T ) = T m=1 N m Q m (T ) + α T. (9.16) The idea is t find, fr each α, the subtree T α T 0 t minimize C α (T ). The tuning parameter α 0 gverns the tradeff between tree size and its gdness f fit t the data. Large values f α result in smaller trees T α, and cnversely fr smaller values f α. As the ntatin suggests, with α = 0 the slutin is the full tree T 0. We discuss hw t adaptively chse α belw. Fr each α ne can shw that there is a unique smallest subtree T α that minimizes C α (T ). T find T α we use weakest link pruning: we successively cllapse the internal nde that prduces the smallest per-nde increase in m N mq m (T ), and cntinue until we prduce the single-nde (rt) tree. This gives a (finite) sequence f subtrees, and ne can shw this sequence must cntain T α. See Breiman et al. (1984) r Ripley (1996) fr details. Estimatin f α is achieved by five- r tenfld crss-validatin: we chse the value ˆα t minimize the crss-validated sum f squares. Our final tree is Tˆα Classificatin Trees If the target is a classificatin utcme taking values 1, 2,..., K, the nly changes needed in the tree algrithm pertain t the criteria fr splitting ndes and pruning the tree. Fr regressin we used the squared-errr nde

17 9.2 Tree-Based Methds Gini index Misclassificatin errr Entrpy p FIGURE 9.3. Nde impurity measures fr tw-class classificatin, as a functin f the prprtin p in class 2. Crss-entrpy has been scaled t pass thrugh (0.5, 0.5). impurity measure Q m (T ) defined in (9.15), but this is nt suitable fr classificatin. In a nde m, representing a regin R m with N m bservatins, let ˆp mk = 1 I(y i = k), N m x i R m the prprtin f class k bservatins in nde m. We classify the bservatins in nde m t class k(m) = arg max k ˆp mk, the majrity class in nde m. Different measures Q m (T ) f nde impurity include the fllwing: 1 Misclassificatin errr: N m i R m I(y i k(m)) = 1 ˆp mk(m). Gini index: k k ˆp mk ˆp mk = K k=1 ˆp mk(1 ˆp mk ). Crss-entrpy r deviance: K k=1 ˆp mk lg ˆp mk. (9.17) Fr tw classes, if p is the prprtin in the secnd class, these three measures are 1 max(p, 1 p), 2p(1 p) and p lg p (1 p) lg (1 p), respectively. They are shwn in Figure 9.3. All three are similar, but crssentrpy and the Gini index are differentiable, and hence mre amenable t numerical ptimizatin. Cmparing (9.13) and (9.15), we see that we need t weight the nde impurity measures by the number N ml and N mr f bservatins in the tw child ndes created by splitting nde m. In additin, crss-entrpy and the Gini index are mre sensitive t changes in the nde prbabilities than the misclassificatin rate. Fr example, in a tw-class prblem with 400 bservatins in each class (dente this by (400, 400)), suppse ne split created ndes (300, 100) and (100, 300), while

18 16 9. Additive Mdels, Trees, and Related Methds the ther created ndes (200, 400) and (200, 0). Bth splits prduce a misclassificatin rate f 0.25, but the secnd split prduces a pure nde and is prbably preferable. Bth the Gini index and crss-entrpy are lwer fr the secnd split. Fr this reasn, either the Gini index r crss-entrpy shuld be used when grwing the tree. T guide cst-cmplexity pruning, any f the three measures can be used, but typically it is the misclassificatin rate. The Gini index can be interpreted in tw interesting ways. Rather than classify bservatins t the majrity class in the nde, we culd classify them t class k with prbability ˆp mk. Then the training errr rate f this rule in the nde is k k ˆp mk ˆp mk the Gini index. Similarly, if we cde each bservatin as 1 fr class k and zer therwise, the variance ver the nde f this 0-1 respnse is ˆp mk (1 ˆp mk ). Summing ver classes k again gives the Gini index Other Issues Categrical Predictrs When splitting a predictr having q pssible unrdered values, there are 2 q 1 1 pssible partitins f the q values int tw grups, and the cmputatins becme prhibitive fr large q. Hwever, with a 0 1 utcme, this cmputatin simplifies. We rder the predictr classes accrding t the prprtin falling in utcme class 1. Then we split this predictr as if it were an rdered predictr. One can shw this gives the ptimal split, in terms f crss-entrpy r Gini index, amng all pssible 2 q 1 1 splits. This result als hlds fr a quantitative utcme and square errr lss the categries are rdered by increasing mean f the utcme. Althugh intuitive, the prfs f these assertins are nt trivial. The prf fr binary utcmes is given in Breiman et al. (1984) and Ripley (1996); the prf fr quantitative utcmes can be fund in?. Fr multicategry utcmes, n such simplificatins are pssible, althugh varius apprximatins have been prpsed (Lh and Vanichsetakul 1988). The partitining algrithm tends t favr categrical predictrs with many levels q; the number f partitins grws expnentially in q, and the mre chices we have, the mre likely we can find a gd ne fr the data at hand. This can lead t severe verfitting if q is large, and such variables shuld be avided. The Lss Matrix In classificatin prblems, the cnsequences f misclassifying bservatins are mre serius in sme classes than thers. Fr example, it is prbably wrse t predict that a persn will nt have a heart attack when he/she actually will, than vice versa. T accunt fr this, we define a K K lss matrix L, with L kk being the lss incurred fr classifying a class k bservatin as class k. Typically n lss is incurred fr crrect classificatins,

19 9.2 Tree-Based Methds 17 that is, L kk = 0 k. T incrprate the lsses int the mdeling prcess, we culd mdify the Gini index t k k L kk ˆp mk ˆp mk ; this wuld be the expected lss incurred by the randmized rule. This wrks fr the multiclass case, but in the tw-class case has n effect, since the cefficient f ˆp mk ˆp mk is L kk + L k k. Fr tw classes a better apprach is t weight the bservatins in class k by L kk. This can be used in the multiclass case nly if, as a functin f k, L kk desn t depend n k. Observatin weighting can be used with the deviance as well. The effect f bservatin weighting is t alter the prir prbability n the classes. In a terminal nde, the empirical Bayes rule implies that we classify t class k(m) = arg min k l L lk ˆp ml. Missing Predictr Values Suppse ur data has sme missing predictr values in sme r all f the variables. We might discard any bservatin with sme missing values, but this culd lead t serius depletin f the training set. Alternatively we might try t fill in (impute) the missing values, with say the mean f that predictr ver the nnmissing bservatins. Fr tree-based mdels, there are tw better appraches. The first is applicable t categrical predictrs: we simply make a new categry fr missing. Frm this we might discver that bservatins with missing values fr sme measurement behave differently than thse with nnmissing values. The secnd mre general apprach is the cnstructin f surrgate variables. When cnsidering a predictr fr a split, we use nly the bservatins fr which that predictr is nt missing. Having chsen the best (primary) predictr and split pint, we frm a list f surrgate predictrs and split pints. The first surrgate is the predictr and crrespnding split pint that best mimics the split f the training data achieved by the primary split. The secnd surrgate is the predictr and crrespnding split pint that des secnd best, and s n. When sending bservatins dwn the tree either in the training phase r during predictin, we use the surrgate splits in rder, if the primary spitting predictr is missing. Surrgate splits explit crrelatins between predictrs t try and alleviate the effect f missing data. The higher the crrelatin between the missing predictr and the ther predictrs, the smaller the lss f infrmatin due t the missing value. The general prblem f missing data is discussed in Sectin 9.6, Why Binary Splits? Rather than splitting each nde int just tw grups at each stage (as abve), we might cnsider multiway splits int mre than tw grups. While this can smetimes be useful, it is nt a gd general strategy. The prblem is that multiway splits fragment the data t quickly, leaving insufficient data at the next level dwn. Hence we wuld want t use such splits nly when needed. Since multiway splits can be achieved by a series f binary splits, the latter are preferred.

20 18 9. Additive Mdels, Trees, and Related Methds Other Tree-Building Prcedures The discussin abve fcuses n the CART (classificatin and regressin tree) implementatin f trees. The ther ppular methdlgy is ID3 and its later versins, C4.5 and C5.0 (Quinlan 1993). Early versins f the prgram were limited t categrical predictrs, and used a tp-dwn rule with n pruning. With mre recent develpments, C5.0 has becme quite similar t CART. The mst significant feature unique t C5.0 is a scheme fr deriving rule sets. After a tree is grwn, the splitting rules that define the terminal ndes can smetimes be simplified: that is, ne r mre cnditin can be drpped withut changing the subset f bservatins that fall in the nde. We end up with a simplified set f rules defining each terminal nde; these n lnger fllw a tree structure, but their simplicity might make them mre attractive t the user. Linear Cmbinatin Splits Rather than restricting splits t be f the frm X j s, ne can allw splits alng linear cmbinatins f the frm a j X j s. The weights a j and split pint s are ptimized t minimize the relevant criterin (such as the Gini index). While this can imprve the predictive pwer f the tree, it can hurt interpretability. Cmputatinally, the discreteness f the split pint search precludes the use f a smth ptimizatin fr the weights. A better way t incrprate linear cmbinatin splits is in the hierarchical mixtures f experts (HME) mdel, the tpic f Sectin 9.5. Instability f Trees One majr prblem with trees is their high variance. Often a small change in the data can result in a very different series f splits, making interpretatin smewhat precarius. The majr reasn fr this instability is the hierarchical nature f the prcess: the effect f an errr in the tp split is prpagated dwn t all f the splits belw it. One can alleviate this t sme degree by trying t use a mre stable split criterin, but the inherent instability is nt remved. It is the price t be paid fr estimating a simple, tree-based structure frm the data. Bagging (Sectin 8.7) averages many trees t reduce this variance. Lack f Smthness Anther limitatin f trees is the lack f smthness f the predictin surface, as can be seen in the bttm right panel f Figure 9.2. In classificatin with 0/1 lss, this desn t hurt much, since bias in estimatin f the class prbabilities has a limited effect. Hwever, this can degrade perfrmance in the regressin setting, where we wuld nrmally expect the underlying functin t be smth. The MARS prcedure, described in Sectin 9.4,

21 9.2 Tree-Based Methds 19 TABLE 9.3. Spam data: cnfusin rates fr the 17-nde tree (chsen by crss validatin) n the test data. Overall errr rate is 9.3%. Predicted True spam 57.3% 4.0% spam 5.3% 33.4% can be viewed as a mdificatin f CART designed t alleviate this lack f smthness. Difficulty in Capturing Additive Structure Anther prblem with trees is their difficulty in mdeling additive structure. In regressin, suppse, fr example, that Y = c 1 I(X 1 < t 1 )+c 2 I(X 2 < t 2 ) + ε where ε is zer-mean nise. Then a binary tree might make its first split n X 1 near t 1. At the next level dwn it wuld have t split bth ndes n X 2 at t 2 in rder t capture the additive structure. This might happen with sufficient data, but the mdel is given n special encuragement t find such structure. If there were ten rather than tw additive effects, it wuld take many frtuitus splits t recreate the structure, and the data analyst wuld be hard pressed t recgnize it in the estimated tree. The blame here can again be attributed t the binary tree structure, which has bth advantages and drawbacks. Again the MARS methd (Sectin 9.4) gives up this tree structure in rder t capture additive structure Spam Example (Cntinued) We applied the classificatin tree methdlgy t the spam example intrduced earlier. We used the deviance measure t grw the tree and misclassificatin rate t prune it. Figure 9.4 shws the 10-fld crss-validatin errr rate as a functin f the size f the pruned tree, alng with ±2 standard errrs f the mean, frm the ten replicatins. The test errr curve is shwn in range. Nte that the crss-validatin errr rates are indexed by a sequence f values f α and nt tree size; fr trees grwn in different flds, a value f α might imply different sizes. The sizes shwn at the base f the plt refer t T α, the sizes f the pruned riginal tree. The errr flattens ut at arund 17 terminal ndes, giving the pruned tree in Figure 9.5. Of the 13 distinct features chsen by the tree, 11 verlap with the 16 significant features in the additive mdel (Table 9.2). The verall errr rate shwn in Table 9.3 is abut 50% higher than fr the additive mdel in Table 9.1. Cnsider the rightmst branches f the tree. We branch t the right with a spam warning if mre than 5.5% f the characters are the $ sign.

22 20 9. Additive Mdels, Trees, and Related Methds α Misclassificatin Rate PSfrag replacements Tree Size FIGURE 9.4. Results fr spam example. The blue curve is the 10-fld crss validatin estimate f misclassificatin rate as a functin f tree size, with ± tw standard errr bars. The minimum ccurs at a tree size with abut 17 terminal ndes. The range curve is the test errr, which tracks the CV errr quite clsely. The crss-validatin was indexed by values f α, shwn abve. The tree sizes shwn belw refer t T α, the size f the riginal tree indexed by α. Hwever, if in additin the phrase hp ccurs frequently, then this is likely t be cmpany business and we classify as . All f the 22 cases in the test set satisfying these criteria were crrectly classified. If the secnd cnditin is nt met, and in additin the average length f repeated capital letters CAPAVE is larger than 2.9, then we classify as spam. Of the 227 test cases, nly seven were misclassified. In medical classificatin prblems, the terms sensitivity and specificity are used t characterize a rule. They are defined as fllws: Sensitivity: prbability f predicting disease given true state is disease. Specificity: prbability f predicting nn-disease given true state is nndisease.

23 9.2 Tree-Based Methds 21 PSfrag replacements 600/1536 ch$< ch$> remve< /1177 remve>0.06 spam 48/359 hp<0.405 hp> spam spam 180/1065 9/112 26/337 0/22 ch!<0.191 ch!>0.191 gerge<0.15 CAPAVE<2.907 gerge>0.15 CAPAVE> spam spam spam 80/ /204 6/109 0/3 19/110 7/227 gerge<0.005 gerge>0.005 CAPAVE< CAPAVE> < > spam spam 80/652 0/209 36/123 16/81 18/109 0/1 hp<0.03 hp>0.03 free<0.065 free> spam 77/423 3/229 16/94 9/29 CAPMAX<10.5 business<0.145 CAPMAX>10.5 business> / /185 spam 14/89 3/5 receive<0.125 edu<0.045 receive>0.125 edu> spam 19/236 1/2 48/113 9/72 ur<1.2 ur>1.2 spam 37/101 1/12 FIGURE 9.5. The pruned tree fr the spam example. The split variables are shwn in blue n the branches, and the classificatin is shwn in every nde.the numbers under the terminal ndes indicate misclassificatin rates n the test data.

24 22 9. Additive Mdels, Trees, and Related Methds Sensitivity Tree (0.95) GAM (0.98) Weighted Tree (0.90) Specificity FIGURE 9.6. ROC curves fr the classificatin rules fit t the spam data. Curves that are clser t the nrtheast crner represent better classifiers. In this case the GAM classifier dminates the trees. The weighted tree achieves better sensitivity fr higher specificity than the unweighted tree. The numbers in the legend represent the area under the curve. If we think f spam and as the presence and absence f disease, respectively, then frm Table 9.3 we have 33.4 Sensitivity = = 86.3%, Specificity = = 93.4%. In this analysis we have used equal lsses. As befre let L kk be the lss assciated with predicting a class k bject as class k. By varying the relative sizes f the lsses L 01 and L 10, we increase the sensitivity and decrease the specificity f the rule, r vice versa. In this example, we want t avid marking gd as spam, and thus we want the specificity t be very high. We can achieve this by setting L 01 > 1 say, with L 10 = 1. The Bayes rule in each terminal nde classifies t class 1 (spam) if the prprtin f spam is L 01 /(L 10 + L 01 ), and class zer therwise. The

25 9.3 PRIM: Bump Hunting 23 receiver perating characteristic curve (ROC) is a cmmnly used summary fr assessing the tradeff between sensitivity and specificity. It is a plt f the sensitivity versus specificity as we vary the parameters f a classificatin rule. Varying the lss L 01 between 0.1 and 10, and applying Bayes rule t the 17-nde tree selected in Figure 9.4, prduced the ROC curve shwn in Figure 9.6. The standard errr f each curve near 0.9 is apprximately 0.9(1 0.9)/1536 = 0.008, and hence the standard errr f the difference is abut We see that in rder t achieve a specificity f clse t 100%, the sensitivity has t drp t abut 50%. The area under the curve is a cmmnly used quantitative summary; extending the curve linearly in each directin s that it is defined ver [0, 100], the area is apprximately Fr cmparisn, we have included the ROC curve fr the GAM mdel fit t these data in Sectin 9.2; it gives a better classificatin rule fr any lss, with an area f Rather than just mdifying the Bayes rule in the ndes, it is better t take full accunt f the unequal lsses in grwing the tree, as was dne in Sectin 9.2. With just tw classes 0 and 1, lsses may be incrprated int the tree-grwing prcess by using weight L k,1 k fr an bservatin in class k. Here we chse L 01 = 5, L 10 = 1 and fit the same size tree as befre ( T α = 17). This tree has higher sensitivity at high values f the specificity than the riginal tree, but des mre prly at the ther extreme. Its tp few splits are the same as the riginal tree, and then it departs frm it. Fr this applicatin the tree grwn using L 01 = 5 is clearly better than the riginal tree. The area under the ROC curve, used abve, is smetimes called the c- statistic. Interestingly, it can be shwn that the area under the ROC curve is equivalent t the Mann-Whitney U statistic (r Wilcxn rank-sum test), fr the median difference between the predictin scres in the tw grups (Hanley and McNeil 1982). Fr evaluating the cntributin f an additinal predictr when added t a standard mdel, the c-statistic may nt be an infrmative measure. The new predictr can be very significant in terms f the change in mdel deviance, but shw nly a small increase in the c- statistic. Fr example, remval f the highly significant term gerge frm the mdel f Table 9.2 results in a decrease in the c-statistic f less than Instead, it is useful t examine hw the additinal predictr changes the classificatin n an individual sample basis. A gd discussin f this pint appears in Ck (2007). 9.3 PRIM: Bump Hunting Tree-based methds (fr regressin) partitin the feature space int bxshaped regins, t try t make the respnse averages in each bx as differ-

26 24 9. Additive Mdels, Trees, and Related Methds ent as pssible. The splitting rules defining the bxes are related t each thrugh a binary tree, facilitating their interpretatin. The patient rule inductin methd (PRIM) als finds bxes in the feature space, but seeks bxes in which the respnse average is high. Hence it lks fr maxima in the target functin, an exercise knwn as bump hunting. (If minima rather than maxima are desired, ne simply wrks with the negative respnse values.) PRIM als differs frm tree-based partitining methds in that the bx definitins are nt described by a binary tree. This makes interpretatin f the cllectin f rules mre difficult; hwever, by remving the binary tree cnstraint, the individual rules are ften simpler. The main bx cnstructin methd in PRIM wrks frm the tp dwn, starting with a bx cntaining all f the data. The bx is cmpressed alng ne face by a small amunt, and the bservatins then falling utside the bx are peeled ff. The face chsen fr cmpressin is the ne resulting in the largest bx mean, after the cmpressin is perfrmed. Then the prcess is repeated, stpping when the current bx cntains sme minimum number f data pints. This prcess is illustrated in Figure 9.7. There are 200 data pints unifrmly distributed ver the unit square. The clr-cded plt indicates the respnse Y taking the value 1 (red) when 0.5 < X 1 < 0.8 and 0.4 < X 2 < 0.6. and zer (blue) therwise. The panels shws the successive bxes fund by the tp-dwn peeling prcedure, peeling ff a prprtin α = 0.1 f the remaining data pints at each stage. Figure 9.8 shws the mean f the respnse values in the bx, as the bx is cmpressed. After the tp-dwn sequence is cmputed, PRIM reverses the prcess, expanding alng any edge, if such an expansin increases the bx mean. This is called pasting. Since the tp-dwn prcedure is greedy at each step, such an expansin is ften pssible. The result f these steps is a sequence f bxes, with different numbers f bservatin in each bx. Crss-validatin, cmbined with the judgment f the data analyst, is used t chse the ptimal bx size. Dente by B 1 the indices f the bservatins in the bx fund in step 1. The PRIM prcedure then remves the bservatins in B 1 frm the training set, and the tw-step prcess tp dwn peeling, fllwed by bttm-up pasting is repeated n the remaining dataset. This entire prcess is repeated several times, prducing a sequence f bxes B 1, B 2,..., B k. Each bx is defined by a set f rules invlving a subset f predictrs like (a 1 X 1 b 1 ) and (b 1 X 3 b 2 ). A summary f the PRIM prcedure is given Algrithm 9.3. PRIM can handle a categrical predictr by cnsidering all partitins f the predictr, as in CART. Missing values are als handled in a manner similar t CART. PRIM is designed fr regressin (quantitative respnse

27 9.3 PRIM: Bump Hunting FIGURE 9.7. Illustratin f PRIM algrithm. There are tw classes, indicated by the blue (class 0) and red (class 1) pints. The prcedure starts with a rectangle (brken black lines) surrunding all f the data, and then peels away pints alng ne edge by a prespecified amunt in rder t maximize the mean f the pints remaining in the bx. Starting at the tp left panel, the sequence f peelings is shwn, until a pure red regin is islated in the bttm right panel. The iteratin number is indicated at the tp f each panel. Number f Observatins in Bx Bx Mean FIGURE 9.8. Bx mean as a functin f number f bservatins in the bx.

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001)

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001) CN700 Additive Mdels and Trees Chapter 9: Hastie et al. (2001) Madhusudana Shashanka Department f Cgnitive and Neural Systems Bstn University CN700 - Additive Mdels and Trees March 02, 2004 p.1/34 Overview

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants 12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

Contents. This is page i Printer: Opaque this

Contents. This is page i Printer: Opaque this Cntents This is page i Printer: Opaque this Supprt Vectr Machines and Flexible Discriminants. Intrductin............. The Supprt Vectr Classifier.... Cmputing the Supprt Vectr Classifier........ Mixture

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

Sequential Allocation with Minimal Switching

Sequential Allocation with Minimal Switching In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants Supprt Vectr Machines and Flexible Discriminants This is page Printer: Opaque this. Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal separating

More information

SPH3U1 Lesson 06 Kinematics

SPH3U1 Lesson 06 Kinematics PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.

More information

Module 4: General Formulation of Electric Circuit Theory

Module 4: General Formulation of Electric Circuit Theory Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

You need to be able to define the following terms and answer basic questions about them:

You need to be able to define the following terms and answer basic questions about them: CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f

More information

Linear programming III

Linear programming III Linear prgramming III Review 1/33 What have cvered in previus tw classes LP prblem setup: linear bjective functin, linear cnstraints. exist extreme pint ptimal slutin. Simplex methd: g thrugh extreme pint

More information

Statistical Learning. 2.1 What Is Statistical Learning?

Statistical Learning. 2.1 What Is Statistical Learning? 2 Statistical Learning 2.1 What Is Statistical Learning? In rder t mtivate ur study f statistical learning, we begin with a simple example. Suppse that we are statistical cnsultants hired by a client t

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

Linear Classification

Linear Classification Linear Classificatin CS 54: Machine Learning Slides adapted frm Lee Cper, Jydeep Ghsh, and Sham Kakade Review: Linear Regressin CS 54 [Spring 07] - H Regressin Given an input vectr x T = (x, x,, xp), we

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

Activity Guide Loops and Random Numbers

Activity Guide Loops and Random Numbers Unit 3 Lessn 7 Name(s) Perid Date Activity Guide Lps and Randm Numbers CS Cntent Lps are a relatively straightfrward idea in prgramming - yu want a certain chunk f cde t run repeatedly - but it takes a

More information

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line Chapter 13: The Crrelatin Cefficient and the Regressin Line We begin with a sme useful facts abut straight lines. Recall the x, y crdinate system, as pictured belw. 3 2 1 y = 2.5 y = 0.5x 3 2 1 1 2 3 1

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

How do scientists measure trees? What is DBH?

How do scientists measure trees? What is DBH? Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date AP Statistics Practice Test Unit Three Explring Relatinships Between Variables Name Perid Date True r False: 1. Crrelatin and regressin require explanatry and respnse variables. 1. 2. Every least squares

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

Assessment Primer: Writing Instructional Objectives

Assessment Primer: Writing Instructional Objectives Assessment Primer: Writing Instructinal Objectives (Based n Preparing Instructinal Objectives by Mager 1962 and Preparing Instructinal Objectives: A critical tl in the develpment f effective instructin

More information

Pipetting 101 Developed by BSU CityLab

Pipetting 101 Developed by BSU CityLab Discver the Micrbes Within: The Wlbachia Prject Pipetting 101 Develped by BSU CityLab Clr Cmparisns Pipetting Exercise #1 STUDENT OBJECTIVES Students will be able t: Chse the crrect size micrpipette fr

More information

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM The general linear mdel and Statistical Parametric Mapping I: Intrductin t the GLM Alexa Mrcm and Stefan Kiebel, Rik Hensn, Andrew Hlmes & J-B J Pline Overview Intrductin Essential cncepts Mdelling Design

More information

Subject description processes

Subject description processes Subject representatin 6.1.2. Subject descriptin prcesses Overview Fur majr prcesses r areas f practice fr representing subjects are classificatin, subject catalging, indexing, and abstracting. The prcesses

More information

THE LIFE OF AN OBJECT IT SYSTEMS

THE LIFE OF AN OBJECT IT SYSTEMS THE LIFE OF AN OBJECT IT SYSTEMS Persns, bjects, r cncepts frm the real wrld, which we mdel as bjects in the IT system, have "lives". Actually, they have tw lives; the riginal in the real wrld has a life,

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Smoothing, penalized least squares and splines

Smoothing, penalized least squares and splines Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

Lecture 10, Principal Component Analysis

Lecture 10, Principal Component Analysis Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16 Principal Cmpnent Analysis Lecture 10, Principal

More information

Engineering Decision Methods

Engineering Decision Methods GSOE9210 vicj@cse.unsw.edu.au www.cse.unsw.edu.au/~gs9210 Maximin and minimax regret 1 2 Indifference; equal preference 3 Graphing decisin prblems 4 Dminance The Maximin principle Maximin and minimax Regret

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis

More information

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall Stats 415 - Classificatin Ji Zhu, Michigan Statistics 1 Classificatin Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Classificatin Ji Zhu, Michigan Statistics 2 Examples f Classificatin

More information

Department of Electrical Engineering, University of Waterloo. Introduction

Department of Electrical Engineering, University of Waterloo. Introduction Sectin 4: Sequential Circuits Majr Tpics Types f sequential circuits Flip-flps Analysis f clcked sequential circuits Mre and Mealy machines Design f clcked sequential circuits State transitin design methd

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

Thermodynamics Partial Outline of Topics

Thermodynamics Partial Outline of Topics Thermdynamics Partial Outline f Tpics I. The secnd law f thermdynamics addresses the issue f spntaneity and invlves a functin called entrpy (S): If a prcess is spntaneus, then Suniverse > 0 (2 nd Law!)

More information

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science

More information

The standards are taught in the following sequence.

The standards are taught in the following sequence. B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student

More information

Localized Model Selection for Regression

Localized Model Selection for Regression Lcalized Mdel Selectin fr Regressin Yuhng Yang Schl f Statistics University f Minnesta Church Street S.E. Minneaplis, MN 5555 May 7, 007 Abstract Research n mdel/prcedure selectin has fcused n selecting

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

1 The limitations of Hartree Fock approximation

1 The limitations of Hartree Fock approximation Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants

More information

Preparation work for A2 Mathematics [2017]

Preparation work for A2 Mathematics [2017] Preparatin wrk fr A2 Mathematics [2017] The wrk studied in Y12 after the return frm study leave is frm the Cre 3 mdule f the A2 Mathematics curse. This wrk will nly be reviewed during Year 13, it will

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are: Algrithm fr Estimating R and R - (David Sandwell, SIO, August 4, 2006) Azimith cmpressin invlves the alignment f successive eches t be fcused n a pint target Let s be the slw time alng the satellite track

More information

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems. Building t Transfrmatins n Crdinate Axis Grade 5: Gemetry Graph pints n the crdinate plane t slve real-wrld and mathematical prblems. 5.G.1. Use a pair f perpendicular number lines, called axes, t define

More information

Discussion on Regularized Regression for Categorical Data (Tutz and Gertheiss)

Discussion on Regularized Regression for Categorical Data (Tutz and Gertheiss) Discussin n Regularized Regressin fr Categrical Data (Tutz and Gertheiss) Peter Bühlmann, Ruben Dezeure Seminar fr Statistics, Department f Mathematics, ETH Zürich, Switzerland Address fr crrespndence:

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

Preparation work for A2 Mathematics [2018]

Preparation work for A2 Mathematics [2018] Preparatin wrk fr A Mathematics [018] The wrk studied in Y1 will frm the fundatins n which will build upn in Year 13. It will nly be reviewed during Year 13, it will nt be retaught. This is t allw time

More information