Computational Statistics

Size: px
Start display at page:

Download "Computational Statistics"

Transcription

1 Cmputatinal Statistics Spring 2008 Peter Bühlmann and Martin Mächler Seminar für Statistik ETH Zürich February 2008 (February 23, 2011)

2 ii

3 Cntents 1 Multiple Linear Regressin Intrductin The Linear Mdel Stchastic Mdels Examples Gals f a linear regressin analysis Least Squares Methd The nrmal equatins Assumptins fr the Linear Mdel Gemetrical Interpretatin Dn t d many regressins n single variables! Cmputer-Output frm R : Part I Prperties f Least Squares Estimates Mments f least squares estimates Distributin f least squares estimates assuming Gaussian errrs Tests and Cnfidence Regins Cmputer-Output frm R : Part II Analysis f residuals and checking f mdel assumptins The Tukey-Anscmbe Plt The Nrmal Plt Plt fr detecting serial crrelatin Generalized least squares and weighted regressin Mdel Selectin Mallws C p statistic Nnparametric Density Estimatin Intrductin Estimatin f a density Histgram Kernel estimatr The rle f the bandwidth Variable bandwidths: k nearest neighbrs The bias-variance trade-ff Asympttic bias and variance Estimating the bandwidth Higher dimensins The curse f dimensinality iii

4 iv CONTENTS 3 Nnparametric Regressin Intrductin The kernel regressin estimatr The rle f the bandwidth Inference fr the underlying regressin curve Lcal plynmial nnparametric regressin estimatr Smthing splines and penalized regressin Penalized sum f squares The smthing spline slutin Shrinking twards zer Relatin t equivalent kernels Crss-Validatin Intrductin Training and Test Set Cnstructing training-, test-data and crss-validatin Leave-ne-ut crss-validatin K-fld Crss-Validatin Randm divisins int test- and training-data Prperties f different CV-schemes Leave-ne-ut CV Leave-d-ut CV K-fld CV; stchastic apprximatins Cmputatinal shrtcut fr sme linear fitting peratrs Btstrap Intrductin Efrn s nnparametric btstrap The btstrap algrithm The btstrap distributin Btstrap cnfidence interval: a first apprach Btstrap estimate f the generalizatin errr Out-f-btstrap sample fr estimatin f the generalizatin errr Duble btstrap Mdel-based btstrap Parametric btstrap Mdel structures beynd i.i.d. and the parametric btstrap The mdel-based btstrap fr regressin Classificatin Intrductin The Bayes classifier The view f discriminant analysis Linear discriminant analysis Quadratic discriminant analysis The view f lgistic regressin Binary classificatin Multiclass case, J >

5 CONTENTS v 7 Flexible regressin and classificatin methds Intrductin Additive mdels Backfitting fr additive regressin mdels Additive mdel fitting in R MARS Hierarchical interactins and cnstraints MARS in R Neural Netwrks Fitting neural netwrks in R Prjectin pursuit regressin Prjectin pursuit regressin in R Classificatin and Regressin Trees (CART) Tree structured estimatin and tree representatin Tree-structured search algrithm and tree interpretatin Prs and cns f trees CART in R Variable Selectin, Regularizatin, Ridging and the Lass Intrductin Ridge Regressin The Lass Lass extensins Bagging and Bsting Intrductin Bagging The bagging algrithm Bagging fr trees Subagging Bsting L 2 Bsting

6 vi CONTENTS

7 Chapter 1 Multiple Linear Regressin 1.1 Intrductin Linear regressin is a widely used statistical mdel in a brad variety f applicatins. It is ne f the easiest examples t demnstrate imprtant aspects f statistical mdelling. 1.2 The Linear Mdel Multiple Regressin Mdel: Given is ne respnse variable: up t sme randm errrs it is a linear functin f several predictrs (r cvariables). The linear functin invlves unknwn parameters. The gal is t estimate these parameters, t study their relevance and t estimate the errr variance. Mdel frmula: Y i = β 1 x i β p x ip + ε i (i = 1,..., n) (1.1) Usually we assume that ε 1,..., ε n are i.i.d. (independent, identically distributed) with E[ε i ] = 0, Var(ε i ) = σ 2. Ntatins: Y = {Y i ; i = 1,..., n} is the vectr f the respnse variables x (j) = {x ij ; i = 1,..., n} is the vectr f the jth predictr (cvariable) (j = 1,..., p) x i = {x ij ; j = 1,... p} is the vectr f predictrs fr the ith bservatin (i = 1,..., n) β = {β j ; j = 1,..., p} is the vectr f the unknwn parameters ε = {ε i ; i = 1,..., n} is the vectr f the unknwn randm errrs n is the sample size, p is the number f predictrs The parameters β j and σ 2 are unknwn and the errrs ε i are unbservable. On the ther hand, the respnse variables Y i and the predictrs x ij have been bserved. Mdel in vectr ntatin: Y i = x i β + ε i (i = 1,..., n) 1

8 2 Multiple Linear Regressin Mdel in matrix frm: Y = X β + ε n 1 n p p 1 n 1 (1.2) where X is a (n p)-matrix with rws x i and clumns x (j). The first predictr variable is ften a cnstant, i.e., x i1 1 fr all i. We then get an intercept in the mdel. Y i = β 1 + β 2 x i β p x ip + ε i. We typically assume that the sample size n is larger than the number f predictrs p, n > p, and mrever that the matrix X has full rank p, i.e., the p clumn vectrs x (1),..., x (p) are linearly independent Stchastic Mdels The linear mdel in (1.1) invlves sme stchastic (randm) cmpnents: the errr terms ε i are randm variables and hence the respnse variables Y i as well. The predictr variables x ij are here assumed t be nn-randm. In sme applicatins, hwever it is mre apprpriate t treat the predictr variables as randm. The stchastic nature f the errr terms ε i can be assigned t varius surces: fr example, measurement errrs r inability t capture all underlying nn-systematic effects which are then summarized by a randm variable with expectatin zer. The stchastic mdelling apprach will allw t quantify uncertainty, t assign significance t varius cmpnents, e.g. significance f predictr variables in mdel (1.1), and t find a gd cmprmise between the size f a mdel and the ability t describe the data (see sectin 1.7). The bserved respnse in the data is always assumed t be realizatins f the randm variables Y 1,..., Y n ; the x ij s are nn-randm and equal t the bserved predictrs in the data Examples Tw-sample mdel: p = 2, X = , β = ( µ1 µ 2 ). Main questins: µ 1 = µ 2? Quantitative difference between µ 1 and µ 2? Frm intrductry curses we knw that ne culd use the tw-sample t-test r tw-sample Wilcxn test.

9 1.2 The Linear Mdel 3 Regressin thrugh the rigin: Y i = βx i + ε i (i = 1,... n). x 1 p = 1, x 2 X =., β = β. x n Simple linear regressin: Y i = β 1 + β 2 x i + ε i (i = 1,... n). 1 x 1 1 x 2 ( ) p = 2 X =.., β = β1. β 2 1 x n Quadratic regressin: Y i = β 1 + β 2 x i + β 3 x 2 i + ε i (i = 1,... n). 1 x 1 x x 2 x 2 β 1 2 p = 3, X =..., β = β 2. β 1 x n x 2 3 n Nte that the fitted functin is quadratic in the x i s but linear in the cefficients β j and therefre a special case f the linear mdel (1.1). Regressin with transfrmed predictr variables: Y i = β 1 + β 2 lg(x i2 ) + β 3 sin(πx i3 ) + ε i (i = 1,... n). 1 lg(x 12 ) sin(πx 13 ) 1 lg(x 22 ) sin(πx 23 ) p = 3, X =..., β = 1 lg(x n2 ) sin(πx n3 ) Again, the mdel is linear in the cefficients β j but nnlinear in the x ij s. In summary: The mdel in (1.1) is called linear because it is linear in the cefficients β j. The predictr (and als the respnse) variables can be transfrmed versins f the riginal predictr and/r respnse variables Gals f a linear regressin analysis A gd fit. Fitting r estimating a (hyper-)plane ver the predictr variables t explain the respnse variables such that the errrs are small. The standard tl fr this is the methd f least squares (see sectin 1.3). Gd parameter estimates. This is useful t describe the change f the respnse when varying sme predictr variable(s). Gd predictin. This is useful t predict a new respnse as a functin f new predictr variables. Uncertainties and significance fr the three gals abve. Cnfidence intervals and statistical tests are useful tls fr this gal. β 1 β 2 β 3.

10 4 Multiple Linear Regressin Develpment f a gd mdel. In an interactive prcess, using methds fr the gals mentined abve, we may change parts f an initial mdel t cme up with a better mdel. The first and third gal can becme cnflicting, see sectin Least Squares Methd We assume the linear mdel Y = Xβ + ε. We are lking fr a gd estimate f β. The least squares estimatr β is defined as where dentes the Euclidean nrm in R n. β = arg min Y Xβ 2, (1.3) β The nrmal equatins The minimizer in (1.3) can be cmputed explicitly (assuming that X has rank p). Cmputing partial derivatives f Y Xβ 2 with respect t β (p-dimensinal gradient vectr), evaluated at β, and setting them t zer yields ( 2) X (Y X β) = 0 ((p 1) null-vectr). Thus, we get the nrmal equatins X X β = X Y. (1.4) These are p linear equatins fr the p unknwns (cmpnents f β). Assuming that the matrix X has full rank p, the p p matrix X X is invertible, the least squares estimatr is unique and can be represented as β = (X X) 1 X Y. This frmula is useful fr theretical purpses. Fr numerical cmputatin it is much mre stable t use the QR decmpsitin instead f inverting the matrix X X. 1 Frm the residuals r i = Y i x i β, the usual estimate fr σ 2 is ˆσ 2 = 1 n p n ri 2. Nte that the r i s are estimates fr ε i s; hence the estimatr is plausible, up t the smewhat unusual factr 1/(n p). It will be shwn in sectin that due t this factr, E[ˆσ 2 ] = σ 2. 1 Let X = QR with rthgnal (n p) matrix Q and upper (Right) triangular (p p) R. Because f X X = R Q QR = R R, cmputing β nly needs subsequent slutin f tw triangular systems: First slve R c = X Y fr c, and then slve R β = c. Further, when Cv( β) requires (X X) 1, the latter is R 1 (R 1 ). i=1

11 1.3 Least Squares Methd 5 number f births (CH) (in 1000) year Figure 1.1: The pill kink Assumptins fr the Linear Mdel We emphasize here that we d nt make any assumptins n the predictr variables, except that the matrix X has full rank p < n. In particular, the predictr variables can be cntinuus r discrete (e.g. binary). We need sme assumptins s that fitting a linear mdel by least squares is reasnable and that tests and cnfidence intervals (see 1.5) are apprximately valid. 1. The linear regressin equatin is crrect. This means: E[ε i ] = 0 fr all i. 2. All x i s are exact. This means that we can bserve them perfectly. 3. The variance f the errrs is cnstant ( hmscedasticity ). This means: Var(ε i ) = σ 2 fr all i. 4. The errrs are uncrrelated. This means: Cv(ε i, ε j ) = 0 fr all i j. 5. The errrs {ε i ; i = 1,..., n} are jintly nrmally distributed. This implies that als {Y i ; i = 1,..., n} are jintly nrmally distributed. In case f vilatins f item 3, we can use weighted least squares instead f least squares. Similarly, if item 4 is vilated, we can use generalized least squares. If the nrmality assumptin in 5 des nt hld, we can use rbust methds instead f least squares. If assumptin 2 fails t be true, we need crrectins knwn frm errrs in variables methds. If the crucial assumptin in 1 fails, we need ther mdels than the linear mdel. The fllwing example shws vilatins f assumptin 1 and 4. The respnse variable is the annual number f births in Switzerland since 1930, and the predictr variable is the time (year). We see in Figure 1.1 that the data can be apprximately described by a linear relatin until the pill kink in We als see that the errrs seem t be crrelated: they are all psitive r negative during perids f years. Finally, the linear mdel is nt representative after the pill kink in In general, it is dangerus t use a fitted mdel fr extraplatin where n predictr variables have been bserved (fr example: if we had fitted the linear mdel in 1964 fr predictin f number f births in the future until 2005).

12 6 Multiple Linear Regressin Y r Y^ E[Y] X Figure 1.2: The residual vectr r is rthgnal t X Gemetrical Interpretatin The respnse variable Y is a vectr in R n. Als, Xβ describes a p-dimensinal subspace X in R n (thrugh the rigin) when varying β R p (assuming that X has full rank p). The least squares estimatr β is then such that X β is clsest t Y with respect t the Euclidean distance. But this means gemetrically that We dente the (vectr f ) fitted values by They can be viewed as an estimate f Xβ. The (vectr f) residuals is defined by X β is the rthgnal prjectin f Y nt X. Ŷ = X β. r = Y Ŷ. Gemetrically, it is evident that the residuals are rthgnal t X, because rthgnal prjectin f Y nt X. This means that Ŷ is the where x (j) is the jth clumn f X. We can frmally see why the map r x (j) = 0 fr all j = 1,..., p, Y Ŷ is an rthgnal prjectin. Since Ŷ = X β = X(X X) 1 X Y, the map can be represented by the matrix P = X(X X) 1 X. (1.5)

13 1.3 Least Squares Methd 7 It is evident that P is symmetric (P = P ) and P is idem-ptent (P 2 = P ). Furthermre P ii = tr(p ) = tr(x(x X) 1 X ) = tr((x X) 1 X X) = tr(i p p ) = p. i But these 3 prperties characterize that P is an rthgnal prjectin frm R n nt a p-dimensinal subspace, here X. The residuals r can be represented as r = (I P )Y, where I P is nw als an rthgnal prjectin nt the rthgnal cmplement f X, X = R n \ X, which is (n p)-dimensinal. In particular, the residuals are elements f X Dn t d many regressins n single variables! In general, it is nt apprpriate t replace multiple regressin by many single regressins (n single predictr variables). The fllwing (synthetic) example shuld help t demnstrate this pint. Cnsider tw predictr variables x (1), x (2) and a respnse variable Y with the values x (1) x (2) Y Multiple regressin yields the least squares slutin which describes the data pints exactly Y i = Ŷi = 2x i1 x i2 fr all i (ˆσ 2 = 0). (1.6) The cefficients 2 and -1, respectively, describe hw y is changing when varying either x (1) r x (2) and keeping the ther predictr variable cnstant. In particular, we see that Y decreases when x (2) increases. On the ther hand, if we d a simple regressin f Y nt x (2) (while ignring the values f x (1) ; and thus, we d nt keep them cnstant), we btain the least squares estimate Ŷ i = 1 9 x i fr all i (ˆσ2 = 1.72). This least squares regressin line describes hw Y changes when varying x (2) while ignring x (1). In particular, Ŷ increases when x(2) increases, in cntrast t multiple regressin! The reasn fr this phenmenn is that x (1) and x (2) are strngly crrelated: if x (2) increases, then als x (1) increases. Nte that in the multiple regressin slutin, x (1) has a larger cefficient in abslute value than x (2) and hence, an increase in x (1) has a strnger influence fr changing y than x (2). The crrelatin amng the predictrs in general makes als the interpretatin f the regressin cefficients mre subtle: in the current setting, the cefficient β 1 quantifies the influence f x (1) n Y after having subtracted the effect f x (2) n Y, see als sectin 1.5. Summarizing: Simple least squares regressins n single predictr variables yield the multiple regressin least squares slutin, nly if the predictr variables are rthgnal. In general, multiple regressin is the apprpriate tl t include effects f mre than ne predictr variables simultaneusly.

14 8 Multiple Linear Regressin The equivalence in case f rthgnal predictrs is easy t see algebraically. Orthgnality f predictrs means X X = diag( n i=1 x2 i1,..., n i=1 x2 ip ) and hence the least squares estimatr ˆβ j = n n x ij Y i / x 2 ij (j = 1,..., p), i=1 i=1 i.e., ˆβ j depends nly n the respnse variable Y i and the jth predictr variable x ij Cmputer-Output frm R : Part I We shw here parts f the cmputer utput (frm R ) when fitting a linear mdel t data abut quality f asphalt. y = LOGRUT : lg("rate f rutting") = lg(change f rut depth in inches per millin wheel passes) ["rut" := Wagenspur, ausgefahrenes Geleise] x1 = LOGVISC : lg(viscsity f asphalt) x2 = ASPH : percentage f asphalt in surface curse x3 = BASE : percentage f asphalt in base curse x4 = RUN : 0/1 indicatr fr tw sets f runs. x5 = FINES : 10 percentage f fines in surface curse x6 = VOIDS : percentage f vids in surface curse The fllwing table shws the least squares estimates ˆβ j (j = 1,..., 7), sme empirical quantiles f the residuals r i (i = 1,..., n), the estimated standard deviatin f the errrs 2 ˆσ 2 and the s-called degrees f freedm n p. Call: lm(frmula = LOGRUT ~., data = asphalt1) Residuals: Min 1Q Median 3Q Max Cefficients: Estimate (Intercept) LOGVISC ASPH BASE RUN FINES VOIDS Residual standard errr: n 24 degrees f freedm 2 The term residual standard errr is a misnmer with a lng traditin, since standard errr usually means Var (ˆθ) fr an estimated parameter θ.

15 1.4 Prperties f Least Squares Estimates Prperties f Least Squares Estimates As an intrductry remark, we pint ut that the least squares estimates are randm variables: fr new data frm the same data-generating mechanism, the data wuld lk differently every time and hence als the least squares estimates. Figure 1.3 displays three least squares regressin lines which are based n three different realizatins frm the same data-generating mdel (i.e., three simulatins frm a mdel). We see that the estimates are varying: they are randm themselves! true line least squares line Figure 1.3: Three least squares estimated regressin lines fr three different data realizatins frm the same mdel Mments f least squares estimates We assume here the usual linear mdel Y = Xβ + ε, E[ε] = 0, Cv(ε) = E[εε ] = σ 2 I n n. (1.7) This means that assumptin frm sectin are satisfied. It can then be shwn that: (i) E[ β] = β: that is, β is unbiased (ii) E[Ŷ] = E[Y] = Xβ which fllws frm (i). Mrever, E[r] = 0. (iii) Cv( β) = σ 2 (X X) 1 (iv) Cv(Ŷ) = σ2 P, Cv(r) = σ 2 (I P ) The residuals (which are estimates f the unknwn errrs ε i ) are als having expectatin zer but they are nt uncrrelated: Frm this, we btain Var(r i ) = σ 2 (1 P ii ). n E[ ri 2 ] = i=1 n E[ri 2 ] = i=1 n n Var(r i ) = σ 2 (1 P ii ) = σ 2 (n tr(p )) = σ 2 (n p). i=1 i=1 Therefre, E[ˆσ 2 ] = E[ n i=1 r2 i /(n p)] = σ2 is unbiased.

16 10 Multiple Linear Regressin Distributin f least squares estimates assuming Gaussian errrs We assume the linear mdel as in (1.7) but require in additin that ε 1,..., ε n i.i.d. N (0, σ 2 ). It can then be shwn that: (i) β N p (β, σ 2 (X X) 1 ) (ii) Ŷ N n(xβ, σ 2 P ), r N n (0, σ 2 (I P )) (iii) ˆσ 2 σ2 n p χ2 n p. The nrmality assumptins f the errrs ε i is ften nt (apprximately) fulfilled in practice. We can then rely n the central limit therem which implies that fr large sample size n, the prperties (i)-(iii) abve are still apprximately true. This is the usual justificatin in practice t use these prperties fr cnstructing cnfidence intervals and tests fr the linear mdel parameters. Hwever, it is ften much better t use rbust methds in case f nn-gaussian errrs which we are nt discussing here. 1.5 Tests and Cnfidence Regins We assume the linear mdel as in (1.7) with ε 1,..., ε n i.i.d. N (0, σ 2 ) (r with ε i s i.i.d. and large sample size n). As we have seen abve, the parameter estimates β are nrmally distributed. If we are interested whether the jth predictr variable is relevant, we can test the null-hypthesis H 0,j : β j = 0 against the alternative H A,j : β j 0. We can then easily derive frm the nrmal distributin f ˆβ j that ˆβ j N (0, 1) under the null-hypthesis H 0,j. σ 2 (X X) 1 jj Since σ 2 is unknwn, this quantity is nt useful, but if we substitute it with the estimate ˆσ 2 we btain the s-called t-test statistic T j = ˆβ j t n p under the null-hypthesis H 0,j, (1.8) ˆσ 2 (X X) 1 jj which has a slightly different distributin than standard Nrmal N (0, 1). The crrespnding test is then called the t-test. In practice, we can thus quantify the relevance f individual predictr variables by lking at the size f the test-statistics T j (j = 1,..., p) r at the crrespnding P -values which may be mre infrmative. The prblem by lking at individual tests H 0,j is (besides the multiple testing prblem in general) that it can happen that all individual tests d nt reject the null-hyptheses (say at the 5% significance level) althugh it is true that sme predictr variables have a significant effect. This paradx can ccur because f crrelatin amng the predictr variables. An individual t-test fr H 0,j shuld be interpreted as quantifying the effect f the jth predictr variable after having subtracted the linear effect f all ther predictr variables n Y. T test whether there exists any effect frm the predictr variables, we can lk at the simultaneus null-hypthesis H 0 : β 2 =... = β p = 0 versus the alternative H A : β j

17 1.5 Tests and Cnfidence Regins 11 0 fr at least ne j {2,..., p}; we assume here that the first predictr variable is the cnstant X i,1 1 (there are p 1 (nn-trivial) predictr variables). Such a test can be develped with an analysis f variance (anva) decmpsitin which takes a simple frm fr this special case: Y Y 2 = Ŷ Y 2 + Y Ŷ 2 which decmpses the ttal squared errr Y Y arund the mean Y = n 1 n i=1 Y i 1 as a sum f the squared errr due t the regressin Ŷ Y (the amunt that the fitted values vary arund the glbal arithmetic mean) and the squared residual errr r = Y Ŷ. (The equality can be seen mst easily frm a gemetrical pint f view: the residuals r are rthgnal t X and hence t Ŷ Y). Such a decmpsitin is usually summarized by an ANOVA table (ANalysis Of VAriance). sum f squares degrees f freedm mean square E [mean square] regressin Ŷ Y 2 p 1 Ŷ Y 2 /(p 1) σ 2 + E[Y] E[Y] 2 p 1 errr Y Ŷ 2 n p Y Ŷ 2 /(n p) σ 2 ttal arund glbal mean Y Y 2 n 1 In case f the glbal null-hypthesis, there is n effect f any predictr variable and hence E[Y] cnst. = E[Y]: therefre, the expected mean square equals σ 2 under H 0. The idea is nw t divide the mean square by the estimate ˆσ 2 t btain a scale-free quantity: this leads t the s-called F -statistic F = Ŷ Y 2 /(p 1) Y Ŷ 2 /(n p) F p 1,n p under the glbal null-hypthesis H 0. This test is called the F -test (it is ne amng several ther F -tests in regressin). Besides perfrming a glbal F -test t quantify the statistical significance f the predictr variables, we ften want t describe the gdness f fit f the linear mdel fr explaining the data. A meaningful quantity is the cefficient f determinatin, abbreviated by R 2, R 2 = Ŷ Y 2 Y Y 2. This is the prprtin f the ttal variatin f Y arund Y which is explained by the regressin (see the ANOVA decmpsitin and table abve). Similarly t the t-tests as in (1.8), ne can derive cnfidence intervals fr the unknwn parameters β j : ˆβ j ± ˆσ 2 (X X) 1 jj t n p;1 α/2 is a tw-sided cnfidence interval which cvers the true β j with prbability 1 α; here, t n p;1 α/2 dentes the 1 α/2 quantile f a t n p distributin Cmputer-Output frm R : Part II We cnsider again the dataset frm sectin We nw give the cmplete list f summary statistics frm a linear mdel fit t the data. Call: lm(frmula = LOGRUT ~., data = asphalt1)

18 12 Multiple Linear Regressin Residuals: Min 1Q Median 3Q Max Cefficients: Estimate Std. Errr t value Pr(> t ) (Intercept) LOGVISC e-07 ASPH BASE RUN FINES VOIDS Signif. cdes: Residual standard errr: n 24 degrees f freedm Multiple R-Squared: , Adjusted R-squared: F-statistic: n 6 and 24 DF, p-value: < 2.2e-16 The table displays the standard errrs f the estimates Var( ˆβ j ) = ˆσ 2 (X X) 1 jj, the t-test statistics fr the null-hyptheses H 0,j : β j = 0 and their crrespnding tw-sided P -values with sme abbreviatin abut strength f significance. Mrever, the R 2 and adjusted R 2 are given and finally als the F -test statistic fr the null-hypthesis H 0 : β 2 =... = β p = 0 (with the degrees f freedm) and its crrespnding P -value. 1.6 Analysis f residuals and checking f mdel assumptins The residuals r i = Y i Ŷi can serve as an apprximatin f the unbservable errr term ε i and fr checking whether the linear mdel is apprpriate The Tukey-Anscmbe Plt The Tukey-Anscmbe is a graphical tl: we plt the residuals r i (n the y-axis) versus the fitted values Ŷi (n the x-axis). A reasn t plt against the fitted values Ŷi is that the sample crrelatin between r i and Ŷi is always zer. In the ideal case, the pints in the Tukey-Anscmbe plt fluctuate randmly arund the hrizntal line thrugh zer: see als Figure 1.4. An ften encuntered deviatin is nn-cnstant variability f the residuals, i.e., an indicatin that the variance f ε i increases with the respnse variable Y i : this is shwn in Figure 1.5 a) c). If the Tukey-Anscmbe plt shws a trend, there is sme evidence that the linear mdel assumptin is nt crrect (the expectatin f the errr is nt zer which indicates a systematic errr): Figure 1.5d) is a typical example. In case where the Tukey-Anscmbe plt exhibits a systematic relatin f the variability n the fitted values Ŷi, we shuld either transfrm the respnse variable r perfrm a weighted regressin (see Sectin 1.6.4). If the standard deviatin grws linearly with the fitted values (as in Figure 1.5a)), the lg-transfrm Y lg(y ) stabilizes the variance; if the standard deviatin grws as the square rt with the values Ŷi (as in Figure 1.5b)), the square rt transfrmatin Y Y stabilizes the variance.

19 1.6 Analysis f residuals and checking f mdel assumptins 13 r y^ Figure 1.4: Ideal Tukey-Anscmbe plt: n vilatins f mdel assumptins. a) r y^ b) r y^ c) r y^ d) r y^ Figure 1.5: a) linear increase f standard deviatin, b) nnlinear increase f standard deviatin, c) 2 grups with different variances, d) missing quadratic term in the mdel The Nrmal Plt Assumptins fr the distributin f randm variables can be graphically checked with the QQ (quantile-quantile) plt. In the special case f checking fr the nrmal distributin, the QQ plt is als referred t as a nrmal plt. In the linear mdel applicatin, we plt the empirical quantiles f the residuals (n the y axis) versus the theretical quantiles f a N (0, 1) distributin (n the x axis). If the residuals were nrmally distributed with expectatin µ and variance σ 2, the nrmal plt wuld exhibit an apprximate straight line with intercept µ and slpe σ. Figures 1.6 and 1.7 shw sme nrmal plts with exactly nrmally and nn-nrmally distributed bservatins Plt fr detecting serial crrelatin Fr checking independence f the errrs we plt the residuals r i versus the bservatin number i (r if available, the time t i f recrding the ith bservatin). If the residuals vary randmly arund the zer line, there are n indicatins fr serial crrelatins amng the errrs ε i. On the ther hand, if neighbring (with respect t the x-axis) residuals lk similar, the independence assumptin fr the errrs seems vilated.

20 14 Multiple Linear Regressin Quantiles f Standard Nrmal x a) Quantiles f Standard Nrmal x Quantiles f Standard Nrmal x b) Quantiles f Standard Nrmal x Quantiles f Standard Nrmal x c) Quantiles f Standard Nrmal x Figure 1.6: QQ-plts fr i.i.d. nrmally distributed randm variables. Tw plts fr each sample size n equal t a) 20, b) 100 and c) Quantiles f Standard Nrmal x a) Quantiles f Standard Nrmal x b) Quantiles f Standard Nrmal x c) Figure 1.7: QQ-plts fr a) lng-tailed distributin, b) skewed distributin, c) dataset with utlier Generalized least squares and weighted regressin In a mre general situatin, the errrs are crrelated with knwn cvariance matrix, Y = Xβ + ε, with ε N n (0, Σ). When Σ is knwn (and als in the case where Σ = σ 2 G with unknwn σ 2 ), this case can be transfrmed t the i.i.d. ne, using a square rt C such that Σ = CC (defined, e.g., via Chlesky factrizatin Σ = LL, L is uniquely determined lwer triangular): If Ỹ := C 1 Y and X := C 1 X, we have Ỹ = Xβ + ε, where ε N (0, I). This leads t the generalized least squares slutin β = (X Σ 1 X) 1 X Σ 1 Y with Cv( β) = (X Σ 1 X) 1. A special case where Σ is diagnal, Σ = σ 2 diag(z 1, z 2,..., z n ) (with trivial inverse) is the weighted least squares prblem min β n i=1 w i(y i x i β) 2, with weights w i 1/z i. 1.7 Mdel Selectin We assume the linear mdel Y i = p j=1 β j x ij + ε i (i = 1,..., n),

21 1.7 Mdel Selectin 15 where ε 1,..., ε n i.i.d., E[ε i ] = 0, Var(ε i ) = σ 2. Prblem: Which f the predictr variables shuld be used in the linear mdel? It may be that nt all f the p predictr variables are relevant. In additin, every cefficient has t be estimated and thus is afflicted with variability: the individual variabilities fr each cefficient sum up and the variability f the estimated hyper-plane increases the mre predictrs are entered int the mdel, whether they are relevant r nt. The aim is ften t lk fr the ptimal r best - nt the true - mdel. What we just explained in wrds can be frmalized a bit mre. Suppse we are lking fr ptimizing the predictin q ˆβ jr x ijr r=1 which includes q predictr variables with indices j 1,..., j q {1,..., p}. The average mean squared errr f this predictin is n 1 n E[(m(x i ) i=1 i=1 q ˆβ jr x ijr ) 2 ] r=1 n q = n 1 (E[ ˆβ jr x ijr ] m(x i )) 2 + n 1 r=1 n q Var( ˆβ jr x ijr ), (1.9) i=1 r=1 } {{ } = q n σ2 where m( ) dentes the regressin functin in the full mdel with all the predictr variables. It is plausible that the systematic errr (squared bias) n 1 n i=1 (E[ q ˆβ r=1 jr x ijr ] m(x i )) 2 decreases as the number f predictr variables q increases (i.e., with respect t bias, we have nthing t lse by using as many predictrs as we can), but the variance term increases linearly in the number f predictrs q (the variance term equals q/n σ 2 which is nt t difficult t derive). This is the s-called bias-variance trade-ff which is present in very many ther situatins and applicatins in statistics. Finding the best mdel thus means t ptimize the bias-variance trade-ff: this is smetimes als referred t as regularizatin (fr aviding a t cmplex mdel) Mallws C p statistic The mean squared errr in (1.9) is unknwn: we d nt knw the magnitude f the bias term but frtunately, we can estimate the mean squared errr. Dente by SSE(M) the residual sum f squares in a mdel M: it is verly ptimistic and nt a gd measure t estimate the mean squared errr in (1.9). Fr example, SSE(M) becmes smaller the bigger the mdel M and the biggest mdel under cnsideratin has the lwest SSE (which generally cntradicts the equatin in (1.9)). Fr any (sub-)mdel M which invlves sme (r all) f the predictr variables, the mean squared errr in (1.9) can be estimated by n 1 SSE(M) ˆσ 2 + 2ˆσ 2 M /n, where ˆσ 2 is the errr variance estimate in the full mdel and SSE(M) is the residual sum f squares in the submdel M. (A justificatin can be fund in the literature). Thus, in rder t estimate the best mdel, we culd search fr the sub-mdel M minimizing the

22 16 Multiple Linear Regressin abve quantity. Since ˆσ 2 and n are cnstants with respect t submdels M, we can als cnsider the well-knwn C p statistic C p (M) = SSE(M) ˆσ 2 n + 2 M and search fr the sub-mdel M minimizing the C p statistic. Other ppular criteria t estimate the predictive ptential f an estimated mdel are Akaike s infrmatin criterin (AIC) and the Bayesian infrmatin criterin (BIC). Searching fr the best mdel with respect t C p If the full mdel has p predictr variables, there are 2 p 1 sub-mdels (every predictr can be in r ut but we exclude the sub-mdel M which crrespnds t the empty set). Therefre, an exhaustive search fr the sub-mdel M minimizing the C p statistic is nly feasible if p is less than say 16 ( = which is already fairly large). If p is large, we can prceed with stepwise algrithms. Frward selectin. 1. Start with the smallest mdel M 0 (lcatin mdel) as the current mdel. 2. Include the predictr variable t the current mdel which reduces the residual sum f squares mst. 3. Cntinue step 2. until all predictr variables have been chsen r until a large number f predictr variables has been selected. This prduces a sequence f sub-mdels M 0 M 1 M Chse the mdel in the sequence M 0 M 1 M 2... which has smallest C p statistic. Backward selectin. 1. Start with the full mdel M 0 as the current mdel. 2. Exclude the predictr variable frm the current mdel which increases the residual sum f squares the least. 3. Cntinue step 2. until all predictr variables have been deleted (r a large number f predictr variables). This prduces a sequence f sub-mdels M 0 M 1 M Chse the mdel in the sequence M 0 M 1 M 2... which has smallest C p statistic. Backward selectin is typically a bit better than frward selectin but it is cmputatinally mre expensive. Als, in case where p n, we dn t want t fit the full mdel and frward selectin is an apprpriate way t prceed.

23 Chapter 2 Nnparametric Density Estimatin 2.1 Intrductin Fr a mment, we will g back t simple data structures: we have bservatins which are realizatins f univariate randm variables, X 1,..., X n i.i.d. F, where F dentes an unknwn cumulative distributin functin. The gal is t estimate the distributin F. In particular, we are interested in estimating the density f = F, assuming that it exists. Instead f assuming a parametric mdel fr the distributin (e.g. Nrmal distributin with unknwn expectatin and variance), we rather want t be as general as pssible : that is, we nly assume that the density exists and is suitably smth (e.g. differentiable). It is then pssible t estimate the unknwn density functin f( ). Mathematically, a functin is an infinite-dimensinal bject. Density estimatin will becme a basic principle hw t d estimatin fr infinite-dimensinal bjects. We will make use f such a principle in many ther settings such as nnparametric regressin with ne predictr variable (Chapter 3) and flexible regressin and classificatin methds with many predictr variables (Chapter 7). 2.2 Estimatin f a density We cnsider the data which recrds the duratin f eruptins f Old Faithful, a famus geysir in Yellwstne Natinal Park (Wyming, USA). Yu can watch it via web-cam n Histgram The histgram is the ldest and mst ppular density estimatr. We need t specify an rigin x 0 and the class width h fr the specificatins f the intervals I j = (x 0 + j h, x 0 + (j + 1)h] (j =..., 1, 0, 1,...) fr which the histgram cunts the number f bservatins falling int each I j : we then plt the histgram such that the area f each bar is prprtinal t the number f bservatins falling int the crrespnding class (interval I j ). 17

24 18 Nnparametric Density Estimatin nc= length [min] Figure 2.1: Histgrams (different class widths) fr duratins f eruptins f Old Faithful geysir in Yellwstne Park (n = 272, data(faithful)). h h h h x 0 x 0 + h x 0 + 2h x 0 + 3h The chice f the rigin x 0 is highly arbitrary, whereas the rle f the class width is immediately clear fr the user. The frm f the histgram depends very much n these tw tuning parameters Kernel estimatr The naive estimatr Similar t the histgram, we can cmpute the relative frequency f bservatins falling int a small regin. The density functin f( ) at a pint x can be represented as 1 f(x) = lim P[x h < X x + h]. (2.1) h 0 2h The naive estimatr is then cnstructed withut taking the limit in (2.1) and by replacing prbabilities with relative frequencies: ˆf(x) = 1 2hn #{i; X i (x h, x + h]}. (2.2) This naive estimatr is nly piecewise cnstant since every X i is either in r ut f the interval (x h, x + h]. As fr histgrams, we als need t specify the s-called bandwidth h; but in cntrast t the histgram, we d nt need t specify an rigin x 0. An alternative representatin f the naive estimatr (2.2) is as fllws. Define the weight functin w(x) = { 1/2 if x 1, 0 therwise.

25 2.3 The rle f the bandwidth 19 Then, ˆf(x) = 1 nh n ( ) x Xi w. h i=1 If we chse instead f the rectangle weight functin w( ) a general, typically mre smth kernel functin K( ), we have the definitin f the kernel density estimatr ˆf(x) = 1 n ( ) x Xi K, nh h K(x) 0, i=1 K(x)dx = 1, K(x) = K( x). (2.3) The estimatr depends n the bandwidth h > 0 which acts as a tuning parameter. Fr large bandwidth h, the estimate ˆf(x) tends t be very slwly varying as a functin f x, while small bandwidths will prduce a mre wiggly functin estimate. The psitivity f the kernel functin K( ) guarantees a psitive density estimate ˆf( ) and the nrmalizatin K(x)dx = 1 implies that ˆf(x)dx = 1 which is necessary fr ˆf( ) t be a density. Typically, the kernel functin K( ) is chsen as a prbability density which is symmetric arund 0. The smthness f ˆf( ) is inherited frm the smthness f the kernel: if the rth derivative K (r) (x) exists fr all x, then ˆf (r) (x) exists as well fr all x (easy t verify using the chain rule fr differentiatin). Ppular kernels are the Gaussian kernel K(x) = ϕ(x) = (2π) 1 2 e x 2 /2 (the density f the N (0, 1) distributin) r a kernel with finite supprt such as K(x) = π 4 cs( π 2 x)1( x 1). The Epanechnikv kernel, which is ptimal with respect t mean squared errr, is K(x) = 3 ( 1 x 2) 1( x 1). 4 But far mre imprtant than the kernel is the bandwidth h, see figure 2.2: its rle and hw t chse it are discussed belw h = h = h = h = h = Figure 2.2: kernel density estimates f the Old Faithful eruptin lengths; Gaussian kernel and bandwidths h = ,2,..., The rle f the bandwidth The bandwidth h is ften als called the smthing parameter : a mment f thught will reveal that fr h 0, we will have δ-spikes at every bservatin X i, whereas ˆf( ) = ˆf h ( ) becmes smther as h is increasing.

26 20 Nnparametric Density Estimatin Variable bandwidths: k nearest neighbrs Instead f using a glbal bandwidth, we can use lcally changing bandwidths h(x). The general idea is t use a large bandwidth fr regins where the data is sparse. The k-nearest neighbr idea is t chse h(x) = Euclidean distance f x t the kth nearest bservatin, where k is regulating the magnitude f the bandwidth. Nte that generally, ˆfh(x) ( ) will nt be a density anymre since the integral is nt necessarily equal t ne The bias-variance trade-ff We can frmalize the behavir f ˆf( ) when varying the bandwidth h in terms f bias and variance f the estimatr. It is imprtant t understand heuristically that the (abslute value f the) bias f ˆf increases and the variance f ˆf decreases as h increases. Therefre, if we want t minimize the mean squared errr MSE(x) at a pint x, [ ( ) ] 2 ( MSE(x) = E ˆf(x) f(x) = E[ ˆf(x)] 2 f(x)) + Var( ˆf(x)), we are cnfrnted with a bias-variance trade-ff. As a cnsequence, this allws - at least cnceptually - t ptimize the bandwidth parameter (namely t minimize the mean squared errr) in a well-defined, cherent way. Instead f ptimizing the mean squared errr at a pint x, ne may want t ptimize the integrated mean squared errr (IMSE) IMSE = MSE(x)dx which yields an integrated decmpsitin f squared bias and variance (integratin is ver the supprt f X). Since the integrand is nn-negative, the rder f integratin (ver the supprt f X and ver the prbability space f X) can be reversed, dented as MISE (mean integrated squared errr) and written as MISE = E[ ( ˆf(x) f(x) ) 2 dx] = E[ISE], (2.4) where ISE = ( ˆf(x) f(x) ) 2 dx Asympttic bias and variance It is straightfrward (using definitins) t give an expressin fr the exact bias and variance: ( ) E[ ˆf(x)] 1 x y = h K f(y)dy h Var( ˆf(x)) = 1 ( ( )) [ x nh 2 Var Xi K = 1 ( ) ] x 2 h nh 2 E Xi K 1 [ ( )] x 2 h nh 2 E Xi K h ( ) 1 x y 2 ( ( ) 1 x y 2 = n 1 h 2 K f(y)dy n f(y)dy) 1 h h K. (2.5) h

27 2.3 The rle f the bandwidth 21 Fr the bias we therefre get (by a change f variable and K( z) = K(z)) Bias(x) = ( ) 1 x y h K f(y)dy f(x) h }{{} = z=(y x)/h K(z)f(x + hz)dz f(x) = K(z) (f(x + hz) f(x)) dz, (2.6) T apprximate this expressin in general, we invke an asympttic argument. We assume that h 0 as sample size n, that is: h = h n 0 with nh n. This will imply that the bias ges t zer since h n 0; the secnd cnditin requires that h n is ging t zer mre slwly than 1/n which turns ut t imply that als the variance f the estimatr will g t zer as n. T see this, we use a Taylr expansin f f, assuming that f is sufficiently smth: Plugging this int (2.6) yields Bias(x) = hf (x) f(x + hz) = f(x) + hzf (x) h2 z 2 f (x) +... = 1 2 h2 f (x) zk(z)dz } {{ } = h2 f (x) z 2 K(z)dz +... z 2 K(z)dz + higher rder terms in h. Fr the variance, we get frm (2.5) ( ) Var( ˆf(x)) 1 x y 2 = n 1 h 2 K f(y)dy n 1 (f(x) + Bias(x)) 2 h = n 1 h 1 f(x hz)k(z) 2 dz n 1 (f(x) + Bias(x)) 2 }{{} =O(n 1 ) = n 1 h 1 f(x hz)k(z) 2 dz + O(n 1 ) = n 1 h 1 f(x) K(z) 2 dz + (n 1 h 1 ), assuming that f is smth and hence f(x hz) f(x) as h n 0. In summary: fr h = h n 0, h n n as n Bias(x) = h 2 f (x) z 2 K(z) dz/2 + (h 2 ) (n ) Var( ˆf(x)) = (nh) 1 f(x) K(z) 2 dz + ((nh) 1 ) (n ) The ptimal bandwidth h = h n which minimizes the leading term in the asympttic MSE(x) can be calculated straightfrwardly by slving hmse(x) = 0, ( h pt (x) = n 1/5 f(x) K 2 ) 1/5 (z)dz (f (x)) 2 ( z 2 K(z)dz) 2. (2.7) Since it s nt straightfrward t estimate and use a lcal bandwidth h(x), ne rather cnsiders minimizing the MISE, i.e., MSE(x) dx which is asympttically asympt. MISE = Bias(x) 2 + Var( ˆf(x)) dx = 1 4 h4 R(f ) σk 4 + R(K)/(nh), (2.8)

28 22 Nnparametric Density Estimatin where R(g) = g 2 (x) dx, σ 2 K = x 2 K(x) dx, and the glbal asympttically ptimal bandwidth becmes h pt = n 1/5 ( R(K)/σ 4 K 1/R(f ) ) 1/5. (2.9) By replacing h with h pt, e.g., in (2.8), we see that bth variance and bias terms are f rder O(n 4/5 ), the ptimal rate fr the MISE and MSE(x). Frm sectin 2.4.1, this rate is als ptimal fr a much larger class f density estimatrs Estimating the bandwidth As seen frm (2.9), the asympttically best bandwidth depends n R(f ) = f 2 (x) dx which is unknwn (whereas as R(K) and σk 2 are knwn). It is pssible t estimate the f again by a kernel estimatr with an initial bandwidth h init (smetimes called a pilt bandwidth) yielding ˆf h init. Plugging this estimate int (2.9) yields an estimated bandwidth ĥ fr the density estimatr ˆf( ) (the riginal prblem): f curse, ĥ depends n the initial bandwidth h init, but chsing h init in an ad-hc way is less critical fr the density estimatr than chsing the bandwidth h itself. Furthermre, methds have been devised t determine h init and h simultaneusly (e.g., Sheather-Jnes, in R using density(, bw="sj")). Estimating lcal bandwidths Nte that the h pt (x) bandwidth selectin in (2.7) is mre prblematical mainly because ˆfĥpt(x) (x) will nt integrate t ne withut further nrmalizatin. On the ther hand, it can be imprtant t use lcally varying bandwidths instead f a single glbal ne in a kernel estimatr at the expense f being mre difficult. The plug-in prcedure utlined abve can be applied lcally, i.e., cnceptually fr each x and hence describes hw t estimate lcal bandwidths frm data and hw t implement a kernel estimatr with lcally varying bandwidths. In the related area f nnparametric regressin, in sectin 3.2.1, we will shw an example abut lcally changing bandwidths which are estimated based n an iterative versin f the (lcal) plug-in idea abve. Other density estimatrs There are quite a few ther appraches t density estimatin than the kernel estimatrs abve (whereas in practice, the fixed bandwidth kernel estimatrs are used predminantly because f their simplicity). An imprtant apprach in particular aims t estimate the lg density lg f(x) (setting ˆf = exp( lg f)) which has n psitivity cnstraints and whse nrmal limit is a simple quadratic. One gd implementatin is in Kperberg s R package lgspline, where spline knts are placed in a stepwise algrithm minimizing apprximate BIC (r AIC). This is can be seen as anther versin f lcally varying bandwidths.

29 2.4 Higher dimensins Higher dimensins Quite many applicatins invlve multivariate data. Fr simplicity, cnsider data which are i.i.d. realizatins f d-dimensinal randm variables X 1,..., X n i.i.d. f(x 1,..., x d )dx 1 dx d where f( ) dentes the multivariate density. The multivariate kernel density estimatr is, in its simplest frm, defined as ˆf(x) = 1 nh d n ( ) x Xi K, h where the kernel K( ) is nw a functin, defined fr d-dimensinal x, satisfying K(u) 0, K(u)du = 1, R d uk(u)du = 0, R d uu K(u)du = I d. R d i=1 Usually, the kernel K( ) is chsen as a prduct f a kernel K univ fr univariate density estimatin d K(u) = K univ (u j ). j=1 If ne additinally desires the multivariate kernel K(u) t be radially symmetric, it can be shwn that K must be the multivariate nrmal (Gaussian) density, K(u) = c d exp( 1 2 u u) The curse f dimensinality In practice, multivariate kernel density estimatin is ften restricted t dimensin d = 2. The reasn is, that a higher dimensinal space (with d f medium size r large) will be nly very sparsely ppulated by data pints. Or in ther wrds, there will be nly very few neighbring data pints t any value x in a higher dimensinal space, unless the sample size is extremely large. This phenmenn is als called the curse f dimensinality. An implicatin f the curse f dimensinality is the fllwing lwer bund fr the best mean squared errr f nnparametric density estimatrs (assuming that the underlying density is twice differentiable): it has been shwn that the best pssible MSE rate is O(n 4/(4+d) ). The fllwing table evaluates n 4/(4+d) fr varius n and d: n 4/(4+d) d = 1 d = 2 d = 3 d = 5 d = 10 n = n = n = Thus, fr d = 10, the rate with n = is still 1.5 times wrse than fr d = 1 and n = 100.

30 24 Nnparametric Density Estimatin

31 Chapter 3 Nnparametric Regressin 3.1 Intrductin We cnsider here nnparametric regressin with ne predictr variable. Practically relevant generalizatins t mre than ne r tw predictr variables are nt s easy due t the curse f dimensinality mentined in sectin and ften require different appraches, as will be discussed later in Chapter 7. Figure 3.1 shws (several identical) scatter plts f (x i, Y i ) (i = 1,..., n). We can mdel such data as Y i = m(x i ) + ε i, (3.1) where ε 1,..., ε n i.i.d. with E[ε i ] = 0 and m : R R is an arbitrary functin. The functin m( ) is called the nnparametric regressin functin and it satisfies m(x) = E[Y x]. The restrictin we make fr m( ) is that it fulfills sme kind f smthness cnditins. The regressin functin in Figure 3.1 des nt appear t be linear in x and linear regressin is nt a gd mdel. The flexibility t allw fr an arbitrary regressin functin is very desirable; but f curse, such flexibility has its price, namely an inferir estimatin accuracy than fr linear regressin. 3.2 The kernel regressin estimatr We can view the regressin functin in (3.1) as m(x) = E[Y X = x], (assuming that X is randm and X i = x i are realized values f the randm variables). We can express this cnditinal expectatin as R yf Y X (y x)dy = R yf X,Y (x, y)dy, f X (x) where f Y X, f X,Y, f X dente the cnditinal, jint and marginal densities. We can nw plug in the univariate and bivariate kernel density (all with the same univariate kernel K) estimates ˆf X (x) = n i=1 K ( ) x x i h, ˆfX,Y (x, y) = nh 25 n i=1 K ( x x i h nh 2 ) ( ) K y Yi h

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Smoothing, penalized least squares and splines

Smoothing, penalized least squares and splines Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM The general linear mdel and Statistical Parametric Mapping I: Intrductin t the GLM Alexa Mrcm and Stefan Kiebel, Rik Hensn, Andrew Hlmes & J-B J Pline Overview Intrductin Essential cncepts Mdelling Design

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001)

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001) CN700 Additive Mdels and Trees Chapter 9: Hastie et al. (2001) Madhusudana Shashanka Department f Cgnitive and Neural Systems Bstn University CN700 - Additive Mdels and Trees March 02, 2004 p.1/34 Overview

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y ) (Abut the final) [COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t m a k e s u r e y u a r e r e a d y ) The department writes the final exam s I dn't really knw what's n it and I can't very well

More information

Inference in the Multiple-Regression

Inference in the Multiple-Regression Sectin 5 Mdel Inference in the Multiple-Regressin Kinds f hypthesis tests in a multiple regressin There are several distinct kinds f hypthesis tests we can run in a multiple regressin. Suppse that amng

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

7 TH GRADE MATH STANDARDS

7 TH GRADE MATH STANDARDS ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

Nonparametric Regression 1

Nonparametric Regression 1 Nnparametric Regressin 1 WBL Angewandte Statistik, AS 2018 Martin Mächler Seminar für Statistik ETH Zürich Sep. 2018 (September 12, 2018) 1 An excerpt f the lecture ntes Cmputatinal Statistics by Peter

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

IN a recent article, Geary [1972] discussed the merit of taking first differences

IN a recent article, Geary [1972] discussed the merit of taking first differences The Efficiency f Taking First Differences in Regressin Analysis: A Nte J. A. TILLMAN IN a recent article, Geary [1972] discussed the merit f taking first differences t deal with the prblems that trends

More information

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

MATHEMATICS SYLLABUS SECONDARY 5th YEAR Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE

More information

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date AP Statistics Practice Test Unit Three Explring Relatinships Between Variables Name Perid Date True r False: 1. Crrelatin and regressin require explanatry and respnse variables. 1. 2. Every least squares

More information

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

Discussion on Regularized Regression for Categorical Data (Tutz and Gertheiss)

Discussion on Regularized Regression for Categorical Data (Tutz and Gertheiss) Discussin n Regularized Regressin fr Categrical Data (Tutz and Gertheiss) Peter Bühlmann, Ruben Dezeure Seminar fr Statistics, Department f Mathematics, ETH Zürich, Switzerland Address fr crrespndence:

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

1 The limitations of Hartree Fock approximation

1 The limitations of Hartree Fock approximation Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Linear programming III

Linear programming III Linear prgramming III Review 1/33 What have cvered in previus tw classes LP prblem setup: linear bjective functin, linear cnstraints. exist extreme pint ptimal slutin. Simplex methd: g thrugh extreme pint

More information

Lecture 3: Principal Components Analysis (PCA)

Lecture 3: Principal Components Analysis (PCA) Lecture 3: Principal Cmpnents Analysis (PCA) Reading: Sectins 6.3.1, 10.1, 10.2, 10.4 STATS 202: Data mining and analysis Jnathan Taylr, 9/28 Slide credits: Sergi Bacallad 1 / 24 The bias variance decmpsitin

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants 12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university

More information

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA Mental Experiment regarding 1D randm walk Cnsider a cntainer f gas in thermal

More information

Preparation work for A2 Mathematics [2017]

Preparation work for A2 Mathematics [2017] Preparatin wrk fr A2 Mathematics [2017] The wrk studied in Y12 after the return frm study leave is frm the Cre 3 mdule f the A2 Mathematics curse. This wrk will nly be reviewed during Year 13, it will

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

Lecture 10, Principal Component Analysis

Lecture 10, Principal Component Analysis Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16 Principal Cmpnent Analysis Lecture 10, Principal

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

Checking the resolved resonance region in EXFOR database

Checking the resolved resonance region in EXFOR database Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,

More information

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

Linear Methods for Regression

Linear Methods for Regression 3 Linear Methds fr Regressin This is page 43 Printer: Opaque this 3.1 Intrductin A linear regressin mdel assumes that the regressin functin E(Y X) is linear in the inputs X 1,...,X p. Linear mdels were

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

ENGI 4430 Parametric Vector Functions Page 2-01

ENGI 4430 Parametric Vector Functions Page 2-01 ENGI 4430 Parametric Vectr Functins Page -01. Parametric Vectr Functins (cntinued) Any nn-zer vectr r can be decmpsed int its magnitude r and its directin: r rrˆ, where r r 0 Tangent Vectr: dx dy dz dr

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 11: Mdeling with systems f ODEs In Petre Department f IT, Ab Akademi http://www.users.ab.fi/ipetre/cmpmd/ Mdeling with differential equatins Mdeling strategy Fcus

More information

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern 0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy

More information

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems. Building t Transfrmatins n Crdinate Axis Grade 5: Gemetry Graph pints n the crdinate plane t slve real-wrld and mathematical prblems. 5.G.1. Use a pair f perpendicular number lines, called axes, t define

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

The standards are taught in the following sequence.

The standards are taught in the following sequence. B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and

More information

You need to be able to define the following terms and answer basic questions about them:

You need to be able to define the following terms and answer basic questions about them: CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f

More information

SAMPLING DYNAMICAL SYSTEMS

SAMPLING DYNAMICAL SYSTEMS SAMPLING DYNAMICAL SYSTEMS Melvin J. Hinich Applied Research Labratries The University f Texas at Austin Austin, TX 78713-8029, USA (512) 835-3278 (Vice) 835-3259 (Fax) hinich@mail.la.utexas.edu ABSTRACT

More information

Statistical Learning. 2.1 What Is Statistical Learning?

Statistical Learning. 2.1 What Is Statistical Learning? 2 Statistical Learning 2.1 What Is Statistical Learning? In rder t mtivate ur study f statistical learning, we begin with a simple example. Suppse that we are statistical cnsultants hired by a client t

More information

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall Stats 415 - Classificatin Ji Zhu, Michigan Statistics 1 Classificatin Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Classificatin Ji Zhu, Michigan Statistics 2 Examples f Classificatin

More information

Lyapunov Stability Stability of Equilibrium Points

Lyapunov Stability Stability of Equilibrium Points Lyapunv Stability Stability f Equilibrium Pints 1. Stability f Equilibrium Pints - Definitins In this sectin we cnsider n-th rder nnlinear time varying cntinuus time (C) systems f the frm x = f ( t, x),

More information

Preparation work for A2 Mathematics [2018]

Preparation work for A2 Mathematics [2018] Preparatin wrk fr A Mathematics [018] The wrk studied in Y1 will frm the fundatins n which will build upn in Year 13. It will nly be reviewed during Year 13, it will nt be retaught. This is t allw time

More information

Function notation & composite functions Factoring Dividing polynomials Remainder theorem & factor property

Function notation & composite functions Factoring Dividing polynomials Remainder theorem & factor property Functin ntatin & cmpsite functins Factring Dividing plynmials Remainder therem & factr prperty Can d s by gruping r by: Always lk fr a cmmn factr first 2 numbers that ADD t give yu middle term and MULTIPLY

More information

We can see from the graph above that the intersection is, i.e., [ ).

We can see from the graph above that the intersection is, i.e., [ ). MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Homology groups of disks with holes

Homology groups of disks with holes Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

Mathematics Methods Units 1 and 2

Mathematics Methods Units 1 and 2 Mathematics Methds Units 1 and 2 Mathematics Methds is an ATAR curse which fcuses n the use f calculus and statistical analysis. The study f calculus prvides a basis fr understanding rates f change in

More information

More Tutorial at

More Tutorial at Answer each questin in the space prvided; use back f page if extra space is needed. Answer questins s the grader can READILY understand yur wrk; nly wrk n the exam sheet will be cnsidered. Write answers,

More information