Statistical Learning. 2.1 What Is Statistical Learning?

Size: px
Start display at page:

Download "Statistical Learning. 2.1 What Is Statistical Learning?"

Transcription

1 2 Statistical Learning 2.1 What Is Statistical Learning? In rder t mtivate ur study f statistical learning, we begin with a simple example. Suppse that we are statistical cnsultants hired by a client t prvide advice n hw t imprve sales f a particular prduct. The Advertising data set cnsists f the sales f that prduct in 200 different markets, alng with advertising budgets fr the prduct in each f thse markets fr three different media: TV, radi, andnewspaper. The data are displayed in Figure 2.1. It is nt pssible fr ur client t directly increase sales f the prduct. On the ther hand, they can cntrl the advertising expenditure in each f the three media. Therefre, if we determine that there is an assciatin between advertising and sales, then we can instruct ur client t adjust advertising budgets, thereby indirectly increasing sales. In ther wrds, ur gal is t develp an accurate mdel that can be used t predict sales n the basis f the three media budgets. In this setting, the advertising budgets are input variables while sales input is an utput variable. The input variables are typically dented using the symbl X, with a subscript t distinguish them. S X 1 might be the TV budget, X 2 the radi budget, and X 3 the newspaper budget. The inputs g by different names, such as predictrs, independent variables, features, r smetimes just variables. The utput variable in this case, sales is ften called the respnse r dependent variable, and is typically dented using the symbl Y. Thrughut this bk, we will use all f these terms interchangeably. G. James et al., An Intrductin t Statistical Learning: with Applicatins in R, Springer Texts in Statistics 103, DOI / , Springer Science+Business Media New Yrk variable utput variable predictr independent variable feature variable respnse dependent variable

2 16 2. Statistical Learning Sales Sales Sales TV Radi Newspaper FIGURE 2.1. The Advertising data set. The plt displays sales, in thusands f units, as a functin f TV, radi, and newspaper budgets, in thusands f dllars, fr 200 different markets. In each plt we shw the simple least squares fit f sales t that variable, as described in Chapter 3. In ther wrds, each blue line represents a simple mdel that can be used t predict sales using TV, radi, and newspaper, respectively. Mre generally, suppse that we bserve a quantitative respnse Y and p different predictrs, X 1,X 2,...,X p. We assume that there is sme relatinship between Y and X =(X 1,X 2,...,X p ), which can be written in the very general frm Y = f(x)+ɛ. (2.1) Here f is sme fixed but unknwn functin f X 1,...,X p,andɛ is a randm errr term, which is independent f X and has mean zer. In this frmula- errr term tin, f represents the systematic infrmatin that X prvides abut Y. systematic As anther example, cnsider the left-hand panel f Figure 2.2, a plt f incme versus years f educatin fr 30 individuals in the Incme data set. The plt suggests that ne might be able t predict incme using years f educatin. Hwever, the functin f that cnnects the input variable t the utput variable is in general unknwn. In this situatin ne must estimate f basednthebservedpints.sinceincme is a simulated data set, f is knwn and is shwn by the blue curve in the right-hand panel f Figure 2.2. The vertical lines represent the errr terms ɛ. We nte that sme f the 30 bservatins lie abve the blue curve and sme lie belw it; verall, the errrs have apprximately mean zer. In general, the functin f may invlve mre than ne input variable. In Figure 2.3 we plt incme as a functin f years f educatin and senirity. Here f is a tw-dimensinal surface that must be estimated based n the bserved data.

3 2.1 What Is Statistical Learning? 17 Incme Incme Years f Educatin Years f Educatin FIGURE 2.2. The Incme data set. Left: The red dts are the bserved values f incme (in tens f thusands f dllars) and years f educatin fr 30 individuals. Right: The blue curve represents the true underlying relatinship between incme and years f educatin, which is generally unknwn (but is knwn in this case because the data were simulated). The black lines represent the errr assciated with each bservatin. Nte that sme errrs are psitive (if an bservatin lies abve the blue curve) and sme are negative (if an bservatin lies belw the curve). Overall, these errrs have apprximately mean zer. In essence, statistical learning refers t a set f appraches fr estimating f. In this chapter we utline sme f the key theretical cncepts that arise in estimating f, as well as tls fr evaluating the estimates btained Why Estimate f? There are tw main reasns that we may wish t estimate f: predictin and inference. We discuss each in turn. Predictin In many situatins, a set f inputs X are readily available, but the utput Y cannt be easily btained. In this setting, since the errr term averages t zer, we can predict Y using Ŷ = ˆf(X), (2.2) where ˆf represents ur estimate fr f, andŷ represents the resulting predictin fr Y. In this setting, ˆf is ften treated as a black bx, in the sense that ne is nt typically cncerned with the exact frm f ˆf, prvided that it yields accurate predictins fr Y.

4 Incme Statistical Learning Years f Educatin Senirity FIGURE 2.3. The plt displays incme as a functin f years f educatin and senirity in the Incme data set. The blue surface represents the true underlying relatinship between incme and years f educatin and senirity, which is knwn since the data are simulated. The red dts indicate the bserved values f these quantities fr 30 individuals. As an example, suppse that X 1,...,X p are characteristics f a patient s bld sample that can be easily measured in a lab, and Y is a variable encding the patient s risk fr a severe adverse reactin t a particular drug. It is natural t seek t predict Y using X, since we can then avid giving the drug in questin t patients wh are at high risk f an adverse reactin that is, patients fr whm the estimate f Y is high. The accuracy f Ŷ as a predictin fr Y depends n tw quantities, which we will call the reducible errr and the irreducible errr. In general, reducible ˆf will nt be a perfect estimate fr f, and this inaccuracy will intrduce sme errr. This errr is reducible because we can ptentially imprve the accuracy f ˆf by using the mst apprpriate statistical learning technique t estimate f. Hwever, even if it were pssible t frm a perfect estimate fr f, s that ur estimated respnse tk the frm Ŷ = f(x), ur predictin wuld still have sme errr in it! This is because Y is als a functin f ɛ, which, by definitin, cannt be predicted using X. Therefre, variability assciated with ɛ als affects the accuracy f ur predictins. This is knwn as the irreducible errr, because n matter hw well we estimate f, we cannt reduce the errr intrduced by ɛ. Why is the irreducible errr larger than zer? The quantity ɛ may cntain unmeasured variables that are useful in predicting Y : since we dn t measure them, f cannt use them fr its predictin. The quantity ɛ may als cntain unmeasurable variatin. Fr example, the risk f an adverse reactin might vary fr a given patient n a given day, depending n errr irreducible errr

5 2.1 What Is Statistical Learning? 19 manufacturing variatin in the drug itself r the patient s general feeling f well-being n that day. Cnsider a given estimate ˆf and a set f predictrs X, which yields the predictin Ŷ = ˆf(X). Assume fr a mment that bth ˆf and X are fixed. Then, it is easy t shw that E(Y Ŷ )2 = E[f(X)+ɛ ˆf(X)] 2 = [f(x) ˆf(X)] 2 }{{} Reducible + Var(ɛ) }{{}, (2.3) Irreducible where E(Y Ŷ )2 represents the average, r expected value, f the squared expected difference between the predicted and actual value f Y,andVar(ɛ) represents the variance assciated with the errr term ɛ. value variance The fcus f this bk is n techniques fr estimating f with the aim f minimizing the reducible errr. It is imprtant t keep in mind that the irreducible errr will always prvide an upper bund n the accuracy f ur predictin fr Y. This bund is almst always unknwn in practice. Inference We are ften interested in understanding the way that Y is affected as X 1,...,X p change. In this situatin we wish t estimate f, but ur gal is nt necessarily t make predictins fr Y. We instead want t understand the relatinship between X and Y, r mre specifically, t understand hw Y changes as a functin f X 1,...,X p.nw ˆf cannt be treated as a black bx, because we need t knw its exact frm. In this setting, ne may be interested in answering the fllwing questins: Which predictrs are assciated with the respnse? It is ften the case that nly a small fractin f the available predictrs are substantially assciated with Y. Identifying the few imprtant predictrs amng a large set f pssible variables can be extremely useful, depending n the applicatin. What is the relatinship between the respnse and each predictr? Sme predictrs may have a psitive relatinship with Y, in the sense that increasing the predictr is assciated with increasing values f Y. Other predictrs may have the ppsite relatinship. Depending n the cmplexity f f, the relatinship between the respnse and a given predictr may als depend n the values f the ther predictrs. Can the relatinship between Y and each predictr be adequately summarized using a linear equatin, r is the relatinship mre cmplicated? Histrically, mst methds fr estimating f have taken a linear frm. In sme situatins, such an assumptin is reasnable r even desirable. But ften the true relatinship is mre cmplicated, in which case a linear mdel may nt prvide an accurate representatin f the relatinship between the input and utput variables.

6 20 2. Statistical Learning In this bk, we will see a number f examples that fall int the predictin setting, the inference setting, r a cmbinatin f the tw. Fr instance, cnsider a cmpany that is interested in cnducting a direct-marketing campaign. The gal is t identify individuals wh will respnd psitively t a mailing, based n bservatins f demgraphic variables measured n each individual. In this case, the demgraphic variables serve as predictrs, and respnse t the marketing campaign (either psitive r negative) serves as the utcme. The cmpany is nt interested in btaining a deep understanding f the relatinships between each individual predictr and the respnse; instead, the cmpany simply wants an accurate mdel t predict the respnse using the predictrs. This is an example f mdeling fr predictin. In cntrast, cnsider the Advertising data illustrated in Figure 2.1. One may be interested in answering questins such as: Which media cntribute t sales? Which media generate the biggest bst in sales? r Hw much increase in sales is assciated with a given increase in TV advertising? This situatin falls int the inference paradigm. Anther example invlves mdeling the brand f a prduct that a custmer might purchase based n variables such as price, stre lcatin, discunt levels, cmpetitin price, and s frth. In this situatin ne might really be mst interested in hw each f the individual variables affects the prbability f purchase. Fr instance, what effect will changing the price f a prduct have n sales? This is an example f mdeling fr inference. Finally, sme mdeling culd be cnducted bth fr predictin and inference. Fr example, in a real estate setting, ne may seek t relate values f hmes t inputs such as crime rate, zning, distance frm a river, air quality, schls, incme level f cmmunity, size f huses, and s frth. In this case ne might be interested in hw the individual input variables affect the prices that is, hw much extra will a huse be wrth if it has a view f the river? This is an inference prblem. Alternatively, ne may simply be interested in predicting the value f a hme given its characteristics: is this huse under- r ver-valued? This is a predictin prblem. Depending n whether ur ultimate gal is predictin, inference, r a cmbinatin f the tw, different methds fr estimating f may be apprpriate. Fr example, linear mdels allw fr relatively simple and inter- linear mdel pretable inference, but may nt yield as accurate predictins as sme ther appraches. In cntrast, sme f the highly nn-linear appraches that we discuss in the later chapters f this bk can ptentially prvide quite accurate predictins fr Y, but this cmes at the expense f a less interpretable mdel fr which inference is mre challenging.

7 2.1.2 Hw D We Estimate f? 2.1 What Is Statistical Learning? 21 Thrughut this bk, we explre many linear and nn-linear appraches fr estimating f. Hwever, these methds generally share certain characteristics. We prvide an verview f these shared characteristics in this sectin. We will always assume that we have bserved a set f n different data pints. Fr example in Figure 2.2 we bserved n =30datapints. These bservatins are called the training data because we will use these training data bservatins t train, r teach, ur methd hw t estimate f. Letx ij represent the value f the jth predictr, r input, fr bservatin i, where i =1, 2,...,n and j =1, 2,...,p. Crrespndingly, let y i represent the respnse variable fr the ith bservatin. Then ur training data cnsist f {(x 1,y 1 ), (x 2,y 2 ),...,(x n,y n )} where x i =(x i1,x i2,...,x ip ) T. Our gal is t apply a statistical learning methd t the training data in rder t estimate the unknwn functin f. In ther wrds, we want t find a functin ˆf such that Y ˆf(X) fr any bservatin (X, Y ). Bradly speaking, mst statistical learning methds fr this task can be characterized as either parametric r nn-parametric. We nw briefly discuss these parametric tw types f appraches. Parametric Methds Parametric methds invlve a tw-step mdel-based apprach. 1. First, we make an assumptin abut the functinal frm, r shape, f f. Fr example, ne very simple assumptin is that f is linear in X: f(x) =β 0 + β 1 X 1 + β 2 X β p X p. (2.4) This is a linear mdel, which will be discussed extensively in Chapter 3. Once we have assumed that f is linear, the prblem f estimating f is greatly simplified. Instead f having t estimate an entirely arbitrary p-dimensinal functin f(x), ne nly needs t estimate the p + 1 cefficients β 0,β 1,...,β p. 2. After a mdel has been selected, we need a prcedure that uses the training data t fit r train the mdel. In the case f the linear mdel (2.4), we need t estimate the parameters β 0,β 1,...,β p.thatis,we want t find values f these parameters such that nnparametric fit train Y β 0 + β 1 X 1 + β 2 X β p X p. The mst cmmn apprach t fitting the mdel (2.4) is referred t as (rdinary) least squares, which we discuss in Chapter 3. Hw- least squares ever, least squares is ne f many pssible ways way t fit the linear mdel. In Chapter 6, we discuss ther appraches fr estimating the parameters in (2.4). The mdel-based apprach just described is referred t as parametric; it reduces the prblem f estimating f dwn t ne f estimating a set f

8 Incme Statistical Learning Years f Educatin Senirity FIGURE 2.4. A linear mdel fit by least squares t the Incme data frm Figure 2.3. The bservatins are shwn in red, and the yellw plane indicates the least squares fit t the data. parameters. Assuming a parametric frm fr f simplifies the prblem f estimating f because it is generally much easier t estimate a set f parameters, such as β 0,β 1,...,β p in the linear mdel (2.4), than it is t fit an entirely arbitrary functin f. The ptential disadvantage f a parametric apprach is that the mdel we chse will usually nt match the true unknwn frm f f. If the chsen mdel is t far frm the true f, then ur estimate will be pr. We can try t address this prblem by chsing flexible mdels that can fit many different pssible functinal frms flexible fr f. But in general, fitting a mre flexible mdel requires estimating a greater number f parameters. These mre cmplex mdels can lead t a phenmenn knwn as verfitting the data, which essentially means they verfitting fllw the errrs, r nise, t clsely. These issues are discussed thrugh- nise ut this bk. Figure 2.4 shws an example f the parametric apprach applied t the Incme data frm Figure 2.3. We have fit a linear mdel f the frm incme β 0 + β 1 educatin + β 2 senirity. Since we have assumed a linear relatinship between the respnse and the tw predictrs, the entire fitting prblem reduces t estimating β 0, β 1,and β 2, which we d using least squares linear regressin. Cmparing Figure 2.3 t Figure 2.4, we can see that the linear fit given in Figure 2.4 is nt quite right: the true f has sme curvature that is nt captured in the linear fit. Hwever, the linear fit still appears t d a reasnable jb f capturing the psitive relatinship between years f educatin and incme, aswellasthe

9 Incme 2.1 What Is Statistical Learning? 23 Years f Educatin Senirity FIGURE 2.5. A smth thin-plate spline fit t the Incme data frm Figure 2.3 is shwn in yellw; the bservatins are displayed in red. Splines are discussed in Chapter 7. slightly less psitive relatinship between senirity and incme. Itmaybe that with such a small number f bservatins, this is the best we can d. Nn-parametric Methds Nn-parametric methds d nt make explicit assumptins abut the functinal frm f f. Instead they seek an estimate f f that gets as clse t the data pints as pssible withut being t rugh r wiggly. Such appraches can have a majr advantage ver parametric appraches: by aviding the assumptin f a particular functinal frm fr f, they have the ptential t accurately fit a wider range f pssible shapes fr f. Anyparametric apprach brings with it the pssibility that the functinal frm used t estimate f is very different frm the true f, in which case the resulting mdel will nt fit the data well. In cntrast, nn-parametric appraches cmpletely avid this danger, since essentially n assumptin abut the frm f f is made. But nn-parametric appraches d suffer frm a majr disadvantage: since they d nt reduce the prblem f estimating f t a small number f parameters, a very large number f bservatins (far mre than is typically needed fr a parametric apprach) is required in rder t btain an accurate estimate fr f. An example f a nn-parametric apprach t fitting the Incme data is shwninfigure2.5. Athin-plate spline is used t estimate f. This ap- thin-plate prach des nt impse any pre-specified mdel n f. It instead attempts spline t prduce an estimate fr f that is as clse as pssible t the bserved data, subject t the fit that is, the yellw surface in Figure 2.5 being

10 Incme Statistical Learning Years f Educatin Senirity FIGURE 2.6. A rugh thin-plate spline fit t the Incme data frm Figure 2.3. This fit makes zer errrs n the training data. smth. In this case, the nn-parametric fit has prduced a remarkably accurate estimate f the true f shwn in Figure 2.3. In rder t fit a thin-plate spline, the data analyst must select a level f smthness. Figure 2.6 shws the same thin-plate spline fit using a lwer level f smthness, allwing fr a rugher fit. The resulting estimate fits the bserved data perfectly! Hwever, the spline fit shwn in Figure 2.6 is far mre variable than the true functin f, frm Figure 2.3. This is an example f verfitting the data, which we discussed previusly. It is an undesirable situatin because the fit btained will nt yield accurate estimates f the respnse n new bservatins that were nt part f the riginal training data set. We discuss methds fr chsing the crrect amunt f smthness in Chapter 5. Splines are discussed in Chapter 7. As we have seen, there are advantages and disadvantages t parametric and nn-parametric methds fr statistical learning. We explre bth types f methds thrughut this bk The Trade-Off Between Predictin Accuracy and Mdel Interpretability Of the many methds that we examine in this bk, sme are less flexible, r mre restrictive, in the sense that they can prduce just a relatively small range f shapes t estimate f. Fr example, linear regressin is a relatively inflexible apprach, because it can nly generate linear functins such as the lines shwn in Figure 2.1 r the plane shwn in Figure 2.3.

11 2.1 What Is Statistical Learning? 25 High Subset Selectin Lass Interpretability Least Squares Generalized Additive Mdels Trees Bagging, Bsting Lw Supprt Vectr Machines Lw High Flexibility FIGURE 2.7. A representatin f the tradeff between flexibility and interpretability, using different statistical learning methds. In general, as the flexibility f a methd increases, its interpretability decreases. Other methds, such as the thin plate splines shwn in Figures 2.5 and 2.6, are cnsiderably mre flexible because they can generate a much wider range f pssible shapes t estimate f. One might reasnably ask the fllwing questin: why wuld we ever chse t use a mre restrictive methd instead f a very flexible apprach? There are several reasns that we might prefer a mre restrictive mdel. If we are mainly interested in inference, then restrictive mdels are much mre interpretable. Fr instance, when inference is the gal, the linear mdel may be a gd chice since it will be quite easy t understand the relatinship between Y and X 1,X 2,...,X p. In cntrast, very flexible appraches, such as the splines discussed in Chapter 7 and displayed in Figures 2.5 and 2.6, and the bsting methds discussed in Chapter 8, can lead t such cmplicated estimates f f that it is difficult t understand hw any individual predictr is assciated with the respnse. Figure 2.7 prvides an illustratin f the trade-ff between flexibility and interpretability fr sme f the methds that we cver in this bk. Least squares linear regressin, discussed in Chapter 3, is relatively inflexible but is quite interpretable. The lass, discussedinchapter6, relies upn the lass linear mdel (2.4) but uses an alternative fitting prcedure fr estimating the cefficients β 0,β 1,...,β p. The new prcedure is mre restrictive in estimating the cefficients, and sets a number f them t exactly zer. Hence in this sense the lass is a less flexible apprach than linear regressin. It is als mre interpretable than linear regressin, because in the final mdel the respnse variable will nly be related t a small subset f the predictrs namely, thse with nnzer cefficient estimates. Generalized

12 26 2. Statistical Learning additive mdels (GAMs), discussed in Chapter 7, instead extend the lin- generalized ear mdel (2.4) t allw fr certain nn-linear relatinships. Cnsequently, additive GAMs are mre flexible than linear regressin. They are als smewhat mdel less interpretable than linear regressin, because the relatinship between each predictr and the respnse is nw mdeled using a curve. Finally, fully nn-linear methds such as bagging, bsting, andsupprt vectr machines bagging with nn-linear kernels, discussed in Chapters 8 and 9, are highly flexible appraches that are harder t interpret. We have established that when inference is the gal, there are clear advantages t using simple and relatively inflexible statistical learning methds. In sme settings, hwever, we are nly interested in predictin, and the interpretability f the predictive mdel is simply nt f interest. Fr instance, if we seek t develp an algrithm t predict the price f a stck, ur sle requirement fr the algrithm is that it predict accurately interpretability is nt a cncern. In this setting, we might expect that it will be best t use the mst flexible mdel available. Surprisingly, this is nt always the case! We will ften btain mre accurate predictins using a less flexible methd. This phenmenn, which may seem cunterintuitive at first glance, has t d with the ptential fr verfitting in highly flexible methds. We saw an example f verfitting in Figure 2.6. We will discuss this very imprtant cncept further in Sectin 2.2 and thrughut this bk. bsting supprt vectr machine Supervised Versus Unsupervised Learning Mst statistical learning prblems fall int ne f tw categries: supervised supervised r unsupervised. The examples that we have discussed s far in this chap- unsupervised ter all fall int the supervised learning dmain. Fr each bservatin f the predictr measurement(s) x i, i =1,...,n there is an assciated respnse measurement y i. We wish t fit a mdel that relates the respnse t the predictrs, with the aim f accurately predicting the respnse fr future bservatins (predictin) r better understanding the relatinship between the respnse and the predictrs (inference). Many classical statistical learning methds such as linear regressin and lgistic regressin (Chapter 4), as lgistic well as mre mdern appraches such as GAM, bsting, and supprt vectr machines, perate in the supervised learning dmain. The vast majrity regressin f this bk is devted t this setting. In cntrast, unsupervised learning describes the smewhat mre challenging situatin in which fr every bservatin i =1,...,n,webserve a vectr f measurements x i but n assciated respnse y i.itisntpssible t fit a linear regressin mdel, since there is n respnse variable t predict. In this setting, we are in sme sense wrking blind; the situatin is referred t as unsupervised because we lack a respnse variable that can supervise ur analysis. What srt f statistical analysis is

13 2.1 What Is Statistical Learning? FIGURE 2.8. A clustering data set invlving three grups. Each grup is shwn using a different clred symbl. Left: The three grups are well-separated. In this setting, a clustering apprach shuld successfully identify the three grups. Right: There is sme verlap amng the grups. Nw the clustering task is mre challenging. pssible? We can seek t understand the relatinships between the variables r between the bservatins. One statistical learning tl that we may use in this setting is cluster analysis, r clustering. The gal f cluster analysis cluster is t ascertain, n the basis f x 1,...,x n, whether the bservatins fall int analysis relatively distinct grups. Fr example, in a market segmentatin study we might bserve multiple characteristics (variables) fr ptential custmers, such as zip cde, family incme, and shpping habits. We might believe that the custmers fall int different grups, such as big spenders versus lw spenders. If the infrmatin abut each custmer s spending patterns were available, then a supervised analysis wuld be pssible. Hwever, this infrmatin is nt available that is, we d nt knw whether each ptential custmer is a big spender r nt. In this setting, we can try t cluster the custmers n the basis f the variables measured, in rder t identify distinct grups f ptential custmers. Identifying such grups can be f interest because it might be that the grups differ with respect t sme prperty f interest, such as spending habits. Figure 2.8 prvides a simple illustratin f the clustering prblem. We have pltted 150 bservatins with measurements n tw variables, X 1 and X 2. Each bservatin crrespnds t ne f three distinct grups. Fr illustrative purpses, we have pltted the members f each grup using different clrs and symbls. Hwever, in practice the grup memberships are unknwn, and the gal is t determine the grup t which each bservatin belngs. In the left-hand panel f Figure 2.8, this is a relatively easy task because the grups are well-separated. In cntrast, the right-hand panel illustrates a mre challenging prblem in which there is sme verlap

14 28 2. Statistical Learning tistical learning methd that can incrprate the m bservatins fr which respnse measurements are available as well as the n m bservatins fr which they are nt. Althugh this is an interesting tpic, it is beynd the scpe f this bk. between the grups. A clustering methd culd nt be expected t assign all f the verlapping pints t their crrect grup (blue, green, r range). In the examples shwn in Figure 2.8, there are nly tw variables, and s ne can simply visually inspect the scatterplts f the bservatins in rder t identify clusters. Hwever, in practice, we ften encunter data sets that cntain many mre than tw variables. In this case, we cannt easily plt the bservatins. Fr instance, if there are p variables in ur data set, then p(p 1)/2 distinct scatterplts can be made, and visual inspectin is simply nt a viable way t identify clusters. Fr this reasn, autmated clustering methds are imprtant. We discuss clustering and ther unsupervised learning appraches in Chapter 10. Many prblems fall naturally int the supervised r unsupervised learning paradigms. Hwever, smetimes the questin f whether an analysis shuld be cnsidered supervised r unsupervised is less clear-cut. Fr instance, suppse that we have a set f n bservatins. Fr m f the bservatins, where m<n, we have bth predictr measurements and a respnse measurement. Fr the remaining n m bservatins, we have predictr measurements but n respnse measurement. Such a scenari can arise if the predictrs can be measured relatively cheaply but the crrespnding respnses are much mre expensive t cllect. We refer t this setting as a semi-supervised learning prblem. In this setting, we wish t use a sta- semisupervised learning Regressin Versus Classificatin Prblems Variables can be characterized as either quantitative r qualitative (als quantitative knwn as categrical). Quantitative variables take n numerical values. qualitative Examples include a persn s age, height, r incme, the value f a huse, categrical and the price f a stck. In cntrast, qualitative variables take n values in ne f K different classes, r categries. Examples f qualitative class variables include a persn s gender (male r female), the brand f prduct purchased (brand A, B, r C), whether a persn defaults n a debt (yes r n), r a cancer diagnsis (Acute Myelgenus Leukemia, Acute Lymphblastic Leukemia, r N Leukemia). We tend t refer t prblems with a quantitative respnse as regressin prblems, while thse invlv- regressin ing a qualitative respnse are ften referred t as classificatin prblems. Hwever, the distinctin is nt always that crisp. Least squares linear regressin (Chapter 3) is used with a quantitative respnse, whereas lgistic regressin (Chapter 4) is typically used with a qualitative (tw-class, r binary) respnse. As such it is ften used as a classificatin methd. But binary since it estimates class prbabilities, it can be thught f as a regressin classificatin

15 2.2 Assessing Mdel Accuracy 29 methd as well. Sme statistical methds, such as K-nearest neighbrs (Chapters 2 and 4) and bsting (Chapter 8), can be used in the case f either quantitative r qualitative respnses. We tend t select statistical learning methds n the basis f whether the respnse is quantitative r qualitative; i.e. we might use linear regressin when quantitative and lgistic regressin when qualitative. Hwever, whether the predictrs are qualitative r quantitative is generally cnsidered less imprtant. Mst f the statistical learning methds discussed in this bk can be applied regardless f the predictr variable type, prvided that any qualitative predictrs are prperly cded befre the analysis is perfrmed. This is discussed in Chapter Assessing Mdel Accuracy One f the key aims f this bk is t intrduce the reader t a wide range f statistical learning methds that extend far beynd the standard linear regressin apprach. Why is it necessary t intrduce s many different statistical learning appraches, rather than just a single best methd? There is n free lunch in statistics: n ne methd dminates all thers ver all pssible data sets. On a particular data set, ne specific methd may wrk best, but sme ther methd may wrk better n a similar but different data set. Hence it is an imprtant task t decide fr any given set f data which methd prduces the best results. Selecting the best apprach can be ne f the mst challenging parts f perfrming statistical learning in practice. In this sectin, we discuss sme f the mst imprtant cncepts that arise in selecting a statistical learning prcedure fr a specific data set. As the bk prgresses, we will explain hw the cncepts presented here can be applied in practice Measuring the Quality f Fit In rder t evaluate the perfrmance f a statistical learning methd n a given data set, we need sme way t measure hw well its predictins actually match the bserved data. That is, we need t quantify the extent t which the predicted respnse value fr a given bservatin is clse t the true respnse value fr that bservatin. In the regressin setting, the mst cmmnly-used measure is the mean squared errr (MSE), given by MSE = 1 n n (y i ˆf(x i )) 2, (2.5) i=1 mean squared errr

16 30 2. Statistical Learning where ˆf(x i ) is the predictin that ˆf gives fr the ith bservatin. The MSE will be small if the predicted respnses are very clse t the true respnses, and will be large if fr sme f the bservatins, the predicted and true respnses differ substantially. The MSE in (2.5) is cmputed using the training data that was used t fit the mdel, and s shuld mre accurately be referred t as the training MSE. But in general, we d nt really care hw well the methd wrks training n the training data. Rather, we are interested in the accuracy f the predictins that we btain when we apply ur methd t previusly unseen MSE test data. Why is this what we care abut? Suppse that we are interested test data in develping an algrithm t predict a stck s price based n previus stck returns. We can train the methd using stck returns frm the past 6 mnths. But we dn t really care hw well ur methd predicts last week s stck price. We instead care abut hw well it will predict tmrrw s price r next mnth s price. On a similar nte, suppse that we have clinical measurements (e.g. weight, bld pressure, height, age, family histry f disease) fr a number f patients, as well as infrmatin abut whether each patient has diabetes. We can use these patients t train a statistical learning methd t predict risk f diabetes based n clinical measurements. In practice, we want this methd t accurately predict diabetes risk fr future patients based n their clinical measurements. We are nt very interested in whether r nt the methd accurately predicts diabetes risk fr patients used t train the mdel, since we already knw which f thse patients have diabetes. T state it mre mathematically, suppse that we fit ur statistical learning methd n ur training bservatins {(x 1,y 1 ), (x 2,y 2 ),...,(x n,y n )}, and we btain the estimate ˆf. We can then cmpute ˆf(x 1 ), ˆf(x 2 ),..., ˆf(x n ). If these are apprximately equal t y 1,y 2,...,y n, then the training MSE given by (2.5) is small. Hwever, we are really nt interested in whether ˆf(x i ) y i ; instead, we want t knw whether ˆf(x 0 ) is apprximately equal t y 0,where(x 0,y 0 )isapreviusly unseen test bservatin nt used t train the statistical learning methd. We want t chse the methd that gives the lwest test MSE, as ppsed t the lwest training MSE. In ther wrds, test MSE if we had a large number f test bservatins, we culd cmpute Ave( ˆf(x 0 ) y 0 ) 2, (2.6) the average squared predictin errr fr these test bservatins (x 0,y 0 ). We d like t select the mdel fr which the average f this quantity the test MSE is as small as pssible. Hw can we g abut trying t select a methd that minimizes the test MSE? In sme settings, we may have a test data set available that is, we may have access t a set f bservatins that were nt used t train the statistical learning methd. We can then simply evaluate (2.6) n the test bservatins, and select the learning methd fr which the test MSE is

17 2.2 Assessing Mdel Accuracy 31 Y Mean Squared Errr X Flexibility FIGURE 2.9. Left: Data simulated frm f, shwn in black. Three estimates f f are shwn: the linear regressin line (range curve), and tw smthing spline fits (blue and green curves). Right: Training MSE (grey curve), test MSE (red curve), and minimum pssible test MSE ver all methds (dashed line). Squares represent the training and test MSEs fr the three fits shwn in the left-hand panel. smallest. But what if n test bservatins are available? In that case, ne might imagine simply selecting a statistical learning methd that minimizes the training MSE (2.5). This seems like it might be a sensible apprach, since the training MSE and the test MSE appear t be clsely related. Unfrtunately, there is a fundamental prblem with this strategy: there is n guarantee that the methd with the lwest training MSE will als have the lwest test MSE. Rughly speaking, the prblem is that many statistical methds specifically estimate cefficients s as t minimize the training set MSE. Fr these methds, the training set MSE can be quite small, but the test MSE is ften much larger. Figure 2.9 illustrates this phenmenn n a simple example. In the lefthand panel f Figure 2.9, we have generated bservatins frm (2.1) with the true f given by the black curve. The range, blue and green curves illustrate three pssible estimates fr f btained using methds with increasing levels f flexibility. The range line is the linear regressin fit, which is relatively inflexible. The blue and green curves were prduced using smthing splines, discussedinchapter7, with different levels f smthness. It is smthing clear that as the level f flexibility increases, the curves fit the bserved spline data mre clsely. The green curve is the mst flexible and matches the data very well; hwever, we bserve that it fits the true f (shwn in black) prly because it is t wiggly. By adjusting the level f flexibility f the smthing spline fit, we can prduce many different fits t this data.

18 32 2. Statistical Learning We nw mve n t the right-hand panel f Figure 2.9. The grey curve displays the average training MSE as a functin f flexibility, r mre frmally the degrees f freedm, fr a number f smthing splines. The de- degrees f grees f freedm is a quantity that summarizes the flexibility f a curve; it freedm is discussed mre fully in Chapter 7. The range, blue and green squares indicate the MSEs assciated with the crrespnding curves in the lefthand panel. A mre restricted and hence smther curve has fewer degrees f freedm than a wiggly curve nte that in Figure 2.9, linear regressin is at the mst restrictive end, with tw degrees f freedm. The training MSE declines mntnically as flexibility increases. In this example the true f is nn-linear, and s the range linear fit is nt flexible enugh t estimate f well. The green curve has the lwest training MSE f all three methds, since it crrespnds t the mst flexible f the three curves fit in the left-hand panel. In this example, we knw the true functin f, and s we can als cmpute the test MSE ver a very large test set, as a functin f flexibility. (Of curse, in general f is unknwn, s this will nt be pssible.) The test MSE is displayed using the red curve in the right-hand panel f Figure 2.9. As with the training MSE, the test MSE initially declines as the level f flexibility increases. Hwever, at sme pint the test MSE levels ff and then starts t increase again. Cnsequently, the range and green curves bth have high test MSE. The blue curve minimizes the test MSE, which shuld nt be surprising given that visually it appears t estimate f the best in the left-hand panel f Figure 2.9. The hrizntal dashed line indicates Var(ɛ), the irreducible errr in (2.3), which crrespnds t the lwest achievable test MSE amng all pssible methds. Hence, the smthing spline represented by the blue curve is clse t ptimal. In the right-hand panel f Figure 2.9, as the flexibility f the statistical learning methd increases, we bserve a mntne decrease in the training MSE and a U-shape in the test MSE. This is a fundamental prperty f statistical learning that hlds regardless f the particular data set at hand and regardless f the statistical methd being used. As mdel flexibility increases, training MSE will decrease, but the test MSE may nt. When a given methd yields a small training MSE but a large test MSE, we are saidt be verfitting the data. This happens becauseur statisticallearning prcedure is wrking t hard t find patterns in the training data, and may be picking up sme patterns that are just caused by randm chance rather than by true prperties f the unknwn functin f. Whenweverfit the training data, the test MSE will be very large because the suppsed patterns that the methd fund in the training data simply dn t exist in the test data. Nte that regardless f whether r nt verfitting has ccurred, we almst always expect the training MSE t be smaller than the test MSE because mst statistical learning methds either directly r indirectly seek t minimize the training MSE. Overfitting refers specifically t the case in which a less flexible mdel wuld have yielded a smaller test MSE.

19 2.2 Assessing Mdel Accuracy 33 Y Mean Squared Errr X Flexibility FIGURE Details are as in Figure 2.9, using a different true f that is much clser t linear. In this setting, linear regressin prvides a very gd fit t the data. Figure 2.10 prvides anther example in which the true f is apprximately linear. Again we bserve that the training MSE decreases mntnically as the mdel flexibility increases, and that there is a U-shape in the test MSE. Hwever, because the truth is clse t linear, the test MSE nly decreases slightly befre increasing again, s that the range least squares fit is substantially better than the highly flexible green curve. Finally, Figure 2.11 displays an example in which f is highly nn-linear. The training and test MSE curves still exhibit the same general patterns, but nw there is a rapid decrease in bth curves befre the test MSE starts t increase slwly. In practice, ne can usually cmpute the training MSE with relative ease, but estimating test MSE is cnsiderably mre difficult because usually n test data are available. As the previus three examples illustrate, the flexibility level crrespnding t the mdel with the minimal test MSE can vary cnsiderably amng data sets. Thrughut this bk, we discuss a variety f appraches that can be used in practice t estimate this minimum pint. One imprtant methd is crss-validatin (Chapter 5), which is a crssvalidatin methd fr estimating test MSE using the training data The Bias-Variance Trade-Off The U-shape bserved in the test MSE curves (Figures ) turns ut t be the result f tw cmpeting prperties f statistical learning methds. Thugh the mathematical prf is beynd the scpe f this bk, it is pssible t shw that the expected test MSE, fr a given value x 0,can

20 34 2. Statistical Learning Y Mean Squared Errr X Flexibility FIGURE Details are as in Figure 2.9, using a different f that is far frm linear. In this setting, linear regressin prvides a very pr fit t the data. always be decmpsed int the sum f three fundamental quantities: the variance f ˆf(x 0 ), the squared bias f ˆf(x 0 ) and the variance f the errr variance terms ɛ. Thatis, bias ( E y 0 ˆf(x 2 0 )) =Var(ˆf(x0 )) + [Bias( ˆf(x 0 ))] 2 +Var(ɛ). (2.7) ( Here the ntatin E y 0 ˆf(x ) 2 0 ) defines the expected test MSE, and refers expected t the average test MSE that we wuld btain if we repeatedly estimated test MSE f using a large number f training sets, and tested each at x 0.Theverall ( expected test MSE can be cmputed by averaging E y 0 ˆf(x 2 0 )) ver all pssible values f x 0 in the test set. Equatin 2.7 tells us that in rder t minimize the expected test errr, we need t select a statistical learning methd that simultaneusly achieves lw variance and lw bias. Nte that variance is inherently a nnnegative quantity, and squared bias is als nnnegative. Hence, we see that the expected test MSE can never lie belw Var(ɛ), the irreducible errr frm (2.3). What d we mean by the variance and bias f a statistical learning methd? Variance refers t the amunt by which ˆf wuld change if we estimated it using a different training data set. Since the training data are used t fit the statistical learning methd, different training data sets will result in a different ˆf. But ideally the estimate fr f shuld nt vary t much between training sets. Hwever, if a methd has high variance then small changes in the training data can result in large changes in ˆf. In general, mre flexible statistical methds have higher variance. Cnsider the

21 2.2 Assessing Mdel Accuracy 35 green and range curves in Figure 2.9. The flexible green curve is fllwing the bservatins very clsely. It has high variance because changing any ne f these data pints may cause the estimate ˆf t change cnsiderably. In cntrast, the range least squares line is relatively inflexible and has lw variance, because mving any single bservatin will likely cause nly a small shift in the psitin f the line. On the ther hand, bias refers t the errr that is intrduced by apprximating a real-life prblem, which may be extremely cmplicated, by a much simpler mdel. Fr example, linear regressin assumes that there is a linear relatinship between Y and X 1,X 2,...,X p. It is unlikely that any real-life prblem truly has such a simple linear relatinship, and s perfrming linear regressin will undubtedly result in sme bias in the estimate f f. In Figure 2.11, the true f is substantially nn-linear, s n matter hw many training bservatins we are given, it will nt be pssible t prduce an accurate estimate using linear regressin. In ther wrds, linear regressin results in high bias in this example. Hwever, in Figure 2.10 the true f is very clse t linear, and s given enugh data, it shuld be pssible fr linear regressin t prduce an accurate estimate. Generally, mre flexible methds result in less bias. As a general rule, as we use mre flexible methds, the variance will increase and the bias will decrease. The relative rate f change f these tw quantities determines whether the test MSE increases r decreases. As we increase the flexibility f a class f methds, the bias tends t initially decrease faster than the variance increases. Cnsequently, the expected test MSE declines. Hwever, at sme pint increasing flexibility has little impact n the bias but starts t significantly increase the variance. When this happens the test MSE increases. Nte that we bserved this pattern f decreasing test MSE fllwed by increasing test MSE in the right-hand panels f Figures The three plts in Figure 2.12 illustrate Equatin 2.7 fr the examples in Figures In each case the blue slid curve represents the squared bias, fr different levels f flexibility, while the range curve crrespnds t the variance. The hrizntal dashed line represents Var(ɛ), the irreducible errr. Finally, the red curve, crrespnding t the test set MSE, is the sum f these three quantities. In all three cases, the variance increases and the bias decreases as the methd s flexibility increases. Hwever, the flexibility level crrespnding t the ptimal test MSE differs cnsiderably amng the three data sets, because the squared bias and variance change at different rates in each f the data sets. In the left-hand panel f Figure 2.12, the bias initially decreases rapidly, resulting in an initial sharp decrease in the expected test MSE. On the ther hand, in the center panel f Figure 2.12 the true f is clse t linear, s there is nly a small decrease in bias as flexibility increases, and the test MSE nly declines slightly befre increasing rapidly as the variance increases. Finally, in the right-hand panel f Figure 2.12, as flexibility increases, there is a dramatic decline in bias because

22 36 2. Statistical Learning MSE Bias Var Flexibility Flexibility Flexibility FIGURE Squared bias (blue curve), variance (range curve), Var(ɛ) (dashed line), and test MSE (red curve) fr the three data sets in Figures The vertical dtted line indicates the flexibility level crrespnding t the smallest test MSE. the true f is very nn-linear. There is als very little increase in variance as flexibility increases. Cnsequently, the test MSE declines substantially befre experiencing a small increase as mdel flexibility increases. The relatinship between bias, variance, and test set MSE given in Equatin 2.7 and displayed in Figure 2.12 is referred t as the bias-variance trade-ff. Gd test set perfrmance f a statistical learning methd re- bias-variance quires lw variance as well as lw squared bias. This is referred t as a trade-ff trade-ff because it is easy t btain a methd with extremely lw bias but high variance (fr instance, by drawing a curve that passes thrugh every single training bservatin) r a methd with very lw variance but high bias (by fitting a hrizntal line t the data). The challenge lies in finding a methd fr which bth the variance and the squared bias are lw. This trade-ff is ne f the mst imprtant recurring themes in this bk. In a real-life situatin in which f is unbserved, it is generally nt pssible t explicitly cmpute the test MSE, bias, r variance fr a statistical learning methd. Nevertheless, ne shuld always keep the bias-variance trade-ff in mind. In this bk we explre methds that are extremely flexible and hence can essentially eliminate bias. Hwever, this des nt guarantee that they will utperfrm a much simpler methd such as linear regressin. T take an extreme example, suppse that the true f is linear. In this situatin linear regressin will have n bias, making it very hard fr a mre flexible methd t cmpete. In cntrast, if the true f is highly nn-linear and we have an ample number f training bservatins, then we may d better using a highly flexible apprach, as in Figure In Chapter 5 we discuss crss-validatin, which is a way t estimate the test MSE using the training data.

23 2.2.3 The Classificatin Setting 2.2 Assessing Mdel Accuracy 37 Thus far, ur discussin f mdel accuracy has been fcused n the regressin setting. But many f the cncepts that we have encuntered, such as the bias-variance trade-ff, transfer ver t the classificatin setting with nly sme mdificatins due t the fact that y i is n lnger numerical. Suppse that we seek t estimate f n the basis f training bservatins {(x 1,y 1 ),...,(x n,y n )}, where nw y 1,...,y n are qualitative. The mst cmmn apprach fr quantifying the accuracy f ur estimate ˆf is the training errr rate, the prprtin f mistakes that are made if we apply errr rate ur estimate ˆf t the training bservatins: 1 n n I(y i ŷ i ). (2.8) i=1 Here ŷ i is the predicted class label fr the ith bservatin using ˆf. And I(y i ŷ i )isanindicatr variable that equals 1 if y i ŷ i and zer if y i =ŷ i. If I(y i ŷ i ) = 0 then the ith bservatin was classified crrectly by ur classificatin methd; therwise it was misclassified. Hence Equatin 2.8 cmputes the fractin f incrrect classificatins. indicatr variable Equatin 2.8 is referred t as the training errr rate because it is cm- training puted based n the data that was used t train ur classifier. As in the errr regressin setting, we are mst interested in the errr rates that result frm applying ur classifier t test bservatins that were nt used in training. The test errr rate assciated with a set f test bservatins f the frm test errr (x 0,y 0 )isgivenby Ave (I(y 0 ŷ 0 )), (2.9) where ŷ 0 is the predicted class label that results frm applying the classifier t the test bservatin with predictr x 0.Agd classifier is ne fr which the test errr (2.9) is smallest. The Bayes Classifier It is pssible t shw (thugh the prf is utside f the scpe f this bk) that the test errr rate given in (2.9) is minimized, n average, by a very simple classifier that assigns each bservatin t the mst likely class, given its predictr values. In ther wrds, we shuld simply assign a test bservatin with predictr vectr x 0 t the class j fr which Pr(Y = j X = x 0 ) (2.10) is largest. Nte that (2.10) isacnditinal prbability: it is the prbability cnditinal that Y = j, given the bserved predictr vectr x 0. This very simple classifier is called the Bayes classifier. In a tw-class prblem where there are Bayes prbability nly tw pssible respnse values, say class 1 r class 2, the Bayes classifier classifier

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

How do scientists measure trees? What is DBH?

How do scientists measure trees? What is DBH? Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science Weathering Title: Chemical and Mechanical Weathering Grade Level: 9-12 Subject/Cntent: Earth and Space Science Summary f Lessn: Students will test hw chemical and mechanical weathering can affect a rck

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date AP Statistics Practice Test Unit Three Explring Relatinships Between Variables Name Perid Date True r False: 1. Crrelatin and regressin require explanatry and respnse variables. 1. 2. Every least squares

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Five Whys How To Do It Better

Five Whys How To Do It Better Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex

More information

Activity Guide Loops and Random Numbers

Activity Guide Loops and Random Numbers Unit 3 Lessn 7 Name(s) Perid Date Activity Guide Lps and Randm Numbers CS Cntent Lps are a relatively straightfrward idea in prgramming - yu want a certain chunk f cde t run repeatedly - but it takes a

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

CHM112 Lab Graphing with Excel Grading Rubric

CHM112 Lab Graphing with Excel Grading Rubric Name CHM112 Lab Graphing with Excel Grading Rubric Criteria Pints pssible Pints earned Graphs crrectly pltted and adhere t all guidelines (including descriptive title, prperly frmatted axes, trendline

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line Chapter 13: The Crrelatin Cefficient and the Regressin Line We begin with a sme useful facts abut straight lines. Recall the x, y crdinate system, as pictured belw. 3 2 1 y = 2.5 y = 0.5x 3 2 1 1 2 3 1

More information

B. Definition of an exponential

B. Definition of an exponential Expnents and Lgarithms Chapter IV - Expnents and Lgarithms A. Intrductin Starting with additin and defining the ntatins fr subtractin, multiplicatin and divisin, we discvered negative numbers and fractins.

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Experiment #3. Graphing with Excel

Experiment #3. Graphing with Excel Experiment #3. Graphing with Excel Study the "Graphing with Excel" instructins that have been prvided. Additinal help with learning t use Excel can be fund n several web sites, including http://www.ncsu.edu/labwrite/res/gt/gt-

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

Assessment Primer: Writing Instructional Objectives

Assessment Primer: Writing Instructional Objectives Assessment Primer: Writing Instructinal Objectives (Based n Preparing Instructinal Objectives by Mager 1962 and Preparing Instructinal Objectives: A critical tl in the develpment f effective instructin

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10]

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10] EECS 270, Winter 2017, Lecture 3 Page 1 f 6 Medium Scale Integrated (MSI) devices [Sectins 2.9 and 2.10] As we ve seen, it s smetimes nt reasnable t d all the design wrk at the gate-level smetimes we just

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Springer Texts in Statistics

Springer Texts in Statistics Springer Texts in Statistics Series Editrs: G. Casella S. Fienberg I. Olkin Fr further vlumes: http://www.springer.cm/series/417 Gareth James Daniela Witten Trevr Hastie Rbert Tibshirani An Intrductin

More information

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 6. An electrchemical cell is cnstructed with an pen switch, as shwn in the diagram abve. A strip f Sn and a strip f an unknwn metal, X, are used as electrdes.

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Department of Electrical Engineering, University of Waterloo. Introduction

Department of Electrical Engineering, University of Waterloo. Introduction Sectin 4: Sequential Circuits Majr Tpics Types f sequential circuits Flip-flps Analysis f clcked sequential circuits Mre and Mealy machines Design f clcked sequential circuits State transitin design methd

More information

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology Technical Bulletin Generatin Intercnnectin Prcedures Revisins t Cluster 4, Phase 1 Study Methdlgy Release Date: Octber 20, 2011 (Finalizatin f the Draft Technical Bulletin released n September 19, 2011)

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

READING STATECHART DIAGRAMS

READING STATECHART DIAGRAMS READING STATECHART DIAGRAMS Figure 4.48 A Statechart diagram with events The diagram in Figure 4.48 shws all states that the bject plane can be in during the curse f its life. Furthermre, it shws the pssible

More information

Pipetting 101 Developed by BSU CityLab

Pipetting 101 Developed by BSU CityLab Discver the Micrbes Within: The Wlbachia Prject Pipetting 101 Develped by BSU CityLab Clr Cmparisns Pipetting Exercise #1 STUDENT OBJECTIVES Students will be able t: Chse the crrect size micrpipette fr

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

Smoothing, penalized least squares and splines

Smoothing, penalized least squares and splines Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin

More information

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS Christpher Cstell, Andrew Slw, Michael Neubert, and Stephen Plasky Intrductin The central questin in the ecnmic analysis f climate change plicy cncerns

More information

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

Getting Involved O. Responsibilities of a Member. People Are Depending On You. Participation Is Important. Think It Through

Getting Involved O. Responsibilities of a Member. People Are Depending On You. Participation Is Important. Think It Through f Getting Invlved O Literature Circles can be fun. It is exciting t be part f a grup that shares smething. S get invlved, read, think, and talk abut bks! Respnsibilities f a Member Remember a Literature

More information

BASD HIGH SCHOOL FORMAL LAB REPORT

BASD HIGH SCHOOL FORMAL LAB REPORT BASD HIGH SCHOOL FORMAL LAB REPORT *WARNING: After an explanatin f what t include in each sectin, there is an example f hw the sectin might lk using a sample experiment Keep in mind, the sample lab used

More information

Subject description processes

Subject description processes Subject representatin 6.1.2. Subject descriptin prcesses Overview Fur majr prcesses r areas f practice fr representing subjects are classificatin, subject catalging, indexing, and abstracting. The prcesses

More information

Tutorial 4: Parameter optimization

Tutorial 4: Parameter optimization SRM Curse 2013 Tutrial 4 Parameters Tutrial 4: Parameter ptimizatin The aim f this tutrial is t prvide yu with a feeling f hw a few f the parameters that can be set n a QQQ instrument affect SRM results.

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

Physics 2010 Motion with Constant Acceleration Experiment 1

Physics 2010 Motion with Constant Acceleration Experiment 1 . Physics 00 Mtin with Cnstant Acceleratin Experiment In this lab, we will study the mtin f a glider as it accelerates dwnhill n a tilted air track. The glider is supprted ver the air track by a cushin

More information

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents WRITING THE REPORT Organizing the reprt Mst reprts shuld be rganized in the fllwing manner. Smetime there is a valid reasn t include extra chapters in within the bdy f the reprt. 1. Title page 2. Executive

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

We can see from the graph above that the intersection is, i.e., [ ).

We can see from the graph above that the intersection is, i.e., [ ). MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

Chapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms

Chapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms Chapter 5 1 Chapter Summary Mathematical Inductin Strng Inductin Recursive Definitins Structural Inductin Recursive Algrithms Sectin 5.1 3 Sectin Summary Mathematical Inductin Examples f Prf by Mathematical

More information

ChE 471: LECTURE 4 Fall 2003

ChE 471: LECTURE 4 Fall 2003 ChE 47: LECTURE 4 Fall 003 IDEL RECTORS One f the key gals f chemical reactin engineering is t quantify the relatinship between prductin rate, reactr size, reactin kinetics and selected perating cnditins.

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

Physical Layer: Outline

Physical Layer: Outline 18-: Intrductin t Telecmmunicatin Netwrks Lectures : Physical Layer Peter Steenkiste Spring 01 www.cs.cmu.edu/~prs/nets-ece Physical Layer: Outline Digital Representatin f Infrmatin Characterizatin f Cmmunicatin

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

8 th Grade Math: Pre-Algebra

8 th Grade Math: Pre-Algebra Hardin Cunty Middle Schl (2013-2014) 1 8 th Grade Math: Pre-Algebra Curse Descriptin The purpse f this curse is t enhance student understanding, participatin, and real-life applicatin f middle-schl mathematics

More information

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Interference is when two (or more) sets of waves meet and combine to produce a new pattern. Interference Interference is when tw (r mre) sets f waves meet and cmbine t prduce a new pattern. This pattern can vary depending n the riginal wave directin, wavelength, amplitude, etc. The tw mst extreme

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university

More information

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes Chemistry 20 Lessn 11 Electrnegativity, Plarity and Shapes In ur previus wrk we learned why atms frm cvalent bnds and hw t draw the resulting rganizatin f atms. In this lessn we will learn (a) hw the cmbinatin

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

Localized Model Selection for Regression

Localized Model Selection for Regression Lcalized Mdel Selectin fr Regressin Yuhng Yang Schl f Statistics University f Minnesta Church Street S.E. Minneaplis, MN 5555 May 7, 007 Abstract Research n mdel/prcedure selectin has fcused n selecting

More information

Sequential Allocation with Minimal Switching

Sequential Allocation with Minimal Switching In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

Cells though to send feedback signals from the medulla back to the lamina o L: Lamina Monopolar cells

Cells though to send feedback signals from the medulla back to the lamina o L: Lamina Monopolar cells Classificatin Rules (and Exceptins) Name: Cell type fllwed by either a clumn ID (determined by the visual lcatin f the cell) r a numeric identifier t separate ut different examples f a given cell type

More information

1 The limitations of Hartree Fock approximation

1 The limitations of Hartree Fock approximation Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants

More information

INSTRUMENTAL VARIABLES

INSTRUMENTAL VARIABLES INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment

More information

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is Length L>>a,b,c Phys 232 Lab 4 Ch 17 Electric Ptential Difference Materials: whitebards & pens, cmputers with VPythn, pwer supply & cables, multimeter, crkbard, thumbtacks, individual prbes and jined prbes,

More information

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method. Lessn Plan Reach: Ask the students if they ever ppped a bag f micrwave ppcrn and nticed hw many kernels were unppped at the bttm f the bag which made yu wnder if ther brands pp better than the ne yu are

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th,

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th, Phys. 344 Ch 7 Lecture 8 Fri., April. 0 th, 009 Fri. 4/0 8. Ising Mdel f Ferrmagnets HW30 66, 74 Mn. 4/3 Review Sat. 4/8 3pm Exam 3 HW Mnday: Review fr est 3. See n-line practice test lecture-prep is t

More information