Springer Texts in Statistics

Size: px
Start display at page:

Download "Springer Texts in Statistics"

Transcription

1 Springer Texts in Statistics Series Editrs: G. Casella S. Fienberg I. Olkin Fr further vlumes:

2

3 Gareth James Daniela Witten Trevr Hastie Rbert Tibshirani An Intrductin t Statistical Learning with Applicatins in R 123

4 Gareth James Department f Infrmatin and Operatins Management University f Suthern Califrnia Ls Angeles, CA, USA Trevr Hastie Department f Statistics Stanfrd University Stanfrd, CA, USA Daniela Witten Department f Bistatistics University f Washingtn Seattle, WA, USA Rbert Tibshirani Department f Statistics Stanfrd University Stanfrd, CA, USA ISSN X ISBN ISBN (ebk) DOI / Springer New Yrk Heidelberg Drdrecht Lndn Library f Cngress Cntrl Number: Springer Science+Business Media New Yrk 2013 (Crrected at 4 printing 2014) This wrk is subject t cpyright. All rights are reserved by the Publisher, whether the whle r part f the material is cncerned, specifically the rights f translatin, reprinting, reuse f illustratins, recitatin, bradcasting, reprductin n micrfilms r in any ther physical way, and transmissin r infrmatin strage and retrieval, electrnic adaptatin, cmputer sftware, r by similar r dissimilar methdlgy nw knwn r hereafter develped. Exempted frm this legal reservatin are brief excerpts in cnnectin with reviews r schlarly analysis r material supplied specifically fr the purpse f being entered and executed n a cmputer system, fr exclusive use by the purchaser f the wrk. Duplicatin f this publicatin r parts theref is permitted nly under the prvisins f the Cpyright Law f the Publisher s lcatin, in its current versin, and permissin fr use must always be btained frm Springer. Permissins fr use may be btained thrugh RightsLink at the Cpyright Clearance Center. Vilatins are liable t prsecutin under the respective Cpyright Law. The use f general descriptive names, registered names, trademarks, service marks, etc. in this publicatin des nt imply, even in the absence f a specific statement, that such names are exempt frm the relevant prtective laws and regulatins and therefre free fr general use. While the advice and infrmatin in this bk are believed t be true and accurate at the date f publicatin, neither the authrs nr the editrs nr the publisher can accept any legal respnsibility fr any errrs r missins that may be made. The publisher makes n warranty, express r implied, with respect t the material cntained herein. Printed n acid-free paper Springer is part f Springer Science+Business Media (

5 T ur parents: Alisn and Michael James Chiara Nappi and Edward Witten Valerie and Patrick Hastie Vera and Sami Tibshirani and t ur families: Michael, Daniel, and Catherine Ari Samantha, Timthy, and Lynda Charlie, Ryan, Julie, and Cheryl

6

7 Preface Statistical learning refers t a set f tls fr mdeling and understanding cmplex datasets. It is a recently develped area in statistics and blends with parallel develpments in cmputer science and, in particular, machine learning. The field encmpasses many methds such as the lass and sparse regressin, classificatin and regressin trees, and bsting and supprt vectr machines. With the explsin f Big Data prblems, statistical learning has becme a very ht field in many scientific areas as well as marketing, finance, and ther business disciplines. Peple with statistical learning skills are in high demand. One f the first bks in this area The Elements f Statistical Learning (ESL) (Hastie, Tibshirani, and Friedman) was published in 2001, with a secnd editin in ESL has becme a ppular text nt nly in statistics but als in related fields. One f the reasns fr ESL s ppularity is its relatively accessible style. But ESL is intended fr individuals with advanced training in the mathematical sciences. An Intrductin t Statistical Learning (ISL) arse frm the perceived need fr a brader and less technical treatment f these tpics. In this new bk, we cver many f the same tpics as ESL, but we cncentrate mre n the applicatins f the methds and less n the mathematical details. We have created labs illustrating hw t implement each f the statistical learning methds using the ppular statistical sftware package R. These labs prvide the reader with valuable hands-n experience. This bk is apprpriate fr advanced undergraduates r master s students in statistics r related quantitative fields r fr individuals in ther vii

8 viii Preface disciplines wh wish t use statistical learning tls t analyze their data. It can be used as a textbk fr a curse spanning ne r tw semesters. We wuld like t thank several readers fr valuable cmments n preliminary drafts f this bk: Pallavi Basu, Alexandra Chuldechva, Patrick Danaher, Will Fithian, Luella Fu, Sam Grss, Max Grazier G Sell, Curtney Paulsn, Xingha Qia, Elisa Sheng, Nah Simn, Kean Ming Tan, and Xin Lu Tan. It s tugh t make predictins, especially abut the future. -Ygi Berra Ls Angeles, USA Seattle, USA Pal Alt, USA Pal Alt, USA Gareth James Daniela Witten Trevr Hastie Rbert Tibshirani

9 Cntents Preface vii 1 Intrductin 1 2 Statistical Learning What Is Statistical Learning? Why Estimate f? Hw D We Estimate f? The Trade-Off Between Predictin Accuracy and Mdel Interpretability Supervised Versus Unsupervised Learning Regressin Versus Classificatin Prblems Assessing Mdel Accuracy Measuring the Quality f Fit The Bias-Variance Trade-Off The Classificatin Setting Lab: Intrductin t R Basic Cmmands Graphics Indexing Data Lading Data Additinal Graphical and Numerical Summaries Exercises ix

10 x Cntents 3 Linear Regressin Simple Linear Regressin Estimating the Cefficients Assessing the Accuracy f the Cefficient Estimates Assessing the Accuracy f the Mdel Multiple Linear Regressin Estimating the Regressin Cefficients Sme Imprtant Questins Other Cnsideratins in the Regressin Mdel Qualitative Predictrs Extensins f the Linear Mdel Ptential Prblems The Marketing Plan Cmparisn f Linear Regressin with K-Nearest Neighbrs Lab: Linear Regressin Libraries Simple Linear Regressin Multiple Linear Regressin Interactin Terms Nn-linear Transfrmatins f the Predictrs Qualitative Predictrs Writing Functins Exercises Classificatin An Overview f Classificatin Why Nt Linear Regressin? Lgistic Regressin The Lgistic Mdel Estimating the Regressin Cefficients Making Predictins Multiple Lgistic Regressin Lgistic Regressin fr >2 Respnse Classes Linear Discriminant Analysis Using Bayes Therem fr Classificatin Linear Discriminant Analysis fr p = Linear Discriminant Analysis fr p> Quadratic Discriminant Analysis A Cmparisn f Classificatin Methds Lab: Lgistic Regressin, LDA, QDA, and KNN The Stck Market Data Lgistic Regressin Linear Discriminant Analysis

11 Cntents xi Quadratic Discriminant Analysis K-Nearest Neighbrs An Applicatin t Caravan Insurance Data Exercises Resampling Methds Crss-Validatin The Validatin Set Apprach Leave-One-Out Crss-Validatin k-fld Crss-Validatin Bias-Variance Trade-Off fr k-fld Crss-Validatin Crss-Validatin n Classificatin Prblems The Btstrap Lab: Crss-Validatin and the Btstrap The Validatin Set Apprach Leave-One-Out Crss-Validatin k-fld Crss-Validatin The Btstrap Exercises Linear Mdel Selectin and Regularizatin Subset Selectin Best Subset Selectin Stepwise Selectin Chsing the Optimal Mdel Shrinkage Methds Ridge Regressin The Lass Selecting the Tuning Parameter Dimensin Reductin Methds Principal Cmpnents Regressin Partial Least Squares Cnsideratins in High Dimensins High-Dimensinal Data What Ges Wrng in High Dimensins? Regressin in High Dimensins Interpreting Results in High Dimensins Lab 1: Subset Selectin Methds Best Subset Selectin Frward and Backward Stepwise Selectin Chsing Amng Mdels Using the Validatin Set Apprach and Crss-Validatin

12 xii Cntents 6.6 Lab 2: Ridge Regressin and the Lass Ridge Regressin The Lass Lab 3: PCR and PLS Regressin Principal Cmpnents Regressin Partial Least Squares Exercises Mving Beynd Linearity Plynmial Regressin Step Functins Basis Functins Regressin Splines Piecewise Plynmials Cnstraints and Splines The Spline Basis Representatin Chsing the Number and Lcatins f the Knts Cmparisn t Plynmial Regressin Smthing Splines An Overview f Smthing Splines Chsing the Smthing Parameter λ Lcal Regressin Generalized Additive Mdels GAMs fr Regressin Prblems GAMs fr Classificatin Prblems Lab: Nn-linear Mdeling Plynmial Regressin and Step Functins Splines GAMs Exercises Tree-Based Methds The Basics f Decisin Trees Regressin Trees Classificatin Trees Trees Versus Linear Mdels Advantages and Disadvantages f Trees Bagging, Randm Frests, Bsting Bagging Randm Frests Bsting Lab: Decisin Trees Fitting Classificatin Trees Fitting Regressin Trees

13 Cntents xiii Bagging and Randm Frests Bsting Exercises Supprt Vectr Machines Maximal Margin Classifier What Is a Hyperplane? Classificatin Using a Separating Hyperplane The Maximal Margin Classifier Cnstructin f the Maximal Margin Classifier The Nn-separable Case Supprt Vectr Classifiers Overview f the Supprt Vectr Classifier Details f the Supprt Vectr Classifier Supprt Vectr Machines Classificatin with Nn-linear Decisin Bundaries The Supprt Vectr Machine An Applicatin t the Heart Disease Data SVMs with Mre than Tw Classes One-Versus-One Classificatin One-Versus-All Classificatin Relatinship t Lgistic Regressin Lab: Supprt Vectr Machines Supprt Vectr Classifier Supprt Vectr Machine ROC Curves SVM with Multiple Classes Applicatin t Gene Expressin Data Exercises Unsupervised Learning The Challenge f Unsupervised Learning Principal Cmpnents Analysis What Are Principal Cmpnents? Anther Interpretatin f Principal Cmpnents Mre n PCA Other Uses fr Principal Cmpnents Clustering Methds K-Means Clustering Hierarchical Clustering Practical Issues in Clustering Lab 1: Principal Cmpnents Analysis

14 xiv Cntents 10.5 Lab 2: Clustering K-Means Clustering Hierarchical Clustering Lab 3: NCI60 Data Example PCA n the NCI60 Data Clustering the Observatins f the NCI60 Data Exercises Index 419

15 1 Intrductin An Overview f Statistical Learning Statistical learning refers t a vast set f tls fr understanding data. These tls can be classified as supervised r unsupervised. Bradly speaking, supervised statistical learning invlves building a statistical mdel fr predicting, r estimating, an utput basednnermreinputs. Prblemsf this nature ccur in fields as diverse as business, medicine, astrphysics, and public plicy. With unsupervised statistical learning, there are inputs but n supervising utput; nevertheless we can learn relatinships and structure frm such data. T prvide an illustratin f sme applicatins f statistical learning, we briefly discuss three real-wrld data sets that are cnsidered in this bk. Wage Data In this applicatin (which we refer t as the Wage data set thrughut this bk), we examine a number f factrs that relate t wages fr a grup f males frm the Atlantic regin f the United States. In particular, we wish t understand the assciatin between an emplyee s age and educatin, as well as the calendar year, n his wage. Cnsider, fr example, the left-hand panel f Figure 1.1, which displays wage versus age fr each f the individuals in the data set. There is evidence that wage increases with age but then decreases again after apprximately age 60. The blue line, which prvides an estimate f the average wage fr a given age, makes this trend clearer. G. James et al., An Intrductin t Statistical Learning: with Applicatins in R, Springer Texts in Statistics, DOI / , Springer Science+Business Media New Yrk

16 2 1. Intrductin Wage Wage Wage Age Year Educatin Level FIGURE 1.1. Wage data, which cntains incme survey infrmatin fr males frm the central Atlantic regin f the United States. Left: wage as a functin f age. On average, wage increases with age until abut 60 years f age, at which pint it begins t decline. Center: wage as a functin f year. Thereisaslw but steady increase f apprximately $10,000 in the average wage between 2003 and Right: Bxplts displaying wage as a functin f educatin, with1 indicating the lwest level (n high schl diplma) and 5 the highest level (an advanced graduate degree). On average, wage increases with the level f educatin. Givenanemplyee sage, we can use this curve t predict his wage. Hwever, it is als clear frm Figure 1.1 that there is a significant amunt f variability assciated with this average value, and s age alne is unlikely t prvide an accurate predictin f a particular man s wage. We als have infrmatin regarding each emplyee s educatin level and the year in which the wage was earned. The center and right-hand panels f Figure 1.1, which display wage as a functin f bth year and educatin, indicate that bth f these factrs are assciated with wage. Wages increase by apprximately $10,000, in a rughly linear (r straight-line) fashin, between 2003 and 2009, thugh this rise is very slight relative t the variability in the data. Wages are als typically greater fr individuals with higher educatin levels: men with the lwest educatin level (1) tend t have substantially lwer wages than thse with the highest educatin level (5). Clearly, the mst accurate predictin f a given man s wage will be btained by cmbining his age, his educatin, and the year. In Chapter 3, we discuss linear regressin, which can be used t predict wage frm this data set. Ideally, we shuld predict wage in a way that accunts fr the nn-linear relatinship between wage and age. In Chapter 7, we discuss a class f appraches fr addressing this prblem. Stck Market Data The Wage data invlves predicting a cntinuus r quantitative utput value. This is ften referred t as a regressin prblem. Hwever, in certain cases we may instead wish t predict a nn-numerical value that is, a categrical

17 1. Intrductin 3 Yesterday Tw Days Previus Three Days Previus Percentage change in S&P Percentage change in S&P Percentage change in S&P Dwn Up Tday s Directin Dwn Up Tday s Directin Dwn Up Tday s Directin FIGURE 1.2. Left: Bxplts f the previus day s percentage change in the S&P index fr the days fr which the market increased r decreased, btained frm the Smarket data. Center and Right: Same as left panel, but the percentage changes fr 2 and 3 days previus are shwn. r qualitative utput. Fr example, in Chapter 4 we examine a stck market data set that cntains the daily mvements in the Standard & Pr s 500 (S&P) stck index ver a 5-year perid between 2001 and We refer t this as the Smarket data. The gal is t predict whether the index will increase r decrease n a given day using the past 5 days percentage changes in the index. Here the statistical learning prblem des nt invlve predicting a numerical value. Instead it invlves predicting whether a given day s stck market perfrmance will fall int the Up bucket r the Dwn bucket. This is knwn as a classificatin prblem. A mdel that culd accurately predict the directin in which the market will mve wuld be very useful! The left-hand panel f Figure 1.2 displays tw bxplts f the previus day s percentage changes in the stck index: ne fr the 648 days fr which the market increased n the subsequent day, and ne fr the 602 days fr which the market decreased. The tw plts lk almst identical, suggesting that there is n simple strategy fr using yesterday s mvement in the S&P t predict tday s returns. The remaining panels, which display bxplts fr the percentage changes 2 and 3 days previus t tday, similarly indicate little assciatin between past and present returns. Of curse, this lack f pattern is t be expected: in the presence f strng crrelatins between successive days returns, ne culd adpt a simple trading strategy t generate prfits frm the market. Nevertheless, in Chapter 4, we explre these data using several different statistical learning methds. Interestingly, there are hints f sme weak trends in the data that suggest that, at least fr this 5-year perid, it is pssible t crrectly predict the directin f mvement in the market apprximately 60% f the time (Figure 1.3).

18 4 1. Intrductin Predicted Prbability Dwn Up Tday s Directin FIGURE 1.3. We fit a quadratic discriminant analysis mdel t the subset f the Smarket data crrespnding t the time perid, and predicted the prbability f a stck market decrease using the 2005 data. On average, the predicted prbability f decrease is higher fr the days in which the market des decrease. Based n these results, we are able t crrectly predict the directin f mvement in the market 60% f the time. Gene Expressin Data The previus tw applicatins illustrate data sets with bth input and utput variables. Hwever, anther imprtant class f prblems invlves situatins in which we nly bserve input variables, with n crrespnding utput. Fr example, in a marketing setting, we might have demgraphic infrmatin fr a number f current r ptential custmers. We may wish t understand which types f custmers are similar t each ther by gruping individuals accrding t their bserved characteristics. This is knwn as a clustering prblem. Unlike in the previus examples, here we are nt trying t predict an utput variable. We devte Chapter 10 t a discussin f statistical learning methds fr prblems in which n natural utput variable is available. We cnsider the NCI60 data set, which cnsists f 6,830 gene expressin measurements fr each f 64 cancer cell lines. Instead f predicting a particular utput variable, we are interested in determining whether there are grups, r clusters, amng the cell lines based n their gene expressin measurements. This is a difficult questin t address, in part because there are thusands f gene expressin measurements per cell line, making it hard t visualize the data. The left-hand panel f Figure 1.4 addresses this prblem by representing each f the 64 cell lines using just tw numbers, Z 1 and Z 2.These are the first tw principal cmpnents f the data, which summarize the 6, 830 expressin measurements fr each cell line dwn t tw numbers r dimensins. While it is likely that this dimensin reductin has resulted in

19 1. Intrductin 5 Z Z Z Z 1 FIGURE 1.4. Left: Representatin f the NCI60 gene expressin data set in a tw-dimensinal space, Z 1 and Z 2. Each pint crrespnds t ne f the 64 cell lines. There appear t be fur grups f cell lines, which we have represented using different clrs. Right: Same as left panel except that we have represented each f the 14 different types f cancer using a different clred symbl. Cell lines crrespnding t the same cancer type tend t be nearby in the tw-dimensinal space. sme lss f infrmatin, it is nw pssible t visually examine the data fr evidence f clustering. Deciding n the number f clusters is ften a difficult prblem. But the left-hand panel f Figure 1.4 suggests at least fur grups f cell lines, which we have represented using separate clrs. We can nw examine the cell lines within each cluster fr similarities in their types f cancer, in rder t better understand the relatinship between gene expressin levels and cancer. In this particular data set, it turns ut that the cell lines crrespnd t 14 different types f cancer. (Hwever, this infrmatin was nt used t create the left-hand panel f Figure 1.4.) The right-hand panel f Figure 1.4 is identical t the left-hand panel, except that the 14 cancer types are shwn using distinct clred symbls. There is clear evidence that cell lines with the same cancer type tend t be lcated near each ther in this tw-dimensinal representatin. In additin, even thugh the cancer infrmatin was nt used t prduce the left-hand panel, the clustering btained des bear sme resemblance t sme f the actual cancer types bserved in the right-hand panel. This prvides sme independent verificatin f the accuracy f ur clustering analysis. A Brief Histry f Statistical Learning Thugh the term statistical learning is fairly new, many f the cncepts that underlie the field were develped lng ag. At the beginning f the nineteenth century, Legendre and Gauss published papers n the methd

20 6 1. Intrductin f least squares, which implemented the earliest frm f what is nw knwn as linear regressin. The apprach was first successfully applied t prblems in astrnmy. Linear regressin is used fr predicting quantitative values, such as an individual s salary. In rder t predict qualitative values, such as whether a patient survives r dies, r whether the stck market increases r decreases, Fisher prpsed linear discriminant analysis in In the 1940s, varius authrs put frth an alternative apprach, lgistic regressin. In the early 1970s, Nelder and Wedderburn cined the term generalized linear mdels fr an entire class f statistical learning methds that include bth linear and lgistic regressin as special cases. By the end f the 1970s, many mre techniques fr learning frm data were available. Hwever, they were almst exclusively linear methds, because fitting nn-linear relatinships was cmputatinally infeasible at the time. By the 1980s, cmputing technlgy had finally imprved sufficiently that nn-linear methds were n lnger cmputatinally prhibitive. In mid 1980s Breiman, Friedman, Olshen and Stne intrduced classificatin and regressin trees, and were amng the first t demnstrate the pwer f a detailed practical implementatin f a methd, including crss-validatin fr mdel selectin. Hastie and Tibshirani cined the term generalized additive mdels in 1986 fr a class f nn-linear extensins t generalized linear mdels, and als prvided a practical sftware implementatin. Since that time, inspired by the advent f machine learning and ther disciplines, statistical learning has emerged as a new subfield in statistics, fcused n supervised and unsupervised mdeling and predictin. In recent years, prgress in statistical learning has been marked by the increasing availability f pwerful and relatively user-friendly sftware, such as the ppular and freely available R system. This has the ptential t cntinue the transfrmatin f the field frm a set f techniques used and develped by statisticians and cmputer scientists t an essential tlkit fr a much brader cmmunity. This Bk The Elements f Statistical Learning (ESL) by Hastie, Tibshirani, and Friedman was first published in Since that time, it has becme an imprtant reference n the fundamentals f statistical machine learning. Its success derives frm its cmprehensive and detailed treatment f many imprtant tpics in statistical learning, as well as the fact that (relative t many upper-level statistics textbks) it is accessible t a wide audience. Hwever, the greatest factr behind the success f ESL has been its tpical nature. At the time f its publicatin, interest in the field f statistical

21 1. Intrductin 7 learning was starting t explde. ESL prvided ne f the first accessible and cmprehensive intrductins t the tpic. Since ESL was first published, the field f statistical learning has cntinued t flurish. The field s expansin has taken tw frms. The mst bvius grwth has invlved the develpment f new and imprved statistical learning appraches aimed at answering a range f scientific questins acrss a number f fields. Hwever, the field f statistical learning has als expanded its audience. In the 1990s, increases in cmputatinal pwer generated a surge f interest in the field frm nn-statisticians wh were eager t use cutting-edge statistical tls t analyze their data. Unfrtunately, the highly technical nature f these appraches meant that the user cmmunity remained primarily restricted t experts in statistics, cmputer science, and related fields with the training (and time) t understand and implement them. In recent years, new and imprved sftware packages have significantly eased the implementatin burden fr many statistical learning methds. At the same time, there has been grwing recgnitin acrss a number f fields, frm business t health care t genetics t the scial sciences and beynd, that statistical learning is a pwerful tl with imprtant practical applicatins. As a result, the field has mved frm ne f primarily academic interest t a mainstream discipline, with an enrmus ptential audience. This trend will surely cntinue with the increasing availability f enrmus quantities f data and the sftware t analyze it. The purpse f An Intrductin t Statistical Learning (ISL) is t facilitate the transitin f statistical learning frm an academic t a mainstream field. ISL is nt intended t replace ESL, which is a far mre cmprehensive text bth in terms f the number f appraches cnsidered and the depth t which they are explred. We cnsider ESL t be an imprtant cmpanin fr prfessinals (with graduate degrees in statistics, machine learning, r related fields) wh need t understand the technical details behind statistical learning appraches. Hwever, the cmmunity f users f statistical learning techniques has expanded t include individuals with a wider range f interests and backgrunds. Therefre, we believe that there is nw a place fr a less technical and mre accessible versin f ESL. In teaching these tpics ver the years, we have discvered that they are f interest t master s and PhD students in fields as disparate as business administratin, bilgy, and cmputer science, as well as t quantitativelyriented upper-divisin undergraduates. It is imprtant fr this diverse grup t be able t understand the mdels, intuitins, and strengths and weaknesses f the varius appraches. But fr this audience, many f the technical details behind statistical learning methds, such as ptimizatin algrithms and theretical prperties, are nt f primary interest. We believe that these students d nt need a deep understanding f these aspects in rder t becme infrmed users f the varius methdlgies, and

22 8 1. Intrductin in rder t cntribute t their chsen fields thrugh the use f statistical learning tls. ISLR is based n the fllwing fur premises. 1. Many statistical learning methds are relevant and useful in a wide range f academic and nn-academic disciplines, beynd just the statistical sciences. We believe that many cntemprarystatisticallearning prcedures shuld, and will, becme as widely available and used as is currently the case fr classical methds such as linear regressin. As a result, rather than attempting t cnsider every pssible apprach (an impssible task), we have cncentrated n presenting the methds that we believe are mst widely applicable. 2. Statistical learning shuld nt be viewed as a series f black bxes. N single apprach will perfrm well in all pssible applicatins. Withut understanding all f the cgs inside the bx, r the interactin between thse cgs, it is impssible t select the best bx. Hence, we have attempted t carefully describe the mdel, intuitin, assumptins, and trade-ffs behind each f the methds that we cnsider. 3. While it is imprtant t knw what jb is perfrmed by each cg, it is nt necessary t have the skills t cnstruct the machine inside the bx! Thus, we have minimized discussin f technical details related t fitting prcedures and theretical prperties. We assume that the reader is cmfrtable with basic mathematical cncepts, but we d nt assume a graduate degree in the mathematical sciences. Fr instance, we have almst cmpletely avided the use f matrix algebra, and it is pssible t understand the entire bk withut a detailed knwledge f matrices and vectrs. 4. We presume that the reader is interested in applying statistical learning methds t real-wrld prblems. In rder t facilitate this, as well as t mtivate the techniques discussed, we have devted a sectin within each chapter t R cmputer labs. In each lab, we walk the reader thrugh a realistic applicatin f the methds cnsidered in that chapter. When we have taught this material in ur curses, we have allcated rughly ne-third f classrm time t wrking thrugh the labs, and we have fund them t be extremely useful. Many f the less cmputatinally-riented students wh were initially intimidated by R s cmmand level interface gt the hang f things ver the curse f the quarter r semester. We have used R because it is freely available and is pwerful enugh t implement all f the methds discussed in the bk. It als has ptinal packages that can be dwnladed t implement literally thusands f additinal methds. Mst imprtantly, R is the language f chice fr academic statisticians, and new appraches ften becme available in

23 1. Intrductin 9 R years befre they are implemented in cmmercial packages. Hwever, the labs in ISL are self-cntained, and can be skipped if the reader wishes t use a different sftware package r des nt wish t apply the methds discussed t real-wrld prblems. Wh Shuld Read This Bk? This bk is intended fr anyne wh is interested in using mdern statistical methds fr mdeling and predictin frm data. This grup includes scientists, engineers, data analysts, r quants, but als less technical individuals with degrees in nn-quantitative fields such as the scial sciences r business. We expect that the reader will have had at least ne elementary curse in statistics. Backgrund in linear regressin is als useful, thugh nt required, since we review the key cncepts behind linear regressin in Chapter 3. The mathematical level f this bk is mdest, and a detailed knwledge f matrix peratins is nt required. This bk prvides an intrductin t the statistical prgramming language R. Previus expsure t a prgramming language, such as MATLAB r Pythn, is useful but nt required. We have successfully taught material at this level t master s and PhD students in business, cmputer science, bilgy, earth sciences, psychlgy, and many ther areas f the physical and scial sciences. This bk culd als be apprpriate fr advanced undergraduates wh have already taken a curse n linear regressin. In the cntext f a mre mathematically rigrus curse in which ESL serves as the primary textbk, ISL culd be used as a supplementary text fr teaching cmputatinal aspects f the varius appraches. Ntatin and Simple Matrix Algebra Chsing ntatin fr a textbk is always a difficult task. Fr the mst part we adpt the same ntatinal cnventins as ESL. We will use n t represent the number f distinct data pints, r bservatins, in ur sample. We will let p dente the number f variables that are available fr use in making predictins. Fr example, the Wage data set cnsists f 12 variables fr 3,000 peple, s we have n =3,000 bservatins and p = 12 variables (such as year, age, wage, and mre). Nte that thrughut this bk, we indicate variable names using clred fnt: Variable Name. In sme examples, p might be quite large, such as n the rder f thusands r even millins; this situatin arises quite ften, fr example, in the analysis f mdern bilgical data r web-based advertising data.

24 10 1. Intrductin In general, we will let x ij represent the value f the jth variable fr the ith bservatin, where i =1, 2,...,n and j =1, 2,...,p. Thrughut this bk, i will be used t index the samples r bservatins (frm 1 t n) and j will be used t index the variables (frm 1 t p). We let X dente a n p matrix whse (i, j)th element is x ij.thatis, x 11 x x 1p x 21 x x 2p X = x n1 x n2... x np Fr readers wh are unfamiliar with matrices, it is useful t visualize X as a spreadsheet f numbers with n rws and p clumns. At times we will be interested in the rws f X, which we write as x 1,x 2,...,x n.herex i is a vectr f length p, cntaining the p variable measurements fr the ith bservatin. That is, x i1 x i2 x i =... (1.1) x ip (Vectrs are by default represented as clumns.) Fr example, fr the Wage data, x i is a vectr f length 12, cnsisting f year, age, wage, and ther values fr the ith individual. At ther times we will instead be interested in the clumns f X, which we write as x 1, x 2,...,x p. Each is a vectr f length n. Thatis, x 1j x 2j x j =.. x nj Fr example, fr the Wage data, x 1 cntains the n =3,000 values fr year. Using this ntatin, the matrix X can be written as X = ( x 1 x 2 x p ), r x T 1 x T 2 X =... x T n

25 1. Intrductin 11 The T ntatin dentes the transpse f a matrix r vectr. S, fr example, x 11 x x n1 X T x 12 x x n2 =..., x 1p x 2p... x np while x T i = ( x i1 x i2 x ip ). We use y i t dente the ith bservatin f the variable n which we wish t make predictins, such as wage. Hence, we write the set f all n bservatins in vectr frm as y 1 y 2 y =. y n. Then ur bserved data cnsists f {(x 1,y 1 ), (x 2,y 2 ),...,(x n,y n )}, where each x i is a vectr f length p. (Ifp =1,thenx i is simply a scalar.) In this text, a vectr f length n will always be dented in lwer case bld ; e.g. a 1 a 2 a =.. a n Hwever, vectrs that are nt f length n (such as feature vectrs f length p, as in (1.1)) will be dented in lwer case nrmal fnt, e.g. a. Scalars will als be dented in lwer case nrmal fnt, e.g. a. In the rare cases in which these tw uses fr lwer case nrmal fnt lead t ambiguity, we will clarify which use is intended. Matrices will be dented using bld capitals, such as A. Randm variables will be dented using capital nrmal fnt, e.g. A, regardless f their dimensins. Occasinally we will want t indicate the dimensin f a particular bject. T indicate that an bject is a scalar, we will use the ntatin a R. T indicate that it is a vectr f length k, we will use a R k (r a R n if it is f length n). We will indicate that an bject is a r s matrix using A R r s. We have avided using matrix algebra whenever pssible. Hwever, in a few instances it becmes t cumbersme t avid it entirely. In these rare instances it is imprtant t understand the cncept f multiplying tw matrices. Suppse that A R r d and B R d s. Then the prduct

26 12 1. Intrductin f A and B is dented AB. The(i, j)th element f AB is cmputed by multiplying each element f the ith rw f A by the crrespnding element f the jth clumn f B. Thatis,(AB) ij = d k=1 a ikb kj. As an example, cnsider A = Then ( )( ) AB = = ( ) and B = ( ) ( ) = ( ) Nte that this peratin prduces an r s matrix. It is nly pssible t cmpute AB if the number f clumns f A is the same as the number f rws f B. Organizatin f This Bk Chapter 2 intrduces the basic terminlgy and cncepts behind statistical learning. This chapter als presents the K-nearest neighbr classifier, a very simple methd that wrks surprisingly well n many prblems. Chapters 3 and 4 cver classical linear methds fr regressin and classificatin. In particular, Chapter 3 reviews linear regressin, the fundamental starting pint fr all regressin methds. In Chapter 4 we discuss tw f the mst imprtant classical classificatin methds, lgistic regressin and linear discriminant analysis. A central prblem in all statistical learning situatins invlves chsing the best methd fr a given applicatin. Hence, in Chapter 5 we intrduce crss-validatin and the btstrap, which can be used t estimate the accuracy f a number f different methds in rder t chse the best ne. Much f the recent research in statistical learning has cncentrated n nn-linear methds. Hwever, linear methds ften have advantages ver their nn-linear cmpetitrs in terms f interpretability and smetimes als accuracy. Hence, in Chapter 6 we cnsider a hst f linear methds, bth classical and mre mdern, which ffer ptential imprvements ver standard linear regressin. These include stepwise selectin, ridge regressin, principal cmpnents regressin, partial least squares, andthelass. The remaining chapters mve int the wrld f nn-linear statistical learning. We first intrduce in Chapter 7 a number f nn-linear methds that wrk well fr prblems with a single input variable. We then shw hw these methds can be used t fit nn-linear additive mdels fr which there is mre than ne input. In Chapter 8, we investigate tree-based methds, including bagging, bsting, and randm frests. Supprt vectr machines, a set f appraches fr perfrming bth linear and nn-linear classificatin,

27 1. Intrductin 13 are discussed in Chapter 9. Finally, in Chapter 10, we cnsider a setting in which we have input variables but n utput variable. In particular, we present principal cmpnents analysis, K-means clustering, andhierarchical clustering. At the end f each chapter, we present ne r mre R lab sectins in which we systematically wrk thrugh applicatins f the varius methds discussed in that chapter. These labs demnstrate the strengths and weaknesses f the varius appraches, and als prvide a useful reference fr the syntax required t implement the varius methds. The reader may chse t wrk thrugh the labs at his r her wn pace, r the labs may be the fcus f grup sessins as part f a classrm envirnment. Within each R lab, we present the results that we btained when we perfrmed the lab at the time f writing this bk. Hwever, new versins f R are cntinuusly released, and ver time, the packages called in the labs will be updated. Therefre, in the future, it is pssible that the results shwn in the lab sectins may n lnger crrespnd precisely t the results btained by the reader wh perfrms the labs. As necessary, we will pst updates t the labs n the bk website. We use the symbl t dente sectins r exercises that cntain mre challenging cncepts. These can be easily skipped by readers wh d nt wish t delve as deeply int the material, r wh lack the mathematical backgrund. Data Sets Used in Labs and Exercises In this textbk, we illustrate statistical learning methds using applicatins frm marketing, finance, bilgy, and ther areas. The ISLR package available n the bk website cntains a number f data sets that are required in rder t perfrm the labs and exercises assciated with this bk. One ther data set is cntained in the MASS library, and yet anther is part f the base R distributin. Table 1.1 cntains a summary f the data sets required t perfrm the labs and exercises. A cuple f these data sets are als available as text files n the bk website, fr use in Chapter 2. Bk Website Thewebsitefrthisbkislcatedat

28 14 1. Intrductin Name Aut Bstn Caravan Carseats Cllege Default Hitters Khan NCI60 OJ Prtfli Smarket USArrests Wage Weekly Descriptin Gas mileage, hrsepwer, and ther infrmatin fr cars. Husing values and ther infrmatin abut Bstn suburbs. Infrmatin abut individuals ffered caravan insurance. Infrmatin abut car seat sales in 400 stres. Demgraphic characteristics, tuitin, and mre fr USA clleges. Custmer default recrds fr a credit card cmpany. Recrds and salaries fr baseball players. Gene expressin measurements fr fur cancer types. Gene expressin measurements fr 64 cancer cell lines. Sales infrmatin fr Citrus Hill and Minute Maid range juice. Past values f financial assets, fr use in prtfli allcatin. Daily percentage returns fr S&P 500 ver a 5-year perid. Crime statistics per 100,000 residents in 50 states f USA. Incme survey data fr males in central Atlantic regin f USA. 1,089 weekly stck market returns fr 21 years. TABLE 1.1. A list f data sets needed t perfrm the labs and exercises in this textbk. All data sets are available in the ISLR library, with the exceptin f Bstn (part f MASS) and USArrests (part f the base R distributin). It cntains a number f resurces, including the R package assciated with this bk, and sme additinal data sets. Acknwledgements A few f the plts in this bk were taken frm ESL: Figures 6.7, 8.3, and All ther plts are new t this bk.

29 2 Statistical Learning 2.1 What Is Statistical Learning? In rder t mtivate ur study f statistical learning, we begin with a simple example. Suppse that we are statistical cnsultants hired by a client t prvide advice n hw t imprve sales f a particular prduct. The Advertising data set cnsists f the sales f that prduct in 200 different markets, alng with advertising budgets fr the prduct in each f thse markets fr three different media: TV, radi, andnewspaper. The data are displayed in Figure 2.1. It is nt pssible fr ur client t directly increase sales f the prduct. On the ther hand, they can cntrl the advertising expenditure in each f the three media. Therefre, if we determine that there is an assciatin between advertising and sales, then we can instruct ur client t adjust advertising budgets, thereby indirectly increasing sales. In ther wrds, ur gal is t develp an accurate mdel that can be used t predict sales n the basis f the three media budgets. In this setting, the advertising budgets are input variables while sales input is an utput variable. The input variables are typically dented using the symbl X, with a subscript t distinguish them. S X 1 might be the TV budget, X 2 the radi budget, and X 3 the newspaper budget. The inputs g by different names, such as predictrs, independent variables, features, r smetimes just variables. The utput variable in this case, sales is ften called the respnse r dependent variable, and is typically dented using the symbl Y. Thrughut this bk, we will use all f these terms interchangeably. G. James et al., An Intrductin t Statistical Learning: with Applicatins in R, Springer Texts in Statistics, DOI / , Springer Science+Business Media New Yrk variable utput variable predictr independent variable feature variable respnse dependent variable

30 16 2. Statistical Learning Sales Sales Sales TV Radi Newspaper FIGURE 2.1. The Advertising data set. The plt displays sales, in thusands f units, as a functin f TV, radi, and newspaper budgets, in thusands f dllars, fr 200 different markets. In each plt we shw the simple least squares fit f sales t that variable, as described in Chapter 3. In ther wrds, each blue line represents a simple mdel that can be used t predict sales using TV, radi, and newspaper, respectively. Mre generally, suppse that we bserve a quantitative respnse Y and p different predictrs, X 1,X 2,...,X p. We assume that there is sme relatinship between Y and X =(X 1,X 2,...,X p ), which can be written in the very general frm Y = f(x)+ɛ. (2.1) Here f is sme fixed but unknwn functin f X 1,...,X p,andɛ is a randm errr term, which is independent f X and has mean zer. In this frmula- errr term tin, f represents the systematic infrmatin that X prvides abut Y. systematic As anther example, cnsider the left-hand panel f Figure 2.2, a plt f incme versus years f educatin fr 30 individuals in the Incme data set. The plt suggests that ne might be able t predict incme using years f educatin. Hwever, the functin f that cnnects the input variable t the utput variable is in general unknwn. In this situatin ne must estimate f basednthebservedpints.sinceincme is a simulated data set, f is knwn and is shwn by the blue curve in the right-hand panel f Figure 2.2. The vertical lines represent the errr terms ɛ. We nte that sme f the 30 bservatins lie abve the blue curve and sme lie belw it; verall, the errrs have apprximately mean zer. In general, the functin f may invlve mre than ne input variable. In Figure 2.3 we plt incme as a functin f years f educatin and senirity. Here f is a tw-dimensinal surface that must be estimated based n the bserved data.

31 2.1 What Is Statistical Learning? 17 Incme Incme Years f Educatin Years f Educatin FIGURE 2.2. The Incme data set. Left: The red dts are the bserved values f incme (in tens f thusands f dllars) and years f educatin fr 30 individuals. Right: The blue curve represents the true underlying relatinship between incme and years f educatin, which is generally unknwn (but is knwn in this case because the data were simulated). The black lines represent the errr assciated with each bservatin. Nte that sme errrs are psitive (if an bservatin lies abve the blue curve) and sme are negative (if an bservatin lies belw the curve). Overall, these errrs have apprximately mean zer. In essence, statistical learning refers t a set f appraches fr estimating f. In this chapter we utline sme f the key theretical cncepts that arise in estimating f, as well as tls fr evaluating the estimates btained Why Estimate f? There are tw main reasns that we may wish t estimate f: predictin and inference. We discuss each in turn. Predictin In many situatins, a set f inputs X are readily available, but the utput Y cannt be easily btained. In this setting, since the errr term averages t zer, we can predict Y using Ŷ = ˆf(X), (2.2) where ˆf represents ur estimate fr f, andŷ represents the resulting predictin fr Y. In this setting, ˆf is ften treated as a black bx, in the sense that ne is nt typically cncerned with the exact frm f ˆf, prvided that it yields accurate predictins fr Y.

32 Incme Statistical Learning Years f Educatin Senirity FIGURE 2.3. The plt displays incme as a functin f years f educatin and senirity in the Incme data set. The blue surface represents the true underlying relatinship between incme and years f educatin and senirity, which is knwn since the data are simulated. The red dts indicate the bserved values f these quantities fr 30 individuals. As an example, suppse that X 1,...,X p are characteristics f a patient s bld sample that can be easily measured in a lab, and Y is a variable encding the patient s risk fr a severe adverse reactin t a particular drug. It is natural t seek t predict Y using X, since we can then avid giving the drug in questin t patients wh are at high risk f an adverse reactin that is, patients fr whm the estimate f Y is high. The accuracy f Ŷ as a predictin fr Y depends n tw quantities, which we will call the reducible errr and the irreducible errr. In general, reducible ˆf will nt be a perfect estimate fr f, and this inaccuracy will intrduce sme errr. This errr is reducible because we can ptentially imprve the accuracy f ˆf by using the mst apprpriate statistical learning technique t estimate f. Hwever, even if it were pssible t frm a perfect estimate fr f, s that ur estimated respnse tk the frm Ŷ = f(x), ur predictin wuld still have sme errr in it! This is because Y is als a functin f ɛ, which, by definitin, cannt be predicted using X. Therefre, variability assciated with ɛ als affects the accuracy f ur predictins. This is knwn as the irreducible errr, because n matter hw well we estimate f, we cannt reduce the errr intrduced by ɛ. Why is the irreducible errr larger than zer? The quantity ɛ may cntain unmeasured variables that are useful in predicting Y : since we dn t measure them, f cannt use them fr its predictin. The quantity ɛ may als cntain unmeasurable variatin. Fr example, the risk f an adverse reactin might vary fr a given patient n a given day, depending n errr irreducible errr

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

Statistical Learning. 2.1 What Is Statistical Learning?

Statistical Learning. 2.1 What Is Statistical Learning? 2 Statistical Learning 2.1 What Is Statistical Learning? In rder t mtivate ur study f statistical learning, we begin with a simple example. Suppse that we are statistical cnsultants hired by a client t

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Subject description processes

Subject description processes Subject representatin 6.1.2. Subject descriptin prcesses Overview Fur majr prcesses r areas f practice fr representing subjects are classificatin, subject catalging, indexing, and abstracting. The prcesses

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

How do scientists measure trees? What is DBH?

How do scientists measure trees? What is DBH? Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology Technical Bulletin Generatin Intercnnectin Prcedures Revisins t Cluster 4, Phase 1 Study Methdlgy Release Date: Octber 20, 2011 (Finalizatin f the Draft Technical Bulletin released n September 19, 2011)

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science Weathering Title: Chemical and Mechanical Weathering Grade Level: 9-12 Subject/Cntent: Earth and Space Science Summary f Lessn: Students will test hw chemical and mechanical weathering can affect a rck

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

8 th Grade Math: Pre-Algebra

8 th Grade Math: Pre-Algebra Hardin Cunty Middle Schl (2013-2014) 1 8 th Grade Math: Pre-Algebra Curse Descriptin The purpse f this curse is t enhance student understanding, participatin, and real-life applicatin f middle-schl mathematics

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018 Michael Faraday lived in the Lndn area frm 1791 t 1867. He was 29 years ld when Hand Oersted, in 1820, accidentally discvered that electric current creates magnetic field. Thrugh empirical bservatin and

More information

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents WRITING THE REPORT Organizing the reprt Mst reprts shuld be rganized in the fllwing manner. Smetime there is a valid reasn t include extra chapters in within the bdy f the reprt. 1. Title page 2. Executive

More information

The standards are taught in the following sequence.

The standards are taught in the following sequence. B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and

More information

MODULE ONE. This module addresses the foundational concepts and skills that support all of the Elementary Algebra academic standards.

MODULE ONE. This module addresses the foundational concepts and skills that support all of the Elementary Algebra academic standards. Mdule Fundatinal Tpics MODULE ONE This mdule addresses the fundatinal cncepts and skills that supprt all f the Elementary Algebra academic standards. SC Academic Elementary Algebra Indicatrs included in

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method. Lessn Plan Reach: Ask the students if they ever ppped a bag f micrwave ppcrn and nticed hw many kernels were unppped at the bttm f the bag which made yu wnder if ther brands pp better than the ne yu are

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic. Tpic : AC Fundamentals, Sinusidal Wavefrm, and Phasrs Sectins 5. t 5., 6. and 6. f the textbk (Rbbins-Miller) cver the materials required fr this tpic.. Wavefrms in electrical systems are current r vltage

More information

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date AP Statistics Practice Test Unit Three Explring Relatinships Between Variables Name Perid Date True r False: 1. Crrelatin and regressin require explanatry and respnse variables. 1. 2. Every least squares

More information

English 10 Pacing Guide : Quarter 2

English 10 Pacing Guide : Quarter 2 Implementatin Ntes Embedded Standards: Standards nted as embedded n this page are t be cntinuusly spiraled thrughut the quarter. This des nt mean that nging explicit instructin n these standards is t take

More information

Activity Guide Loops and Random Numbers

Activity Guide Loops and Random Numbers Unit 3 Lessn 7 Name(s) Perid Date Activity Guide Lps and Randm Numbers CS Cntent Lps are a relatively straightfrward idea in prgramming - yu want a certain chunk f cde t run repeatedly - but it takes a

More information

Five Whys How To Do It Better

Five Whys How To Do It Better Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is Length L>>a,b,c Phys 232 Lab 4 Ch 17 Electric Ptential Difference Materials: whitebards & pens, cmputers with VPythn, pwer supply & cables, multimeter, crkbard, thumbtacks, individual prbes and jined prbes,

More information

SPH3U1 Lesson 06 Kinematics

SPH3U1 Lesson 06 Kinematics PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.

More information

Mathematics and Computer Sciences Department. o Work Experience, General. o Open Entry/Exit. Distance (Hybrid Online) for online supported courses

Mathematics and Computer Sciences Department. o Work Experience, General. o Open Entry/Exit. Distance (Hybrid Online) for online supported courses SECTION A - Curse Infrmatin 1. Curse ID: 2. Curse Title: 3. Divisin: 4. Department: 5. Subject: 6. Shrt Curse Title: 7. Effective Term:: MATH 70S Integrated Intermediate Algebra Natural Sciences Divisin

More information

A Quick Overview of the. Framework for K 12 Science Education

A Quick Overview of the. Framework for K 12 Science Education A Quick Overview f the NGSS EQuIP MODULE 1 Framewrk fr K 12 Science Educatin Mdule 1: A Quick Overview f the Framewrk fr K 12 Science Educatin This mdule prvides a brief backgrund n the Framewrk fr K-12

More information

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

Emphases in Common Core Standards for Mathematical Content Kindergarten High School Emphases in Cmmn Cre Standards fr Mathematical Cntent Kindergarten High Schl Cntent Emphases by Cluster March 12, 2012 Describes cntent emphases in the standards at the cluster level fr each grade. These

More information

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student

More information

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

Biocomputers. [edit]scientific Background

Biocomputers. [edit]scientific Background Bicmputers Frm Wikipedia, the free encyclpedia Bicmputers use systems f bilgically derived mlecules, such as DNA and prteins, t perfrm cmputatinal calculatins invlving string, retrieving, and prcessing

More information

City of Angels School Independent Study Los Angeles Unified School District

City of Angels School Independent Study Los Angeles Unified School District City f Angels Schl Independent Study Ls Angeles Unified Schl District INSTRUCTIONAL GUIDE Algebra 1B Curse ID #310302 (CCSS Versin- 06/15) This curse is the secnd semester f Algebra 1, fulfills ne half

More information

INSTRUMENTAL VARIABLES

INSTRUMENTAL VARIABLES INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 6. An electrchemical cell is cnstructed with an pen switch, as shwn in the diagram abve. A strip f Sn and a strip f an unknwn metal, X, are used as electrdes.

More information

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y ) (Abut the final) [COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t m a k e s u r e y u a r e r e a d y ) The department writes the final exam s I dn't really knw what's n it and I can't very well

More information

Assessment Primer: Writing Instructional Objectives

Assessment Primer: Writing Instructional Objectives Assessment Primer: Writing Instructinal Objectives (Based n Preparing Instructinal Objectives by Mager 1962 and Preparing Instructinal Objectives: A critical tl in the develpment f effective instructin

More information

BASD HIGH SCHOOL FORMAL LAB REPORT

BASD HIGH SCHOOL FORMAL LAB REPORT BASD HIGH SCHOOL FORMAL LAB REPORT *WARNING: After an explanatin f what t include in each sectin, there is an example f hw the sectin might lk using a sample experiment Keep in mind, the sample lab used

More information

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems. Building t Transfrmatins n Crdinate Axis Grade 5: Gemetry Graph pints n the crdinate plane t slve real-wrld and mathematical prblems. 5.G.1. Use a pair f perpendicular number lines, called axes, t define

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

Land Information New Zealand Topographic Strategy DRAFT (for discussion)

Land Information New Zealand Topographic Strategy DRAFT (for discussion) Land Infrmatin New Zealand Tpgraphic Strategy DRAFT (fr discussin) Natinal Tpgraphic Office Intrductin The Land Infrmatin New Zealand Tpgraphic Strategy will prvide directin fr the cllectin and maintenance

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

A Correlation of. to the. South Carolina Academic Standards for Mathematics Precalculus

A Correlation of. to the. South Carolina Academic Standards for Mathematics Precalculus A Crrelatin f Suth Carlina Academic Standards fr Mathematics Precalculus INTRODUCTION This dcument demnstrates hw Precalculus (Blitzer), 4 th Editin 010, meets the indicatrs f the. Crrelatin page references

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Lab #3: Pendulum Period and Proportionalities

Lab #3: Pendulum Period and Proportionalities Physics 144 Chwdary Hw Things Wrk Spring 2006 Name: Partners Name(s): Intrductin Lab #3: Pendulum Perid and Prprtinalities Smetimes, it is useful t knw the dependence f ne quantity n anther, like hw the

More information

Writing Guidelines. (Updated: November 25, 2009) Forwards

Writing Guidelines. (Updated: November 25, 2009) Forwards Writing Guidelines (Updated: Nvember 25, 2009) Frwards I have fund in my review f the manuscripts frm ur students and research assciates, as well as thse submitted t varius jurnals by thers that the majr

More information

Pipetting 101 Developed by BSU CityLab

Pipetting 101 Developed by BSU CityLab Discver the Micrbes Within: The Wlbachia Prject Pipetting 101 Develped by BSU CityLab Clr Cmparisns Pipetting Exercise #1 STUDENT OBJECTIVES Students will be able t: Chse the crrect size micrpipette fr

More information

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line Chapter 13: The Crrelatin Cefficient and the Regressin Line We begin with a sme useful facts abut straight lines. Recall the x, y crdinate system, as pictured belw. 3 2 1 y = 2.5 y = 0.5x 3 2 1 1 2 3 1

More information

Department: MATHEMATICS

Department: MATHEMATICS Cde: MATH 022 Title: ALGEBRA SKILLS Institute: STEM Department: MATHEMATICS Curse Descriptin: This curse prvides students wh have cmpleted MATH 021 with the necessary skills and cncepts t cntinue the study

More information

We can see from the graph above that the intersection is, i.e., [ ).

We can see from the graph above that the intersection is, i.e., [ ). MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

Department of Electrical Engineering, University of Waterloo. Introduction

Department of Electrical Engineering, University of Waterloo. Introduction Sectin 4: Sequential Circuits Majr Tpics Types f sequential circuits Flip-flps Analysis f clcked sequential circuits Mre and Mealy machines Design f clcked sequential circuits State transitin design methd

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

Basics. Primary School learning about place value is often forgotten and can be reinforced at home.

Basics. Primary School learning about place value is often forgotten and can be reinforced at home. Basics When pupils cme t secndary schl they start a lt f different subjects and have a lt f new interests but it is still imprtant that they practise their basic number wrk which may nt be reinfrced as

More information

Module 4: General Formulation of Electric Circuit Theory

Module 4: General Formulation of Electric Circuit Theory Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated

More information

CHM112 Lab Graphing with Excel Grading Rubric

CHM112 Lab Graphing with Excel Grading Rubric Name CHM112 Lab Graphing with Excel Grading Rubric Criteria Pints pssible Pints earned Graphs crrectly pltted and adhere t all guidelines (including descriptive title, prperly frmatted axes, trendline

More information

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

MATHEMATICS SYLLABUS SECONDARY 5th YEAR Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

Experiment #3. Graphing with Excel

Experiment #3. Graphing with Excel Experiment #3. Graphing with Excel Study the "Graphing with Excel" instructins that have been prvided. Additinal help with learning t use Excel can be fund n several web sites, including http://www.ncsu.edu/labwrite/res/gt/gt-

More information

7 TH GRADE MATH STANDARDS

7 TH GRADE MATH STANDARDS ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

West Deptford Middle School 8th Grade Curriculum Unit 4 Investigate Bivariate Data

West Deptford Middle School 8th Grade Curriculum Unit 4 Investigate Bivariate Data West Deptfrd Middle Schl 8th Grade Curriculum Unit 4 Investigate Bivariate Data Office f Curriculum and Instructin West Deptfrd Middle Schl 675 Grve Rd, Paulsbr, NJ 08066 wdeptfrd.k12.nj.us (856) 848-1200

More information

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA Mental Experiment regarding 1D randm walk Cnsider a cntainer f gas in thermal

More information

Turing Machines. Human-aware Robotics. 2017/10/17 & 19 Chapter 3.2 & 3.3 in Sipser Ø Announcement:

Turing Machines. Human-aware Robotics. 2017/10/17 & 19 Chapter 3.2 & 3.3 in Sipser Ø Announcement: Turing Machines Human-aware Rbtics 2017/10/17 & 19 Chapter 3.2 & 3.3 in Sipser Ø Annuncement: q q q q Slides fr this lecture are here: http://www.public.asu.edu/~yzhan442/teaching/cse355/lectures/tm-ii.pdf

More information

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern 0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy

More information

Sample questions to support inquiry with students:

Sample questions to support inquiry with students: Area f Learning: Mathematics Calculus 12 Big Ideas Elabratins The cncept f a limit is fundatinal t calculus. cncept f a limit: Differentiatin and integratin are defined using limits. Sample questins t

More information