Chapter 3: Other Issues in Multiple regression (Part 1)
|
|
- Suzan Black
- 5 years ago
- Views:
Transcription
1 Chapter 3: Other Issues i Multiple regressio (Part 1) 1 Model (variable) selectio The difficulty with model selectio: for p predictors, there are 2 p differet cadidate models. Whe we have may predictors (with may possible iteractios), it ca be difficult to fid a good model. Model selectio tries to simplify this task. Suppose we have P predictors X 1,..., X P, but the true models oly depeds o a subset of X 1,..., X P. I other words i model Y = β 0 + β 1 X β P X P + ε some of the coefficiets are zeros. We eed to fid those predictors with ozero coefficiets. we call the set of predictors with ozero coefficiets best subset, all the predictors i the best subset importat variables Criteria: Statistical test; some idices of the model; predictability (Distictio betwee predictive ad explaatory research.) Example 1.1 (Surgical Uit example) X 1 : blood clottig score; X 2 : Progostic idex; X 3 : ezyme fuctio test score X 4 : liver fuctio test score; X 5 : age i year; X 6 : idicator of geder (0=mail, 1=f ); X 7,X 8 idicator for alcohol use; Y :survivaltime. If we oly cosider the first 4 predictors, we have the followig calculatio for the 1
2 possible models variables selected p SSE R 2 Ra 2 C p AIC SBC PRESS (BIC) (CV) Noe X X X X X1, X X1, X X1, X X2, X X2, X X3, X X1, X2, X X1, X2, X X1, X3, X X2, X3, X X1, X2, X3, X where p is the umber of coefficiets icluded i the model. 2 R 2 ad R 2 a Criterio 1. R 2 : ca be used for models with the same umber of parameters/coefficiets. 2. R 2 a : ca be used for models with Differet umber of parameters/coefficiets. We eed to choose a model with the biggest R 2 a. 3 Mallows C p Criterio Suppose we select p predictors, p P ad try a model with the selected predictors. deote its SSE by SSE p. The criterio is C p = SSE p MSE(X 1,..., X P ) ( 2p ) where p is the umber of coefficiets icludig itercept (if there is). Criterio: We seek to idetify subsets of X for which (1) the C p values is small ad (2) the C p vale is ear p. 2
3 If a selected model icludes all the importat variables (But with some other uimportat variables), the model is still correct. The we have E{SSE p } =( p )σ 2 O the other had Roughly speakig, we have E{MSE(X 1,..., X P )} = σ 2 C p p ( 2p )=p Questio: are the estimators still ubiased? If a selected model does ot iclude all the importat variables, the model is wrog. The SSE p >> SSE P C p >> p ( 2p )=p Questio: are the estimators still ubiased? 4 Akaike s iformatio criterio (AIC) We caot use SSE aloe for the selectio. As p icreases, SSE p decreases. AIC try to balace the umber of parameters ad SSE p. AIC p =log( SSE p )+2p or AIC p = log( SSE p )+2p 3
4 5 Schwarz Bayesia criterio (BIC or SBC) Theoretically, people fid that AIC does ot give a right umber of variables. Schwarz proposed the BIC or BIC p =log( SSE p )+log()p BIC p = log( SSE p )+log()p BIC gives bigger pealty to the umber of parameters 6 Predictio sum of squares (PRESS) or Cross-validatio criterio (CV) A better model should have better predictio. Most of the time, we dot have a data for us to predict. A simple way is to partitio the data to two parts: traiig samples (set) ad predictio set (or validatio set). Use traiig set to estimate the model ad predictio set to check the predictability. A simple case that each time, the predictio set has oe sample i tur. There are may partitios. Usig all the partitios is the idea of cross-validatio (CV). The idea was proposed by M. Stoe (1974). If we use 1 observatio for validatio ad the other -1 for model estimatio, it is the leave-oe-observatio-out cross-validatio If we use m observatios for validatio ad the other -m for model estimatio, it is the leave-m-observatio-out cross-validatio. We eed to select variables from X 1,..., X p to be icluded i the model. There are may cadidate variables. For example, model 1: model 2: model 3: Y = a 0 + a 1 X 1 + ε Y = b 0 + b 1 X 1 + b 2 X 4 + ε Y = c 0 + c 1 X 2 + ε 4
5 Suppose we have samples. For each i = 1,...,, we use data (Y 1,X 1 ),..., (Y i 1,X i 1 ), (Y i+1,x i+1 ),...(Y,X ), where X i =(X i1,..., X ip ), to estimate the models. the estimated models are, say, model 1: model 2: model 3: Y =â i 0 +âi 1 X i1 Y = ˆb i 0 + ˆb i 1X i1 + ˆb i 2X i4 Y =ĉ i 0 +ĉ i 1X i2 The predictio errors for (Y i,x i ) are respectively model 1: err 1 (i) ={Y i â i 0 â i 1X i,1 } 2 model 2: err 2 (i) ={Y i ˆb i 0 ˆb i 1 X i,1 ˆb i 2 X i,4} 2 model 3: err 3 (i) ={Y i ĉ i 0 ĉi 1 X i,2} 2 The overall predictio errors (also called Cross-validatio value) are respectively the model 1: CV 1 = 1 err 1 (i) i=1 model 2: CV 2 = 1 err 2 (i) i=1 model 3: CV 3 = 1 err 3 (i) i=1 The model with the smallest CV value is the model we prefer. 5
6 Example 6.1 For the same data above (data) Our cadidate models are model 0 model 1 model 2 model 3 model 4 model 5 Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + ε Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + ε Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 5 X 5 + ε Y = β 0 + β 1 X 1 + β 2 X 2 + β 4 X 4 + β 5 X 5 + ε Y = β 0 + β 1 X 1 + β 3 X 3 + β 4 X 4 + β 5 X 5 + ε Y = β 0 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + ε The CV values for the above model are respectivly CV (model 0) = ,CV(model 1) = ,CV(model 2) = , CV (model 3) = ,CV(model 4) = ,CV(model 5) = Thus model 1 is selected (ad variable X 5 is deleted) R-code for the calculatio K-fold cross-validatio I K-fold cross-validatio, the origial sample is partitioed ito K subsamples. Of the K subsamples, a sigle subsample is retaied as the validatio data for testig the model, ad the remaiig K 1 subsamples are used as traiig data. The cross-validatio process is the repeated K times (the folds), with each of the K subsamples used exactly oce as the validatio data. The K results from the folds the ca be averaged (or otherwise combied) to produce a sigle estimatio. The advatage of this method over repeated radom sub-samplig is that all observatios are used for both traiig ad validatio, ad each observatio is used for validatio exactly oce. 10-fold cross-validatio is commoly used. 7 Searchig for the best subset Forward selectio: startig with o variables i the model, tryig out the variables oe by oe ad icludig them if they are statistically sigificat or ca icrease the predictability. 6
7 Backward elimiatio: startig with all cadidate variables ad testig them oe by oe for statistical sigificace, deletig ay that are ot sigificat or ca icrease the predictability. Stepwise: a combiatio of the above, testig at each stage for variables to be icluded or excluded. 8 R code step(object, directio = c("both", "backward", "forward"), steps = 1000, k =??) where k ca be ay positive values, but k =2forAIC,adk =log() forbic(sbc) Example 8.1 For the first example above with data, the selected model variables are Based o BIC: X1 + X2 + X3 + X5 + X6 + X8 or Based o BIC: X1 + X2 + X3 + X8 (code) 7
Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationLecture 24: Variable selection in linear models
Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet
More informationII. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation
II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio
More informationResponse Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable
Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that
More informationCircle the single best answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 6, 2017 Name: Please read the followig directios. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directios This exam is closed book ad closed otes. There are 32 multiple choice questios.
More information(all terms are scalars).the minimization is clearer in sum notation:
7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.
ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic
More information11 Correlation and Regression
11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationLinear Regression Models
Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect
More informationA Question. Output Analysis. Example. What Are We Doing Wrong? Result from throwing a die. Let X be the random variable
A Questio Output Aalysis Let X be the radom variable Result from throwig a die 5.. Questio: What is E (X? Would you throw just oce ad take the result as your aswer? Itroductio to Simulatio WS/ - L 7 /
More informationFirst, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,
0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationThere is no straightforward approach for choosing the warmup period l.
B. Maddah INDE 504 Discrete-Evet Simulatio Output Aalysis () Statistical Aalysis for Steady-State Parameters I a otermiatig simulatio, the iterest is i estimatig the log ru steady state measures of performace.
More information6.003 Homework #3 Solutions
6.00 Homework # Solutios Problems. Complex umbers a. Evaluate the real ad imagiary parts of j j. π/ Real part = Imagiary part = 0 e Euler s formula says that j = e jπ/, so jπ/ j π/ j j = e = e. Thus the
More informationn but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean.
Sectio. True or False Questios (2 pts each). For a populatio the meas is defied as i= μ = i but for a small sample of the populatio, the mea is defied as: = i= i 2. For a logormal distributio, the media
More informationECON 3150/4150, Spring term Lecture 3
Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio
More informationSTP 226 EXAMPLE EXAM #1
STP 226 EXAMPLE EXAM #1 Istructor: Hoor Statemet: I have either give or received iformatio regardig this exam, ad I will ot do so util all exams have bee graded ad retured. PRINTED NAME: Siged Date: DIRECTIONS:
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationChapter 13, Part A Analysis of Variance and Experimental Design
Slides Prepared by JOHN S. LOUCKS St. Edward s Uiversity Slide 1 Chapter 13, Part A Aalysis of Variace ad Eperimetal Desig Itroductio to Aalysis of Variace Aalysis of Variace: Testig for the Equality of
More informationLecture 11 Simple Linear Regression
Lecture 11 Simple Liear Regressio Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech Midterm 2 mea: 91.2 media: 93.75 std: 6.5 2 Meddicorp
More informationRead through these prior to coming to the test and follow them when you take your test.
Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationt distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference
EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The
More informationPolynomial Functions and Their Graphs
Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationStudy the bias (due to the nite dimensional approximation) and variance of the estimators
2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite
More informationPSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9
Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationP1 Chapter 8 :: Binomial Expansion
P Chapter 8 :: Biomial Expasio jfrost@tiffi.kigsto.sch.uk www.drfrostmaths.com @DrFrostMaths Last modified: 6 th August 7 Use of DrFrostMaths for practice Register for free at: www.drfrostmaths.com/homework
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS
PART of UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF () The sample media is greater tha the sample mea whe there is. (B) () A radom variable is ormally distributed
More informationChapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation
Chapter Output Aalysis for a Sigle Model Baks, Carso, Nelso & Nicol Discrete-Evet System Simulatio Error Estimatio If {,, } are ot statistically idepedet, the S / is a biased estimator of the true variace.
More informationAssessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions
Assessmet ad Modelig of Forests FR 48 Sprig Assigmet Solutios. The first part of the questio asked that you calculate the average, stadard deviatio, coefficiet of variatio, ad 9% cofidece iterval of the
More informationStatistical Properties of OLS estimators
1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationRevision Topic 1: Number and algebra
Revisio Topic : Number ad algebra Chapter : Number Differet types of umbers You eed to kow that there are differet types of umbers ad recogise which group a particular umber belogs to: Type of umber Symbol
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationMathematical Statistics - MS
Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios
More informationApply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.
Eigevalue-Eigevector Istructor: Nam Su Wag eigemcd Ay vector i real Euclidea space of dimesio ca be uiquely epressed as a liear combiatio of liearly idepedet vectors (ie, basis) g j, j,,, α g α g α g α
More informationSTATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:
Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal
More informationMultiple regression is arguably the single most important method in all of statistics.
Multiple Regressio: How Much Is Your Car Worth? 3 Essetially, all models are wrog; some are useful. George E. P. Box 1 Multiple regressio is arguably the sigle most importat method i all of statistics.
More informationCHAPTER 10 INFINITE SEQUENCES AND SERIES
CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationIf, for instance, we were required to test whether the population mean μ could be equal to a certain value μ
STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationFinal Examination Solutions 17/6/2010
The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:
More informationEE260: Digital Design, Spring n Binary Addition. n Complement forms. n Subtraction. n Multiplication. n Inputs: A 0, B 0. n Boolean equations:
EE260: Digital Desig, Sprig 2018 EE 260: Itroductio to Digital Desig Arithmetic Biary Additio Complemet forms Subtractio Multiplicatio Overview Yao Zheg Departmet of Electrical Egieerig Uiversity of Hawaiʻi
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationRegression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.
Regressio, Part I I. Differece from correlatio. II. Basic idea: A) Correlatio describes the relatioship betwee two variables, where either is idepedet or a predictor. - I correlatio, it would be irrelevat
More informationChapter 7. Support Vector Machine
Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)
More information7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals
7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses
More informationIt should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.
Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig
More informationREGRESSION (Physics 1210 Notes, Partial Modified Appendix A)
REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationARIMA Models. Dan Saunders. y t = φy t 1 + ɛ t
ARIMA Models Da Sauders I will discuss models with a depedet variable y t, a potetially edogeous error term ɛ t, ad a exogeous error term η t, each with a subscript t deotig time. With just these three
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationStat 139 Homework 7 Solutions, Fall 2015
Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationMath 116 Practice for Exam 3
Math 6 Practice for Eam 3 Geerated April 4, 26 Name: SOLUTIONS Istructor: Sectio Number:. This eam has questios. Note that the problems are ot of equal difficulty, so you may wat to skip over ad retur
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationPaired Data and Linear Correlation
Paired Data ad Liear Correlatio Example. A group of calculus studets has take two quizzes. These are their scores: Studet st Quiz Score ( data) d Quiz Score ( data) 7 5 5 0 3 0 3 4 0 5 5 5 5 6 0 8 7 0
More informationDS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10
DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set
More informationChapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).
Chapters 5 ad 13: REGREION AND CORRELATION (ectios 5.5 ad 13.5 are omitted) Uivariate data: x, Bivariate data (x,y). Example: x: umber of years studets studied paish y: score o a proficiecy test For each
More informationMachine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring
Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor
More informationData Description. Measure of Central Tendency. Data Description. Chapter x i
Data Descriptio Describe Distributio with Numbers Example: Birth weights (i lb) of 5 babies bor from two groups of wome uder differet care programs. Group : 7, 6, 8, 7, 7 Group : 3, 4, 8, 9, Chapter 3
More informationLesson 11: Simple Linear Regression
Lesso 11: Simple Liear Regressio Ka-fu WONG December 2, 2004 I previous lessos, we have covered maily about the estimatio of populatio mea (or expected value) ad its iferece. Sometimes we are iterested
More informationSimple Linear Regression
Chapter 2 Simple Liear Regressio 2.1 Simple liear model The simple liear regressio model shows how oe kow depedet variable is determied by a sigle explaatory variable (regressor). Is is writte as: Y i
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationUNIT 11 MULTIPLE LINEAR REGRESSION
UNIT MULTIPLE LINEAR REGRESSION Structure. Itroductio release relies Obectives. Multiple Liear Regressio Model.3 Estimatio of Model Parameters Use of Matrix Notatio Properties of Least Squares Estimates.4
More informationChapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers
Chapter 4 4-1 orth Seattle Commuity College BUS10 Busiess Statistics Chapter 4 Descriptive Statistics Summary Defiitios Cetral tedecy: The extet to which the data values group aroud a cetral value. Variatio:
More information(ii) Two-permutations of {a, b, c}. Answer. (B) P (3, 3) = 3! (C) 3! = 6, and there are 6 items in (A). ... Answer.
SOLUTIONS Homewor 5 Due /6/19 Exercise. (a Cosider the set {a, b, c}. For each of the followig, (A list the objects described, (B give a formula that tells you how may you should have listed, ad (C verify
More informationTAMS24: Notations and Formulas
TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =
More informationStat 342 Homework Fall 2014
Stat 34 Homework Fall 014 Assigmet 1 Due 9/5/14 Sectio 1.1 of the Course Outlie 1. Cosider a probability model for the radom pair yx, with joit desity 1 y exp for 0 x 1 ad y 0 f y, x x x 0 otherwise a)
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationEE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course
Sigal-EE Postal Correspodece Course 1 SAMPLE STUDY MATERIAL Electrical Egieerig EE / EEE Postal Correspodece Course GATE, IES & PSUs Sigal System Sigal-EE Postal Correspodece Course CONTENTS 1. SIGNAL
More informationLecture 10: Performance Evaluation of ML Methods
CSE57A Machie Learig Sprig 208 Lecture 0: Performace Evaluatio of ML Methods Istructor: Mario Neuma Readig: fcml: 5.4 (Performace); esl: 7.0 (Cross-Validatio); optioal book: Evaluatio Learig Algorithms
More informationENGI 4421 Confidence Intervals (Two Samples) Page 12-01
ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More information, then cv V. Differential Equations Elements of Lineaer Algebra Name: Consider the differential equation. and y2 cos( kx)
Cosider the differetial equatio y '' k y 0 has particular solutios y1 si( kx) ad y cos( kx) I geeral, ay liear combiatio of y1 ad y, cy 1 1 cy where c1, c is also a solutio to the equatio above The reaso
More informationCurve Sketching Handout #5 Topic Interpretation Rational Functions
Curve Sketchig Hadout #5 Topic Iterpretatio Ratioal Fuctios A ratioal fuctio is a fuctio f that is a quotiet of two polyomials. I other words, p ( ) ( ) f is a ratioal fuctio if p ( ) ad q ( ) are polyomials
More informationCorrelation Regression
Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother
More informationQuestion 1: Exercise 8.2
Questio 1: Exercise 8. (a) Accordig to the regressio results i colum (1), the house price is expected to icrease by 1% ( 100% 0.0004 500 ) with a additioal 500 square feet ad other factors held costat.
More informationLeast-Squares Regression
MATH 482 Least-Squares Regressio Dr. Neal, WKU As well as fidig the correlatio of paired sample data {{ x 1, y 1 }, { x 2, y 2 },..., { x, y }}, we also ca plot the data with a scatterplot ad fid the least
More informationMA238 Assignment 4 Solutions (part a)
(i) Sigle sample tests. Questio. MA38 Assigmet 4 Solutios (part a) (a) (b) (c) H 0 : = 50 sq. ft H A : < 50 sq. ft H 0 : = 3 mpg H A : > 3 mpg H 0 : = 5 mm H A : 5mm Questio. (i) What are the ull ad alterative
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationHYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018
HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018 We are resposible for 2 types of hypothesis tests that produce ifereces about the ukow populatio mea, µ, each of which has 3 possible
More informationOpen book and notes. 120 minutes. Cover page and six pages of exam. No calculators.
IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits
More information