Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification
|
|
- Ann Singleton
- 5 years ago
- Views:
Transcription
1 Outler Detecton n Logstc egresson: A Quest for elable Knowledge from Predctve Modelng and Classfcaton Abdul Nurunnab, Geoff West Department of Spatal Scences, Curtn Unversty, Perth, Australa CC for Spatal Informaton CCSI abdul.nurunnab@postgrad.curtn.edu.au g.west@curtn.edu.au
2 Objectves Identfcaton of multple nfluental observatons n logstc regresson Classfcaton of outlers on a graphcal plot Investgatng mportance of outler treatment for relable knowledge dscovery
3 Outler what, when and how? An outler s an observaton that devates so much from the other observatons as to arouse suspcons that t was generated by a dfferent mechansm. Hawkns 1980 Causes of outlers: Outlers occur very frequently n real data, and often go unnotced because much data s processed by computers wthout careful nspecton and screenng. They may appear because of human error such as keypunch errors, mechancal faults such as transmsson or recodng errors, changes n system behavor, exceptonal events natural dsasters such as earthquakes and floods, nstrument error, or smply through natural devatons n populatons. Outlers effects: The presence of outlers n a dataset may cause the parameter estmaton to be erroneous, msclassfyng the outcomes and consequently creatng problems when makng nferences wth the wrong model. Draws unrelable conclusons and decsons.
4 Outlers and elablty Issues The typcal steps consttutng the KDD process elablty ssue and outler nteract wth 5 questons [Fayyad et al. 1996; Da et al. 2012] :. What are the major factors that can make the dscovery process unrelable?. How can we make sure that the dscovered knowledge are relable?. Under what condtons can a relable dscovery be assured? v. What technques are there that can mprove the relablty of dscovered knowledge? v. When can we trust that the dscovered knowledge s relable and reflects the real data?
5 Logstc egresson Logstc regresson s useful for stuaton n whch we want to predct the response presence or absence of a characterstc or outcome based on values of a set of predctor varables. It can be classfed nto three types based on categorcal response varables: bnary, ordnary and nomnal. A bnary response has two categores wth no natural order for example, success-falure or yes-no. An ordnal response has three or more categores wth a natural orderng e.g. none, mld, and severe. A nomnal response has three or more categores wth no natural orderng for example, blue, black, red, yellow; or sunny, rany, and cloudy. In ths presentaton, we wll cover only the bnary logstc regresson.
6 Logstc egresson and Outler The customary model for L s: g π 1 π X = ln = where π = e 0 + e E Y X = π X The log of [π./1-π.] can be defned as a lnear functon called logt log odds of X logodds = β0 + β1 X1 + + β p X = β + β X β + β X + + β X p p p + + β X p, 0 π 1 The L model can be re-wrtten as: Y = π + ε where Y s a vector of bnary 0, 1 response and ε s the error term: 1 π ε = π wth wth probablty probablty π; 1 π; f f p y = 1, Xβ y = 0. Outlers, lnear and logstc S-curve models.
7 Types of Outlers Typcally outlers n regresson can be categorzed nto three classes: outlers, hgh leverage ponts and nfluental observatons. Devaton/change n X explanatory space, called leverage ponts Devaton n Y response varable not n X, called vertcal outlers Devaton n both X-Y spaces. Influental observatons are defned as ponts, whch ether Indvdually or together wth several other observatons, have a demonstrably larger mpact on the calculated values of varous estmates coeffcents, standard errors, t-values etc.. Belsley et al In logstc regresson, outlers and nfluental observatons may occur as msclassfcaton between the bnary 0, 1 responses. It may occur by meanngful devaton we also see low leverage n explanatory varables. Nurunnab et al. 2010
8 Outler Detecton: Sngle-case Deleton Approach The th resdual can be defned n L as: εˆ = y πˆ The projecton leverage matrx s a dagonal matrx that gves the ftted values of the response varable as the projecton onto the covarate space. It s defned as: H = V 1 / 2 X X TVX 1 X TV The th dagonal element of H defned as: ˆ ˆ 1/ 2 T T 1 h = π 1 π x X VX The standardzed Pearson resdual for L s defned as: y πˆ rs = v 1 h DFFITS n L defned as: DFFITS x where V s a dagonal matrx wth dagonal elements v = πˆ 1 πˆ h > ck/n c = 2 or 3 values are generally dentfed as hgh leverage ponts. k=p+1. r s 3 are generally dentfed as outlers yˆ yˆ = DFFITS > 3 k/n are dentfed as v h nfluental cases
9 Modfcaton for Group Deleton Approach The sngle-case deleton measures are naturally affected by the maskng and swampng phenomena and fal to detect outlers n the presence of multple outlers and/or nfluental cases. Maskng occurs when an outlyng subset goes undetected because of the presence of another, usually adjacent, subset. Swampng occurs when good observatons are ncorrectly dentfed as outlers because of the presence of another, usually remote subset of observatons. The group deleton approach forms a clean subset of the data that s presumably free of outlers, and then test the outlyngness of the remanng ponts relatve to the clean subset.
10 Outler Detecton: Group Deleton Approach = = = D D D V V V Y Y Y X X X 0 0,, ˆ exp 1 ˆ exp ˆ T T β x β x π + = ˆ 1 ˆ, ˆ ˆ π π v π y ε = = T T x X V X x π π h 1 ˆ 1 ˆ = + =, 1 ˆ, 1 ˆ * D for h v π y for h v π y r s + =. 1, 1 * D for h h for h h h are generally dentfed as outlers r * s 3 are dentfed as hgh leverage ponts 3 * * * h MAD h medan h + > Generalzed Standardzed Pearson esdual GSP Generalzed Weght GW These methods fnd a suspect group D of d outlyng/unusual cases wth the help of graphcal methods, robust technques such as LMS, LS and/or approprate dagnostcs measures. The data n explanatory varables X, response varable Y and the varancecovarance matrx V can be separated deleton group D and the clean set as:
11 Identfcaton of Multple Influental Observatons Mahalanobs Dstance MD = Z Z T Σ 1 Z Z where Z s an m varate wth mean Z and covarance matrx Σ. The proposed Influence Dstance ID: ID T G G Σ G G = 1 where G = [ r* * s h ] s the generalzed resdual-leverage matrx and and Σ are the mean and covarance matrx based on the group excludng the observatons are dentfed as outlers by GSP. G
12 Proposed Method Algorthm 1. * Calculate and usng the group deleton approach Construct the matrx Calculate and based on the group after the v v v deleton of outlyng cases Calculate Fnd nfluental observatons for whch ID 2 > χ 2,0.975 = To sketch the classfcaton plot : a draw an scatter plot r* s versus r* s h* b draw cut-off lnes at ±3 and * * 3 r* * h + MAD h s medan h Σ G ID G = [ * r * s h ] T G G Σ G G = 1 through the and h axes respectvely c draw an nfluence ellpse based on the ID values and the Ch-square cut-off value. Classfcaton plot.
13 Modfed Brown Data L.N.I. A.P. L.N.I. A.P. L.N.I. A.P. L.N.I. A.P Experment 1. Modfed Brown Data r s 3.00 h Dagnostc esults for Modfed Brown Data DFFITS r s * 3.00 h * ID r s 3.00 h DFFITS r s * 3.00 h * ID
14 Experment 1. Modfed Brown Data Modfed Brown data a scatter plot; L.N.I. versus A.P. b ndex plot of standardzed Pearson resdual c ndex plot of leverage values d ndex plot of DFFITS e ndex plot of GSP f ndex plot of GW g ndex plot of ID h classfcaton plot
15 Experment 2. Modfed Fnney Data Modfed Fnney Data Y Vol. ate Y Vol. ate Y Vol. ate Dagnostc esults for Modfed Fnney Data r s h DFFITS r s * h * ID r s h DFFITS r s * h * ID
16 Experment 2. Modfed Fnney Data Modfed Fnney data a character plot; rate versus volume wth the response values 1, 0 b ndex plot of standardzed Pearson resdual c ndex plot of leverage values d ndex plot of DFFITS e ndex plot of GSP f ndex plot of GW g ndex plot of logid h classfcaton plot.
17 elablty Checkng: Models Parameters Estmaton and Test Modfed Brown Data L model ft and sgnfcance test esults for all observatons esults wthout outlers Parameter estmaton Parameter estmaton Predctor Coef. S. E. Z P Odds 95% Conf. Int. Odds 95% Conf. Int. Coef. S. E. Z P ato Lower Upper ato Lower Upper Constant A.P Test Test Test that all slopes are zero: G = 0.183, df = 1, P-Value = Test that all slopes are zero: G = 7.31, df = 1, P-Value = Goodness -of-ft test Goodness -of-ft test Χ 2 df p χ 2 Df p Pearson Pearson Devance Devance Hosmer-Lemeshow H-L Hosmer-Lemeshow H-L Model Summery Model Summery Log-Lkelhood LL -2 LL Cox & Snell 2 Nagelkerke 2 Log-Lkelhood LL -2 LL Cox & Snell 2 Nagelkerke Predcted probabltes versus A.P.
18 Performance Evaluaton: Classfcaton Classfcaton results wth outlers and wthout outlers Predcted status Absence 0 Presence 1 Total Correct classf. All observatons Actual Status Absence Presence Total % % 100% % 0% 0% % % 55 Wthout outlers Actual status Absence Presence Total % 18.18% 63.64% % 23.64% 36.36% % 41.82% % 69.09% Mosac plot a classfcaton wth outlers b classfcaton wthout outlers
19 Performance Evaluaton: Predctve Ablty OC Curves results Area S. E. Sg. Asymptotc 95% Conf. Int. p Lower Bound Upper Bound All observatons Wthout outlers OC curve a data wth outlers b data wthout outlers
20 Conclusons Ths paper proposes a dagnostc measure for dentfyng multple nfluental observatons n logstc regresson. It ntroduces a classfcaton graph to classfy outlers, hgh leverage ponts and nfluental observatons n the same plot at one tme. Dagnostc results show that the proposed measure effcently dentfes multple nfluental cases, and the graph s helpful for vsualzng outler categores. esults show that wthout careful outler nvestgaton, t may not be possble to get relable knowledge usng logstc regresson for predctve modelng and classfcaton.
21 Conclusons The outler nvestgaton n logstc regresson s hghly related to the ssues rased for relable knowledge dscovery. Outler detecton s one of the major factors that affect the relablty of the dscovery process. The condtons for relable knowledge dscovery can be mproved by parameter estmaton and testng the sgnfcance of the estmates. Proper outler dagnostcs and treatment deleton or correcton of the outlyng observatons can mprove the relablty of dscovered knowledge. v We can trust the dscovered knowledge s relable and reflects the real data f the test results meet the requred statstcal sgnfcance level. Therefore outler detecton and proper treatment s vtal for obtanng relable knowledge, and should be consdered as a data preprocessng step n knowledge dscovery n databases KDD. The proposed dagnostc method s ntroduced for the bnomal response varable n logstc regresson. Future research wll nvestgate the dagnostc method for multnomal response varables, and large and hgh dmensonal data as hgher dmensonal data presents extra problems that need to be addressed.
22 Queston?
Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers
Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationComparison of Regression Lines
STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence
More informationLecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding
Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study
More informationOn the detection of influential outliers in linear regression analysis
Amercan Journal of Theoretcal and Appled Statstcs 04; 3(4): 00-06 Publshed onlne July 30, 04 (http://www.scencepublshnggroup.com/j/ajtas) do: 0.648/j.ajtas.040304.4 ISSN: 36-8999 (Prnt); ISSN: 36-9006
More informationLecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding
Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an
More informationIntroduction to Generalized Linear Models
INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 Introducton to Generalzed Lnear Models I. Motvaton In ths lecture we extend the deas of lnear regresson to the more general dea of a generalzed lnear model
More informationDiagnostics in Poisson Regression. Models - Residual Analysis
Dagnostcs n Posson Regresson Models - Resdual Analyss 1 Outlne Dagnostcs n Posson Regresson Models - Resdual Analyss Example 3: Recall of Stressful Events contnued 2 Resdual Analyss Resduals represent
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationLecture 6: Introduction to Linear Regression
Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More informationBasically, if you have a dummy dependent variable you will be estimating a probability.
ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy
More informationPolynomial Regression Models
LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance
More informationSee Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)
Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes
More informationChapter 14: Logit and Probit Models for Categorical Response Variables
Chapter 4: Logt and Probt Models for Categorcal Response Varables Sect 4. Models for Dchotomous Data We wll dscuss only ths secton of Chap 4, whch s manly about Logstc Regresson, a specal case of the famly
More informationIntroduction to Regression
Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes
More informationReminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1
Lecture 9: Interactons, Quadratc terms and Splnes An Manchakul amancha@jhsph.edu 3 Aprl 7 Remnder: Nested models Parent model contans one set of varables Extended model adds one or more new varables to
More informationDepartment of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6
Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationChapter 5 Multilevel Models
Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours
UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x
More informationStatistics for Economics & Business
Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable
More informationChap 10: Diagnostics, p384
Chap 10: Dagnostcs, p384 Multcollnearty 10.5 p406 Defnton Multcollnearty exsts when two or more ndependent varables used n regresson are moderately or hghly correlated. - when multcollnearty exsts, regresson
More information[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.
PPOL 59-3 Problem Set Exercses n Smple Regresson Due n class /8/7 In ths problem set, you are asked to compute varous statstcs by hand to gve you a better sense of the mechancs of the Pearson correlaton
More informationChapter 15 Student Lecture Notes 15-1
Chapter 15 Student Lecture Notes 15-1 Basc Busness Statstcs (9 th Edton) Chapter 15 Multple Regresson Model Buldng 004 Prentce-Hall, Inc. Chap 15-1 Chapter Topcs The Quadratc Regresson Model Usng Transformatons
More informationInfluence Diagnostics on Competing Risks Using Cox s Model with Censored Data. Jalan Gombak, 53100, Kuala Lumpur, Malaysia.
Proceedngs of the 8th WSEAS Internatonal Conference on APPLIED MAHEMAICS, enerfe, Span, December 16-18, 5 (pp14-138) Influence Dagnostcs on Competng Rsks Usng Cox s Model wth Censored Data F. A. M. Elfak
More informationSimulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests
Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More information28. SIMPLE LINEAR REGRESSION III
8. SIMPLE LINEAR REGRESSION III Ftted Values and Resduals US Domestc Beers: Calores vs. % Alcohol To each observed x, there corresponds a y-value on the ftted lne, y ˆ = βˆ + βˆ x. The are called ftted
More informationSTAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression
STAT 45 BIOSTATISTICS (Fall 26) Handout 5 Introducton to Logstc Regresson Ths handout covers materal found n Secton 3.7 of your text. You may also want to revew regresson technques n Chapter. In ths handout,
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationChapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the
More informationChapter 8 Indicator Variables
Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n
More informationLinear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the
Chapter 11 Student Lecture Notes 11-1 Lnear regresson Wenl lu Dept. Health statstcs School of publc health Tanjn medcal unversty 1 Regresson Models 1. Answer What Is the Relatonshp Between the Varables?.
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed
More informationChapter 14 Simple Linear Regression
Chapter 4 Smple Lnear Regresson Chapter 4 - Smple Lnear Regresson Manageral decsons often are based on the relatonshp between two or more varables. Regresson analss can be used to develop an equaton showng
More information18. SIMPLE LINEAR REGRESSION III
8. SIMPLE LINEAR REGRESSION III US Domestc Beers: Calores vs. % Alcohol Ftted Values and Resduals To each observed x, there corresponds a y-value on the ftted lne, y ˆ ˆ = α + x. The are called ftted values.
More informationOutline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil
Outlne Multvarate Parametrc Methods Steven J Zel Old Domnon Unv. Fall 2010 1 Multvarate Data 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate
More informationChapter 15 - Multiple Regression
Chapter - Multple Regresson Chapter - Multple Regresson Multple Regresson Model The equaton that descrbes how the dependent varable y s related to the ndependent varables x, x,... x p and an error term
More informationx i1 =1 for all i (the constant ).
Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by
More informationa. (All your answers should be in the letter!
Econ 301 Blkent Unversty Taskn Econometrcs Department of Economcs Md Term Exam I November 8, 015 Name For each hypothess testng n the exam complete the followng steps: Indcate the test statstc, ts crtcal
More informationStatistics for Business and Economics
Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear
More informationThe Ordinary Least Squares (OLS) Estimator
The Ordnary Least Squares (OLS) Estmator 1 Regresson Analyss Regresson Analyss: a statstcal technque for nvestgatng and modelng the relatonshp between varables. Applcatons: Engneerng, the physcal and chemcal
More informationSOME METHODS OF DETECTION OF OUTLIERS IN LINEAR REGRESSION MODEL
SOME METHODS OF DETECTION OF OUTLIERS IN LINEAR REGRESSION MODEL RANJIT KUMAR PAUL M. Sc. (Agrcultural Statstcs), Roll No. 4405 IASRI, Lbrary Avenue, New Delh-11001 Charperson: Dr. L. M. Bhar Abstract:
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationChapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2
Chapter 4 Smple Lnear Regresson Page. Introducton to regresson analyss 4- The Regresson Equaton. Lnear Functons 4-4 3. Estmaton and nterpretaton of model parameters 4-6 4. Inference on the model parameters
More informationOn Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function
On Outler Robust Small Area Mean Estmate Based on Predcton of Emprcal Dstrbuton Functon Payam Mokhtaran Natonal Insttute of Appled Statstcs Research Australa Unversty of Wollongong Small Area Estmaton
More informationUnit 10: Simple Linear Regression and Correlation
Unt 10: Smple Lnear Regresson and Correlaton Statstcs 571: Statstcal Methods Ramón V. León 6/28/2004 Unt 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regresson analyss s a method for studyng the
More informationSTAT 3008 Applied Regression Analysis
STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,
More informationStatistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models
Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 14 Multple Regresson Models 1999 Prentce-Hall, Inc. Chap. 14-1 Chapter Topcs The Multple Regresson Model Contrbuton of Indvdual Independent Varables
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased
More informationLearning Objectives for Chapter 11
Chapter : Lnear Regresson and Correlaton Methods Hldebrand, Ott and Gray Basc Statstcal Ideas for Managers Second Edton Learnng Objectves for Chapter Usng the scatterplot n regresson analyss Usng the method
More informationANOVA. The Observations y ij
ANOVA Stands for ANalyss Of VArance But t s a test of dfferences n means The dea: The Observatons y j Treatment group = 1 = 2 = k y 11 y 21 y k,1 y 12 y 22 y k,2 y 1, n1 y 2, n2 y k, nk means: m 1 m 2
More informationY = β 0 + β 1 X 1 + β 2 X β k X k + ε
Chapter 3 Secton 3.1 Model Assumptons: Multple Regresson Model Predcton Equaton Std. Devaton of Error Correlaton Matrx Smple Lnear Regresson: 1.) Lnearty.) Constant Varance 3.) Independent Errors 4.) Normalty
More informationStatistics MINITAB - Lab 2
Statstcs 20080 MINITAB - Lab 2 1. Smple Lnear Regresson In smple lnear regresson we attempt to model a lnear relatonshp between two varables wth a straght lne and make statstcal nferences concernng that
More information( )( ) [ ] [ ] ( ) 1 = [ ] = ( ) 1. H = X X X X is called the hat matrix ( it puts the hats on the Y s) and is of order n n H = X X X X.
( ) ( ) where ( ) 1 ˆ β = X X X X β + ε = β + Aε A = X X 1 X [ ] E ˆ β β AE ε β so ˆ = + = β s unbased ( )( ) [ ] ˆ Cov β = E ˆ β β ˆ β β = E Aεε A AE ε ε A Aσ IA = σ AA = σ X X = [ ] = ( ) 1 Ftted values
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationDesigning a Pseudo R-Squared Goodness-of-Fit Measure in Generalized Linear Models
Desgnng a Pseudo R-Squared Goodness-of-Ft Measure n Generalzed Lnear Models H. I. Mbachu Dept. of Mathematcs/Statstcs, Unversty of Port Harcourt, Port Harcourt E. C. Nduka Dept. of Mathematcs/Statstcs,
More informatione i is a random error
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown
More informationSome basic statistics and curve fitting techniques
Some basc statstcs and curve fttng technques Statstcs s the dscplne concerned wth the study of varablty, wth the study of uncertanty, and wth the study of decsonmakng n the face of uncertanty (Lndsay et
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationMethods of Detecting Outliers in A Regression Analysis Model.
Methods of Detectng Outlers n A Regresson Analyss Model. Ogu, A. I. *, Inyama, S. C+, Achugamonu, P. C++ *Department of Statstcs, Imo State Unversty,Owerr +Department of Mathematcs, Federal Unversty of
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More information4.3 Poisson Regression
of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationComparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method
Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationFREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,
FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then
More informationCathy Walker March 5, 2010
Cathy Walker March 5, 010 Part : Problem Set 1. What s the level of measurement for the followng varables? a) SAT scores b) Number of tests or quzzes n statstcal course c) Acres of land devoted to corn
More informationSemiparametric geographically weighted generalised linear modelling in GWR 4.0
Semparametrc geographcally weghted generalsed lnear modellng n GWR 4.0 T. Nakaya 1, A. S. Fotherngham 2, M. Charlton 2, C. Brunsdon 3 1 Department of Geography, Rtsumekan Unversty, 56-1 Tojn-kta-mach,
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More information4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle
Lecture 4: Fttng dstrbutons: goodness of ft Goodness of ft Testng goodness of ft Testng normalty An mportant note on testng normalty! L4.1 Goodness of ft measures the extent to whch some emprcal dstrbuton
More informationJournal of Modern Applied Statistical Methods
Journal of Modern Appled Statstcal Methods Volume 17 Issue 1 Artcle 17 6-29-2018 Robust Heteroscedastcty Consstent Covarance Matrx Estmator based on Robust Mahalanobs Dstance and Dagnostc Robust Generalzed
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationsince [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation
Econ 388 R. Butler 204 revsons Lecture 4 Dummy Dependent Varables I. Lnear Probablty Model: the Regresson model wth a dummy varables as the dependent varable assumpton, mplcaton regular multple regresson
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More information/ n ) are compared. The logic is: if the two
STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence
More informationRESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
Operatons Research and Applcatons : An Internatonal Journal (ORAJ), Vol.4, No.3/4, November 17 RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEAED MEASUREMEN DAA Munsr Al, Yu Feng, Al choo, Zamr
More informationChapter 3. Two-Variable Regression Model: The Problem of Estimation
Chapter 3. Two-Varable Regresson Model: The Problem of Estmaton Ordnary Least Squares Method (OLS) Recall that, PRF: Y = β 1 + β X + u Thus, snce PRF s not drectly observable, t s estmated by SRF; that
More informationChapter 3 Describing Data Using Numerical Measures
Chapter 3 Student Lecture Notes 3-1 Chapter 3 Descrbng Data Usng Numercal Measures Fall 2006 Fundamentals of Busness Statstcs 1 Chapter Goals To establsh the usefulness of summary measures of data. The
More informationT E C O L O T E R E S E A R C H, I N C.
T E C O L O T E R E S E A R C H, I N C. B rdg n g En g neern g a nd Econo mcs S nce 1973 THE MINIMUM-UNBIASED-PERCENTAGE ERROR (MUPE) METHOD IN CER DEVELOPMENT Thrd Jont Annual ISPA/SCEA Internatonal Conference
More informationBasic Business Statistics, 10/e
Chapter 13 13-1 Basc Busness Statstcs 11 th Edton Chapter 13 Smple Lnear Regresson Basc Busness Statstcs, 11e 009 Prentce-Hall, Inc. Chap 13-1 Learnng Objectves In ths chapter, you learn: How to use regresson
More informationStatistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation
Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear
More informationPrimer on High-Order Moment Estimators
Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc
More informationChapter 12 Analysis of Covariance
Chapter Analyss of Covarance Any scentfc experment s performed to know somethng that s unknown about a group of treatments and to test certan hypothess about the correspondng treatment effect When varablty
More informationSupport Vector Machines
/14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x
More information3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. CDS Mphil Econometrics Vijayamohan. 3-Mar-14. CDS M Phil Econometrics.
Dummy varable Models an Plla N Dummy X-varables Dummy Y-varables Dummy X-varables Dummy X-varables Dummy varable: varable assumng values 0 and to ndcate some attrbutes To classfy data nto mutually exclusve
More informationRobust Logistic Ridge Regression Estimator in the Presence of High Leverage Multicollinear Observations
Mathematcal and Computatonal Methods n Scence and Engneerng Robust Logstc Rdge Regresson Estmator n the Presence of Hgh Leverage Multcollnear Observatons SYAIBA BALQISH ARIFFIN 1 AND HABSHAH MIDI 1, Faculty
More informationProfessor Chris Murray. Midterm Exam
Econ 7 Econometrcs Sprng 4 Professor Chrs Murray McElhnney D cjmurray@uh.edu Mdterm Exam Wrte your answers on one sde of the blank whte paper that I have gven you.. Do not wrte your answers on ths exam.
More informationOn the Influential Points in the Functional Circular Relationship Models
On the Influental Ponts n the Functonal Crcular Relatonshp Models Department of Mathematcs, Faculty of Scence Al-Azhar Unversty-Gaza, Gaza, Palestne alzad33@yahoo.com Abstract If the nterest s to calbrate
More informationMETHOD OF NETWORK RELIABILITY ANALYSIS BASED ON ACCURACY CHARACTERISTICS
METHOD OF NETWOK ELIABILITY ANALYI BAED ON ACCUACY CHAACTEITIC ławomr Łapńsk hd tudent Faculty of Geodesy and Cartography Warsaw Unversty of Technology ABTACT Measurements of structures must be precse
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More information