THE LINEAR DISCRIMINATION PROBLEM
|
|
- Priscilla Doyle
- 6 years ago
- Views:
Transcription
1 What exactly is the linear discrimination story? In the logistic regression problem we have 0/ dependent variable, and we set up a model that predict this from independent variables. Specifically we use logit P[ Y i = ] = β 0 + β X i + β 2 X i2 + + β k X ik This has assumed k independent variables. As an alternative, we might try to ask what linear combination of the X s is most useful for distinguishing Y i = from Y i = 0. The story line is roughly similar. In logistic regression, we are fitting the probability P[ Y i = ]. We are estimating the slopes, determining which predictors are useful, and noting the sensitivity of the response to each predictor. We maintain the illusion that we might be able to influence the outcomes through manipulation of the independent variables. In linear discrimination, we do not control the X s. The objective is just being able to predict Y with good probability. The data come to us as sample from two populations. There are n values from the population, in which the distribution of X is normal with mean μ and with variance Σ. There are n 2 values from the 2 population, the distribution of X is normal with mean μ 2 and with variance Σ. The variance matrix is assumed the same. We re thinking of the populations as and 2, rather than 0 and. No big deal. The probability density for population j (j =, 2) is the multivariate normal density, which is f j (x) = ( ) factor with Σ exp{ ( x μ j) Σ ( x μ j) 2 }. We can also impose prior probabilities on the problem π and π 2 = π. These might or might not be related to the sample sizes n and n 2.
2 Let s say you get a value x. What population is it from? P[population data x] = [ x ] P data population P[ data x ] = = [ ] [ x ] P population P data population P[ data x] P[ popn ] P[ data x popn ] [ ] [ x ] + [ ] [ x ] P popn P data popn P popn 2 P data popn 2 In a similar style, get P[population 2 data x ]. You will then be able to get the ratio [ x] [ x] P population data P population 2 data = π P[ data x population ] ( π ) P[ data x population 2 ] The substitution of the multivariate normal density will lead to a condition of the form Classify as population if Classify as population 2 if a x c a x< c The vector a is the linear discriminator. This can go beyond two populations. We have training sets of values from M populations. The data are all vectors of the same form; that is, every vector is K-by- and the meanings of all the coordinates are the same. In a medical investigation on human subjects, the first coordinate might be age, the second might be height, and so on. From population, with mean vector μ, we have n values. From population 2, with mean vector μ 2, we have n 2 values... From population M, with mean vector μ M, we have n M values. It is assumed that the population variance matrices are Σ (all the same). The populations might also be given prior probabilities π, π 2,, π K. (If these are not given, some people use π j = n j n +.) 2
3 Then, given a new random vector X, the task is to identify which population it came from. The solution will find M vectors a, a 2,, a K. We will classify this as population j if a j X is the biggest value. If M = 2, we will classify population if and only if a X > a2 X. This is of course equivalent to ( a a2) X > 0. The two-population discrimination problem is commonly described in terms of a single vector. Let s illustrate the linear discrimination function with the file on the baby weights, LOWBWT.MTP. The columns of this sheet are ID, LOW, AGE, LWT, RACE, SMOKE, PTL, HT, UI, FTV, BWT The variable RACE is categorical with three levels, so we ll use Calc Make Indicator Variables to break into separate indicators. Use Stat Multivariate Discriminant Analysis. The grouping variable will be LOW, which was coded as = Low birth weight and 0 = not low birth weight. If we take the defaults, here is what happens: Discriminant Analysis: LOW versus AGE, LWT,... Linear Method for Response: LOW Predictors: AGE, LWT, SMOKE, PTL, HT, UI, FTV, RACE, RACE3 Group Count Summary of classification True Group Put into Group Total N N correct Proportion N = 89 N Correct = 28 Proportion Correct = Squared Distance Between Groups
4 Linear Discriminant Function for Groups Constant AGE LWT SMOKE PTL HT UI FTV RACE RACE We can start by noting that % are in the group LOW = 0. Thus a naive 89 method, always guess LOW = 0, would be right 68.78% of the time. Observe that the self-classification gets 67.7% correct (which is terrible). You have two discriminant functions here, and the method of classification is to choose the group that gets the higher value. You can check off the cross-validation box. If you do, the classification for data row j is based on the discriminant function obtained from the other n points. In this case, it does slightly worse. You can see from various displays that these two groups are very badly overlapped. Here s one: 45 Boxplot of AGE AGE LOW 4
5 The Options box can help. Set this up as This gets much better results: Discriminant Analysis: LOW versus AGE, LWT,... Linear Method for Response: LOW Predictors: AGE, LWT, SMOKE, PTL, HT, UI, FTV, Race, Race2 Group Count Prior Summary of classification True Group Put into Group Total N N correct 5 2 Proportion N = 89 N Correct = 36 Proportion Correct = Squared Distance Between Groups
6 Linear Discriminant Function for Groups Constant AGE LWT SMOKE PTL HT UI FTV Race Race You could apply logistic regression to the same set of data. Make the predictions based on the p ˆ j values (which Minitab will compute for you). Use the cutoff 0.50 to make the groups. This will make 49 errors out of 89; the probability of correct prediction is %, so it did slightly better! 89 Let s try this on the Easton data set (EASTON.mtp). The variables were these: MONTH, PRICE, SIZE, BEDROOM, AGE, SUBD, AGENCY, Avon, Bellewood, Chelsea Let s see what discriminates (aside from price) those homes sold by agents (AGENCY = ) from those that were sold by the builder. Use Stat Multivariate Discriminant Analysis. The grouping variable will be AGENCY. Use SIZE, BEDROOM, AGE. The results: Discriminant Analysis: AGENCY versus SIZE, BEDROOM, AGE Linear Method for Response: AGENCY Predictors: SIZE, BEDROOM, AGE Group Count Summary of classification True Group Put into Group Total N N correct Proportion N = 58 N Correct = 344 Proportion Correct =
7 Squared Distance Between Groups Linear Discriminant Function for Groups Constant SIZE BEDROOM AGE In this problem, the homes not sold by agents were guessing should get this right at least that often % of the data. Naive If we set the prior probabilities to match this, we have Discriminant Analysis: AGENCY versus SIZE, BEDROOM, AGE Linear Method for Response: AGENCY Predictors: SIZE, BEDROOM, AGE Group Count Prior Summary of classification True Group Put into Group Total N N correct Proportion N = 58 N Correct = 469 Proportion Correct = Squared Distance Between Groups It got this by placing all predictions in group 0. 7
8 It might be interesting to try three groups. Let s see what distinguishes the three subdivisions. Tally for Discrete Variables: SUBD SUBD Count N= 58 We had named these three subdivisions as Avon, Bellewood, and Chelsea. Observe that the proportions are as prior probabilities , , and We can use these Let s make the discrimination on the basis of Price, Size, Bedroom, Age. The values of Bedroom and Age are small integers, while the values of Price and Size are large numbers. Let s begin by standardizing. We can set this up with Calc Standardize. (There should be an equivalent operation within Calc Calculator, but there is not.) Here s the result. Discriminant Analysis: SUBD versus ZPrice, ZSize, ZBedroom, ZAge Linear Method for Response: SUBD Predictors: ZPrice, ZSize, ZBedroom, ZAge Group 2 3 Count Prior Summary of classification True Group Put into Group Total N N correct Proportion N = 58 N Correct = 408 Proportion Correct = Squared Distance Between Groups
9 Linear Discriminant Function for Groups 2 3 Constant ZPrice ZSize ZBedroom ZAge The discrimination is very strong on Price (favoring AVON), very strong on Size (favoring Chelsea). The discrimination on the other variables is borderline. You can see from this summary list: Descriptive Statistics: PRICE, SIZE, BEDROOM, AGE Variable SUBD N Mean StDev PRICE SIZE BEDROOM AGE
10 You can see even better from this graph: Scatterplot of PRICE vs SIZE SUBD PRICE SIZE
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationDescription Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationLogistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ
Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationSTAT Regression Methods
STAT 501 - Regression Methods Unit 9 Examples Example 1: Quake Data Let y t = the annual number of worldwide earthquakes with magnitude greater than 7 on the Richter scale for n = 99 years. Figure 1 gives
More informationHomework Example Chapter 1 Similar to Problem #14
Chapter 1 Similar to Problem #14 Given a sample of n = 129 observations of shower-flow-rate, do this: a.) Construct a stem-and-leaf display of the data. b.) What is a typical, or representative flow rate?
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop
More informationStat 502X Exam 2 Spring 2014
Stat 502X Exam 2 Spring 2014 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This exam consists of 12 parts. I'll score it at 10 points per problem/part
More informationStatistical View of Least Squares
May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationMultiple Linear Regression for the Supervisor Data
for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationLogistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model
Logistic Regression In previous lectures, we have seen how to use linear regression analysis when the outcome/response/dependent variable is measured on a continuous scale. In this lecture, we will assume
More informationNeural networks: further insights into error function, generalized weights and others
Big-data Clinical Trial Column Page 1 of 6 Neural networks: further insights into error function, generalized weights and others Zhongheng Zhang Department of Critical Care Medicine, Jinhua Municipal Central
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationBayesian Classification Methods
Bayesian Classification Methods Suchit Mehrotra North Carolina State University smehrot@ncsu.edu October 24, 2014 Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33 How do you define
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationFinal Exam Bus 320 Spring 2000 Russell
Name Final Exam Bus 320 Spring 2000 Russell Do not turn over this page until you are told to do so. You will have 3 hours minutes to complete this exam. The exam has a total of 100 points and is divided
More informationHomework 2. Convex Optimization /36-725
Homework 2 Convex Optimization 0-725/36-725 Due Monday October 3 at 5:30pm submitted to Christoph Dann in Gates 803 Remember to a submit separate writeup for each problem, with your name at the top) Total:
More informationViolation of OLS assumption- Multicollinearity
Violation of OLS assumption- Multicollinearity What, why and so what? Lars Forsberg Uppsala University, Department of Statistics October 17, 2014 Lars Forsberg (Uppsala University) 1110 - Multi - co -
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationSTA441: Spring Multiple Regression. More than one explanatory variable at the same time
STA441: Spring 2016 Multiple Regression More than one explanatory variable at the same time This slide show is a free open source document. See the last slide for copyright information. One Explanatory
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationTopic 10 - Linear Regression
Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider
More informationAnalysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.
Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a
More informationIntroduction to Logistic Regression
Misclassification 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.0 0.2 0.4 0.6 0.8 1.0 Cutoff Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What skills are important
More informationSMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3
SMAM 319 Exam1 Name 1. Pick the best choice. (10 points-2 each) _c A. A data set consisting of fifteen observations has the five number summary 4 11 12 13 15.5. For this data set it is definitely true
More informationCS 361: Probability & Statistics
January 24, 2018 CS 361: Probability & Statistics Relationships in data Standard coordinates If we have two quantities of interest in a dataset, we might like to plot their histograms and compare the two
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationOverview. 4.1 Tables and Graphs for the Relationship Between Two Variables. 4.2 Introduction to Correlation. 4.3 Introduction to Regression 3.
3.1-1 Overview 4.1 Tables and Graphs for the Relationship Between Two Variables 4.2 Introduction to Correlation 4.3 Introduction to Regression 3.1-2 4.1 Tables and Graphs for the Relationship Between Two
More informationChapter 2: Looking at Data Relationships (Part 3)
Chapter 2: Looking at Data Relationships (Part 3) Dr. Nahid Sultana Chapter 2: Looking at Data Relationships 2.1: Scatterplots 2.2: Correlation 2.3: Least-Squares Regression 2.5: Data Analysis for Two-Way
More informationCh 13 & 14 - Regression Analysis
Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours
Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More information[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex
Lab 1 Key Regression Analysis: wage versus yrsed, ex wage = - 4.78 + 1.46 yrsed +.126 ex Constant -4.78 2.146-2.23.26 yrsed 1.4623.153 9.73. ex.12635.2739 4.61. S = 8.9851 R-Sq = 11.9% R-Sq(adj) = 11.7%
More informationInterdisciplinary Lively Application Project Spread of Disease Activity
Interdisciplinary Lively Application Project Spread of Disease Activity Title: Spread of Disease Activity Authors: Bruce MacMillan Lynn Bennethum University of Colorado at Denver University of Colorado
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More informationLongitudinal Data Analysis of Health Outcomes
Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis Workshop Running Example: Days 2 and 3 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development
More informationAP Final Review II Exploring Data (20% 30%)
AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure
More information, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1
Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationTrain the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?
Train the model with a subset of the data Test the model on the remaining data (the validation set) What data to choose for training vs. test? In a time-series dimension, it is natural to hold out the
More informationYear 10 Mathematics Semester 2 Bivariate Data Chapter 13
Year 10 Mathematics Semester 2 Bivariate Data Chapter 13 Why learn this? Observations of two or more variables are often recorded, for example, the heights and weights of individuals. Studying the data
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationLesson 7: Classification of Solutions
Student Outcomes Students know the conditions for which a linear equation will have a unique solution, no solution, or infinitely many solutions. Lesson Notes Part of the discussion on the second page
More informationSTA102 Class Notes Chapter Logistic Regression
STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationLogit Regression and Quantities of Interest
Logit Regression and Quantities of Interest Stephen Pettigrew March 4, 2015 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 1 / 57 Outline 1 Logistics 2 Generalized Linear Models
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationSIMPLE TWO VARIABLE REGRESSION
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 SIMPLE TWO VARIABLE REGRESSION I. AGENDA: A. Causal inference and non-experimental research B. Least squares principle C. Regression
More informationProblem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56
STAT 391 - Spring Quarter 2017 - Midterm 1 - April 27, 2017 Name: Student ID Number: Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 Directions. Read directions carefully and show all your
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i
More informationBivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More information17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk
94 Jonathan Richard Shewchuk 7 Neural Networks NEURAL NETWORKS Can do both classification & regression. [They tie together several ideas from the course: perceptrons, logistic regression, ensembles of
More information176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56
References Aljandali, A. (2014). Exchange rate forecasting: Regional applications to ASEAN, CACM, MERCOSUR and SADC countries. Unpublished PhD thesis, London Metropolitan University, London. Aljandali,
More informationThis gives us an upper and lower bound that capture our population mean.
Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationRegression Analysis. A statistical procedure used to find relations among a set of variables.
Regression Analysis A statistical procedure used to find relations among a set of variables. Understanding relations Mapping data enables us to examine (describe) where things occur (e.g., areas where
More informationNotes 6: Multivariate regression ECO 231W - Undergraduate Econometrics
Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics Prof. Carolina Caetano 1 Notation and language Recall the notation that we discussed in the previous classes. We call the outcome
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationMULTIPLE REGRESSION METHODS
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MULTIPLE REGRESSION METHODS I. AGENDA: A. Residuals B. Transformations 1. A useful procedure for making transformations C. Reading:
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationThe scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More informationFrom Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...
From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...
More informationConfidence Interval for the mean response
Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.
More informationFrom the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models
The Stata Journal (2002) 2, Number 3, pp. 301 313 From the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models Mario A. Cleves, Ph.D. Department
More informationA simulation study of model fitting to high dimensional data using penalized logistic regression
A simulation study of model fitting to high dimensional data using penalized logistic regression Ellinor Krona Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats
More informationANOVA: Analysis of Variation
ANOVA: Analysis of Variation The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical
More informationCh Inference for Linear Regression
Ch. 12-1 Inference for Linear Regression ACT = 6.71 + 5.17(GPA) For every increase of 1 in GPA, we predict the ACT score to increase by 5.17. population regression line β (true slope) μ y = α + βx mean
More informationAlgebra II Notes Quadratic Functions Unit Applying Quadratic Functions. Math Background
Applying Quadratic Functions Math Background Previously, you Graphed and solved quadratic functions. Solved literal equations for a given variable. Found the inverse for a linear function. Verified by
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationover Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */
CLP 944 Example 4 page 1 Within-Personn Fluctuation in Symptom Severity over Time These data come from a study of weekly fluctuation in psoriasis severity. There was no intervention and no real reason
More informationSTAT Lecture 11: Bayesian Regression
STAT 491 - Lecture 11: Bayesian Regression Generalized Linear Models Generalized linear models (GLMs) are a class of techniques that include linear regression, logistic regression, and Poisson regression.
More informationGeneralized Linear Models for Count, Skewed, and If and How Much Outcomes
Generalized Linear Models for Count, Skewed, and If and How Much Outcomes Today s Class: Review of 3 parts of a generalized model Models for discrete count or continuous skewed outcomes Models for two-part
More informationChapter 4: Probability and Probability Distributions
Chapter 4: Probability and Probability Distributions 4.1 How Probability Can Be Used in Making Inferences 4.1 a. Subjective probability b. Relative frequency c. Classical d. Relative frequency e. Subjective
More information1. Which is an irrational number? B 0.28 C The value of 6 is between which two integers? A 2 and 3. B 5 and 7. C 35 and 37.
1. Which is an irrational number? A 2 8 B 0.28 C 28 2. The value of 6 is between which two integers? A 2 and 3 B 5 and 7 C 35 and 37 Page 1 3. Each edge of the cube shown below is 2 inches long. If each
More informationMULTIPLE LINEAR REGRESSION IN MINITAB
MULTIPLE LINEAR REGRESSION IN MINITAB This document shows a complicated Minitab multiple regression. It includes descriptions of the Minitab commands, and the Minitab output is heavily annotated. Comments
More informationLogistic Regression. via GLM
Logistic Regression via GLM 1 2008 US Election Some analysts say that Obama s data science team basically won him the election For the first time, a team used data and statistical methods to model voter
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationConfidence Intervals for the Odds Ratio in Logistic Regression with One Binary X
Chapter 864 Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X Introduction Logistic regression expresses the relationship between a binary response variable and one or more
More informationSession 4 2:40 3:30. If neither the first nor second differences repeat, we need to try another
Linear Quadratics & Exponentials using Tables We can classify a table of values as belonging to a particular family of functions based on the math operations found on any calculator. First differences
More information(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.
Exam 3 Resource Economics 312 Introductory Econometrics Please complete all questions on this exam. The data in the spreadsheet: Exam 3- Home Prices.xls are to be used for all analyses. These data are
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationFeature Engineering, Model Evaluations
Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationStat 101 Exam 1 Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative
More informationChapter 3 Multiple Regression Complete Example
Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be
More informationVOTE FOR YOUR FAVORITE SODA BRAND!!
VOTE FOR YOUR FAVORITE SODA BRAND!! NUMBER OF VOTES 1000 995 990 985 980 975 970 965 960 955 950 PEPSI COCA-COLA STORE BRAND FAVORITE SODA NUMBER OF VOTES 1000 900 800 700 600 500 400 300 200 100 0 PEPSI
More information