Multiple Linear Regression Using Rank-Based Test of Asymptotic Free Distribution
|
|
- Oscar Owen
- 6 years ago
- Views:
Transcription
1 Multiple Linear Regression Using Rank-Based Test of Asymptotic Free Distribution Kuntoro Department of Biostatistics and Population Study, Airlangga University School of Public Health, Surabaya 60115, Indonesia ( Abstract An experimental design is a classical approach for proving causal relationship. Sometime a study in the field of public health including maternal child health study is difficult to control experimental conditions properly beside an ethical reason for doing an experiment. A multiple regression approach that involves a dependent variable and a number of independent variables in its model could be an alternative solution for proving causal relationship in a non experimental study. In maternal child health study that involves variables in ordinal scales such knowledge, attitude and practice, an ordinary regression model is not the best choice for analyzing those variables. A rank-based test of asymptotic free distribution is the better alternative solution than that one. The Jaeckel - Hettmansperger- McKean, HM is used to demonstrate the effect of knowledge about safe water and attitude upon drinking unboiled water on practice of drinking unboiled water. The data obtained from sample of mothers having under five yrears children in 14 districts in East Java Province, Indonesia. The results show that Hodges - Lehmann estimate of tau is The Jaeckal distribution measure is The HM statistic for testing the null hypothesis, Beta1 = Beta2 = 0 is Under null hypothesis, HM statistic has a sampling distribution that approximates to Chi Square distribution. Since the result is less than critical point of 5.99 (degree of freedom = 2 and level of significance of 0.05), the alternative hypothesis fails to be rejected. That means there are no effect of konwledge and attitude on practice. It is concluded that the procedure is quite simple compared to ordinary regression procedure, no assumption is made. It is easy to use. It is recommended to use HM statistic in analyizing data obtainded from public health study as well as social study. Keywords: non experimental study ordinal scale HM statistic 1 Introduction Over years the procedure of multiple linear regression analysis have been used for analyzing the data collected from survey research (Fowler, Jr., 1984) in which a researcher is willing to demonstrate the causal relationship between the independent variables and the dependent variable. Unlike an experimental research, a survey research has weakness in demontrasting causal relationship because it can not hold internal validity. It is believed that an internal validity is conditio sine qua non for demonstrating that relationship. A researcher who implements survey research can not overcome the factors that affect internal validity such as history, maturation, instrumentation, experimental mortality, testing and regression artifact as well as time ordering of events (Campbell, 1966; Nachmias, 1987) In a situation in which the experimental conditions can not be obtained, a researcher tends to use a regression model for demonstrating causal relationship. It could be a linear or a nonlinear regression model, a simple or a multiple regression model. A researcher considers that the model to some extent can be used to connect between X variable as
2 276 Collection of Presented Papers ICMA-MU 2007 independent variable and Y variable as dependent variable. A regression model as a statistical tool looks like an experimental model as a research methodology tool in which they connect between the independent variable and the dependent variable (Joreskorg and Sorgom, 1988). Today many researchers from the areas of social sciences and economics as well as the behavioral sciences implement the regression model to demonstrate causal relationship in the nonexperimental conditions. They use the quantitative approach for collecting the data. Most data have an ordinal scale such as motivation, attitude, knowledge, practice, performance. Hence, one of the classical assumptions of ordinary regression model related to the scale of the data is violated. Researchers who are not statisticians argue that a statistical method is just a tool for support their findings no matter it violates or it does not violates the assumptions. They considers that a statistical tool is not an objective of the research process. A statistician should explain to them that the results of the research are valuable optimally when they are analyzed by mean of an appropriate statistical method. Over years statisticians have developed statistical methods that are expected to support the researchers in analyzing their data properly. This paper discusses the application of regression model when the data do not have an interval or a ratio scale. The first section discusses basic concept of nonparametric multiple linear regression. The second one implements that statistical method in the data collected from health research. 2 Basic Concept The basic concept to be discussed includes the data to be used, the asumption of the multiple regression model, the hypothesis to be formulated, the procedure for computing the statistic, and in the case where ties exist. 2.1 Data Suppose, x = x 1 x 2... x p is a row vector of p independent variables, and x 1 = (x 11, x 21,..., x p1 ),..., x n = (x 1n, x 2n,..., x pn ) are n fixed values of this vector. From each vector x 1, x 2,..., x n the value of the single response random dependent variable Y is observed. Hence, a set of observations Y 1, Y 2,..., Y n is obtained, in which Y i is the value of the dependent variable when x = x i. 2.2 Assumptions First of all, the following equation represents the multiple regression model: Y i = ξ + β 1 x 1i + β 2 x 2i β p x pi + ɛ i = ξ + x iβ (1) where i = 1, 2,..., n; x 1 = (x 11, x 21,..., x p1 ),..., x n = (x 1n, x 2n,..., x pn ) are known constant vectors; is the unknown intercept parameter, and β = [β 1 β 2... β p ] is a row vector of unknown parameters that is usually referred to as the set of regression coefficients. To make simple understanding, equation 1 can be written in matrix notation. Suppose Y = [Y 1 Y 2... Y n ] and ξ = [ξ ξ... ξ] and set X = x 11 x 21 x p1 x 12 x 22 x p2 x 1,n 1 x 2,n 1 x p,n 1 x 1n x 2n x pn Moreover, equation 1 can be expressed in matrix notation as follows. (2) Y = ξ + Xβ (3)
3 Collection of Presented Papers ICMA-MU Secondly, the error random variables ɛ 1, ɛ 2,..., ɛ n are a random sample from a continuous distribution which is symmetric about its Median 0. It has cumulative distribution function F ( ) and has probability density function f( ) that satisfies the mild condition that + f 2 (t)dt <. 2.3 Hypothesis In this regression model, it is emphasized to test the null hypothesis that a specific subset β q of the regression parameters β are equal to zero. Without loss of generality (because the ordering of (x 1, β 1 ), (x 2, β 2 ),..., (x p, β p ) pairs in the equation 1 is arbitrary), this subset β q is taken to be the first q components of β, that is, β q = [β 1 β 2... β q ] is taken. Hence, the hypothesis to be tested is H 0 : [ β q = 0; β p q = (β q+1 β q+2 ;... β p ) and ξ not specified ] (4) The statement mentioned above tells that the null hypothesis accepts that the independent variables x 1, x 2,..., x q do not have the significant roles in determining the value of the dependent variable Y. (In many setting, the interest is to assess the effect of all the independent variables simultaneously, which is appropriate to taking q = p in the null hypothesis (4). 2.4 Procedure In order to compute the Jaeckel - Hettmansperger - McKean, test statistic HM, it is processed in several steps clearly. The first step is to obtain an unrestricted estimator for the vector of regression parameters. Suppose R i (β) is the rank of Y i x i β among Y 1 x 1β, Y 2 x 2β,..., Y n x nβ as a function of β, for = 1, 2,..., n. The unrestricted estimator for β is appropriate to a special case of a class of estimator proposed by Jaeckel (1972). Hence, the estimator of the value of β, say, ˆβ minimizes the measure of dispersion: D J (Y Xβ) = (12) 1 2 (n + 1) 1 n [R i (β) 1 2 (n + 1)](Y i x iβ) (5) In general, the estimator ˆβ does not have an expression of closed-form and methods of iterative computer is generally needed to obtain numerical solution. It can be accomplished by using command of RREG in MINITAB program to obtain that value. The second step is to involve repeating the steps in order to obtain ˆβ. Except that minimization of the measure of dispersion Jaeckel D J (Y Xβ) is obtained under the condition that the null hypothesis is true, say, β q = 0, with β p q unspecified. Suppose ˆβ 0 represents the value of β which minimizes D J (Y Xβ) in equation (5) under the null constraint that β q = 0. Once again, ˆβ0 will not be available in an expression of closed-form. It will be used command of RREG in MINITAB program to obtain its value. Suppose D J (Y X ˆβ) and D J (Y X ˆβ 0 ) respectively represent the overall minimum and the minimum under the null constraint that β q = 0 of the measure of dispersion of Jaeckel D J (Y Xβ). Furthermore, it is set that: i=1 D J = D J (Y X ˆβ 0 ) D J (Y X ˆβ) (6) where DJ is the reduction in dispersion of Jaeckel from fitting the full model as opposed to the reduced model which is appropriate to the null hypothesis (4) constraint that β q = 0. The third step is to compute a consistent estimator of the parameter: τ = [12] 1 2 [ + f 2 (t)dt] 1 (7) Once again, by using command of RREG in MINITAB program, this consistent estimator, say, ˆτ of τ can be obtained.
4 278 Collection of Presented Papers ICMA-MU 2007 By combining the results of the three steps, the Jaeckel - Hettmansperger - McKean test statistic HM is expressed by equation as follows: HM = 2D J (8) ˆτ If the null hypothesis (4) is true, and n tends to be infinite, HM statistic has an asymptotic chi square distribution (χ 2 ) with q degree of freedom which is appropriate to the q constraints placed on β under the null hypothesis. To test the null hypothesis, H 0 : [ β q = 0; β p q = (β q+1 β q+2 ;... β p ) and ξ not specified ] against the alternative hypothesis, H 0 : [ β q = 0; β p q (β q+1 β q+2 ;... β p ) and ξ not specified ] by selecting the level of significance of α, Reject the null hypothesis if Accept the null hypothesis if HM χ 2 q,α HM < χ 2 q,α where χ 2 q,α is the upper α percentile point of chi square distribution with the q degree of freedom. The value of χ 2 q,α can be obtained from the statistical table which is available in the text-books of statististics. Hettmansperger and McKean (1977) and McKean and Sheather (1991) remind that in application using small to moderate sample size, the chi square distribution is often too light-tailed. They suggest to replace the percentile of chi square χ 2 q,α by: (9) qf q,n p 1;α where F q,n p 1;α is the upper α percentile of the F distribution with q numerator degree of freedom and n - p - 1 denominator degree of freedom. TIES : when the ties exist among Y 1 x 1β, Y 2 x 2β,..., Y n x nβ, use the rank average to break the ties in computing the minimum of D J (Y Xβ). Similarly when the ties exist among Y 1 x 1β 0, Y 2 x 2β 0,..., Y n x nβ 0, use the rank average to break the ties in computing the minimum of D J (Y Xβ 0 ).
5 Collection of Presented Papers ICMA-MU Material And Method 3.1 Material To show the computation of the Jaeckel - Hettmansperger - McKean, test statistic HM, the secondary data collected by Kuntoro (2001) are used in this paper. The data were collected from 2804 students of the elementary schools who lived in 14 districts in East Java Province, Indonesia. The variables of knowledge about safe water, attitude upon drinking unboiled water, and practice of drinking unboiled water are selected. The level of knowledge about safe water is scored 2 for good knowledge and scored 1 for bad knowledge. The level of attitude upon drinking unboiled water is scored 5 for strongly disagree, scored 4 for diagree, scored 3 for doubtful, scored 2 for agree, and scored 1 for strongly agree. The level of practice of drinking uboiled water is scored 3 for never, scored 2 for ever, scored 1 for always. The unit of analysis is district. For each unit of analysis, the selection of score of variable based on the highest percentage of level of variable. For example, district of Ponorogo, the highest percentage of level of knowledge is bad. Then the score for knowledge is 1, The highest percentage of level of attitude is strongly disagree. Then the score for attitude is 5. The highest percentage of level of practice is never. Then the score for practice is 3.
6 280 Collection of Presented Papers ICMA-MU 2007 The following table shows the highest percentage of level of knowledge, attitude, and practice and their scores. Table 1. The Highest Percentage of Level of Knowledge, Attitude, and Practice Knowledge About Safe Water Attitude Upon Drinking Unboiled Water Practice of Drinking Unboiled Water District % Level and Score % Level and Score % Kategori/Skor Ponorogo 79.9 Bad 69.4 Strongly Disagree 64.9 Never Blitar 80.0 Good 50.0 Strongly Disagree 64.5 Never Kediri 81.3 Bad 46.1 Disagree 53.5 Never Malang 65.6 Good 42.5 Disagree 59.0 Never Lumajang 61.7 Good 48.2 Disagree 54.8 Ever Jember 74.0 Good 53.4 Strongly Disagree 52.5 Never Bondowoso 69.1 Good 49.7 Strongly Disagree 51.7 Ever Probolinggo 65.3 Good 73.7 Disagree 50.7 Never Mojokerto 74.1 Good 55.7 Disagree 58.6 Never Bojonegoro 0.2 Good 48.6 Disagree 63.6 Never Tuban 52.0 Bad 51.0 Disagree 63.3 Ever Lamongan 64.0 Good 49.6 Strongly Disagree 58.8 Ever Sampang 50.9 Bad 45.7 Disagree 49.1 Ever Sumenep 68.6 Bad 37.9 Agree 42.4 Ever 1 2 2
7 Collection of Presented Papers ICMA-MU Method By applying Secondary Data Analysis Method (Nachmias, 1987) The scores of three variables are analyzed by mean of MINITAB program in order to compute HM statistic. First of all : Enter the scores of variables of knowledge (knowl), attitude(attit), and practice (pract) to the spreadsheet of MINITAB as follows. Row Knowl Attit Pract Second : Create matrices of M1, M2, and M3 that state the null hypothesis 1, the null hypothesis 2, and the null hypothesis 3 respectively. The null hypothesis 1: H 01 [β 1 = β 2 = 0; ξ unspecified] MTB > READ C4-C5 DATA> 1 0 DATA> 0 1 DATA> END 2 rows read. MTB > COPY C4-C5 M1 MTB > PRINT M1 Data Display Matrix M MTB > Then M1 = [ ] The null hypothesis 2: H 02 [β 1 = 0; ξ unspecified] MTB > READ C6-C7 DATA> 1 0 DATA> END 1 rows read. MTB > COPY C6-C7 M2 MTB > PRINT M2 Data Display Matrix M2 1 0 MTB > Then M2 = [ 1 0 ] The null hypothesis 2: H 03 [β 2 = 0; ξ unspecified] MTB > READ C8-C9 DATA> 0 1 DATA> END 1 rows read. MTB > COPY C8-C9 M3 MTB > PRINT M3 Data Display
8 282 Collection of Presented Papers ICMA-MU 2007 Matrix M3 0 1 MTB > Then M3 = [ 0 1 ] Third: Operate the command of Rank Regression (RREG) to obtain the value that can be used to compute measure of dispersion Jaeckel,HM statistic and to obtain the equation of rank regression. To test the null hypothesis 1: SUBC> HYPOTHESIS M1. To test the null hypothesis 2: SUBC> HYPOTHESIS M2. To test the null hypothesis 3: SUBC> HYPOTHESIS M3. 4 Result And Discussion 4.1 To test the null hypothesis : β 1 = β 2 = 0 The statement of the null hypothesis is the independent variable of knowledge about safe water and the independent variable of attitude upon drinking unboiled water do not affect the dependent variable of practice of drinking unboiled water. SUBC> HYPOTHESIS M1. This is the print out of MINITAB : The regression equation is Pract = Attit Knowl Coefficient Coefficient Predictor Rank Least-sq Rank Least-sq Constant Attit Knowl Hodges-Lehmann estimate of tau = Least-squares S = %2 ANOVA for hypothesis matrix M1 Dispersion Reduced model Full model DF F Denom DF Approx F Rank Least-sq Unusual observations Observation Attit Pract Pseudo Fit SE Fit Residual X X denotes an observation whose X value gives it large influence. Moreover, compute measure of dispersion Jaeckel as follows. D J = D J (Y X ˆβ 0 ) D J (Y X ˆβ) = 5, , = 0, q = degreesoffreedom = 2, ˆτ = 0, 5329 HM = 2D J/ˆτ = 2 0, /0, 5329 = 0, Furthermore, the result is compared to the critical point in Chi Square table. When we choose level of significance of α = 0,05 with 2 degree of freedom, the critical point is 5,99. Since HM statistic < 5,99 then the null hypothesis that states β 1 = β 2 = 0 is to be accepted. Hence, it can be concluded that knowledge about safe water and attitude upon drinking unboiled water simultaneously do not affect practice of drinking unboiled water.
9 Collection of Presented Papers ICMA-MU To test the null hypothesis : β 1 = 0 The statement of the null hypothesis is the independent variable of knowledge about safe water does not affect practice of drinking unboiled water. SUBC> HYPOTHESIS M2. This is the "print out " of MINITAB : Pract = Knowl Attit Coefficient Coefficient Predictor Rank Least-sq Rank Least-sq Constant Knowl Attit Hodges-Lehmann estimate of tau = Least-squares S = %2 ANOVA for hypothesis matrix M2 Dispersion Reduced model Full model DF F Denom DF Approx F Rank Least-sq Unusual observations Observation Knowl Pract Pseudo Fit SE Fit Residual X X denotes an observation whose X value gives it large influence. MTB > 4.3 To test the null hypothesis : β 2 = 0 The statement of the null hypothesis is the independent variable of attitude upon drinking unboiled water does not affect practice of drinking unboiled water. SUBC> HYPOTHESIS M3. This is the print out of MINITAB : The regression equation is PRAKT = Knowl Attit Coefficient Coefficient Predictor Rank Least-sq Rank Least-sq Constant Knowl Attit Hodges-Lehmann estimate of tau = Least-squares S = %2 ANOVA for hypothesis matrix M3 Dispersion Reduced model Full model DF F Denom DF Approx F Rank Least-sq Unusual observations Observation Knowl Pract Pseudo Fit SE Fit Residual X X denotes an observation whose X value gives it large influence. MTB > To test the null hypotheses 2 and 3, the results of computing measure of dispersion for both full model and reduced model seems to be similar. The results of ˆτ and HM statistic also seem to be similar. They give the same conclusion: The independent variable of knowledge
10 284 Collection of Presented Papers ICMA-MU 2007 about safe water does not affect the dependent variable of practice of drinking unboiled water, and also the independent variable of attitude upon drinking unboiled water does not affect the dependent variable of practice of drinking unboiled water. Like parametric multiple regression model, rank regression model also requires the assumption that there is no collinearity among independent variables. MINITAB will drop the independent variable which is highly correlated with other independent variable and there is no hypothesis to be tested. Before doing RR command in MINITAB, collinearity among independent variables can be detected by computing correlation coefficient for ordinal scale such as Spearman rank correlation coefficient. 5 Conclusion And Recommendation It is concluded that knowledge about safe water and attitude upon drinking unboiled water simultaneously do not affect practice of drinking unboiled water. Each independent variable does not affect practice of drinking water. The procedure is quite simple compared to ordinary regression procedure. The assumption made is no collinearity among independent variables. It is easy to use. It is recommended to use HM statistic in analyzing the data having ordinal scale obtained from public health study as well as social study. References [1] Campbell, D.T., and Stanley, J.C. (1966). Experimental and Quasi-Experimental Designs for Research. Rand McNally College Publishing Company. Chicago. [2] Fowler, Jr., F.J. (1984). Survey Research Methods. Sage Publications.Beverly Hills. [3] Hollander, M., and Wolfe, D.A. (1999). Nonparametric Statistical Methods. John Wiley & Sons, Inc.New York. [4] Jöreskog, K.G., and Sörgbom, D. (1988). LISREL 7 A Guide to the Program and Applications 2nd Edit. SPSS, Inc.Chicago. [5] Kuntoro, Sulisyorini, L., Mahmudah, Soenarnatalina, Puspitasari, N., Indawati, R., Qomaruddin, M.B. and Wibowo, A. (2001). Baseline Survey About Knowledge, Practice of Hygiene and Sanitation in East Java. Cooperation Between Airlangga University and Regional Development Planning Board of East Java Province. Surabaya. [6] Nachmias, D, and C.Nachmias Research Methods in the Social Sciences. New York. St. Martin s Press.
Types of Statistical Tests DR. MIKE MARRAPODI
Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationIntroduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.
Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of
More information1. Least squares with more than one predictor
Statistics 1 Lecture ( November ) c David Pollard Page 1 Read M&M Chapter (skip part on logistic regression, pages 730 731). Read M&M pages 1, for ANOVA tables. Multiple regression. 1. Least squares with
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationDISTRIBUTIONS USED IN STATISTICAL WORK
DISTRIBUTIONS USED IN STATISTICAL WORK In one of the classic introductory statistics books used in Education and Psychology (Glass and Stanley, 1970, Prentice-Hall) there was an excellent chapter on different
More informationResidual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,
Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, is Y ijk = µ ij + ɛ ijk = µ + α i + β j + γ ij + ɛ ijk with i = 1,..., I, j = 1,..., J, k = 1,..., K. In carrying
More informationCorrelation. A statistics method to measure the relationship between two variables. Three characteristics
Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction
More informationGROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION
FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89
More informationContinuous Probability Distributions
Continuous Probability Distributions Called a Probability density function. The probability is interpreted as "area under the curve." 1) The random variable takes on an infinite # of values within a given
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationUNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION
UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION Structure 4.0 Introduction 4.1 Objectives 4. Rank-Order s 4..1 Rank-order data 4.. Assumptions Underlying Pearson s r are Not Satisfied 4.3 Spearman
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationNon-parametric (Distribution-free) approaches p188 CN
Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationINTERVAL ESTIMATION AND HYPOTHESES TESTING
INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,
More informationArea1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)
Institutional Assessment Report Texas Southern University College of Pharmacy and Health Sciences "An Analysis of 2013 NAPLEX, P4-Comp. Exams and P3 courses The following analysis illustrates relationships
More informationMultiple group models for ordinal variables
Multiple group models for ordinal variables 1. Introduction In practice, many multivariate data sets consist of observations of ordinal variables rather than continuous variables. Most statistical methods
More informationpsychological statistics
psychological statistics B Sc. Counselling Psychology 011 Admission onwards III SEMESTER COMPLEMENTARY COURSE UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA,
More informationCh. 16: Correlation and Regression
Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to
More informationHypothesis Testing for Var-Cov Components
Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output
More informationassumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )
Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship
More informationAMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015
AMS7: WEEK 7. CLASS 1 More on Hypothesis Testing Monday May 11th, 2015 Testing a Claim about a Standard Deviation or a Variance We want to test claims about or 2 Example: Newborn babies from mothers taking
More informationSEVERAL μs AND MEDIANS: MORE ISSUES. Business Statistics
SEVERAL μs AND MEDIANS: MORE ISSUES Business Statistics CONTENTS Post-hoc analysis ANOVA for 2 groups The equal variances assumption The Kruskal-Wallis test Old exam question Further study POST-HOC ANALYSIS
More informationOne-Way Analysis of Variance (ANOVA)
1 One-Way Analysis of Variance (ANOVA) One-Way Analysis of Variance (ANOVA) is a method for comparing the means of a populations. This kind of problem arises in two different settings 1. When a independent
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationInferential statistics
Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationDraft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM
1 REGRESSION AND CORRELATION As we learned in Chapter 9 ( Bivariate Tables ), the differential access to the Internet is real and persistent. Celeste Campos-Castillo s (015) research confirmed the impact
More informationSTAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing
STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =
More informationFinding Relationships Among Variables
Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis
More informationComputational rank-based statistics
Article type: Advanced Review Computational rank-based statistics Joseph W. McKean, joseph.mckean@wmich.edu Western Michigan University Jeff T. Terpstra, jeff.terpstra@ndsu.edu North Dakota State University
More informationNominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers
Nominal Data Greg C Elvers 1 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics A parametric statistic is a statistic that makes certain
More informationMultiple linear regression S6
Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple
More informationChapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity
More informationECON 5350 Class Notes Functional Form and Structural Change
ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this
More informationAnalysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.
Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a
More informationSPSS Guide For MMI 409
SPSS Guide For MMI 409 by John Wong March 2012 Preface Hopefully, this document can provide some guidance to MMI 409 students on how to use SPSS to solve many of the problems covered in the D Agostino
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationTest 3 Practice Test A. NOTE: Ignore Q10 (not covered)
Test 3 Practice Test A NOTE: Ignore Q10 (not covered) MA 180/418 Midterm Test 3, Version A Fall 2010 Student Name (PRINT):............................................. Student Signature:...................................................
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationPart 1.) We know that the probability of any specific x only given p ij = p i p j is just multinomial(n, p) where p k1 k 2
Problem.) I will break this into two parts: () Proving w (m) = p( x (m) X i = x i, X j = x j, p ij = p i p j ). In other words, the probability of a specific table in T x given the row and column counts
More informationDisadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means
Stat 529 (Winter 2011) Analysis of Variance (ANOVA) Reading: Sections 5.1 5.3. Introduction and notation Birthweight example Disadvantages of using many pooled t procedures The analysis of variance procedure
More informationTwo-Sample Inferential Statistics
The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is
More informationInference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58
Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister
More informationLecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks
More informationESP 178 Applied Research Methods. 2/23: Quantitative Analysis
ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and
More informationLecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks
More informationThe simple linear regression model discussed in Chapter 13 was written as
1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple
More informationRobust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching
The work of Kosten and McKean was partially supported by NIAAA Grant 1R21AA017906-01A1 Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching Bradley E. Huitema Western
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationHYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă
HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and
More informationMultiple Regression Examples
Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +
More informationThis document contains 3 sets of practice problems.
P RACTICE PROBLEMS This document contains 3 sets of practice problems. Correlation: 3 problems Regression: 4 problems ANOVA: 8 problems You should print a copy of these practice problems and bring them
More informationCRP 272 Introduction To Regression Analysis
CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous
More informationHYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC
1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare
More informationBasic Business Statistics, 10/e
Chapter 1 1-1 Basic Business Statistics 11 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Basic Business Statistics, 11e 009 Prentice-Hall, Inc. Chap 1-1 Learning Objectives In this chapter,
More informationMultiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:
Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship
More informationCorrelation and simple linear regression S5
Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and
More informationLecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000
Lecture 14 Analysis of Variance * Correlation and Regression Outline Analysis of Variance (ANOVA) 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination
More informationLecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)
Outline Lecture 14 Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) 11-1 Introduction 11- Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination
More informationChap The McGraw-Hill Companies, Inc. All rights reserved.
11 pter11 Chap Analysis of Variance Overview of ANOVA Multiple Comparisons Tests for Homogeneity of Variances Two-Factor ANOVA Without Replication General Linear Model Experimental Design: An Overview
More informationTHE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future
More informationNonparametric Statistics
Nonparametric Statistics Nonparametric or Distribution-free statistics: used when data are ordinal (i.e., rankings) used when ratio/interval data are not normally distributed (data are converted to ranks)
More informationInferential Statistics
Inferential Statistics Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova riccomagno@dima.unige.it rogantin@dima.unige.it Part G Distribution free hypothesis tests 1. Classical and distribution-free
More informationSpearman Rho Correlation
Spearman Rho Correlation Learning Objectives After studying this Chapter, you should be able to: know when to use Spearman rho, Calculate Spearman rho coefficient, Interpret the correlation coefficient,
More informationInter-Rater Agreement
Engineering Statistics (EGC 630) Dec., 008 http://core.ecu.edu/psyc/wuenschk/spss.htm Degree of agreement/disagreement among raters Inter-Rater Agreement Psychologists commonly measure various characteristics
More informationInference with Simple Regression
1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems
More informationRegression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More informationNonparametric statistic methods. Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health
Nonparametric statistic methods Waraphon Phimpraphai DVM, PhD Department of Veterinary Public Health Measurement What are the 4 levels of measurement discussed? 1. Nominal or Classificatory Scale Gender,
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More informationNeuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:
1 Neuendorf MANOVA /MANCOVA Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y1 Y2 Y3 Y4 Like ANOVA/ANCOVA: 1. Assumes equal variance (equal covariance matrices) across cells (groups defined by
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationWeek 14 Comparing k(> 2) Populations
Week 14 Comparing k(> 2) Populations Week 14 Objectives Methods associated with testing for the equality of k(> 2) means or proportions are presented. Post-testing concepts and analysis are introduced.
More informationUsing SPSS for One Way Analysis of Variance
Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial
More informationNon-parametric Hypothesis Testing
Non-parametric Hypothesis Testing Procedures Hypothesis Testing General Procedure for Hypothesis Tests 1. Identify the parameter of interest.. Formulate the null hypothesis, H 0. 3. Specify an appropriate
More informationCHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC
CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 2011 MODULE 4 : Linear models Time allowed: One and a half hours Candidates should answer THREE questions. Each question
More informationAssociation Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression
Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression Last couple of classes: Measures of Association: Phi, Cramer s V and Lambda (nominal level of measurement)
More informationChi Square Analysis M&M Statistics. Name Period Date
Chi Square Analysis M&M Statistics Name Period Date Have you ever wondered why the package of M&Ms you just bought never seems to have enough of your favorite color? Or, why is it that you always seem
More informationWhat is a Hypothesis?
What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population mean Example: The mean monthly cell phone bill in this city is μ = $42 population proportion Example:
More informationNon-parametric tests, part A:
Two types of statistical test: Non-parametric tests, part A: Parametric tests: Based on assumption that the data have certain characteristics or "parameters": Results are only valid if (a) the data are
More informationINTRODUCTION TO ANALYSIS OF VARIANCE
CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two
More informationEmpirical Power of Four Statistical Tests in One Way Layout
International Mathematical Forum, Vol. 9, 2014, no. 28, 1347-1356 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2014.47128 Empirical Power of Four Statistical Tests in One Way Layout Lorenzo
More informationChi-Square. Heibatollah Baghi, and Mastee Badii
1 Chi-Square Heibatollah Baghi, and Mastee Badii Different Scales, Different Measures of Association Scale of Both Variables Nominal Scale Measures of Association Pearson Chi-Square: χ 2 Ordinal Scale
More informationContents Kruskal-Wallis Test Friedman s Two-way Analysis of Variance by Ranks... 47
Contents 1 Non-parametric Tests 3 1.1 Introduction....................................... 3 1.2 Advantages of Non-parametric Tests......................... 4 1.3 Disadvantages of Non-parametric Tests........................
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More information