LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

Size: px

Start display at page:

Download "LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi"

Reynard Morgan
5 years ago
Views:

1 LOGISTIC REGRESSION VINAANAND KANDALA M.Sc. (Agricultural Statistics), Roll No. 444 I.A.S.R.I, Library Avenue, New Delhi- Chairerson: Dr. Ranjana Agarwal Abstract: Logistic regression is widely used when the resonse variable is qualitative. In this talk, the roerties of logistic regression model and its estimation rocedure are outlined. To start with, binary resonse variable is considered and alication of logistic regression models for forecasting uroses is discussed by considering two case studies. Various coefficients of determination under logistic regression are also dealt with. Finally the multinomial logistic regression is stated along with an illustration. Key words: Logistic Regression, Qualitative Resonse, Dichotomous, Polytomous, R - Tye Statistic.. Introduction Regression analysis is a method for investigating functional relationshis among variables. The relationshi is exressed in the form of an equation or a model connecting the resonse or deendent variable and one or more exlanatory or redictor variables. When the resonse variable is quantitative, the usual theory of multile linear regression (MLR) analysis holds good. However, situations where the resonse variable is qualitative are also quite common and occur extensively in statistical alications. For examle to determine the risk factors for cancer in humans, data could be collected on several variables, such as age, sex, smoking, diet etc. The resonse variable here, is dichotomous that either the erson has cancer (=), or did not have cancer (=).In such cases, the usual MLR theory is not aroriate. Rather, the statistical model referred for the analysis of such binary (dichotomous) resonses is the binary logistic regression model, develoed rimarily by Cox (958) and Walker and Duncan (967).Thus Logistic Regression is a mathematical modeling aroach that can be used to describe the relationshi of several indeendent variables to (say) a binary (dichotomous) deendent variable. Later on, the models to deal with olytomous (multinomial) resonses evolved.. Assumtions of Multile Linear Regression Model Let the resonse variable be denoted by and the set of redictor variables by, X, X X, where denotes the number of redictor variables. The true relationshi between and (X, X X ) can be aroximated by the regression model = f (X, X...X ) + ε (.) An examle is the linear regression model = β +β X +β X +. +β X + ε (.) Some assumtions are made about the model given in (.). They are:

2 . Assumtions About the Form of the Model The model that relates the resonse to the redictors X, X...X is assumed to be linear in the regression arameters β, β β, namely, = β +β X + β X +. + β X + ε. This imlies that the i th observation can be written as y i = β + β x i + β x i +. + β x i + ε i, i =, n (.3) This is referred as linearity assumtion. When the linearity assumtion does not hold, transformation of data can sometimes lead to linearity.. Assumtions About the Errors The errors ε,ε,ε n in (.3) are assumed to be indeendently and identically distributed (iid) normal random variables each with mean zero and a common variance σ. That is, The errors ε, ε, ε n have the same (but unknown) variance σ. This is the constant variance assumtion. It is also known by the other names known as the homogeneity or the homoscedasticity assumtion. When this assumtion does not hold, the roblem is called heterogeneity or the heteroscedasticity roblem. The errors ε, ε, ε n are indeendent of each other (their air-wise covariances are zero). This is known as indeendent-errors assumtion..3 Assumtions About the Predictors The assumtions concerning the redictor variables are: The redictor variables X, X...X are nonrandom, that is, the values x j, x j... x nj ; j =,..., are assumed fixed or selected in advance. The values x j, x j,..., x nj ; j =,..., are measured without error..4 Assumtions About the Observations All observations are equally reliable and have aroximately equal role in determining the regression results and in influencing conclusions. 3. Violation of Assumtions of Linear Regression Model when Resonse is Qualitative In order to exlain the constraints in using a linear regression model when the resonse variable is qualitative, let us consider, for simlicity the simle linear regression model with single redictor variable and a binary resonse variable. i = β +β X i + ε i, i =, i.e, when the outcome i is binary taking on the value or. Let π i denote the robability that i = when X i = x. If we use the standard linear model to describe π i then our model for the robability would be

3 π i = P( i = X i = x) = β +β X i + ε i (3.) Let us consider the constraints in using this model as a linear regression model: Since π i is a robability it must lie between and. The linear function given in R.H.S of (3.) is unbounded and hence cannot be used to model robabilities. Now since E(ε i ) =, the exected value of the resonse variable is E(y i ) =. (π i ) +. (- π i ) = π i If the resonse is binary, then the error terms ε i can take on two values, namely, ε i = π i when y i = ε i = π i when y i = Because the error is dichotomous, it cannot be even aroximately normally distributed. Second the error variance is not constant, since σ y i = E {y i - E (y i )} = (- π i ) π i + (- π i ) (- π i ) = π i (-π i ). From the above equation we can see that variance is a function of the π i s. Therefore, the assumtion of equal variance (homoscedasticity) does not hold. 4. Logistic Regression Model The relationshi between the robability π and X in (3.) can often be reresented by a logistic resonse function. It resembles an S-shaed curve, a sketch of which is given in Figure 4.. π - x Figure 4.: The Logistic Function 3

4 The robability π initially increases slowly with increase in X, and then the increase accelerates, finally stabilizes, but does not increase beyond. The shae of the S-curve given in Figure 4. can be reroduced if we model the robabilities as follows: z π = P(= X=x) = e / (+e z ), (4.) where z = β +β x, and e is the base of the natural logarithm. The robabilities here are modeled by the distribution function (cumulative robability function) of the logistic distribution. The logistic model can be generalized directly to the situation where we have several redictor variables. The robability π is modeled as π = P(= X = x X = x ) = e z z / (+e ) z = / (+e ), (4.) where z = β +β x +..+β x. The equation in (4.) is called the logistic regression function. It is nonlinear in the arameters β, β β. Modeling the resonse robabilities by the logistic distribution and estimating the arameters of the model given in (4.) constitutes fitting a logistic regression. The method of estimation used is the maximum likelihood estimation method. To exlain the oularity of logistic regression, let us consider the mathematical form on which the logistic model is based. This function, called f (z), is given by f (z) = / (+e z ), - < z < Now when z = -, f (z) = and when z =, f (z) =. Thus the range of f (z) is to. So the logistic model is oular because the logistic function, on which the model is based, rovides Estimates that lie in the range between zero and one. An aealing S-shaed descrition of the combined effect of several exlanatory variables on the robability of an event.(from Figure 4.) 5. The Logit Model The logistic regression model can be linearized by the logit transformation. Instead of working directly with π sometimes a transformed value of π is referred. In logistic regression the fitting is carried out by working with the logit as logit transformation roduces a model that is linear in the arameters. If π is the robability of an event haening, the ratio π/ (-π) is called the odds ratio for the event. Since -π = P(= X = x X = x ) then, = / (+ ex (β +β x +..+β x )) π / (-π) = ex (β +β x +..+β x ) (5.) 4

5 Taking the natural logarithm of both sides of (4.), we obtain g (x x ) = logit {= X} = log (π/ (-π)) = β +β x +..+β x, (5.) which reresents the usual multile linear regression model. The logarithm of the odds ratio is called the logit. It can be seen from (5.) that the logit transformation roduces a linear function of the arameters β, β β. While the range of the values of π in (4.) is between and, the range of values of log (π/ (-π)) is between - and, which makes the logits more aroriate for linear regression fitting. 6. Model Assumtions and Interretation of Parameters The logistic regression assumtions are most easily understood by transforming π to make a model that is linear in Xβ: logit {= X} = logit π = log [π / (-π)] = Xβ (6.) where π = Prob{= X}. Thus the model is a regression model in the log odds that = since logit (π) is a weighted sum of the X s. If all the effects are additive (i.e., no interactions are resent), the model assumes that for redictor X, j logit {= X} = β + β x + + β x = β j x j + C (6.) where if all other factors are held constant, C is a constant given by C = β + β x + + β x + β x β x (6.3) j i j j+ j+ The arameter β j is then change in the log odds er unit change in X j if X j reresents a single factor that is linear and does not interact with other factors and if all the other factors are held constant. 7. Maximum Likelihood Estimation For simlicity, consider a simle binary logistic regression model. Let the binary resonse variable i takes only two values (say) and. Since each i observation is an ordinary Bernoulli random variable, where: P ( i = ) = π i P ( i = ) = - π i we can reresent its robability distribution function as follows: 5

6 f ( ) = π i (- π i ) i, =, ; i = n...(7.) i i i i Since the i observations are indeendent, their joint robability function is: Since E { We obtain n n n i i i= Π= i n e n e π i i i= n i i g ( ) = Π f ( ) = π (- π ) i log g ( ) = log Π (- π i ) i i i = [ i log e ( π i / (- π i ))] + log e (-π i ) (7.) i = } = π i for a binary variable, it follows that -π i = [+ ex (β + β X i )] log [π i / (- π )] = β + β X i e i Hence (7.) can be exressed as follows: n e i = i log L (β, β ) = (β + β X i ) - log [+ ex (β + β X i )], n i = n i = e where L (β, β ) relaces g ( n ) to show exlicitly that the function can now be viewed as the likelihood function of the arameters to be estimated, given the samle observations. The maximum likelihood estimates β and β in the simle logistic regression model are those values of β and β that maximize the log-likelihood function in (7.). No closed form solution exists for the values of β and β in (7.) that maximize the log-likelihood function. Comuter intensive numerical search rocedures are therefore required to find the maximum likelihood estimates b and b. Some standard statistical software rograms such as SAS (PROC LOGISTIC), SPSS (Analyze- Regression-Binary Logistic) rovide maximum likelihood estimates b and b for logistic regression. Once the maximum likelihood estimates b and b are found, by substituting these values into the resonse function in (7.) the fitted resonse function, say, The fitted resonse function is as follows: i π, can be obtained. π = ex (b + b X)/ [+ ex (b + b X)] (7.3) Utilizing the logit transformation, the fitted resonse function in (7.3) can be exressed as follows: 6

7 where π = b + b X, (7.4) π = log e ( π/ - π) We call (7.4) the fitted logit resonse function. 8. Alications Examle 8.: Johnson et al. (996) examined the relationshi between weather and outbreak of Potato late blight in south central Washington for a 5 year eriod with logistic regression analysis. The resonse variable was a year either with or without a late blight outbreak. The redictor variables used in the study include: number of days with rain during Aril and May ( R am ), number of days with rain during July and August ( R ), the reciitation during May when the daily minimum temerature was 5 ja C (P m ), the resence of a late blight outbreak during the receding year ( ; = yes, = no) from 97 to 994. They develoed two searate models with model I deending on, R am and P m and the model II deending on, R ja and P m. The data set used is resented in Table 8.. Table 8. ear R am (days) R ja (days) P m (mm)

8 where = whether affected by Potato late blight, or not. ( = affected, = unaffected) R am = number of days with rain during Aril and May R ja = number of days with rain during July and August P = reciitation during May when the daily minimum temerature was 5 m C. The logistic regression models obtained were given by: Model I: P(=) = / {+ ex ( R +.59 R )} Model II: P(=) =/ {+ ex ( R -.69P m )} If P(=) <.5, then the year was classified as a nonoutbreak ; otherwise the year was classified as outbreak. am am ja Measure Accuracy Sensitivity Secificity Table 8. Indeendent variables, R, ja, R, P m am R am where =the resence of a late blight outbreak during the receding year R am = number of days with rain during Aril and May R ja = number of days with rain during July and August P = reciitation during May when the daily minimum temerature was 5 m C. Accuracy = ercentage of years with or without late blight outbreak classified correctly Sensitivity = ercentage of years with late blight outbreaks classified correctly Secificity = ercentage of years without late blight outbreaks classified correctly The logistic function correctly classified 88% of the years as outbreak or nonoutbreak for both models. Sensitivity and secificity were high for both models. Misclassified years were 978, 988, and 99 for both the models. The values of the three variables from model II will be available on June. Using model II develoed in this study; forecasts of the late blight could be made each year on June. This is 4 - weeks after lanting and 4 days before late blight observed in any year studied. If more criteria for the occurrence of a late blight outbreak were fulfilled before June, a forecast for the otential occurrence of an outbreak could be made earlier. Model I could be used through July and August if late blight had not been observed earlier. The value of the third variable from 8

9 model I, i.e., R ja solving for the value of, would not be available until 3 August, but the model could be used by R ja, needed for an outbreak to occur and comaring it to the normal (mean for 5-year eriod ) and exected occurrences of rainy days during July and August based on weather forecasts. The calculations could be made reeatedly, and the forecast could be udated as weather and cro conditions changed during July and August. These emirical based forecasts for the resence or absence of late blight would be general in nature, advising growers at the beginning of the growing season of the likelihood of an outbreak. Such warning is imortant in Washington, where late blight is soradic in occurrence and growers are not accustomed to alying fungicides early in the season. When a late blight outbreak is likely, growers can monitor individual fields more thoroughly and initiate fungicide srays in areas with a history of early occurrence of late blight. Examle 8.: Misra et al. (4) in their study, used weather data during in Kakori and Malihabad mango (Mangifera indica L.) belt (Lucknow) of Uttar Pradesh to develo logistic regression models for forewarning owdery mildew caused by Oidium mangiferae Berthet and validated the same using data of recent years. The forewarning system thus obtained satisfactorily forewarns with the results obtained comaring well with the observed year-wise resonses. The status of the owdery mildew (its eidemic and sread) during and 997 are given in Table 8.3, with the occurrence of the eidemic denoted by and otherwise. The variables used were maximum temerature (X ) and relative humidity (X ). The model is given by P (=) = / [+ex {- (β + β x + β x )}] Table 8.3 ear Third week() Average weather data in the II week of March X X

10 Logistic regression models were develoed using the maximum likelihood estimation rocedure. Consider model based on the average of II week of March weather data using which forewarning robability is obtained for the year 997. Forewarning robability of occurrence of owdery mildew in mango using logistic regression modeling for 997 is given in Table 8.4. The logistic regression model yielded good results. The arameter estimates obtained are: ^ β = 7.36 ^ β = ^ β = -.94 Then the model becomes P (=) = / {+ ex (-(7.36+ (-.843* x ) + (-.94* x ))} Plugging in the values x = 3.5 and x = 68.9 of year 997, it can be seen that P(=) =.33. Table 8.4 Model develoed uon the years To forewarn for year Observed eidemic Forewarning robabilities of eidemic status using average weather data of March II week If P (=) <.5, then robability that eidemic will occur is minimal, otherwise there is more chance of occurrence of eidemic and this can be taken as objective rocedure of forewarning the disease. As we were having the information that there was no eidemic during the year 997, it can be seen that obtained from the logistic regression model forewarns the actual status correctly. 9. R - tye statistics in logistic regression The use of R in linear regression is reasonably straightforward, although its use is not without criticism. As ointed out by Cox and Wermuth (99), if we simly use the linear regression form of R when the resonse is binary, we can obtain a value close to zero even when the model fits the data very well. Various alternative R -tye statistics have been roosed, and these are reviewed and critiqued by Menard (). He comared the statistics relative to the eight roerties of a good R statistic roosed by Kvalseth (985), and also comared them for an actual data set. Kvalseth (985) felt that R should: have an intuitively reasonable interretation and have utility as a measure of goodness of fit,

11 be dimensionless, be well-defined with endoints that corresond to erfect fit and comlete lack of fit, weight ositive and negative residuals equally, be comarable across different models, be comatible with statistics derived from other accetable measures of fit, be alicable to any tye of model, and not be confined to any articular modeling technique. Menard () gave desirable roerties for an R statistic in logistic regression stating that it should be alicable to both dichotomous and olytomous models, is based on the quantity the model is trying to maximize or minimize, has a roortional change interretation, and naturally varies between and. Menard () roosed the following R -tye statistics for logistic regression:. The ordinary least squares R R o = - ( y y) / (y y). The log likelihood ratio R R L = - [ln (L M ) / ln (L O )] 3. The geometric mean squared imrovement er observation R R M = - (L O / L M ) / n 4. The adjusted geometric mean squared imrovement R R N = [- (L / L ) / n ]/ [- (L ) / n ] 5. The contingency coefficient R O M R C = G M / (G M + n) O where, n = total samle size L O = the likelihood function for the model containing only the intercet L M = the likelihood function for the model containing all of the redictors. G M = -[ln (L O ) ln (L M )] = the model chi-square statistic y = the redicted value of the deendent variable obtained from the model y = the observed value of the deendent variable coded as a discrete integer value. y = the mean value of the deendent variable.

12 Menard () exresses a reference for R L statistic that has the above mentioned desirable roerties under logistic regression setu. The data set used by Menard () had multile deendent variables, and the author used them individually so as to create seven logistic regression models, each of which utilized all of the available redictors. Data are taken from the National outh Survey (NS), a national household robability samle of individuals who were -7 years old in 976. In this study, the author focuses on data from 98, from 97 resondents who were 5-8 years old at the time of the interview. All data involve self reorts of attitudes, behavior of friends, one s own behavior, and sociodemograhic characteristics. The deendent variables are different tyes of illicit drug use and other forms of illegal behavior, deliberately selected to range from low (aroximately %) to high (aroximately 5%) base rates, where base rate is roortion of cases for which y = (or for y =, if that is the smaller of the two categories). The redictors for all of the models are (a) gender(male or female); (b) ethnicity; (c) an index of how wrong the resondent believes it is to commit different illegal acts, coded as consecutive values from to 6, with high values indicating strong beliefs against violating the law; and (d) an index of how many of one s friends have committed different illegal acts coded as consecutive integers from to, with high values indicating extensive illegal behavior among one s friends. Summary information for the models for the seven dichotomous deendent variables for the 5-8 year old samle is resented in Table 9.. Deendent variable Table 9.: Classification tables and redictive efficiency Prediction tables observed redicted Predictive efficiency

13 b =.% 99 9 I =. % correct = 99. b =.% I =. % correct = b = 9.8% I = -. % correct = 9 4 b = 3.3% I =.74 % correct = b = 3.5% I =. % correct = b = 4.% I =.46 % correct = b = 49.5% I =.397 % correct = 7. Here, I = index of redictive efficiency = - (n- f ii )/ (n-n mod e ) n = samle size n mod e = observed number of cases in the modal category of the deendent variable f ii = number of cases for which the redicted value is equal to the observed value. b = Base rate = roortion of cases for which y = (or for y =, if that is the smaller of the two categories) = sexual assault = selling illicit hard drugs 3 = felony assault = Hard illicit drug use 4 5 = serious non-drug offending 6 = general illicit drug use 7 = general offending Table 9.: Coefficients of Determination for models using different deendent variables. Exlained 3

14 Variation (.%) * (.%) (9.8%) (3.3%) (3.5%) (4.%) (49.5%) R o R L R M R N R C * The values in arenthesis indicate the base rate. All of the R analogs reviewed in Table 6 could reasonably be used to comare models involving different redictors but same deendent variable and the same samle. Although for some uroses there aears to be little reason for referring one coefficient of determination over another 4

15 Table 9.3: Correlations among base rate and coefficients of determination. r (n=7) Base rate I R o R L R M R N R C Base rate I R o R L R M R N R C. r with base rate Hence, R seems referable to the other R L analogs in two resects: R L has the most intuitively reasonable interretation as a roortional reduction in error measure, arallel to R o. R stands out for its indeendence from the base rate, relative to other L R analogs, making it most generally alicable and consistently useful of the R analogs.. Polytomous (Multinomial) Logistic Regression Model When the resonse variable has more than two levels (olytomous), logistic regression can be emloyed by means of a olytomous (multinomial) logistic regression model. An aroximate way of carrying out olytomous regression analysis is to fit several individual binary logistic regression models. The estimates of the logistic regression coefficients so obtained are consistent estimates of the olytomous logistic regression arameters. For simlicity, let us consider the case where resonse variable has three levels; (say),, and 3. Let π j ( X) denote the classification robabilities Pr(= j- X) of resonse, j =,, 3 at value X T = (X, X X ) for a set of exlanatory variables X, X X. The interest is centered in relating π T = (π (X), π ( X), π 3 (X)) to the redictor X. Since the resonse variable has three levels of resonse, ordinal logistic regression model is considered. Since the resonse categories have natural ordering logit models should utilize that ordering. Here, the roortional-odds model is used which is described below. The ordered multile resonse models assumes the relationshi 5

16 logit [Pr ( j- X) = α + j β T X, j =, where α j are two intercet arameters and β T = (β β ) is the sloe arameter vector not including the intercet terms. By construction, α < α holds. The model fits a common sloes cumulative model that is a arallel lines regression model based on the cumulative robabilities of the resonse levels. The multinomial logistic regression model for the three level resonse case is given as follows: where logit (π ) = log (π / (- π )) = α + β x +..+β x, logit (π + π ) = log (π + π / (- π - π )). π (X) = ex (α + β T X) / [+ { ex(α + β T X)}] π (X) + π (X) = ex (α + β T X) / [+ {ex (α + β T X)}] and π + π + π 3 =. The estimates of α s and β s are obtained by method of maximum likelihood estimation. Thus, the estimates are given as: α α π (X) = ex ( + β T X)/ [+ {ex ( + π α β T X)}], π (X)+ (X) = ex ( + β T X)/ [+ {ex ( + α For each X, π j ( X) = Pr (= j- X) can then be estimated by π j. β T X)}]. Examle.: Kim et al. collected data of fire ant mating flight activities daily from Aril, to June 3,. The data collection was setu with 3 resonse variables viz. Worker, Winged Male and Winged Female. Each of these resonse variables had 3 levels: for not active, for moderately active, and for very active, resectively. The observations on these variables were taken from a.m. to 5.m., the necessary duration of a day that would cover all ossible flight activities. These were taken every 3 minutes by observing several mounds near the Texas A&M camus, U.S.A., usually 7 days a week at 3-minute intervals. Only the nests that articiated in mating flights were studied. On 59 days in the 456-day duration, we observed winged male/female fire ant activities ( s and/or s). Table. reorts the value of combined variable (MF) of Male and Female defined to be the maximum value of the male and female s activities, since it is a good measure of overall mating flight activities. The resulting combined variable has the frequencies given in Table 8. Hereinafter, the examle concentrates on studying this combined characteristic only. 6

17 Table.: Frequencies of the combined variable MF. Activity MF The redictor variables used are: the square of standardized temerature (X ), the square of barometric ressure (X ), the square of wind seed (X ), the squared standardized 3 humidity (X 4 ), change of barometric ressure (X 5 ), and rain (X 6 ). The multinomial logistic regression model setu is logit ( π) = α + βx + βx + β3x3 + β4x4 + β5x5 + β6x6 and logit ( π + π ) = α + β X + β X + β X + β X + β X + β X6 Tables. &.3 reort the arameter estimates of the fitted multinomial logistic regression model and the estimated robabilities for the three categories of the combined variable MF. Table.: Parameter estimates for the combined variable MF Parameter α α β β β 3 β 4 β 5 β 6 Estimate Table.3: Estimated π s for the combined variable MF X X X 3 X 4 X 5 X 6 π π π

18 . Other Related Methods The other methods that have been used to analyze qualitative resonse data include the robit model, which writes π in terms of the cumulative normal distribution, and the dicriminant analysis. Probit regression, although assuming a similar shae to the logistic function for the regression relationshi between Xβ and π, involves more cumbersome calculations, and there is no natural interretation of its regression arameters. In the ast, discriminant analysis has been the redominant method since it is the simlest comutationally. It assumes that the redictors are each normally distributed and that jointly the redictors have multivariate normal distribution. These assumtions are unlikely to be met in ractice, esecially when one of the redictors is a discrete variable. When discriminant analysis assumtions are violated, logistic regression yields more accurate estimates. References Cox, D. R. (958). The regression analysis of binary sequences (with discussion). Journal of the Royal Statistical Society B,, 5-4. Cox, D.R. and Wermuth, N. (99). A comment on the coefficient of determination for binary resonses, The American Statistician, 46, -4. Fox, J. (984). Linear statistical models and related methods with alications To Social Research, Wiley, New ork. Johnson D.A., Alldredge J.R., and Vakoch, D.L. (996). Potato late blight forecasting models for the semiarid environment of South-Central Washington. American Phytoathology 86: Kim, H., Li, J., Wang, S. Ordinal logistic regression modeling to redict mating flights through meteorological cues, Texas A&M University, Personal Communication. Kleinbaum, D.G. (994). Logistic regression: A Self Learning Text, nd ed., Sringer- Verlag, New ork. Kvalseth, T. O. (985). Cautionary note about R. The American Statistician, 39, Menard, S. (). Coefficients of determination for multile logistic regression analysis. The American Statistician, 54(), 7-4. Misra, A.K., Om Prakash, and Ramasubramanian, V. (4). Forewarning owdery mildew caused by Oidium mangiferae in mango (Mangifera indica) using logistic regression models. Indian Journal of Agricultural Sciences. 74(): Walker, S.H., and Duncan, D.B. (967) Estimation of the robability of an event as a function of several indeendent variables. Biometrika, 54,

General Linear Model Introduction, Classes of Linear models and Estimation

Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)