Diagnostics and Remedial Measures: An Overview

Size: px

Start display at page:

Download "Diagnostics and Remedial Measures: An Overview"

Curtis Hodges
6 years ago
Views:

1 Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics Graphical techniques Hypothesis testing Remedial measures Transformation Later: more about all this for multiple regression W. Zhou (Colorado State University) STAT 540 July 6th, / 50

2 Model Assumptions Recall simple linear regression model. Y i = β 0 + β 1 X i + ɛ i, ɛ i iid N(0, σ 2 ), for i = 1,..., n. A linear-line relationship between E(Y ) and X: Homogeneous variance: Independence: Normal distribution: W. Zhou (Colorado State University) STAT 540 July 6th, / 50

3 Ramifications If Assumptions Violated Recall simple linear regression model. Nonlinearity Linear model will fit poorly Parameter estimates may be meaningless Non-independence Parameter estimates are still unbiased Standard errors are a problem and thus so is inference Nonconstant variance Parameter estimates are still unbiased Standard errors are a problem Non-normality Least important, why? Inference is fairly robust to non-normality Important effects on prediction intervals W. Zhou (Colorado State University) STAT 540 July 6th, / 50

4 Model Diagnostics Reliable inference hinges on reasonable adherence to model assumptions Hence it is important to evaluate the FOUR model assumptions, that is, to perform model diagnostics. The main approach to model diagnostics is to examine the residuals (thanks to the additive model assumption) Consider two approaches. Graphical techniques: More subjective but quick and very informative for an expert. Hypothesis tests: More objective and comfortable for amateurs, but outcomes depend on assumptions, sensitivity. Tendency to use as a crutch. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

5 Graphical Techniques At this point in the analysis, you have already done EDA. 1D exploration of X and Y. 2D exploration of X and Y. Not very effective for model diagnostics except in drastic cases Recall the definition of residual e i = Y i Ŷi, where i = 1,..., n e i can be treated as an estimate of the true error ɛ i = Y i E(Y i ) iid N(0, σ 2 ) e i can be used to check normality, homoscedasticity, linearity, and independence. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

6 Properties of Residuals Mean: ē = Variance: Nonindependence: MSE = SSE n n n 2 = i=1 e2 i n 2 = i=1 (e i ē) 2 = s 2. n 2 When the sample size n is large, however, residuals can be treated as independent. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

7 Standardized Residuals For diagnostics there are superior choices to the ordinary residuals Standardized (KNNL: semi-studentized ) residuals: V ar(ɛ i) = σ 2 therefore is is natural to apply the standardization e i = But each e i has a different variance.. Use this fact to derive superior type of residuals below W. Zhou (Colorado State University) STAT 540 July 6th, / 50

8 Hat Values Ŷ i = The h ij are called hat values. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

9 Deriving the Variance of Residuals Using Ŷi = j h ijy j we obtain e i = Therefore (since the Y s are independent) V ar{e i } = W. Zhou (Colorado State University) STAT 540 July 6th, / 50

10 Continuing to Derive Variance of Residuals Using V ar{e i } = σ [(1 2 h ii ) 2 + ] j i h2 ij, we have h 2 ij = h ii (show it in HW.) Finally, j V ar{e i } = σ 2 (1 h ii ) 2 + j i h 2 ij = σ 2 1 2h ii + h 2 ii + j i h 2 ij = σ 2 1 2h ii + j h 2 ij = σ 2 (1 2h ii + h ii ) = σ 2 (1 h ii ) W. Zhou (Colorado State University) STAT 540 July 6th, / 50

11 Studentized Residuals Now we may scale each residual separately by its own standard deviation The (internally) studentized residual is r i = e i / MSE(1 h ii ) There is still a problem: Imagine that Y i is a severe outlier Yi will strongly pull the regression line toward it ei will understate the distance between Y i and the true regression line The solution is to use externally studentized residuals... W. Zhou (Colorado State University) STAT 540 July 6th, / 50

12 Studentized Residuals Now we may scale each residual separately by its own standard deviation The (internally) studentized residual is r i = e i / MSE(1 h ii ) There is still a problem: Imagine that Y i is a severe outlier Yi will strongly pull the regression line toward it ei will understate the distance between Y i and the true regression line The solution is to use externally studentized residuals... W. Zhou (Colorado State University) STAT 540 July 6th, / 50

13 Studentized Deleted Residuals To eliminate the influence of Y i on the misfit at the ith point, fit the regression line based on all points except the ith. Define the prediction at X i using this deleted regression as Ŷi(i) The deleted residual is d i = Y i Ŷi(i) The studentized deleted residual is t i = d i /ŝ{d i } = Y i Ŷi(i) MSE(i) /(1 h ii ) No need to fit n deleted regressions, we can show that d i = e i /(1 h ii ) (n 2)MSE = (n 3)MSE (i) + e 2 i /(1 h ii ) Also, t i has a t-distribution: t i t n 3 W. Zhou (Colorado State University) STAT 540 July 6th, / 50

14 Residual Plots Residual plot is a primary graphic diagnostic method. Departures from model assumptions can be difficult to detect directly from X and Y. Use the externally standardized residuals Some key residual plots: Plot ti against predicted values Ŷi (Not Yi) detect nonconstant variance detect nonlinearity detect outliers Plot ti against X i. In simple linear regression this is same as above (Why?) In multiple regression will be useful to detect partial correlation Plot ti versus other possible predictors (e.g., time) Detect important lurking variable Plot ti versus lagged residuals Detect correlated errors QQ-plot or normal probability (PP-) plot of ti. Detect non-normality W. Zhou (Colorado State University) STAT 540 July 6th, / 50

15 Nonlinearity of Regression Function Plot t i against Ŷi (and X i for multiple linear regressions). Random scatter indicates no serious departure from linearity. Banana indicates departure from linearity. Could fit nonparametric smoother to residual plot to aid detection Example: Curved relationship (KNNL Figure 3.4(a)). Plotting Y vs. X is not nearly as effective for detecting nonlinearity because trend has not been removed Logically, you are investigating model assumptions not marginal effect. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

16 Nonlinearity of Regression Function Plot t i against Ŷi (and X i for multiple linear regressions). Random scatter indicates no serious departure from linearity. Banana indicates departure from linearity. Could fit nonparametric smoother to residual plot to aid detection Example: Curved relationship (KNNL Figure 3.4(a)). Plotting Y vs. X is not nearly as effective for detecting nonlinearity because trend has not been removed Logically, you are investigating model assumptions not marginal effect. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

17 Nonconstant Error Variance Plot t i against Ŷi (and X i for multiple linear regressions). Random scatter indicates no serious departure from constant variance. Could fit nonparametric smoother to this plot to aid detection Funnel indicates non-constant variance. Example: KNNL Figure 3.4(c). Often both nonconstant variance and nonlinearity exist. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

18 Nonindependence of Error Terms Possible causes of nonindependence. Observations collected over time and/or across space. Study done on sets of siblings. Departure from independence. For example, Trend effect (KNNL Figure 3.4(d), 3.8(a)). Cyclical nonindependence (KNNL Figure 3.8(b)). Plot t i against other covariate, such as time. Autocorrelation function plot ( acf() ) W. Zhou (Colorado State University) STAT 540 July 6th, / 50

19 Nonnormality of Error Terms Box plot, histogram, stem-and-leaf plot of t i. QQ (quantile-quantile) plot. 1 Order the residuals: t (1) t (2) t (n). 2 Find the corresponding rankits : z (1) z (2) z (n), where for k = 1,..., n, z (k) = MSE z ( ) k n is an approximation of the expected value of the kth smallest observation in a normal random sample. 3 Plot t (k) against z (k). QQ plot should be approximately linear if normality holds S shape means distribution of residuals has light ( short ) tails Backwards S means heavy tails C or backwards C means skew It is a good idea to examine other possible problems first. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

20 Presence of Outliers An outlier refers to an extreme observation. Some diagnostic methods Box plot of ti. Plot ti against Ŷi (and Xi). ti which are very unlikely compared to the reference t-distribution could be called outliers Modern cluster analysis methods Outliers may convey important information. An error. A different mechanism is at work. A significant discovery. Temptation to throw away outliers because they may strongly influence parameter estimates. Doesn t mean that the model is right and the data point is wrong The data point is right and the model is wrong W. Zhou (Colorado State University) STAT 540 July 6th, / 50

21 Graphical Techniques: Remarks We generally do not plot residuals (t i ) against response (Y i ). Why? Residual plots may provide evidence against model assumptions, but do not generally validate assumptions. For data analysis in practice: Fit model and check model assumptions (an iterative process). Generally do not include residual plots in a report, but include a sentence or two such as Standard diagnostics did not indicate any violations of the assumptions for this model. For this class, always include residual plots for homework assignments so you can learn the methods No magic formulas. Decision may be difficult for small sample size. As much art as science. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

22 Diagnostic Methods Based on Hypothesis Testing Tests for linearity: F test for lack of fit (Section 3.7). Tests for constancy of variance (Section 3.6): Brown-Forsythe test. Breusch-Pagan test. Levene s test. Bartlett s test. Tests for independence (Chapter 12): Runs test. Durbin-Watson test. Tests for normality (Section 3.5). χ 2 test. Kolmogorov-Smirnov test. Tests for outliers (Chapter 10). W. Zhou (Colorado State University) STAT 540 July 6th, / 50

23 F Test for Lack of Fit Residual plots can be used to assess the adequacy of a simple linear regression model. A more formal procedure is a test for lack of fit using pure error. Need repeat groups For a given data set, suppose we have fitted a simple linear regression model and computed regression error sum of squares SSE = n (Y i Ŷi) 2. i=1 These deviations Y i Ŷi could be due to either random fluctuations around the linear line or an inadequate model W. Zhou (Colorado State University) STAT 540 July 6th, / 50

24 Pure Error and Lack of Fit The main idea is to take several observations on Y for the same X, independently, to distinguish the error due to random fluctuations around the linear line and the error due to lack of fit of the simple linear regression model. The variation among the repeated measurements is called pure error. The remaining error variation is called lack of fit Thus we can partition the regression SSE into two parts: SSE = SSPE + SSLF where SSPE = SS Pure Error and SSLF = SS Lack of Fit. Actually, we are comparing a Linear Function with a Simple function. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

25 Pure Error and Lack of Fit One possibility is that pure error is comparatively large and the linear model seems adequate. That is, pure error is a large part of the SSE. The other possibility is that pure error is comparatively small and linear model seems inadequate. That is, pure error is a small part of the regression error and error due to lack of fit is then a large part of the SSE. If the latter case holds, there may be significant evidence of lack of fit. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

26 Notation Models (R notation): Null (N): Y 1, common mean model Linear regression is Reduced (R): Y X, regression model ANOVA is Full (F): Y factor(x), separate mean model Notation: Y ij are the data, where j indexes groups and i indexes individuals. (Sums will be taken over all available indices). Ȳ is the grand mean Ȳ j is the jth group mean Ŷ ij are the fitted values using the regression line. Note that Ȳj are the fitted values under the ANOVA model that fits group means, Y factor(x) W. Zhou (Colorado State University) STAT 540 July 6th, / 50

27 Sums of Squares Recall: All sums are over both i and j except as noted. SST O = (Y ij Ȳ )2 SSR R = (Ŷij Ȳ )2 SSE R = (Y ij Ŷij) 2 SST O = SSR R + SSE R SSP E = SSE F = (Y ij Ȳj) 2 SSLF = (Ȳj Ŷij) 2 = j n j(ȳj Ŷij) 2 SSE R = SSP E + SSLF W. Zhou (Colorado State University) STAT 540 July 6th, / 50

28 LOF ANOVA Table One way to summarize the LOF test is by ANOVA: Source df SS MS Regression 1 SSR SSR/1 Lack of Fit r 2 SSLF MS LF =SSLF/(r 2) Pure Error n r SSPE MS P E =SSPE/(n r) Total n 1 SSTO E(MSP E) = σ 2 and E(MS LF ) = σ 2 + F-test for lack of fit is therefore: r n i (µ i (β 0 +β 1 x i )) 2 i=1 r 2 W. Zhou (Colorado State University) STAT 540 July 6th, / 50

29 LOF as model comparison In fact, the above lack of fit test is doing model comparison our desired model Y X to the potentially better model Y factor(x) which would be required if the linear model fit poorly. Apply the GLT to compare these two models (are they nested?) F LOF = SSE / R SSE F SSEF df R df F df F Notice SSE R SSE F = SSE R SSP E = SSLF and SSE F = SSP E so F LOF = MSLF/MSP E and LOF ANOVA F-test is same as model comparison by F LOF. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

30 Lack of Fit in R anova(reduced.lm) Df Sum Sq Mean Sq F value Pr(>F) x e-09 #< SSR_R Residuals #< SSE_R full.lm=lm(y~factor(x),purerr) anova(full.lm) Df Sum Sq Mean Sq F value Pr(>F) factor(x) e-10 #< SSR_F Residuals #< SSE_F = SSPE anova(reduced.lm,full.lm) Model 1: y ~ x Model 2: y ~ factor(x) Res.Df RSS Df Sum of Sq F Pr(>F) e-06 #< F_LOF (= F_GLT here) Therefore F LOF = (SSE R SSE F )/(df SSE,R df SSE,F ) SSE F /df SSE,F ( )/(14 8) = = (4.3213/6)/(0.0983/8) = /8 W. Zhou (Colorado State University) STAT 540 July 6th, / 50

31 LOF p-value Compare F = with F (6, 8), p-value = P (F (6, 8) ) < There is very strong evidence of a lack of fit. ANVOA Table for LOF Source df SS MS Regression 1 SSR = MSR = Lack of Fit 6 SSLF = MSLF = Pure Error 8 SSPE = MSPE = Total 15 SSTO = W. Zhou (Colorado State University) STAT 540 July 6th, / 50

32 y LOF by hand The data consist of 16 observations with X repeated at several values: X: Y : X: Y : x W. Zhou (Colorado State University) STAT 540 July 6th, / 50

33 Computing SS Pure Error by hand There are 5 repeat groups out of 8 groups: i X Y n i Compute SSPE as (Y i1 Ȳ1) 2 + (Y i2 Ȳ2) 2 + (Y i3 Ȳ3) 2 i=1 3 (Y i4 Ȳ4) 2 + i=1 i=1 i=1 2 (Y i5 Ȳ5) 2 i=1 W. Zhou (Colorado State University) STAT 540 July 6th, / 50

34 Computing SS Pure Error by hand Y ij : the ith observation for the jth group. n j : the number of observations in the jth group. c: the number of repeat groups. In general, SSP E = df P E = n c j (Y ij Ȳj) 2, j=1 i=1 c (n j 1) = n r j=1 In the example, n 1 = 3, n 2 = 2, n 3 = 3, n 4 = 3, n 5 = 2, c = 5 and hence df PE is = 8 and SSPE = W. Zhou (Colorado State University) STAT 540 July 6th, / 50

35 Computing SS Lack of Fit by Hand By subtraction. In the example, SSE = 4.42 from the regression ANOVA table on df = 14. SSP E = on df = 8. Thus SSLF = SSE SSP E = = df LF = 14 8 = 6. Note SSLF = c n j (Ȳj Ŷij) 2 j=1 i=1 where k denotes the number of groups (here c = 8). W. Zhou (Colorado State University) STAT 540 July 6th, / 50

36 Lack of Fit Test by Hand For testing H 0 : No lack of fit (here, simple linear regression is adequate) versus H a : Lack of fit (here, simple linear regression is inadequate). Use the fact that, under H 0, F = SSLF/df LF SSP E/df P E F (df LF, df P E ). In the example, the observed F test statistic is F = 4.332/ /8 = W. Zhou (Colorado State University) STAT 540 July 6th, / 50

37 Lack of Fit Test: Remarks Note that the R 2 = 93.2% is high, but according to the LOF test, the model is still inadequate. Possible remedy is to use polynomial regression. The repeats need to be independent measurements. If there are no repeats at all, some consider approximate repeat groups by binning the X s close to one another into groups. In this case, the LOF test is an approximate test. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

38 Remedial Measures For simple linear regression, consider two basic approaches. Abandon the current model and look for a better one. Transform the data so that the simple linear regression model is appropriate. Nonlinearity of regression function: Transformation (X or Y or both) Polynomial regression. Nonlinear regression. Nonconstancy of error variance: Transformation (Y) Weighted least squares. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

39 Remedial Measures Simultaneous nonlinearity and nonconstant variance Sometimes works to... Transform Y to fix variance then Transform X to fix linearity Nonindependence of error terms: First-order differencing. Models with correlated error terms. Nonnormality of error terms. Transformation. Generalized linear models. Presence of outliers: Removal of outliers (with extreme caution). Analyze both with and without outliers Robust estimation. New model. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

40 Example: bacteria Data consist of number of surviving bacteria after exposure to X-rays for different periods of time. Let t denote time (in number of 6-minute intervals) and let n denote number of surviving bacteria (in 100s) after exposure to X-rays for t time. t n t n W. Zhou (Colorado State University) STAT 540 July 6th, / 50

41 Example: bacteria n We fit a simple linear regression model to the data. Residuals vs Fitted Residuals t Fitted values It appears that the linear-line model is not adequate. The assumption of correct model seems to be violated. What to do? W. Zhou (Colorado State University) STAT 540 July 6th, / 50

42 Example: bacteria Consider nonlinear model (why nonlinear?) n t = n 0 e βt, where t is time, n t is the number of bacteria at time t, n 0 is the number of bacteria at t = 0, and β < 0 is a decay rate. Take natural logs of both sides of the model, we have, by setting α = ln(n 0 ). ln(n t ) = ln(n 0 ) + ln(e βt ) = ln(n 0 ) + βt = α + βt, That is, we log-transformed n t and the result is a usual linear-line model! W. Zhou (Colorado State University) STAT 540 July 6th, / 50

43 Example: bacteria The transformed data are as follows. t ln(n) t ln(n) Residuals vs Fitted log(n) Residuals t Fitted values W. Zhou (Colorado State University) STAT 540 July 6th, / 50

44 Example: bacteria Based on the log-transformed counts, we can fit the model to get the LS estimates ˆα = 6.029, ˆβ = 0.222, sln(n) t = , R 2 = Inference for β is straightforward. [Unit? Interpretation?] Inference for α is straightforward. [Unit? Interpretation?] Inference for n 0 is not straightforward. Since α = ln(n0), n 0 = e α. Given ˆα = 6.029, we obtain an estimate of n0 ˆn 0 = e ˆα = But the estimate ˆn0 is biased (i.e., E(ˆn 0) n 0). W. Zhou (Colorado State University) STAT 540 July 6th, / 50

45 Transformation: Remarks The purpose of transformation is to meet the assumptions of the linear regression analysis. Linear Models Nonlinear Models (L1) Y = β 0 + β 1 X (L2) Y = β 0 + β 1 X 2 (L3) Y = β 0 + β 1 e X (N1) Y = αe βx ln(y ) = ln(α) + βx (N2) Y = αx β ln(y ) = ln(α) + β ln(x) (N3) Y = α + e βx In (L2) and (L3), the relationship between X and Y is not linear, but the model is linear in the parameters and hence the model is linear. In (N1) and (N2), the model is nonlinear, but can be log-transformed to a linear model (i.e., linearized). In (N3), the model is nonlinear and cannot be linearized. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

46 Transformation: Remarks Transformation could be for X, or Y, or both. Common transformations are log 10 {Z}, ln{z}, Z, and Z 2. Less common transformations include 1/Z, 1/Z 2, arcsin Z, and log 2 {Z}. Another use of transformation is to control unequal variance. Example: If Y are counts, then often larger variances are associated with larger counts. In this case, Y transformation can help stabilize variance. Example: If Y are proportions (of successes among trials), then V ar(y ) = π(1 π)/n, which depends on the true success rate π. Residual plots would reveal the unequal variance problem. In this case, arcsin( Y ) transformation can help stabilize variance. Rule of thumbs: positive data - use log Y ; data are proportions - use arcsin Y ; data are counts - use Y W. Zhou (Colorado State University) STAT 540 July 6th, / 50

47 Transformation: Remarks Ideally, theory should dictate what transformation to use as in the bacteria count example. But in practice, transformation is usually chosen empirically. Transforming Y can affect both linearity and variance homogeneity, but transforming X can affect only linearity. Sometimes solving one problem can create another. For example, transforming Y to stabilize variance causes curved relationship. Usually it is best to start with a simple transformation and experiment. It happens often that a simple transformation allows the use of the linear regression model. When needed, use more complicated methods such as nonlinear regression. Transformations are useful not only for simple linear regression, but also for multiple linear regression and design of experiment. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

48 Variance Stabilizing Transformations If Var(Y ) = g(e(y )), then a variance stabilizing transform is h(y) (g(z)) 1/2 dz Example: if var mean, then g(z) = z and h(y) = y if var mean 2, then g(z) = z 2 and h(y) = ln(y) if var mean(1-mean), then g(z) = z(1 z), then h(y) = sin 1 y W. Zhou (Colorado State University) STAT 540 July 6th, / 50

49 Box-Cox Transformation Consider a transformation ladder for Z = X or Y. λ Z 1 1 Z 2 Z log(z) Z Z Z 2 1 Z Moving up or down the ladder (starting at 1) changes the residual plots in a consistent manner. Use only these choices for manual search, too! Box-Cox method is a formal approach to selecting λ to transform Y. The idea is to consider Y λ i = β 0 + β 1 X i + ɛ i. Estimate λ (along with β 0, β 1, σ 2 ) using maximum likelihood. Box-Cox method may give ˆλ = Round to the nearest interpretable value ˆλ = 0.5. If ˆλ 1, do not transform. In R, boxcox gives a 95% CI for λ. Choose an interpretable ˆλ within the CI. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

50 Box-Cox Transformation Box-Cox family of transformations Z = Y λ I(λ 0) + ln(y ) Estimate λ using maximum likelihood or use the variance - mean relationship Using variance-mean relationship to estimate g(y ) works for 2 groups or many groups (ANOVA presented in near future) Compute Ȳ and S Y for each group Regress log(sy ) on log(ȳ ) and estimate the slope β Use transformation Y λ with λ = 1 β W. Zhou (Colorado State University) STAT 540 July 6th, / 50

51 model for variability σ = Var(Y ) = kµ β Box-Cox Transformation When does this work or Var(Y ) = σ 2 = [kµ β ] 2 := g(µ) Use the delta method to obtain the transformation: Z = g(y ) = Y λ Consider the Taylor expansion Z = g(y ) g(µ) + (Y µ)g (µ) then an approximation for Var(g(µ)) is Var(g(Y )) [g (µ)] 2 Var(Y ) which is the famous Delta Method W. Zhou (Colorado State University) STAT 540 July 6th, / 50

52 Box-Cox Transformation For Z = g(y ) = Y λ we have From the Delta method dz dy = g (Y ) = λy λ 1 Var(Z) = (λµ λ 1 ) 2 (kµ β ) 2 = k 2 λ 2 µ 2(λ 1+β) When λ = 1 β, Var(Z) k 2 λ 2 is approximately constant Analyze the transformed data: e.g., Z 11 = ln(y 11 ), Z 12 = ln(y 12 ),..., Z 2,n2 = ln(y 2,n2 ) Usually round to a reasonable value if β is not integer as discussed above Caution: Some researchers estimate the slope from the regression of ln(var(y )) on ln(ȳ ) then use the transform Z = Y λ with λ = 1 β/2. W. Zhou (Colorado State University) STAT 540 July 6th, / 50

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor