Association studies and regression

Size: px

Start display at page:

Download "Association studies and regression"

Gertrude Stevenson
6 years ago
Views:

1 Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104

2 Administration HW1 will be posted today. Due in two weeks. Send me an if you are still not enrolled! Association studies and regression 2 / 104

3 Review of last lecture Basics of statistical inference. Parameter estimation, Hypothesis testing, Interval estimation Different types of mistakes Multiple hypothesis testing Would like to control: FWER and FDR FWER = P (At least one false positive). Bonferroni procedure controls FWER. FDR = Expected fraction of false discoveries. Benjamini-Hochberg procedure controls FDR. Association studies and regression 3 / 104

4 Outline Motivation Population Association Studies and GWAS Linear regression Univariate solution Probabilistic interpretation Statistical properties of MLE Computational and numerical optimization Logistic regression General setup Maximum likelihood estimation Gradient descent Newton s method Application to GWAS Association studies and regression Motivation 4 / 104

5 Personalized genomics Association studies and regression Motivation 5 / 104

6 Path to Personalized Genomics Basic biology Phenotype is a function of genotype and environment i.e., learn P=f(G,E) How much of the phenotype is genetic vs environmental? Find the genetic and environmental factors associated with phenotype Finding drug targets Disease prediction Given genetic data, predict the phenotype for an individual Association studies and regression Motivation 6 / 104

7 Finding genetic factors that influence a phenotype Will be working with a population of individuals Genotypes of a population of individuals SNPs (most common) Phenotypes in the same set of individuals Binary (disease-healthy), ordinal (number of children or years of schooling), continuous (height, gene expression), heterogeneous and high-dimensional (images, text, videos) Association studies and regression Motivation 7 / 104

8 Outline Motivation Population Association Studies and GWAS Linear regression Univariate solution Probabilistic interpretation Statistical properties of MLE Computational and numerical optimization Logistic regression General setup Maximum likelihood estimation Gradient descent Newton s method Application to GWAS Association studies and regression Population Association Studies and GWAS 8 / 104

9 Population Association Studies Unrelated population of individuals measured for a phenotype Association studies and regression Population Association Studies and GWAS 9 / 104

10 Population Association Studies Find an association between a SNP and phenotype Really want to find a SNP that is causal. Association studies and regression Population Association Studies and GWAS 10 / 104

11 Population Association Studies Find an association between a SNP and phenotype Really want to find a SNP that is causal. Simplest form : does the genotype at a single SNP predict phenotype? Association studies and regression Population Association Studies and GWAS 10 / 104

12 Recall some definitions from lecture 1 Locus: position along the chromosome (could be a single base or longer). Allele: set of variants at a locus Genotype: sequence of alleles along the loci of an individual Individual 1: (1,CT),(2,GG) Individual 2: (1,TT), (2,GA) Association studies and regression Population Association Studies and GWAS 11 / 104

Represent a genotype as the number of copies of the reference allele.

13 Recall some definitions from lecture 1 Pick one allele as reference. Other allele is called alternate allele. Represent a genotype as the number of copies of the reference allele. Each genotype at a single base can be 0/1/2 Locus 1: C is reference Individual 1 has genotype 1 x 1,1 = 1 Individual 2 has genotype 0 x 2,1 = 0 Association studies and regression Population Association Studies and GWAS 11 / 104

14 Genome-wide Association Studies (GWAS) Perform a population association study across the genome Association studies and regression Population Association Studies and GWAS 12 / 104

15 Genome-wide Association Studies (GWAS) Perform a population association study across the genome Association studies and regression Population Association Studies and GWAS 12 / 104

16 GWAS Simplest form: Univariate regression between SNP and continuous phenotype. y i x i,j Phenotype for individual i Genotype for individual i at SNP j Association studies and regression Population Association Studies and GWAS 13 / 104

17 Outline Motivation Population Association Studies and GWAS Linear regression Univariate solution Probabilistic interpretation Statistical properties of MLE Computational and numerical optimization Logistic regression General setup Maximum likelihood estimation Gradient descent Newton s method Application to GWAS Association studies and regression Linear regression 14 / 104

18 Linear regression Goals Predict continuous-valued output from inputs. Classification refers to binary outputs. Output, response: phenotype Input, covariate, predictor: genotype Common practice to test genotype at a single SNP (univariate regression). We will not make that assumption for now. Association studies and regression Linear regression 15 / 104

19 GWAS Single SNP association testing Phenotype AA AG GG Genotype Phenotype Mean phenoytpe + Effect size Genotype + Noise Learn parameters that predict the phenotype well Squared diffence : (actual predicted phenotype) 2 Association studies and regression Linear regression 16 / 104

20 Linear regression Output: y R Input: x R m Find f : x y that predicts y well Assume f(x) = β 0 + β T x. Linear in parameters (hence the name) β 0 is called an intercept or bias. β = (β 1,, β m ): weights, parameters, or parameter vector Sometimes β = (β 0,, β m ) called parameters What does well mean? Association studies and regression Linear regression 17 / 104

21 Linear regression Output: y R Input: x R m Find f : x y that predicts y well Assume f(x) = β 0 + β T x. Linear in parameters (hence the name) β 0 is called an intercept or bias. β = (β 1,, β m ): weights, parameters, or parameter vector Sometimes β = (β 0,, β m ) called parameters What does well mean? Association studies and regression Linear regression 17 / 104

22 Linear regression Setup We have labeled (training) data. D = {(x i, y i ), i = 1,..., n} i: individual j: SNP y i : Phenotype of individual i x i,j : Genotype of individual i at SNP j, x i,j {0, 1, 2} Association studies and regression Linear regression 18 / 104

23 Linear regression What does well mean? Minimize the residual sum of squares RSS RSS( β) = = = n (y i f(x i )) 2 i=1 n (y i (β 0 + β T x i )) 2 i=1 n (y i (β 0 + i=1 m j=1 2 β j x i,j )) Ordinary Least Squares estimator β OLS = arg min β RSS( β) Association studies and regression Linear regression 19 / 104

24 A simple case: x is just one-dimensional (m=1) Residual sum of squares RSS( β) = i [y i f(x i )] 2 = i [y i (β 0 + β 1 x i )] 2 We denote x i = (x i,1 ) (scalar). Identify stationary points by taking derivative with respect to parameters and setting to zero RSS( β) β 0 = 0 2 i [y i (β 0 + β 1 x i )] = 0 RSS( β) β 1 = 0 2 i [y i (β 0 + β 1 x i )]x i = 0 Association studies and regression Linear regression 20 / 104

25 A simple case: x is just one-dimensional (m=1) Residual sum of squares RSS( β) = i [y i f(x i )] 2 = i [y i (β 0 + β 1 x i )] 2 We denote x i = (x i,1 ) (scalar). Identify stationary points by taking derivative with respect to parameters and setting to zero RSS( β) β 0 = 0 2 i [y i (β 0 + β 1 x i )] = 0 RSS( β) β 1 = 0 2 i [y i (β 0 + β 1 x i )]x i = 0 Association studies and regression Linear regression 20 / 104

26 Simplify these expressions to get Normal Equations y i = nβ 0 + β 1 i i x i x i y i = β 0 xi + β 1 i We have two equations and two unknowns! Do some algebra to get: i β 1 = (x i x)(y i ȳ) i (x i x) 2 and β 0 = ȳ β 1 x where x = 1 n i x i and ȳ = 1 n i y i. i x 2 i Association studies and regression Linear regression 21 / 104

27 Simplify these expressions to get Normal Equations y i = nβ 0 + β 1 i i x i x i y i = β 0 xi + β 1 i We have two equations and two unknowns! Do some algebra to get: i β 1 = (x i x)(y i ȳ) i (x i x) 2 and β 0 = ȳ β 1 x where x = 1 n i x i and ȳ = 1 n i y i. i x 2 i Association studies and regression Linear regression 21 / 104

28 Simplify these expressions to get Normal Equations y i = nβ 0 + β 1 i i x i x i y i = β 0 xi + β 1 i We have two equations and two unknowns! Do some algebra to get: i β 1 = (x i x)(y i ȳ) i (x i x) 2 and β 0 = ȳ β 1 x where x = 1 n i x i and ȳ = 1 n i y i. i x 2 i Association studies and regression Linear regression 21 / 104

29 Why is minimizing RSS sensible? Probabilistic interpretation Noisy observation model Y = β 0 + β 1 X + ɛ where ɛ N (0, σ 2 ) is a Gaussian random variable Likelihood of one training sample (x i, y i ) p(y i x i, (β 0, β 1, σ 2 )) = N (β 0 + β 1 x i, σ 2 ) = 1 [y 2πσ 2 e i (β 0 +β 1 x i )]2 2σ 2 Association studies and regression Linear regression 22 / 104

30 Why is minimizing RSS sensible? Probabilistic interpretation Noisy observation model Y = β 0 + β 1 X + ɛ where ɛ N (0, σ 2 ) is a Gaussian random variable Likelihood of one training sample (x i, y i ) p(y i x i, (β 0, β 1, σ 2 )) = N (β 0 + β 1 x i, σ 2 ) = 1 [y 2πσ 2 e i (β 0 +β 1 x i )]2 2σ 2 Association studies and regression Linear regression 22 / 104

31 Probabilistic interpretation Log-likelihood of the training data D (assuming i.i.d) LL(β 0, β 1, σ 2 ) log P (D (β 0, β 1, σ 2 )) Association studies and regression Linear regression 23 / 104

32 Probabilistic interpretation Log-likelihood of the training data D (assuming i.i.d) LL(β 0, β 1, σ 2 ) log P (D (β 0, β 1, σ 2 )) n = log p(y i x i, (β 0, β 1, σ 2 )) = i=1 i log p(y i x i, (β 0, β 1, σ 2 )) Association studies and regression Linear regression 23 / 104

33 Probabilistic interpretation Log-likelihood of the training data D (assuming i.i.d) LL(β 0, β 1, σ 2 ) log P (D (β 0, β 1, σ 2 )) n = log p(y i x i, (β 0, β 1, σ 2 )) = log p(y i x i, (β 0, β 1, σ 2 )) i=1 i = { [y i (β 0 + β 1 x i )] 2 2σ 2 log } 2πσ 2 i Association studies and regression Linear regression 23 / 104

34 Probabilistic interpretation Log-likelihood of the training data D (assuming i.i.d) LL(β 0, β 1, σ 2 ) log P (D (β 0, β 1, σ 2 )) n = log log p(y i x i, (β 0, β 1, σ 2 )) p(y i x i, (β 0, β 1, σ 2 )) = i=1 i = { [y i (β 0 + β 1 x i )] 2 2σ 2 log } 2πσ 2 i = 1 2σ 2 [y i (β 0 + β 1 x i )] 2 n 2 log σ2 n 2 log(2π) i Association studies and regression Linear regression 23 / 104

35 Probabilistic interpretation Log-likelihood of the training data D (assuming i.i.d) LL(β 0, β 1, σ 2 ) log P (D (β 0, β 1, σ 2 )) n = log log p(y i x i, (β 0, β 1, σ 2 )) p(y i x i, (β 0, β 1, σ 2 )) = i=1 i = { [y i (β 0 + β 1 x i )] 2 2σ 2 log } 2πσ 2 i = 1 2σ 2 [y i (β 0 + β 1 x i )] 2 n 2 log σ2 n 2 log(2π) i { } = σ 2 [y i (β 0 + β 1 x i )] 2 + n log σ 2 + const i Relationship between minimizing RSS and maximizing the log-likelihood? Association studies and regression Linear regression 23 / 104

36 Maximum likelihood estimation Estimating σ, β 0 and β 1 can be done in two steps Maximize over β 0 and β 1 arg max β0,β 1 LL(β 0, β 1, σ 2 ) arg min β0,β 1 [y i (β 0 + β 1 x i )] 2 That is RSS( β)! i Maximize over s = σ 2 (we could estimate σ directly) { } LL(β 0, β 1, s) = 1 1 s 2 s 2 [y i (β 0 + β 1 x i )] 2 + n 1 = 0 s σ 2 = s = 1 n i i [y i (β 0 + β 1 x i )] 2 = RSS( β) n Association studies and regression Linear regression 24 / 104

37 Maximum likelihood estimation Estimating σ, β 0 and β 1 can be done in two steps Maximize over β 0 and β 1 arg max β0,β 1 LL(β 0, β 1, σ 2 ) arg min β0,β 1 [y i (β 0 + β 1 x i )] 2 That is RSS( β)! i Maximize over s = σ 2 (we could estimate σ directly) { } LL(β 0, β 1, s) = 1 1 s 2 s 2 [y i (β 0 + β 1 x i )] 2 + n 1 = 0 s σ 2 = s = 1 n i i [y i (β 0 + β 1 x i )] 2 = RSS( β) n Association studies and regression Linear regression 24 / 104

38 Maximum likelihood estimation Estimating σ, β 0 and β 1 can be done in two steps Maximize over β 0 and β 1 arg max β0,β 1 LL(β 0, β 1, σ 2 ) arg min β0,β 1 [y i (β 0 + β 1 x i )] 2 That is RSS( β)! i Maximize over s = σ 2 (we could estimate σ directly) { } LL(β 0, β 1, s) = 1 1 s 2 s 2 [y i (β 0 + β 1 x i )] 2 + n 1 = 0 s σ 2 = s = 1 n i i [y i (β 0 + β 1 x i )] 2 = RSS( β) n Association studies and regression Linear regression 24 / 104

39 How does this probabilistic interpretation help us? It gives a solid footing to our intuition: minimizing RSS( β) is a sensible thing based on reasonable modeling assumptions Estimating σ tells us how much noise there could be in our predictions. For example, it allows us to place confidence intervals around our predictions. Association studies and regression Linear regression 25 / 104

40 Linear regression when x is m-dimensional Probabilistic model y i = β T x i + ɛ i iid N (0, σ 2 ) ɛ i where we have redefined some variables (by augmenting) x [1 x 1 x 2... x m ] T, The likelihood of the parameters θ = ( β, σ 2 ) n L(θ) = Pr(y i x i, θ) = i=1 β [β 0 β 1 β 2... β m ] T n 1 exp (y i β T ) 2 x i 2πσ 2 2σ 2 i=1 Choose parameters to maximize the likelihood. Association studies and regression Linear regression 26 / 104

41 Linear regression Maximum Likelihood Estimation ˆθ = arg max θ log L(θ) LL(θ) log L(θ) n = log Pr(y i x i, θ) i=1 = n (y i β T ) 2 x i 2σ 2 n 2 log(2πσ2 ) i=1 = 1 2σ 2 RSS( β) n 2 log(2πσ2 ) Maximizing the likelihood is equivalent to minimizing the RSS, i.e., least squares Association studies and regression Linear regression 27 / 104

42 Linear regression Computing the MLE LL σ 2 (ˆθ) = 0 βll(ˆθ) = 0 Association studies and regression Linear regression 28 / 104

43 Linear regression Computing the MLE LL σ 2 (ˆθ) = 0 ( σ 2 1 2σ 2 RSS( β) n ) 2 log(2πσ2 ) = 0 1 2(σ 2 ) 2 RSS( β) n 2σ 2 = 0 ˆσ 2 = RSS( β) n Association studies and regression Linear regression 29 / 104

44 Linear regression Computing the MLE βll(ˆθ) = 0 β ( 1 2σ 2 RSS( β) n ) 2 log(2πσ2 ) = 0 β RSS( β) = 0 Association studies and regression Linear regression 30 / 104

45 Linear regression RSS( β) in matrix form RSS( β) = i [y i (β 0 + j β j x i,j )] 2 = i [y i β T x i ] 2 which leads to RSS( β) = i (y i β T x i )(y i x T i β) { βt xi x T i β 2y i x T i β + const.} = i { ( = β T i ) ( ) } x i x T i β 2 y i x T i β + const. i Association studies and regression Linear regression 31 / 104

46 Linear regression RSS( β) in matrix form RSS( β) = i [y i (β 0 + j β j x i,j )] 2 = i [y i β T x i ] 2 which leads to RSS( β) = i (y i β T x i )(y i x T i β) { βt xi x T i β 2y i x T i β + const.} = i { ( = β T i ) ( ) } x i x T i β 2 y i x T i β + const. i Association studies and regression Linear regression 31 / 104

47 Linear regression RSS( β) in matrix form RSS( β) = i [y i (β 0 + j β j x i,j )] 2 = i [y i β T x i ] 2 which leads to RSS( β) = i (y i β T x i )(y i x T i β) { βt xi x T i β 2y i x T i β + const.} = i { ( = β T i ) ( ) } x i x T i β 2 y i x T i β + const. i Association studies and regression Linear regression 31 / 104

48 Linear regression RSS( β) in matrix form RSS( β) = i [y i (β 0 + j β j x i,j )] 2 = i [y i β T x i ] 2 which leads to RSS( β) = i (y i β T x i )(y i x T i β) { βt xi x T i β 2y i x T i β + const.} = i { ( = β T i ) ( ) } x i x T i β 2 y i x T i β + const. i Association studies and regression Linear regression 31 / 104

49 RSS( β) in new notations Design matrix and target vector X = Compact expression x T 1. x T n R n (m+1), y = y 1. y n RSS( β) = X β y 2 2 ( T ( ) = X β y) X β y ( = βt X T y T) ( X β ) y { = β T ( } X T X β 2 X y) T T β + const Association studies and regression Linear regression 32 / 104

50 RSS( β) in new notations Design matrix and target vector X = Compact expression x T 1. x T n R n (m+1), y = y 1. y n RSS( β) = X β y 2 2 ( T ( ) = X β y) X β y ( = βt X T y T) ( X β ) y { = β T ( } X T X β 2 X y) T T β + const Association studies and regression Linear regression 32 / 104

51 Solution in matrix form Compact expression RSS( β) = X β y 2 2 = Gradients of Linear and Quadratic Functions x b x = b x x Ax = 2Ax (symmetric A) Normal equation { β T ( } X T X β 2 X y) T T β + const βrss( β) 2 X T X β 2 X T y = 0 This leads to the ordinary least squares (OLS) solution ( ) β = X T 1 X X T y Association studies and regression Linear regression 33 / 104

52 Solution in matrix form Compact expression RSS( β) = X β y 2 2 = Gradients of Linear and Quadratic Functions x b x = b x x Ax = 2Ax (symmetric A) Normal equation { β T ( } X T X β 2 X y) T T β + const βrss( β) 2 X T X β 2 X T y = 0 This leads to the ordinary least squares (OLS) solution ( ) β = X T 1 X X T y Association studies and regression Linear regression 33 / 104

53 Solution in matrix form Compact expression RSS( β) = X β y 2 2 = Gradients of Linear and Quadratic Functions x b x = b x x Ax = 2Ax (symmetric A) Normal equation { β T ( } X T X β 2 X y) T T β + const βrss( β) 2 X T X β 2 X T y = 0 This leads to the ordinary least squares (OLS) solution ( ) β = X T 1 X X T y Association studies and regression Linear regression 33 / 104

54 Remember the σ 2 parameter σ 2 = RSS( β) n Association studies and regression Linear regression 34 / 104

55 Mini-Summary Linear regression is the linear combination of features f : x y, with f(x) = β 0 + j β jx j = β 0 + β T x If we minimize residual sum of squares as our learning objective, we get a closed-form solution of parameters Probabilistic interpretation: maximum likelihood if assuming residual is Gaussian distributed MLE β = ( ) X T 1 X X T y σ 2 = RSS( β) n Association studies and regression Linear regression 35 / 104

56 Linear regression Statistical properties of the Least squares estimator ] E [ β X ] Cov [ β X = β = σ 2 ( X T X) 1 Assumptions: Linear model is correct Exogeneous covariates. E [ ] ɛ X = 0 Uncorrelated, homoskedastic errors. Cov Does not assume normally distributed errors. [ ] ɛ X = σ 2 I n OLS is best linear unbiased estimator (Gauss-Markov theorem) i.e., has minimum variance among all linear unbiased estimators. Association studies and regression Linear regression 36 / 104

57 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X Association studies and regression Linear regression 37 / 104

58 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X Association studies and regression Linear regression 37 / 104

59 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X ( = X X) T 1 [( X T E X β ] + ɛ) X Association studies and regression Linear regression 37 / 104

60 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X ( = X X) T 1 [( X T E X β ] + ɛ) X ( = X X) ( T 1 [ X T E X β X ] [ ]) + E ɛ X Association studies and regression Linear regression 37 / 104

61 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X ( = X X) T 1 [( X T E X β ] + ɛ) X ( = X X) ( T 1 [ X T E X β X ] [ ]) + E ɛ X ( = X X) ( T 1 X T X β [ ]) + E ɛ X Association studies and regression Linear regression 37 / 104

62 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X ( = X X) T 1 [( X T E X β ] + ɛ) X ( = X X) ( T 1 [ X T E X β X ] [ ]) + E ɛ X ( = X X) ( T 1 X T X β [ ]) + E ɛ X = ( X T X) 1 X T X β Association studies and regression Linear regression 37 / 104

63 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X ( = X X) T 1 [( X T E X β ] + ɛ) X ( = X X) ( T 1 [ X T E X β X ] [ ]) + E ɛ X ( = X X) ( T 1 X T X β [ ]) + E ɛ X = ( X T X) 1 X T X β = β Association studies and regression Linear regression 37 / 104

64 Linear regression Statistical properties of the MLE Exercises: Use similar argument to show that ] Cov [ β X = σ 2 (X T X) 1 We can say more if we assume the errors are normally distributed β N ( β, σ 2 (X T X) 1 ) Association studies and regression Linear regression 38 / 104

65 Computational complexity Bottleneck of computing the solution? β = ( ) 1 XT X Xy Matrix multiply of X T X R (m+1) (m+1) Inverting the matrix X T X How many operations do we need? O(nm 2 ) for matrix multiplication O(m 3 ) (e.g., using Gauss-Jordan elimination) or O(m ) (recent theoretical advances) for matrix inversion Impractical for very large m or n Association studies and regression Linear regression 39 / 104

66 Computational complexity Bottleneck of computing the solution? β = ( ) 1 XT X Xy Matrix multiply of X T X R (m+1) (m+1) Inverting the matrix X T X How many operations do we need? O(nm 2 ) for matrix multiplication O(m 3 ) (e.g., using Gauss-Jordan elimination) or O(m ) (recent theoretical advances) for matrix inversion Impractical for very large m or n Association studies and regression Linear regression 39 / 104

67 Computational complexity Bottleneck of computing the solution? β = ( ) 1 XT X Xy Matrix multiply of X T X R (m+1) (m+1) Inverting the matrix X T X How many operations do we need? O(nm 2 ) for matrix multiplication O(m 3 ) (e.g., using Gauss-Jordan elimination) or O(m ) (recent theoretical advances) for matrix inversion Impractical for very large m or n Association studies and regression Linear regression 39 / 104

68 Alternative method: using numerical optimization Use gradient descent (more details when we talk about logistic regression next) Association studies and regression Linear regression 40 / 104

69 Mini-Summary Assuming the probabilistic model is correct, the MLE is unbiased. Computing the MLE can be done by solving normal equations or by numerical optimization. Association studies and regression Linear regression 41 / 104

70 Linear regression After estimation, what? Prediction Given a new input x, our best guess of y: y = ˆβ 0 + ˆβ T x Association studies and regression Linear regression 42 / 104

71 Linear regression After estimation, what? Hypothesis testing Test if β j = 0 (Wald test) Under H 0 : β j = 0 ˆβ j N (β j, σ 2 [ (X T X) 1] j,j ) ˆβ j N (0, σ 2 [ (X T X) 1] j,j ) ˆβ j ˆσ j N (0, 1) Here ˆσ j = ˆσ 2 [ (X T X) 1] j,j Association studies and regression Linear regression 42 / 104

72 Linear regression After estimation, what? (Generalized) Likelihood Ratio test. Remember lecture 2. ˆβ 0 : MLE with β j = 0 Λ L( ˆβ 0 ) L(ˆβ) [ 2 log Λ = 2 log L(ˆβ) log L( ˆβ ] 0 ) 2 log Λ χ 2 1 Association studies and regression Linear regression 42 / 104

73 Linear regression Model diagnostics Do the observed statistics match the model assumptions? Residuals e i = y i ˆβ T x i If the model assumptions hold, residuals must be independent and normally distributed with equal variance. How do we check this? Association studies and regression Linear regression 43 / 104

74 Linear regression Model diagnostics Are residuals normally distributed (or more generally, distributed according to a known distribution)? Q-Q plot Plot empirical quantiles vs theoretical quantiles. Close to diagnoal implies empirical distribution is close to theoretical distribution. Association studies and regression Linear regression 43 / 104

75 Linear regression Model diagnostics Q-Q plot Residuals Residuals norm quantiles norm quantiles Normally distributed errors Student-t distributed errors Association studies and regression Linear regression 43 / 104

Linear Regression (continued)

Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression