Association studies and regression

Size: px
Start display at page:

Download "Association studies and regression"

Transcription

1 Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104

2 Administration HW1 will be posted today. Due in two weeks. Send me an if you are still not enrolled! Association studies and regression 2 / 104

3 Review of last lecture Basics of statistical inference. Parameter estimation, Hypothesis testing, Interval estimation Different types of mistakes Multiple hypothesis testing Would like to control: FWER and FDR FWER = P (At least one false positive). Bonferroni procedure controls FWER. FDR = Expected fraction of false discoveries. Benjamini-Hochberg procedure controls FDR. Association studies and regression 3 / 104

4 Outline Motivation Population Association Studies and GWAS Linear regression Univariate solution Probabilistic interpretation Statistical properties of MLE Computational and numerical optimization Logistic regression General setup Maximum likelihood estimation Gradient descent Newton s method Application to GWAS Association studies and regression Motivation 4 / 104

5 Personalized genomics Association studies and regression Motivation 5 / 104

6 Path to Personalized Genomics Basic biology Phenotype is a function of genotype and environment i.e., learn P=f(G,E) How much of the phenotype is genetic vs environmental? Find the genetic and environmental factors associated with phenotype Finding drug targets Disease prediction Given genetic data, predict the phenotype for an individual Association studies and regression Motivation 6 / 104

7 Finding genetic factors that influence a phenotype Will be working with a population of individuals Genotypes of a population of individuals SNPs (most common) Phenotypes in the same set of individuals Binary (disease-healthy), ordinal (number of children or years of schooling), continuous (height, gene expression), heterogeneous and high-dimensional (images, text, videos) Association studies and regression Motivation 7 / 104

8 Outline Motivation Population Association Studies and GWAS Linear regression Univariate solution Probabilistic interpretation Statistical properties of MLE Computational and numerical optimization Logistic regression General setup Maximum likelihood estimation Gradient descent Newton s method Application to GWAS Association studies and regression Population Association Studies and GWAS 8 / 104

9 Population Association Studies Unrelated population of individuals measured for a phenotype Association studies and regression Population Association Studies and GWAS 9 / 104

10 Population Association Studies Find an association between a SNP and phenotype Really want to find a SNP that is causal. Association studies and regression Population Association Studies and GWAS 10 / 104

11 Population Association Studies Find an association between a SNP and phenotype Really want to find a SNP that is causal. Simplest form : does the genotype at a single SNP predict phenotype? Association studies and regression Population Association Studies and GWAS 10 / 104

12 Recall some definitions from lecture 1 Locus: position along the chromosome (could be a single base or longer). Allele: set of variants at a locus Genotype: sequence of alleles along the loci of an individual Individual 1: (1,CT),(2,GG) Individual 2: (1,TT), (2,GA) Association studies and regression Population Association Studies and GWAS 11 / 104

13 Recall some definitions from lecture 1 Pick one allele as reference. Other allele is called alternate allele. Represent a genotype as the number of copies of the reference allele. Each genotype at a single base can be 0/1/2 Locus 1: C is reference Individual 1 has genotype 1 x 1,1 = 1 Individual 2 has genotype 0 x 2,1 = 0 Association studies and regression Population Association Studies and GWAS 11 / 104

14 Genome-wide Association Studies (GWAS) Perform a population association study across the genome Association studies and regression Population Association Studies and GWAS 12 / 104

15 Genome-wide Association Studies (GWAS) Perform a population association study across the genome Association studies and regression Population Association Studies and GWAS 12 / 104

16 GWAS Simplest form: Univariate regression between SNP and continuous phenotype. y i x i,j Phenotype for individual i Genotype for individual i at SNP j Association studies and regression Population Association Studies and GWAS 13 / 104

17 Outline Motivation Population Association Studies and GWAS Linear regression Univariate solution Probabilistic interpretation Statistical properties of MLE Computational and numerical optimization Logistic regression General setup Maximum likelihood estimation Gradient descent Newton s method Application to GWAS Association studies and regression Linear regression 14 / 104

18 Linear regression Goals Predict continuous-valued output from inputs. Classification refers to binary outputs. Output, response: phenotype Input, covariate, predictor: genotype Common practice to test genotype at a single SNP (univariate regression). We will not make that assumption for now. Association studies and regression Linear regression 15 / 104

19 GWAS Single SNP association testing Phenotype AA AG GG Genotype Phenotype Mean phenoytpe + Effect size Genotype + Noise Learn parameters that predict the phenotype well Squared diffence : (actual predicted phenotype) 2 Association studies and regression Linear regression 16 / 104

20 Linear regression Output: y R Input: x R m Find f : x y that predicts y well Assume f(x) = β 0 + β T x. Linear in parameters (hence the name) β 0 is called an intercept or bias. β = (β 1,, β m ): weights, parameters, or parameter vector Sometimes β = (β 0,, β m ) called parameters What does well mean? Association studies and regression Linear regression 17 / 104

21 Linear regression Output: y R Input: x R m Find f : x y that predicts y well Assume f(x) = β 0 + β T x. Linear in parameters (hence the name) β 0 is called an intercept or bias. β = (β 1,, β m ): weights, parameters, or parameter vector Sometimes β = (β 0,, β m ) called parameters What does well mean? Association studies and regression Linear regression 17 / 104

22 Linear regression Setup We have labeled (training) data. D = {(x i, y i ), i = 1,..., n} i: individual j: SNP y i : Phenotype of individual i x i,j : Genotype of individual i at SNP j, x i,j {0, 1, 2} Association studies and regression Linear regression 18 / 104

23 Linear regression What does well mean? Minimize the residual sum of squares RSS RSS( β) = = = n (y i f(x i )) 2 i=1 n (y i (β 0 + β T x i )) 2 i=1 n (y i (β 0 + i=1 m j=1 2 β j x i,j )) Ordinary Least Squares estimator β OLS = arg min β RSS( β) Association studies and regression Linear regression 19 / 104

24 A simple case: x is just one-dimensional (m=1) Residual sum of squares RSS( β) = i [y i f(x i )] 2 = i [y i (β 0 + β 1 x i )] 2 We denote x i = (x i,1 ) (scalar). Identify stationary points by taking derivative with respect to parameters and setting to zero RSS( β) β 0 = 0 2 i [y i (β 0 + β 1 x i )] = 0 RSS( β) β 1 = 0 2 i [y i (β 0 + β 1 x i )]x i = 0 Association studies and regression Linear regression 20 / 104

25 A simple case: x is just one-dimensional (m=1) Residual sum of squares RSS( β) = i [y i f(x i )] 2 = i [y i (β 0 + β 1 x i )] 2 We denote x i = (x i,1 ) (scalar). Identify stationary points by taking derivative with respect to parameters and setting to zero RSS( β) β 0 = 0 2 i [y i (β 0 + β 1 x i )] = 0 RSS( β) β 1 = 0 2 i [y i (β 0 + β 1 x i )]x i = 0 Association studies and regression Linear regression 20 / 104

26 Simplify these expressions to get Normal Equations y i = nβ 0 + β 1 i i x i x i y i = β 0 xi + β 1 i We have two equations and two unknowns! Do some algebra to get: i β 1 = (x i x)(y i ȳ) i (x i x) 2 and β 0 = ȳ β 1 x where x = 1 n i x i and ȳ = 1 n i y i. i x 2 i Association studies and regression Linear regression 21 / 104

27 Simplify these expressions to get Normal Equations y i = nβ 0 + β 1 i i x i x i y i = β 0 xi + β 1 i We have two equations and two unknowns! Do some algebra to get: i β 1 = (x i x)(y i ȳ) i (x i x) 2 and β 0 = ȳ β 1 x where x = 1 n i x i and ȳ = 1 n i y i. i x 2 i Association studies and regression Linear regression 21 / 104

28 Simplify these expressions to get Normal Equations y i = nβ 0 + β 1 i i x i x i y i = β 0 xi + β 1 i We have two equations and two unknowns! Do some algebra to get: i β 1 = (x i x)(y i ȳ) i (x i x) 2 and β 0 = ȳ β 1 x where x = 1 n i x i and ȳ = 1 n i y i. i x 2 i Association studies and regression Linear regression 21 / 104

29 Why is minimizing RSS sensible? Probabilistic interpretation Noisy observation model Y = β 0 + β 1 X + ɛ where ɛ N (0, σ 2 ) is a Gaussian random variable Likelihood of one training sample (x i, y i ) p(y i x i, (β 0, β 1, σ 2 )) = N (β 0 + β 1 x i, σ 2 ) = 1 [y 2πσ 2 e i (β 0 +β 1 x i )]2 2σ 2 Association studies and regression Linear regression 22 / 104

30 Why is minimizing RSS sensible? Probabilistic interpretation Noisy observation model Y = β 0 + β 1 X + ɛ where ɛ N (0, σ 2 ) is a Gaussian random variable Likelihood of one training sample (x i, y i ) p(y i x i, (β 0, β 1, σ 2 )) = N (β 0 + β 1 x i, σ 2 ) = 1 [y 2πσ 2 e i (β 0 +β 1 x i )]2 2σ 2 Association studies and regression Linear regression 22 / 104

31 Probabilistic interpretation Log-likelihood of the training data D (assuming i.i.d) LL(β 0, β 1, σ 2 ) log P (D (β 0, β 1, σ 2 )) Association studies and regression Linear regression 23 / 104

32 Probabilistic interpretation Log-likelihood of the training data D (assuming i.i.d) LL(β 0, β 1, σ 2 ) log P (D (β 0, β 1, σ 2 )) n = log p(y i x i, (β 0, β 1, σ 2 )) = i=1 i log p(y i x i, (β 0, β 1, σ 2 )) Association studies and regression Linear regression 23 / 104

33 Probabilistic interpretation Log-likelihood of the training data D (assuming i.i.d) LL(β 0, β 1, σ 2 ) log P (D (β 0, β 1, σ 2 )) n = log p(y i x i, (β 0, β 1, σ 2 )) = log p(y i x i, (β 0, β 1, σ 2 )) i=1 i = { [y i (β 0 + β 1 x i )] 2 2σ 2 log } 2πσ 2 i Association studies and regression Linear regression 23 / 104

34 Probabilistic interpretation Log-likelihood of the training data D (assuming i.i.d) LL(β 0, β 1, σ 2 ) log P (D (β 0, β 1, σ 2 )) n = log log p(y i x i, (β 0, β 1, σ 2 )) p(y i x i, (β 0, β 1, σ 2 )) = i=1 i = { [y i (β 0 + β 1 x i )] 2 2σ 2 log } 2πσ 2 i = 1 2σ 2 [y i (β 0 + β 1 x i )] 2 n 2 log σ2 n 2 log(2π) i Association studies and regression Linear regression 23 / 104

35 Probabilistic interpretation Log-likelihood of the training data D (assuming i.i.d) LL(β 0, β 1, σ 2 ) log P (D (β 0, β 1, σ 2 )) n = log log p(y i x i, (β 0, β 1, σ 2 )) p(y i x i, (β 0, β 1, σ 2 )) = i=1 i = { [y i (β 0 + β 1 x i )] 2 2σ 2 log } 2πσ 2 i = 1 2σ 2 [y i (β 0 + β 1 x i )] 2 n 2 log σ2 n 2 log(2π) i { } = σ 2 [y i (β 0 + β 1 x i )] 2 + n log σ 2 + const i Relationship between minimizing RSS and maximizing the log-likelihood? Association studies and regression Linear regression 23 / 104

36 Maximum likelihood estimation Estimating σ, β 0 and β 1 can be done in two steps Maximize over β 0 and β 1 arg max β0,β 1 LL(β 0, β 1, σ 2 ) arg min β0,β 1 [y i (β 0 + β 1 x i )] 2 That is RSS( β)! i Maximize over s = σ 2 (we could estimate σ directly) { } LL(β 0, β 1, s) = 1 1 s 2 s 2 [y i (β 0 + β 1 x i )] 2 + n 1 = 0 s σ 2 = s = 1 n i i [y i (β 0 + β 1 x i )] 2 = RSS( β) n Association studies and regression Linear regression 24 / 104

37 Maximum likelihood estimation Estimating σ, β 0 and β 1 can be done in two steps Maximize over β 0 and β 1 arg max β0,β 1 LL(β 0, β 1, σ 2 ) arg min β0,β 1 [y i (β 0 + β 1 x i )] 2 That is RSS( β)! i Maximize over s = σ 2 (we could estimate σ directly) { } LL(β 0, β 1, s) = 1 1 s 2 s 2 [y i (β 0 + β 1 x i )] 2 + n 1 = 0 s σ 2 = s = 1 n i i [y i (β 0 + β 1 x i )] 2 = RSS( β) n Association studies and regression Linear regression 24 / 104

38 Maximum likelihood estimation Estimating σ, β 0 and β 1 can be done in two steps Maximize over β 0 and β 1 arg max β0,β 1 LL(β 0, β 1, σ 2 ) arg min β0,β 1 [y i (β 0 + β 1 x i )] 2 That is RSS( β)! i Maximize over s = σ 2 (we could estimate σ directly) { } LL(β 0, β 1, s) = 1 1 s 2 s 2 [y i (β 0 + β 1 x i )] 2 + n 1 = 0 s σ 2 = s = 1 n i i [y i (β 0 + β 1 x i )] 2 = RSS( β) n Association studies and regression Linear regression 24 / 104

39 How does this probabilistic interpretation help us? It gives a solid footing to our intuition: minimizing RSS( β) is a sensible thing based on reasonable modeling assumptions Estimating σ tells us how much noise there could be in our predictions. For example, it allows us to place confidence intervals around our predictions. Association studies and regression Linear regression 25 / 104

40 Linear regression when x is m-dimensional Probabilistic model y i = β T x i + ɛ i iid N (0, σ 2 ) ɛ i where we have redefined some variables (by augmenting) x [1 x 1 x 2... x m ] T, The likelihood of the parameters θ = ( β, σ 2 ) n L(θ) = Pr(y i x i, θ) = i=1 β [β 0 β 1 β 2... β m ] T n 1 exp (y i β T ) 2 x i 2πσ 2 2σ 2 i=1 Choose parameters to maximize the likelihood. Association studies and regression Linear regression 26 / 104

41 Linear regression Maximum Likelihood Estimation ˆθ = arg max θ log L(θ) LL(θ) log L(θ) n = log Pr(y i x i, θ) i=1 = n (y i β T ) 2 x i 2σ 2 n 2 log(2πσ2 ) i=1 = 1 2σ 2 RSS( β) n 2 log(2πσ2 ) Maximizing the likelihood is equivalent to minimizing the RSS, i.e., least squares Association studies and regression Linear regression 27 / 104

42 Linear regression Computing the MLE LL σ 2 (ˆθ) = 0 βll(ˆθ) = 0 Association studies and regression Linear regression 28 / 104

43 Linear regression Computing the MLE LL σ 2 (ˆθ) = 0 ( σ 2 1 2σ 2 RSS( β) n ) 2 log(2πσ2 ) = 0 1 2(σ 2 ) 2 RSS( β) n 2σ 2 = 0 ˆσ 2 = RSS( β) n Association studies and regression Linear regression 29 / 104

44 Linear regression Computing the MLE βll(ˆθ) = 0 β ( 1 2σ 2 RSS( β) n ) 2 log(2πσ2 ) = 0 β RSS( β) = 0 Association studies and regression Linear regression 30 / 104

45 Linear regression RSS( β) in matrix form RSS( β) = i [y i (β 0 + j β j x i,j )] 2 = i [y i β T x i ] 2 which leads to RSS( β) = i (y i β T x i )(y i x T i β) { βt xi x T i β 2y i x T i β + const.} = i { ( = β T i ) ( ) } x i x T i β 2 y i x T i β + const. i Association studies and regression Linear regression 31 / 104

46 Linear regression RSS( β) in matrix form RSS( β) = i [y i (β 0 + j β j x i,j )] 2 = i [y i β T x i ] 2 which leads to RSS( β) = i (y i β T x i )(y i x T i β) { βt xi x T i β 2y i x T i β + const.} = i { ( = β T i ) ( ) } x i x T i β 2 y i x T i β + const. i Association studies and regression Linear regression 31 / 104

47 Linear regression RSS( β) in matrix form RSS( β) = i [y i (β 0 + j β j x i,j )] 2 = i [y i β T x i ] 2 which leads to RSS( β) = i (y i β T x i )(y i x T i β) { βt xi x T i β 2y i x T i β + const.} = i { ( = β T i ) ( ) } x i x T i β 2 y i x T i β + const. i Association studies and regression Linear regression 31 / 104

48 Linear regression RSS( β) in matrix form RSS( β) = i [y i (β 0 + j β j x i,j )] 2 = i [y i β T x i ] 2 which leads to RSS( β) = i (y i β T x i )(y i x T i β) { βt xi x T i β 2y i x T i β + const.} = i { ( = β T i ) ( ) } x i x T i β 2 y i x T i β + const. i Association studies and regression Linear regression 31 / 104

49 RSS( β) in new notations Design matrix and target vector X = Compact expression x T 1. x T n R n (m+1), y = y 1. y n RSS( β) = X β y 2 2 ( T ( ) = X β y) X β y ( = βt X T y T) ( X β ) y { = β T ( } X T X β 2 X y) T T β + const Association studies and regression Linear regression 32 / 104

50 RSS( β) in new notations Design matrix and target vector X = Compact expression x T 1. x T n R n (m+1), y = y 1. y n RSS( β) = X β y 2 2 ( T ( ) = X β y) X β y ( = βt X T y T) ( X β ) y { = β T ( } X T X β 2 X y) T T β + const Association studies and regression Linear regression 32 / 104

51 Solution in matrix form Compact expression RSS( β) = X β y 2 2 = Gradients of Linear and Quadratic Functions x b x = b x x Ax = 2Ax (symmetric A) Normal equation { β T ( } X T X β 2 X y) T T β + const βrss( β) 2 X T X β 2 X T y = 0 This leads to the ordinary least squares (OLS) solution ( ) β = X T 1 X X T y Association studies and regression Linear regression 33 / 104

52 Solution in matrix form Compact expression RSS( β) = X β y 2 2 = Gradients of Linear and Quadratic Functions x b x = b x x Ax = 2Ax (symmetric A) Normal equation { β T ( } X T X β 2 X y) T T β + const βrss( β) 2 X T X β 2 X T y = 0 This leads to the ordinary least squares (OLS) solution ( ) β = X T 1 X X T y Association studies and regression Linear regression 33 / 104

53 Solution in matrix form Compact expression RSS( β) = X β y 2 2 = Gradients of Linear and Quadratic Functions x b x = b x x Ax = 2Ax (symmetric A) Normal equation { β T ( } X T X β 2 X y) T T β + const βrss( β) 2 X T X β 2 X T y = 0 This leads to the ordinary least squares (OLS) solution ( ) β = X T 1 X X T y Association studies and regression Linear regression 33 / 104

54 Remember the σ 2 parameter σ 2 = RSS( β) n Association studies and regression Linear regression 34 / 104

55 Mini-Summary Linear regression is the linear combination of features f : x y, with f(x) = β 0 + j β jx j = β 0 + β T x If we minimize residual sum of squares as our learning objective, we get a closed-form solution of parameters Probabilistic interpretation: maximum likelihood if assuming residual is Gaussian distributed MLE β = ( ) X T 1 X X T y σ 2 = RSS( β) n Association studies and regression Linear regression 35 / 104

56 Linear regression Statistical properties of the Least squares estimator ] E [ β X ] Cov [ β X = β = σ 2 ( X T X) 1 Assumptions: Linear model is correct Exogeneous covariates. E [ ] ɛ X = 0 Uncorrelated, homoskedastic errors. Cov Does not assume normally distributed errors. [ ] ɛ X = σ 2 I n OLS is best linear unbiased estimator (Gauss-Markov theorem) i.e., has minimum variance among all linear unbiased estimators. Association studies and regression Linear regression 36 / 104

57 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X Association studies and regression Linear regression 37 / 104

58 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X Association studies and regression Linear regression 37 / 104

59 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X ( = X X) T 1 [( X T E X β ] + ɛ) X Association studies and regression Linear regression 37 / 104

60 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X ( = X X) T 1 [( X T E X β ] + ɛ) X ( = X X) ( T 1 [ X T E X β X ] [ ]) + E ɛ X Association studies and regression Linear regression 37 / 104

61 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X ( = X X) T 1 [( X T E X β ] + ɛ) X ( = X X) ( T 1 [ X T E X β X ] [ ]) + E ɛ X ( = X X) ( T 1 X T X β [ ]) + E ɛ X Association studies and regression Linear regression 37 / 104

62 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X ( = X X) T 1 [( X T E X β ] + ɛ) X ( = X X) ( T 1 [ X T E X β X ] [ ]) + E ɛ X ( = X X) ( T 1 X T X β [ ]) + E ɛ X = ( X T X) 1 X T X β Association studies and regression Linear regression 37 / 104

63 OLS is unbiased y = X β + ɛ [ E β X ] [ ( ] = E X X) T 1 X T y X ( = X X) T 1 [ ] X T E y X ( = X X) T 1 [( X T E X β ] + ɛ) X ( = X X) ( T 1 [ X T E X β X ] [ ]) + E ɛ X ( = X X) ( T 1 X T X β [ ]) + E ɛ X = ( X T X) 1 X T X β = β Association studies and regression Linear regression 37 / 104

64 Linear regression Statistical properties of the MLE Exercises: Use similar argument to show that ] Cov [ β X = σ 2 (X T X) 1 We can say more if we assume the errors are normally distributed β N ( β, σ 2 (X T X) 1 ) Association studies and regression Linear regression 38 / 104

65 Computational complexity Bottleneck of computing the solution? β = ( ) 1 XT X Xy Matrix multiply of X T X R (m+1) (m+1) Inverting the matrix X T X How many operations do we need? O(nm 2 ) for matrix multiplication O(m 3 ) (e.g., using Gauss-Jordan elimination) or O(m ) (recent theoretical advances) for matrix inversion Impractical for very large m or n Association studies and regression Linear regression 39 / 104

66 Computational complexity Bottleneck of computing the solution? β = ( ) 1 XT X Xy Matrix multiply of X T X R (m+1) (m+1) Inverting the matrix X T X How many operations do we need? O(nm 2 ) for matrix multiplication O(m 3 ) (e.g., using Gauss-Jordan elimination) or O(m ) (recent theoretical advances) for matrix inversion Impractical for very large m or n Association studies and regression Linear regression 39 / 104

67 Computational complexity Bottleneck of computing the solution? β = ( ) 1 XT X Xy Matrix multiply of X T X R (m+1) (m+1) Inverting the matrix X T X How many operations do we need? O(nm 2 ) for matrix multiplication O(m 3 ) (e.g., using Gauss-Jordan elimination) or O(m ) (recent theoretical advances) for matrix inversion Impractical for very large m or n Association studies and regression Linear regression 39 / 104

68 Alternative method: using numerical optimization Use gradient descent (more details when we talk about logistic regression next) Association studies and regression Linear regression 40 / 104

69 Mini-Summary Assuming the probabilistic model is correct, the MLE is unbiased. Computing the MLE can be done by solving normal equations or by numerical optimization. Association studies and regression Linear regression 41 / 104

70 Linear regression After estimation, what? Prediction Given a new input x, our best guess of y: y = ˆβ 0 + ˆβ T x Association studies and regression Linear regression 42 / 104

71 Linear regression After estimation, what? Hypothesis testing Test if β j = 0 (Wald test) Under H 0 : β j = 0 ˆβ j N (β j, σ 2 [ (X T X) 1] j,j ) ˆβ j N (0, σ 2 [ (X T X) 1] j,j ) ˆβ j ˆσ j N (0, 1) Here ˆσ j = ˆσ 2 [ (X T X) 1] j,j Association studies and regression Linear regression 42 / 104

72 Linear regression After estimation, what? (Generalized) Likelihood Ratio test. Remember lecture 2. ˆβ 0 : MLE with β j = 0 Λ L( ˆβ 0 ) L(ˆβ) [ 2 log Λ = 2 log L(ˆβ) log L( ˆβ ] 0 ) 2 log Λ χ 2 1 Association studies and regression Linear regression 42 / 104

73 Linear regression Model diagnostics Do the observed statistics match the model assumptions? Residuals e i = y i ˆβ T x i If the model assumptions hold, residuals must be independent and normally distributed with equal variance. How do we check this? Association studies and regression Linear regression 43 / 104

74 Linear regression Model diagnostics Are residuals normally distributed (or more generally, distributed according to a known distribution)? Q-Q plot Plot empirical quantiles vs theoretical quantiles. Close to diagnoal implies empirical distribution is close to theoretical distribution. Association studies and regression Linear regression 43 / 104

75 Linear regression Model diagnostics Q-Q plot Residuals Residuals norm quantiles norm quantiles Normally distributed errors Student-t distributed errors Association studies and regression Linear regression 43 / 104

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Topic 3: Hypothesis Testing

Topic 3: Hypothesis Testing CS 8850: Advanced Machine Learning Fall 07 Topic 3: Hypothesis Testing Instructor: Daniel L. Pimentel-Alarcón c Copyright 07 3. Introduction One of the simplest inference problems is that of deciding between

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Logistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48

Logistic Regression. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, / 48 Logistic Regression Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 25, 2017 1 / 48 Outline 1 Administration 2 Review of last lecture 3 Logistic regression

More information

Link lecture - Lagrange Multipliers

Link lecture - Lagrange Multipliers Link lecture - Lagrange Multipliers Lagrange multipliers provide a method for finding a stationary point of a function, say f(x, y) when the variables are subject to constraints, say of the form g(x, y)

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Linear Regression. Volker Tresp 2018

Linear Regression. Volker Tresp 2018 Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Two-Variable Regression Model: The Problem of Estimation

Two-Variable Regression Model: The Problem of Estimation Two-Variable Regression Model: The Problem of Estimation Introducing the Ordinary Least Squares Estimator Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Two-Variable

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

Introduction to Simple Linear Regression

Introduction to Simple Linear Regression Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Regression based methods, 1st part: Introduction (Sec.

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Econ 583 Final Exam Fall 2008

Econ 583 Final Exam Fall 2008 Econ 583 Final Exam Fall 2008 Eric Zivot December 11, 2008 Exam is due at 9:00 am in my office on Friday, December 12. 1 Maximum Likelihood Estimation and Asymptotic Theory Let X 1,...,X n be iid random

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Regression Estimation Least Squares and Maximum Likelihood

Regression Estimation Least Squares and Maximum Likelihood Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information