Guarding against Spurious Discoveries in High Dimension. Jianqing Fan

Size: px
Start display at page:

Download "Guarding against Spurious Discoveries in High Dimension. Jianqing Fan"

Transcription

1 in High Dimension Jianqing Fan Princeton University with Wen-Xin Zhou September 30, 2016

2 Outline 1 Introduction 2 Spurious correlation and random geometry 3 Goodness Of Spurious Fit (GOSF) 4 Asymptotic Distributions of GOSF 5 Bootstrap and LAMM Algorithm 6 Spurious Discoveries and Model Selection

3 Outline 1 Introduction 2 Spurious correlation and random geometry 3 Goodness Of Spurious Fit (GOSF) 4 Asymptotic Distributions of GOSF 5 Bootstrap and LAMM Algorithm 6 Spurious Discoveries and Model Selection

4 Introduction

5 Data revolution Enormous datasets are routinely collected Biological Sci.: Genomics, Genetics, Neuroscience, Medicine Natural Sci.: Astronomy, earth sciences, meterology. Engineering: Machine learning, surveil. videos, social media, networks Social Sci: Economics, finance, marketing, managements. Characterize many contem. scientific and decision problems.

6 Growth of Dimensionality and Data Dimen. grows rapidly w/ interactions: m. Synergy of Two Genes: colon cancer in Hanczar et al (2007). e.g., Y = I(X 1 + X 2 > 3) and Y X 1. white patients; black normal

7 False Discoveries Over last two decades, many exciting data mining, statistical machine learning techniques have been developed. Example: Lasso and SCAD are introduced to associate a set of covariates to a response from a large pool. Can our discoveries be spurious? Are our assumptions verifiable?

8 False Discoveries Over last two decades, many exciting data mining, statistical machine learning techniques have been developed. Example: Lasso and SCAD are introduced to associate a set of covariates to a response from a large pool. Can our discoveries be spurious? Are our assumptions verifiable?

9 False Discoveries Over last two decades, many exciting data mining, statistical machine learning techniques have been developed. Example: Lasso and SCAD are introduced to associate a set of covariates to a response from a large pool. Can our discoveries be spurious? Are our assumptions verifiable?

10 First Illustrative Example Data: Gene expressions of 90 Asians from HapMap project Response: Expressions of CHRNA6, cholinergic receptor, nicotinic, alpha 6 (554 SNPs within 1MB). Covariates: All other expressions (p = 47292) Lasso: Pick 25 genes! ĉorr(ŷl,y) = 0.90, ĉorr(ŷpl,y) = 0.92 Any better than chance? Reference distribution: n, p, ŝ, and X under null.

11 First Illustrative Example Data: Gene expressions of 90 Asians from HapMap project Response: Expressions of CHRNA6, cholinergic receptor, nicotinic, alpha 6 (554 SNPs within 1MB). Covariates: All other expressions (p = 47292) Lasso: Pick 25 genes! ĉorr(ŷl,y) = 0.90, ĉorr(ŷpl,y) = 0.92 Any better than chance? Reference distribution: n, p, ŝ, and X under null.

12 Any better than chance? Reference distribution: Max spurious correlation R n (s,p) = max ĉorr n (y,β T X), β 0 =s when X y 350 lambda = , s = lambda = , s = (a) 95%-tile Validating exogeneity via LASSO (b) 95%-tile Validating exogeneity via SCAD

13 Second Illustrative Example Data: Gene expression profiles in German Neuroblastoma Trials Response: 3-year event-free survival (binary outcome) Covariates: Gene expressions (p = 10707) Need a new way to assess correlation Measure of goodness of fit: Likelihood ratio (LR) statistic.

14 Second Illustrative Example Data: Gene expression profiles in German Neuroblastoma Trials Response: 3-year event-free survival (binary outcome) Covariates: Gene expressions (p = 10707) Need a new way to assess correlation Measure of goodness of fit: Likelihood ratio (LR) statistic.

15 Spurious Correlation and Random Geometry

16 Big dimension big spurious correlation Conformatory: Y,X N(0,1), corr(y,x) = 0.5. ĉorr(y,x) = based on 50 i.i.d. observations. Conclusion: Y and X are reasonably correlated. Selective: Y,X 1,...,X 1000 i.i.d. N(0,1). Based on n = 50, ĉorr(y,x 542 ) = max ĉorr(y,x j) = j 1000 Wrong conclusion: Y and X 542 are reasonably correlated. Need methods to guard spurious discoveries

17 Big dimension big spurious correlation Conformatory: Y,X N(0,1), corr(y,x) = 0.5. ĉorr(y,x) = based on 50 i.i.d. observations. Conclusion: Y and X are reasonably correlated. Selective: Y,X 1,...,X 1000 i.i.d. N(0,1). Based on n = 50, ĉorr(y,x 542 ) = max ĉorr(y,x j) = j 1000 Wrong conclusion: Y and X 542 are reasonably correlated. Need methods to guard spurious discoveries

18 Big Data Big Assumption Assumption: Y = X T S β S + ε EεX j = 0,j = 1,,p. Is the exogeneity assumption verifiable? N(0, 1 n ) ĉorr(x j, ε)

19 Relation with Random Geometry Throw p points uniformly at n-dim sphere Dist of min angle? Dist. of angles? Dist of pairwise angles? (Cai, Fan, Jiang, 13) Mutual Incoherence (Donoho and Huo, 01)

20 Relation with Random Geometry Throw p points uniformly at n-dim sphere Dist of min angle? Dist. of angles? Dist of pairwise angles? (Cai, Fan, Jiang, 13) Mutual Incoherence (Donoho and Huo, 01)

21 Relation with Random Geometry Throw p points uniformly at n-dim sphere Dist of min angle? Dist. of angles? Dist of pairwise angles? (Cai, Fan, Jiang, 13) Mutual Incoherence (Donoho and Huo, 01)

22 A more general statement Dist of min principal angle among ( p s) choices? R n (s,p) = max S: S =s ĉorr n (ε,β T S X S) Fan, Shao & Zhou (2015) show that nr n (s,p) 2 a Z 2 (1) + + Z 2 (s) What if it is an ellipse?

23 An Experiment: Dist. of Spurious Correlations Multiple spurious corr: R s = max β 0 =s corr(β T X,ε). Example: n = 50,p = 1000, s = 1,2,5,10 45 (a) s = 1 s = 2 s = 5 s = density density of σ n increment = order stat of χ 2 1 /n (Fan, Shao and Zhou, 15)

24 Asymptotic distribution of Rn (s,p) Theorem 1: Berry-Esseen bound (Fan, Shao and Zhou, 15) sup t 0 P{ n Rn (s,p) t} P{R (s,p) t} [s{log γ sp s logn}]7/8 n 1/8 R (s,p) = max Z T S Σ 1 SS Z S, S {1,...,p} S =s Z N p (0,Σ). Gaussian approx (Chernozhukov, Chetverikov, and Kato, 13, 14)

25 Goodness of Spurious Fit Beyond Linear Models

26 Maximum Spurious Correlation Data: i.i.d. sample {(Y i,x T i )} n i=1 from EY 2 = σ 2, EXX T = Σ. Max. Spurious Corr.: When Y and X are independent, R n (s,p) = ( ) max ĉorr n Y,β T X β R p : β 0 s represents the maximum spurious correlation. Asymptotics of Rn (s,p): Fan, Shao and Zhou (2015).

27 Goodness of spurious fit Normal model: 2{l(0) l( β S )} = n log(1 RS 2 ) are log-mlr of the null and fitted model. Quasi-likelihood: L n (β) from the sample {(Y i,x T i )} n i=1. Best subset: β(s) = argmin β Rp : β 0 s L n(β). Goodness Of Spurious Fit (GOSF): If Y X, LR n (s,p) = L n (0) L n ( β(s)).

28 Goodness of spurious fit Normal model: 2{l(0) l( β S )} = n log(1 RS 2 ) are log-mlr of the null and fitted model. Quasi-likelihood: L n (β) from the sample {(Y i,x T i )} n i=1. Best subset: β(s) = argmin β Rp : β 0 s L n(β). Goodness Of Spurious Fit (GOSF): If Y X, LR n (s,p) = L n (0) L n ( β(s)).

29 Examples Generalized linear models: L n (β) = n i=1 {b(xt i β) Y i X T i β}, where b is a model-dependent convex function. L 1 regression: L n (β) = n i=1 Y i X T i β. Support vector machine: L n (β) = n i=1 (Y i X T i β) +. AdaBoost: L n (β) = n i=1 exp( Y ix T i β).

30 Asymptotic Distributions of GOSF

31 Conditions for GLIM Cond 1: X is sub-gaussian, EXX T = Σ with diag(σ) = I p : A 0 := sup α S p 1 α T (Σ 1/2 X) ψ2 <. Cond 2: s-sparse cond. no. γ s = λmax (s) λ min (s) well-defined, α T Σα λ max (s) = max α 0 s α T α, λ α T Σα min(s) = min α 0 s α T α. Cond 3: Ee u(y µ Y )/σ Y e a 0u 2 /2, u R for some a 0 > 0, where µ Y = EY and σ 2 Y = var(y ). Cond 4: min u 1 b (u) a 1 and max u 1 b (u) A 1.

32 Conditions for GLIM Cond 1: X is sub-gaussian, EXX T = Σ with diag(σ) = I p : A 0 := sup α S p 1 α T (Σ 1/2 X) ψ2 <. Cond 2: s-sparse cond. no. γ s = λmax (s) λ min (s) well-defined, α T Σα λ max (s) = max α 0 s α T α, λ α T Σα min(s) = min α 0 s α T α. Cond 3: Ee u(y µ Y )/σ Y e a 0u 2 /2, u R for some a 0 > 0, where µ Y = EY and σ 2 Y = var(y ). Cond 4: min u 1 b (u) a 1 and max u 1 b (u) A 1.

33 Asymptotics of LR n (s,p) for GLIM R 0 (s,p) = max Σ 1/2 SS Z S 2 S {1,...,p} with Z N p (0,Σ). S =s = Z 2 (1) + + Z 2 (s) when Σ = I Theorem 2: (Berry-Esseen bound) Under Conditions 1 4, sup P{2LR n (s,p) t} P{R 2 0(s,p) t} t 0 {slog(γ spn)} 7/8 n 1/8 + γ1/2 s {slog(γ s pn)} 2 n 1/2. Wilks theorem: When s = p, R 0 (s,p) χ 2 p.

34 Asymptotics of LR n (s,p) for GLIM R 0 (s,p) = max Σ 1/2 SS Z S 2 S {1,...,p} with Z N p (0,Σ). S =s = Z 2 (1) + + Z 2 (s) when Σ = I Theorem 2: (Berry-Esseen bound) Under Conditions 1 4, sup P{2LR n (s,p) t} P{R 2 0(s,p) t} t 0 {slog(γ spn)} 7/8 n 1/8 + γ1/2 s {slog(γ s pn)} 2 n 1/2. Wilks theorem: When s = p, R 0 (s,p) χ 2 p.

35 Linear Median Regression Model and data: i.i.d. sample from (Y,X T ), Y = X T β + ε. Loss function: L n (β) = n i=1 Y i X T i β. l 0 -constrained LR: LR n (s,p) = n i=1 Y i min L n (β). β R p : β 0 s

36 Asymptotics of LR n (s,p) Cond 5: E ε κ < for κ > 1. a 2 < (E ε ) 1,A 2,A 3 2max{1 F ε (u),f ε ( u)} (1 + a 2 u) 1, for all u 0, max u f ε (u) A 2, max max{ f ε(u+), f ε(u ) } A 3, u 1 Theorem 3: Under Conditions 1, 2 & 5, sup P{2LR n (s,p) t} P{R 2 0(s,p) 2f ε (0)t} t 0 n 1 κ + {slog(γ spn)} 7/8 n 1/8 + γ1/4 s {slog(γ s pn)} 3/2 n 1/4.

37 Bootstrap and LAMM Algorithm

38 Multiplier bootstrap approximation Dist. of R 0 (s,p) depends on Σ. Bootstrap approximation: Generate Ẑ N p (0, Σ) via Ẑ = n 1/2 n i=1 e ix i, where e 1,...,e n i.i.d. N(0,1). Compute the bootstrap counterpart of R 0 (s,p): R n (s,p) = max Σ 1/2 SS ẐS 2. S {1,...,p} S =s Validity (Fan, Shao, Zhou, 15): If s log(γ s pn) = o(n 1/5 ), sup P{R 0 (s,p) t} P{R n (s,p) t X 1,...,X n } P 0. t 0

39 Multiplier bootstrap approximation Dist. of R 0 (s,p) depends on Σ. Bootstrap approximation: Generate Ẑ N p (0, Σ) via Ẑ = n 1/2 n i=1 e ix i, where e 1,...,e n i.i.d. N(0,1). Compute the bootstrap counterpart of R 0 (s,p): R n (s,p) = max Σ 1/2 SS ẐS 2. S {1,...,p} S =s Validity (Fan, Shao, Zhou, 15): If s log(γ s pn) = o(n 1/5 ), sup P{R 0 (s,p) t} P{R n (s,p) t X 1,...,X n } P 0. t 0

40 An LAMM Algorithm Goal: Solve min β 0 s f(β), f(β) = L n (β). MM: Get β (0) by forward selection. Compute β (k+1) = argmin β 0 s g(β β(k) ). g(β β (k) ) is a majorized function Non-increasing target value: LQA and LLA LQA LLA f(β (k+1) ) majorization g(β (k+1) β (k) ) minization g(β (k) β (k) ) initialization = f(β (k) ). (a)

41 An LAMM Algorithm Goal: Solve min β 0 s f(β), f(β) = L n (β). MM: Get β (0) by forward selection. Compute β (k+1) = argmin β 0 s g(β β(k) ). g(β β (k) ) is a majorized function Non-increasing target value: LQA and LLA LQA LLA f(β (k+1) ) majorization g(β (k+1) β (k) ) minization g(β (k) β (k) ) initialization = f(β (k) ). (a)

42 Implementation Majorize f(β) at β (k) by isotropic quadratic: g λ (β β (k) ) f( β (k) ) + f( β (k) ),β β (k) + λ 2 β β (k) 2 2. Valid when λ max β 2 f(β). Analytic solution: β (k+1) λ = argmin β 0 s g λ(β β (k) ) = { β(k) λ 1 f( β (k) ) } [1:s]. LAMM: Inflate λ unntil f( β (k+1) λ ) g λl ( β (k+1) l λ β (k) ) l

43 Implementation Majorize f(β) at β (k) by isotropic quadratic: g λ (β β (k) ) f( β (k) ) + f( β (k) ),β β (k) + λ 2 β β (k) 2 2. Valid when λ max β 2 f(β). Analytic solution: β (k+1) λ = argmin β 0 s g λ(β β (k) ) = { β(k) λ 1 f( β (k) ) } [1:s]. LAMM: Inflate λ unntil f( β (k+1) λ ) g λl ( β (k+1) l λ β (k) ) l

44 Spurious Discoveries and Model Selection

45 Spurious Discoveries? How large should a model be? Pick the models that are better than spurious fit: Check if 2 LR s > q 2 n,α(s,p) from null dist (MB). q n,α (s,p) provides a critical value and yardstick for judging whether the discovery is spurious.

46 An Empirical Illustration Data: 246 patients of the German Neuroblastoma Trials. Response: 3-year event-free survival, 56 positive. Covariates: Gene expressions over probe sites. Using 10-fold CV, Lasso selects ŝ cv = 40 probes.

47 Results: Selected models and spurious discoveries λ ŝ λ (2 LR λ ) 1/2 q n,0.1 (ŝ λ,p) q n,0.05 (ŝ λ,p) Mean CV Error Based 2000 bootstrap sample

48 Simulation Setup for Detecting Spurious Discoveries Logistic regression model p = 400,n = 120,160,200. β = (3, 1,3, 1,3,0,...,0) T R p. X 1,...,X n from N(0,Σ) w/ Σ = I p or Σ = (0.5 j k ) 1 j,k p. Repeat 200 times.

49 Simulation Results β cv : 5-fold cross-validated Lasso estimator; ŝ cv = β cv 0 : number of selected variables. For α (0,1), the rejection probability (power) is P{2 LR q 2 n,α(ŝ cv,p)}. α = 10%, q n,α (ŝ cv,p) is computed via 1000 bootstrap rep.

50 Rejection probability and median model size n = 120 n = 160 n = 200 Ind. Dep. Ind. Dep. Ind. Dep. Rej. Prob ŝ cv (13.43) (11.94) (13.81) (12.69) (14.18) (14.18) Lasso w/ CV gives far larger model size (s true = 5). Bias in Lasso forces CV to choose a smaller λ.

51 Applications to Model Selection Select among models that fit better than spurious disc. For each λ in Lasso path, compute LR λ and compare q n,α (s,p) s=ŝλ for α = 0.1. Starting from the largest λ, find λ fit the smallest λ satisfying 2 LR λ q 2 n,α(ŝ λ,p) and use the selected model.

52 Simulation Results: Continued (n,p) = (200,400), Σ = (0.5 j k ) 1 j,k p. Take α = 0.1 in q n,α (s,p). Repeat 200 times. Median of ŝ fit is 9 with RSD 1.87, power 100%.

53 Summary Define a generalized goodness of spurious fit, which extends maximum spurious correlation in linear models Develop the asymptotic distribution of such goodness of spurious fit for GLIM and L 1 regression Propose a simple and effective LAMM algorithm Guard against spurious discoveries Model selection from perspective of guarding against spurious discoveries

54 Summary Define a generalized goodness of spurious fit, which extends maximum spurious correlation in linear models Develop the asymptotic distribution of such goodness of spurious fit for GLIM and L 1 regression Propose a simple and effective LAMM algorithm Guard against spurious discoveries Model selection from perspective of guarding against spurious discoveries

55 The End Thank You Fan, J. and Zhou, W.-X. (2016). Guarding from Spurious Discoveries in High Dimension. Preprint.

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University) Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber

More information

Sparse Learning and Distributed PCA. Jianqing Fan

Sparse Learning and Distributed PCA. Jianqing Fan w/ control of statistical errors and computing resources Jianqing Fan Princeton University Coauthors Han Liu Qiang Sun Tong Zhang Dong Wang Kaizheng Wang Ziwei Zhu Outline Computational Resources and Statistical

More information

arxiv: v2 [math.st] 31 Mar 2015

arxiv: v2 [math.st] 31 Mar 2015 Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications arxiv:1502.04237v2 [math.st] 31 Mar 2015 Jianqing Fan 1,2, Qi-Man Shao 3, Wen-Xin Zhou 1,4 1 Princeton University,

More information

arxiv: v4 [math.st] 21 Jul 2017

arxiv: v4 [math.st] 21 Jul 2017 Submitted to the Annals of Statistics arxiv: arxiv:1502.04237 ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS arxiv:1502.04237v4 [math.st] 21 Jul 2017 By

More information

ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS

ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS The Annals of Statistics 2018, Vol. 46, No. 3, 989 1017 https://doi.org/10.1214/17-aos1575 Institute of Mathematical Statistics, 2018 ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

Large sample distribution for fully functional periodicity tests

Large sample distribution for fully functional periodicity tests Large sample distribution for fully functional periodicity tests Siegfried Hörmann Institute for Statistics Graz University of Technology Based on joint work with Piotr Kokoszka (Colorado State) and Gilles

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Construction of PoSI Statistics 1

Construction of PoSI Statistics 1 Construction of PoSI Statistics 1 Andreas Buja and Arun Kumar Kuchibhotla Department of Statistics University of Pennsylvania September 8, 2018 WHOA-PSI 2018 1 Joint work with "Larry s Group" at Wharton,

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

A contamination model for approximate stochastic order

A contamination model for approximate stochastic order A contamination model for approximate stochastic order Eustasio del Barrio Universidad de Valladolid. IMUVA. 3rd Workshop on Analysis, Geometry and Probability - Universität Ulm 28th September - 2dn October

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Logistic Regression with the Nonnegative Garrote

Logistic Regression with the Nonnegative Garrote Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs Ammon Washburn University of Arizona September 25, 2015 1 / 28 Introduction We will begin with basic Support Vector Machines (SVMs)

More information

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018 LARGE NUMBERS OF EXPLANATORY VARIABLES HS Battey Department of Mathematics, Imperial College London WHAO-PSI, St Louis, 9 September 2018 Regression, broadly defined Response variable Y i, eg, blood pressure,

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

SUPPLEMENTARY MATERIAL FOR: ADAPTIVE ROBUST VARIABLE SELECTION

SUPPLEMENTARY MATERIAL FOR: ADAPTIVE ROBUST VARIABLE SELECTION SUPPLEMENTARY MATERIAL FOR: ADAPTIVE ROBUST VARIABLE SELECTION By Jianqing Fan, Yingying Fan and Emre Barut Princeton University, University of Southern California and IBM T.J. Watson Research Center APPENDIX

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Robust Bayesian Variable Selection for Modeling Mean Medical Costs Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,

More information

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test) Chapter 861 Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test) Introduction Logistic regression expresses the relationship between a binary response variable and one or more

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

Independent and conditionally independent counterfactual distributions

Independent and conditionally independent counterfactual distributions Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

The Lady Tasting Tea. How to deal with multiple testing. Need to explore many models. More Predictive Modeling

The Lady Tasting Tea. How to deal with multiple testing. Need to explore many models. More Predictive Modeling The Lady Tasting Tea More Predictive Modeling R. A. Fisher & the Lady B. Muriel Bristol claimed she prefers tea added to milk rather than milk added to tea Fisher was skeptical that she could distinguish

More information

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou

Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics. Wen-Xin Zhou Cramér-Type Moderate Deviation Theorems for Two-Sample Studentized (Self-normalized) U-Statistics Wen-Xin Zhou Department of Mathematics and Statistics University of Melbourne Joint work with Prof. Qi-Man

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Uncertainty quantification in high-dimensional statistics

Uncertainty quantification in high-dimensional statistics Uncertainty quantification in high-dimensional statistics Peter Bühlmann ETH Zürich based on joint work with Sara van de Geer Nicolai Meinshausen Lukas Meier 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Maximum Likelihood (ML) Estimation

Maximum Likelihood (ML) Estimation Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health Personalized Treatment Selection Based on Randomized Clinical Trials Tianxi Cai Department of Biostatistics Harvard School of Public Health Outline Motivation A systematic approach to separating subpopulations

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

Hunting for significance with multiple testing

Hunting for significance with multiple testing Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Machine Learning: Evaluation

Machine Learning: Evaluation Machine Learning: Evaluation Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Wintersemester 2007 / 2008 Comparison of Algorithms Comparison of Algorithms Is algorithm A better

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press, Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Problem Set 7. Ideally, these would be the same observations left out when you

Problem Set 7. Ideally, these would be the same observations left out when you Business 4903 Instructor: Christian Hansen Problem Set 7. Use the data in MROZ.raw to answer this question. The data consist of 753 observations. Before answering any of parts a.-b., remove 253 observations

More information

Machine Learning Gaussian Naïve Bayes Big Picture

Machine Learning Gaussian Naïve Bayes Big Picture Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction

EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction Prof. Predrag R. Jelenković Time: Tuesday 4:10-6:40pm 1127 Seeley W. Mudd Building Dept. of Electrical Engineering

More information