Selecting explanatory variables with the modified version of Bayesian Information Criterion

Size: px
Start display at page:

Download "Selecting explanatory variables with the modified version of Bayesian Information Criterion"

Transcription

1 Selecting explanatory variables with the modified version of Bayesian Information Criterion Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland in cooperation with J.K.Ghosh, R.W.Doerge, R. Cheng Purdue University A. Baierl, F. Frommlet, A. Futschik Vienna University A. Chakrabarti - Indian Statistical Institute P. Biecek, A. Ochman, M. Żak Wrocław University of Technology Vienna, 24/07/2008

2 Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield)

3 Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Aim identify factors influencing Y

4 Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Aim identify factors influencing Y Properties of the data base number of potential factors, m, may be much larger than the number of cases, n

5 Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Aim identify factors influencing Y Properties of the data base number of potential factors, m, may be much larger than the number of cases, n Assumption of Sparsity - only a small proportion of potential explanatory variables influences Y

6 Specific application - Locating Quantitative Trait Loci

7 Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus

8 Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus X ij - dummy variable encoding the genotype of i-th individual at locus j

9 Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus X ij - dummy variable encoding the genotype of i-th individual at locus j X ij { 1/2, 1/2}

10 Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus X ij - dummy variable encoding the genotype of i-th individual at locus j X ij { 1/2, 1/2} Multiple regression model: Y i = β 0 + m β j X ij + ɛ i, (0.1) j=1 where i {1,..., n} and ɛ i N(0, σ 2 )

11 Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus X ij - dummy variable encoding the genotype of i-th individual at locus j X ij { 1/2, 1/2} Multiple regression model: Y i = β 0 + m β j X ij + ɛ i, (0.1) j=1 where i {1,..., n} and ɛ i N(0, σ 2 ) Problem : estimation of the number of influential genes

12 Bayesian Information Criterion (1) M i - i-th linear model with k i < n regressors

13 Bayesian Information Criterion (1) M i - i-th linear model with k i < n regressors θ i = (β 0, β 1,..., β ki, σ) - vector of model parameters

14 Bayesian Information Criterion (1) M i - i-th linear model with k i < n regressors θ i = (β 0, β 1,..., β ki, σ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) maximize BIC = log L(Y M i, ˆθ i ) 1 2 k i log n

15 Bayesian Information Criterion (1) M i - i-th linear model with k i < n regressors θ i = (β 0, β 1,..., β ki, σ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) maximize BIC = log L(Y M i, ˆθ i ) 1 2 k i log n If m is fixed, n and X X /n Q, where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1.

16 Bayesian Information Criterion (1) M i - i-th linear model with k i < n regressors θ i = (β 0, β 1,..., β ki, σ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) maximize BIC = log L(Y M i, ˆθ i ) 1 2 k i log n If m is fixed, n and X X /n Q, where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1. When n 8 BIC never chooses more regressors than AIC and is usually considered as one of the most restrictive model selection criteria.

17 Bayesian Information Criterion (1) M i - i-th linear model with k i < n regressors θ i = (β 0, β 1,..., β ki, σ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) maximize BIC = log L(Y M i, ˆθ i ) 1 2 k i log n If m is fixed, n and X X /n Q, where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1. When n 8 BIC never chooses more regressors than AIC and is usually considered as one of the most restrictive model selection criteria. Surprise? : - Broman and Speed (JRSS, 2002) report that BIC overestimates the number of regressors when applied to QTL mapping.

18 Explanation - Bayesian roots of BIC (1) f (θ i ) prior density of θ i, π(m i ) prior probability of M i

19 Explanation - Bayesian roots of BIC (1) f (θ i ) prior density of θ i, π(m i ) prior probability of M i m i (Y ) = L(Y M i, θ i )f (θ i )dθ i integrated likelihood of the data given the model M i

20 Explanation - Bayesian roots of BIC (1) f (θ i ) prior density of θ i, π(m i ) prior probability of M i m i (Y ) = L(Y M i, θ i )f (θ i )dθ i integrated likelihood of the data given the model M i posterior probability of M i : P(M i Y ) m i (Y )π(m i )

21 Explanation - Bayesian roots of BIC (1) f (θ i ) prior density of θ i, π(m i ) prior probability of M i m i (Y ) = L(Y M i, θ i )f (θ i )dθ i integrated likelihood of the data given the model M i posterior probability of M i : P(M i Y ) m i (Y )π(m i ) BIC neglects π(m i ) and uses approximation log m i (Y ) log L(Y M i, ˆθ i ) 1/2(k i + 2) log n + R i, R i is bounded in n.

22 Explanation - Bayesian roots of BIC (2) neglecting π(m i ) assuming all the models have the same prior probability

23 Explanation - Bayesian roots of BIC (2) neglecting π(m i ) assuming all the models have the same prior probability assigning a large prior probability to the event that the true model contains approximately m 2 regressors

24 Explanation - Bayesian roots of BIC (2) neglecting π(m i ) assuming all the models have the same prior probability assigning a large prior probability to the event that the true model contains approximately m 2 regressors ( ) 200 m=200, 200 models with one regressor, 2 = models ( ) 200 with two regressors, 100 = models with 100 regressors

25 Modified version of BIC, mbic (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993)

26 Modified version of BIC, mbic (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π(m i ) = p k i (1 p) m k i

27 Modified version of BIC, mbic (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π(m i ) = p k i (1 p) m k i ( ) 1 p log π(m i ) = m log(1 p) k i log p

28 Modified version of BIC, mbic (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π(m i ) = p k i (1 p) m k i ( ) 1 p log π(m i ) = m log(1 p) k i log p Modified version of BIC recommends choosing the model maximizing log L(Y M i, ˆθ i ) 1 ( ) 1 p 2 k i log n k i log p

29 mbic (2) c = mp - expected number of true regressors

30 mbic (2) c = mp - expected number of true regressors mbic = log L(Y M i, ˆθ i ) 1 ( m ) 2 k i log n k i log c 1

31 mbic (2) c = mp - expected number of true regressors mbic = log L(Y M i, ˆθ i ) 1 ( m ) 2 k i log n k i log c 1 Standard version of mbic uses c = 4 to control the overall type I error at the level below 10%

32 mbic (2) c = mp - expected number of true regressors mbic = log L(Y M i, ˆθ i ) 1 ( m ) 2 k i log n k i log c 1 Standard version of mbic uses c = 4 to control the overall type I error at the level below 10% A similar log m penalty appears also in RIC of Foster and George (1994)

33 Relationship to multiple testing (1) Orthogonal design: X T X = ni (m+1) (m+1), (1)

34 Relationship to multiple testing (1) Orthogonal design: X T X = ni (m+1) (m+1), (1) BIC chooses those X j s for which n ˆβ 2 j σ 2 > log n

35 Relationship to multiple testing (1) Orthogonal design: X T X = ni (m+1) (m+1), (1) BIC chooses those X j s for which n ˆβ 2 j σ 2 > log n Under H 0j : β j = 0, Z j = n ˆβ j σ N(0, 1)

36 Relationship to multiple testing (1) Orthogonal design: X T X = ni (m+1) (m+1), (1) BIC chooses those X j s for which n ˆβ 2 j σ 2 > log n Under H 0j : β j = 0, Z j = Since for c > 0, n ˆβ j σ N(0, 1) 1 Φ(c) = φ(c) c (1 + o c )

37 Relationship to multiple testing (1) Orthogonal design: X T X = ni (m+1) (m+1), (1) BIC chooses those X j s for which n ˆβ 2 j σ 2 > log n Under H 0j : β j = 0, Z j = n ˆβ j σ N(0, 1) Since for c > 0, 1 Φ(c) = φ(c) c (1 + o c ) It holds that for large values of n α n = 2P(Z j > 2 log n) πn log n.

38 Relationship to multiple testing (2) When n and m go to infinity and the number of true signals remains fixed, the expected number of false discoveries is of the m rate. n log n

39 Relationship to multiple testing (2) When n and m go to infinity and the number of true signals remains fixed, the expected number of false discoveries is of the m rate. n log n Corollary: BIC is not consistent when m n log n

40 . Bonferroni correction for multiple testing : α n,m = αn m

41 Bonferroni correction for multiple testing : α n,m = αn m probability of detecting at least one false positive : FWER α n

42 Bonferroni correction for multiple testing : α n,m = αn m probability of detecting at least one false positive : FWER α n 2(1 Φ( c Bon )) = αn m

43 Bonferroni correction for multiple testing : α n,m = αn m probability of detecting at least one false positive : FWER α n 2(1 Φ( c Bon )) = αn m ( ) m c Bon = 2 log (1 + o n,m ) = (log n + 2 log m)(1 + o n,m ) α n where o n,m converges to zero when n or m tends to infinity.

44 Bonferroni correction for multiple testing : α n,m = αn m probability of detecting at least one false positive : FWER α n 2(1 Φ( c Bon )) = αn m ( ) m c Bon = 2 log (1 + o n,m ) = (log n + 2 log m)(1 + o n,m ) α n where o n,m converges to zero when n or m tends to infinity. c mbic = log n + 2 log ( m c 1) log n + 2 log m 2 log c

45 Properties of mbic 1. FWER 2 c π n(log n+2 log m 2 log c)

46 Properties of mbic 1. FWER 2 c π n(log n+2 log m 2 log c) 2. The power of detecting the explanatory variable with β j 0 is given by ( 1 P nβj n( ˆβj β j ) c mbic < < ) nβj c mbic σ σ σ ( ) cmbic > 1 Φ nβj σ 1,

47 Properties of mbic 1. FWER 2 c π n(log n+2 log m 2 log c) 2. The power of detecting the explanatory variable with β j 0 is given by ( 1 P nβj n( ˆβj β j ) c mbic < < ) nβj c mbic σ σ σ ( ) cmbic > 1 Φ nβj σ 1, Corollary: Independently on the choice of c mbic is consistent

48 Properties of mbic 1. FWER 2 c π n(log n+2 log m 2 log c) 2. The power of detecting the explanatory variable with β j 0 is given by ( 1 P nβj n( ˆβj β j ) c mbic < < ) nβj c mbic σ σ σ ( ) cmbic > 1 Φ nβj σ 1, Corollary: Independently on the choice of c mbic is consistent The standard version of mbic uses c = 4 to control FWER at the level below 10%, when n 200.

49 Asymptotic optimality of mbic (1) γ 0 - cost of the false discovery, γ A - cost of missing the true signals

50 Asymptotic optimality of mbic (1) γ 0 - cost of the false discovery, γ A - cost of missing the true signals β j (1 p)δ 0 + pn(0, τ 2 )

51 Asymptotic optimality of mbic (1) γ 0 - cost of the false discovery, γ A - cost of missing the true signals β j (1 p)δ 0 + pn(0, τ 2 ) Expected value of the experiment cost: R = m(γ 0 t 1 (1 p) + γ A t 2 p), where t 1 and t 2 are type I and type II errors

52 Asymptotic optimality of mbic (1) γ 0 - cost of the false discovery, γ A - cost of missing the true signals β j (1 p)δ 0 + pn(0, τ 2 ) Expected value of the experiment cost: R = m(γ 0 t 1 (1 p) + γ A t 2 p), where t 1 and t 2 are type I and type II errors Optimal rule: Bayes oracle f A ( ˆβ j ) f 0 ( ˆβ j ) > (1 p)γ 0 pγ A, where f A ( ˆβ j ) N(0, τ 2 + σ2 n ) and f 0( ˆβ j ) N(0, σ2 n )

53 Asymptotic optimality of mbic (2) Bayes oracle n ˆβ j 2 σ 2 > σ2 + nτ 2 nτ 2 [ log ( nτ 2 + σ 2 σ 2 ) + 2 log ( 1 p p ) + 2 log ( γ0 γ A )].

54 Asymptotic optimality of mbic (2) Bayes oracle n ˆβ j 2 σ 2 > σ2 + nτ 2 nτ 2 [ log ( nτ 2 + σ 2 σ 2 ) + 2 log ( 1 p p ) + 2 log Asymptotic Optimality: the model selection rule V is asymptotically optimal if lim n,m R V R BO = 1. ( γ0 γ A )].

55 Asymptotic optimality of mbic (2) Bayes oracle n ˆβ j 2 σ 2 > σ2 + nτ 2 nτ 2 [ log ( nτ 2 + σ 2 σ 2 ) + 2 log ( 1 p p ) + 2 log Asymptotic Optimality: the model selection rule V is asymptotically optimal if lim n,m R V R BO = 1. Theorem 1 (Bogdan, Chakrabarti, Ghosh, 2008). Under orthogonal design (1) mbic is asymptotically optimal when lim m mp = s, where s R. ( γ0 γ A )]

56 Asymptotic optimality of mbic (2) Bayes oracle n ˆβ j 2 σ 2 > σ2 + nτ 2 nτ 2 [ log ( nτ 2 + σ 2 σ 2 ) + 2 log ( 1 p p ) + 2 log Asymptotic Optimality: the model selection rule V is asymptotically optimal if lim n,m R V R BO = 1. Theorem 1 (Bogdan, Chakrabarti, Ghosh, 2008). Under orthogonal design (1) mbic is asymptotically optimal when lim m mp = s, where s R. ( γ0 Conjecture (Frommlet, Bogdan, 2008). Theorem 1 holds also when β j (1 p)δ 0 + pf A, where F A has a positive density at 0. γ A )]

57 Computer simulations(1) Setting : n = 200, m = 300, entries of X N(0, σ = 0.5), k Binomial(m, p), with p = 1 30 (mp = 10), β i N(0, σ = 1.5), ε N(0, 1) and Tukey s gross error model: ε Tukey(0.95, 100, 1) = 0.95 N(0, 1) N(0, 10).

58 Computer simulations(1) Setting : n = 200, m = 300, entries of X N(0, σ = 0.5), k Binomial(m, p), with p = 1 30 (mp = 10), β i N(0, σ = 1.5), ε N(0, 1) and Tukey s gross error model: ε Tukey(0.95, 100, 1) = 0.95 N(0, 1) N(0, 10). Characteristics : Power, FDR = FP AP, MR = FP + FN, l 2 = m j=1 (β j ˆβ j ) 2 mean value of the absolute prediction error based on 50 additional observations, d

59 Computer simulations Table: Results for 1000 replications. noise N(0,1) Tukey(0.95, 100, 1) citerion BIC mbic rbic BIC mbic rbic FP FN Power FDR MR l d E ε 1 0.8, E ε

60 Applications for QTL mapping Y i = µ + j I β j X ij + γ uv X iu X iv + ε i, (u,v) U I - a certain subset of the set N = {1,..., m}, U - a certain subset of N N

61 Applications for QTL mapping Y i = µ + j I β j X ij + (u,v) U I - a certain subset of the set N = {1,..., m}, U - a certain subset of N N Standard version of mbic - minimize γ uv X iu X iv + ε i, n log(rss)+(p+r) log(n)+2p log(m/2.2 1)+2r log(n e /2.2 1) p - number of main effects, r - number of interactions, N e = m(m 1)/2

62 Further applications for QTL mapping 1. Extending to more complicated genetic scenarios + iterative version of mbic : Baierl, Bogdan, Frommlet, Futschik Genetics, Robust versions based on M-estimates: Baierl, Futschik, Bogdan, Biecek CSDA, Rank version: Żak, Baierl, Bogdan, Futschik Genetics, Taking into account the correlations between neighboring markers: Bogdan, Frommlet, Biecek, Cheng, Ghosh, Doerge, Biometrics, 2008

63 Real Data Analysis (1) Huttunen et al (2004) - data on the variation in male courtship song characters in Drosophila virilis.

64 Real Data Analysis (2) Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived oscillations) of low frequency.

65 Real Data Analysis (2) Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived oscillations) of low frequency. Quantitative trait PN - number of pulses in a pulse train.

66 Real Data Analysis (2) Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived oscillations) of low frequency. Quantitative trait PN - number of pulses in a pulse train. Data - 24 markers on three chromosomes, n=520 males Huttunen et al (2004) used single marker analysis and composite interval mapping. They found one QTL on chromosome 2, five QTL on chromosome 3 (not sure if there are only 2) and another QTL on chromosome 4.

67 Real Data Analysis (2) Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived oscillations) of low frequency. Quantitative trait PN - number of pulses in a pulse train. Data - 24 markers on three chromosomes, n=520 males Huttunen et al (2004) used single marker analysis and composite interval mapping. They found one QTL on chromosome 2, five QTL on chromosome 3 (not sure if there are only 2) and another QTL on chromosome 4. We use mbic supplied with Haley and Knott regression. We impute the genotypes inside intermarker intervals so the distance between tested positions does not exceed 10 cm. We penalize these imputed locations as real markers. In the results m = 59 and N e = 1711.

68 Real Data Analysis (3)

69 Real Data Analysis (4) Zeng et al. (2000) data on the morphological differences between two species of Drosophila, Drosophila simulans and Drosophila mauritana

70 Real Data Analysis (4) Zeng et al. (2000) data on the morphological differences between two species of Drosophila, Drosophila simulans and Drosophila mauritana Trait - the size and the shape of the posterior lobe of the male genital arch, quantified by a morphometric descriptor.

71 Real Data Analysis (4) Zeng et al. (2000) data on the morphological differences between two species of Drosophila, Drosophila simulans and Drosophila mauritana Trait - the size and the shape of the posterior lobe of the male genital arch, quantified by a morphometric descriptor. n 1 = 471, n 2 = 491, m = 193, genotypes at neighboring positions are closely correlated, N e = 18, 528

72 Real Data Analysis, BM Forward mbic Zeng Forward mbic Zeng Forward mbic Zeng

73 Real Data Analysis, BS Forward mbic Zeng Forward mbic Zeng Forward mbic Zeng

74 Further work 1. Relaxing the penalty so as to control FDR instead of FWER, expected optimality for a wider range of values of p - with F. Frommlet, J. K. Ghosh, A. Chakrabarti and M. Murawska. 2. Application for association mapping - with F. Frommlet and M. Murawska. 3. Application for GLM and Zero Inflated Generalized Poisson Regression, with M.Zak, C. Czado, V. Earhardt. 4. Application for model selection in logic regression and comparison with Bayesian Regression Trees - with M. Malina, K. Ickstadt, H. Schwender.

75 References 1. Baierl, A., Bogdan, M., Frommlet, F., Futschik, A., On Locating multiple interacting quantitative trait loci in intercross designs. Genetics 173, Baierl, A., Futschik, A.,Bogdan, M.,Biecek, P., Locating multiple interacting quantitative trait loci using robust model selection, Computational Statistics and Data Analysis 51, Bogdan, M., Ghosh, J.K., Doerge, R.W., Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitative trait loci. Genetics 167, Bogdan, M., Frommlet, F., Biecek, P., Cheng, R., Ghosh, J. K., Doerge R. W Extending the Modified Bayesian Information Criterion (mbic) to dense markers and multiple interval mapping. Biometrics, doi: /j x. 5. Broman, K.W., Speed, T.P., A model selection approach for the identification of quantitative trait loci in experimental crosses. J. Roy. Stat. Soc. B 64, George, E.I., McCulloch, R.E., Variable Selection Via Gibbs Sampling. J. Amer. Statist. Assoc. 88 : Żak, M., Baierl, A., Bogdan, M., Futschik, A., Locating multiple interacting quantitative trait loci using rank-based model selection. Genetics 176,

Overview. Background

Overview. Background Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

Locating multiple interacting quantitative trait. loci using rank-based model selection

Locating multiple interacting quantitative trait. loci using rank-based model selection Genetics: Published Articles Ahead of Print, published on May 16, 2007 as 10.1534/genetics.106.068031 Locating multiple interacting quantitative trait loci using rank-based model selection, 1 Ma lgorzata

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org R/qtl workshop (part 2) Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Example Sugiyama et al. Genomics 71:70-77, 2001 250 male

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

A novel algorithmic approach to Bayesian Logic Regression

A novel algorithmic approach to Bayesian Logic Regression A novel algorithmic approach to Bayesian Logic Regression Hubin A.A.,Storvik G.O., Frommlet F. Department of Mathematics, University of Oslo aliaksah@math.uio.no, geirs@math.uio.no and Department of Medical

More information

QTL Mapping I: Overview and using Inbred Lines

QTL Mapping I: Overview and using Inbred Lines QTL Mapping I: Overview and using Inbred Lines Key idea: Looking for marker-trait associations in collections of relatives If (say) the mean trait value for marker genotype MM is statisically different

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

QTL Model Search. Brian S. Yandell, UW-Madison January 2017 QTL Model Search Brian S. Yandell, UW-Madison January 2017 evolution of QTL models original ideas focused on rare & costly markers models & methods refined as technology advanced single marker regression

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

Use of hidden Markov models for QTL mapping

Use of hidden Markov models for QTL mapping Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

QTL model selection: key players

QTL model selection: key players QTL Model Selection. Bayesian strategy. Markov chain sampling 3. sampling genetic architectures 4. criteria for model selection Model Selection Seattle SISG: Yandell 0 QTL model selection: key players

More information

Causal Model Selection Hypothesis Tests in Systems Genetics

Causal Model Selection Hypothesis Tests in Systems Genetics 1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;

More information

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines Lecture 8 QTL Mapping 1: Overview and Using Inbred Lines Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. Notes from a short course taught Jan-Feb 2012 at University of Uppsala While the machinery

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Hierarchical Generalized Linear Models for Multiple QTL Mapping Genetics: Published Articles Ahead of Print, published on January 1, 009 as 10.1534/genetics.108.099556 Hierarchical Generalized Linear Models for Multiple QTL Mapping Nengun Yi 1,* and Samprit Baneree

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

A Frequentist Assessment of Bayesian Inclusion Probabilities

A Frequentist Assessment of Bayesian Inclusion Probabilities A Frequentist Assessment of Bayesian Inclusion Probabilities Department of Statistical Sciences and Operations Research October 13, 2008 Outline 1 Quantitative Traits Genetic Map The model 2 Bayesian Model

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 14: SEMI-SUPERVISED LEARNING. SMALL n LARGE p PROBLEMS Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 The project is co-financed

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4

More information

Variable Selection and Computation of the Prior Probability of a Model via ROC Curves Methodology

Variable Selection and Computation of the Prior Probability of a Model via ROC Curves Methodology Journal of Data Science 10(2012), 653-672 Variable Selection and Computation of the Prior Probability of a Model via ROC Curves Methodology Christos Koukouvinos and Christina Parpoula National Technical

More information

Modeling different dependence structures involving count data with applications to insurance, economics and genetics

Modeling different dependence structures involving count data with applications to insurance, economics and genetics Technische Universität München Zentrum Mathematik Lehrstuhl für Mathematische Statistik Modeling different dependence structures involving count data with applications to insurance, economics and genetics

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Selecting an Orthogonal or Nonorthogonal Two-Level Design for Screening

Selecting an Orthogonal or Nonorthogonal Two-Level Design for Screening Selecting an Orthogonal or Nonorthogonal Two-Level Design for Screening David J. Edwards 1 (with Robert W. Mee 2 and Eric D. Schoen 3 ) 1 Virginia Commonwealth University, Richmond, VA 2 University of

More information

Mapping QTL to a phylogenetic tree

Mapping QTL to a phylogenetic tree Mapping QTL to a phylogenetic tree Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman Human vs mouse www.daviddeen.com 3 Intercross

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI By DÁMARIS SANTANA MORANT A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen A Statistical Framework for Expression Trait Loci (ETL) Mapping Meng Chen Prelim Paper in partial fulfillment of the requirements for the Ph.D. program in the Department of Statistics University of Wisconsin-Madison

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations, Copyright 2004 by the Genetics Society of America DOI: 10.1534/genetics.104.031427 An Efficient Resampling Method for Assessing Genome-Wide Statistical Significance in Mapping Quantitative Trait Loci Fei

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

MOST complex traits are influenced by interacting

MOST complex traits are influenced by interacting Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.099556 Hierarchical Generalized Linear Models for Multiple Quantitative Trait Locus Mapping Nengjun Yi*,1 and Samprit Banerjee

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure

Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure Sonderforschungsbereich 386, Paper 478 (2006) Online unter: http://epub.ub.uni-muenchen.de/

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

what is Bayes theorem? posterior = likelihood * prior / C

what is Bayes theorem? posterior = likelihood * prior / C who was Bayes? Reverend Thomas Bayes (70-76) part-time mathematician buried in Bunhill Cemetary, Moongate, London famous paper in 763 Phil Trans Roy Soc London was Bayes the first with this idea? (Laplace?)

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Model Selection for Multiple QTL

Model Selection for Multiple QTL Model Selection for Multiple TL 1. reality of multiple TL 3-8. selecting a class of TL models 9-15 3. comparing TL models 16-4 TL model selection criteria issues of detecting epistasis 4. simulations and

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Minimum Description Length (MDL)

Minimum Description Length (MDL) Minimum Description Length (MDL) Lyle Ungar AIC Akaike Information Criterion BIC Bayesian Information Criterion RIC Risk Inflation Criterion MDL u Sender and receiver both know X u Want to send y using

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

A Robust Approach to Regularized Discriminant Analysis

A Robust Approach to Regularized Discriminant Analysis A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,

More information

Computational statistics

Computational statistics Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation

Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation Abstract This chapter discusses the Naïve Bayes model strictly in the context of word sense disambiguation. The theoretical model

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Generalized Linear Models and Its Asymptotic Properties

Generalized Linear Models and Its Asymptotic Properties for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for

More information

Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping

Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping Huang et al. BMC Genetics 2013, 14:5 METHODOLOGY ARTICLE Open Access Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping Anhui Huang 1, Shizhong Xu 2 and Xiaodong Cai 1*

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 7. Models with binary response II GLM (Spring, 2018) Lecture 7 1 / 13 Existence of estimates Lemma (Claudia Czado, München, 2004) The log-likelihood ln L(β) in logistic

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

Inferring Genetic Architecture of Complex Biological Processes

Inferring Genetic Architecture of Complex Biological Processes Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison http://www.stat.wisc.edu/~yandell/statgen

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y.

A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. June 2nd 2011 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented using Bayes rule p(m y) = p(y m)p(m)

More information

Model selection in logistic regression using p-values and greedy search

Model selection in logistic regression using p-values and greedy search odel selection in logistic regression using p-values and greedy search Jan ielniczuk 1, and Pawe l Teisseyre 1 1 Institute of Computer Science, Polish Academy of Sciences, Ordona 1, 01 37 Warsaw, Poland

More information

Model Choice Lecture 2

Model Choice Lecture 2 Model Choice Lecture 2 Brian Ripley http://www.stats.ox.ac.uk/ ripley/modelchoice/ Bayesian approaches Note the plural I think Bayesians are rarely Bayesian in their model selection, and Geisser s quote

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Bayesian Assessment of Hypotheses and Models

Bayesian Assessment of Hypotheses and Models 8 Bayesian Assessment of Hypotheses and Models This is page 399 Printer: Opaque this 8. Introduction The three preceding chapters gave an overview of how Bayesian probability models are constructed. Once

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information