A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data

Size: px
Start display at page:

Download "A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data"

Transcription

1 A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data Min-ge Xie Department of Statistics & Biostatistics Rutgers, The State University of New Jersey In collaboration with Xuying Chen, Ying Hung and Chengrui Li SAMSI Workshop on Distributed and Parallel Data Analysis (DPDA) Raleigh, North Carolina; September 2016 Research supported in part by grants from NSF, DHS and IBM

2 Introduction - era of data science/big data Today, the integration of computer technology into science and daily life has enabled the collection of big data across all fields Extremely large data... big n, big p, big t and growing Impose difficulties/challenges in analysis of extremely large data Memory/storage issues: too large to fit into a single computer or a single site. Computing issues: too expensive to perform any computationally intensive analysis Statistical issues: heterogeneity (in designs, information, & more), sparsity, non-conventional (e.g.,image/voice/text/network) data, missing data, random/stochastic components, etc. But also lead to opportunities in many fields, including statistics!

3 Introduction - split-conquer-combine approach Split-conquer-combine (SCC) method is natural, effective, generally applicable,... Recent publications (partial list): Battey et al. (2015), Chen & Xie (2012/14), Dai et al. (2015), Huang & Huo (2015), Song & Liang (2015), Wang & Dunson (2013), Zhang et al (2013), Zhao et al. (2014),... Versus parallel computing in CS: Emphasize on the statistical performance/properties and seek to get best possible outcomes.

4 Introduction - our first SCC project Port-of-entry inspection project (DHS/DIMACS/CCICADA) Background: Standardized shipping containers transport 95% of the U.S. imports by tonnage; they are highly vulnerable for smuggling nuclear/radiological and other illegal materials. Inspection goal: Find shipments of high risk and identify important influencers (variables). Approach: Analyze historical data and search for patterns Problem: A huge amount of data (available data: two week shipments; big n, big p) (Chen 2012, Chen and Xie 2012/14)

5 Introduction - our first SCC project Setting: penalized regression/variable selection in GLMs - Feasibility: The result obtained using the SCC method is the same as/asymptotically equivalent to the result of analyzing the entire data all at once Computational efficiency: The SCC method can substantially reduce computing time and computer memory requirement, if a computationally intensive algorithm is used Better error bounds: As a byproduct, the SCC method is more Mean computing time (seconds) ± 2SD All data K=2 K= n:number of observations resistant to false model selections caused by spurious correlations (Chen 2012, Chen and Xie 2012/14)

6 New (second) project - motivating example of spatial data IBM data center thermal management project A data center is an integrated facility housing multiple-unit servers, providing application services or management for data processing. Goal: Design a data center with an efficient heat removal mechanism. Computational Fluid Dynamics (CFD) simulation (n = 26820; p = 9) Figure : Heat map for IBM T. J. Watson Data Center

7 Gaussian process model Gaussian process (GP) model: y = xβ + Z (x), y: n 1 vector of observations (e.g., room temperatures) x: n p design matrix β: p 1 unknown parameters Z (x) is a GP process with mean 0 and covariance σ 2 Σ(θ) Σ(θ): n-by-n correlation matrix with correlation parameters θ Correlation matrix: The ijth element of Σ(θ) is defined by a power exponential function p corr(z (x i ), Z (x j )) = exp( θ T x i x j ) = exp( θ k x ik x jk ). k=1 Remark: Assume σ 2 is known for simplicity in this talk.

8 Estimation and prediction Likelihood inference l(β, θ, σ) = 1 2σ 2 (y Xβ) Σ 1 (θ)(y Xβ) 1 2 log Σ(θ) n 2 log(σ2 ) So, β θ = argmax β {l(β θ, σ 2 )} = (X Σ 1 (θ)x) 1 X Σ 1 (θ)y θ β, σ 2 = argmax θ {l(θ β, σ 2 )} GP prediction, say y 0, at a new point x 0, given parameters (β, θ), follows a normal distribution with mean p 0 (β, θ) and variance m 0 (β, θ), where, p 0 (β, θ) = x 0 β + γ(θ) Σ 1 (θ)(y Xβ) m 0 (β, θ) = σ 2 (1 γ(θ) Σ 1 (θ)γ(θ)), and γ(θ) is a n 1 vector of ith element equals to φ( x i x 0 ; θ).

9 Two challenges in GP modeling Computational issue: Estimation and prediction involve Σ 1 and Σ with order of O(n 3 ): Not feasible when n is large Uncertainty quantification of GP predictor Plug-in predictive distribution is widely used (i.e., unknown parameters in the formula replaced by their point estimates) It underestimates the uncertainty (cf. Santner et al 2003)

10 Existing methods/efforts For the computational issue: Change the model to one that is computationally convenient: Rue and Held (2005), Cressie and Johannesson (2008). Approximate the likelihood function: Stein et al. (2004), Furrer et al. (2006), Fuentes (2007), Kaufman et al. (2008). Not focusing on uncertainty quantification/bring in additional uncertainty Still not feasible when n is super large For uncertainty quantification of GP prediction Bayesian predictive distribution: Kennedy and O Hagan (2001) Bootstrap approach: Sjostedt-de Luna (2003), Santner et al. (2003) Intensive computationally

11 Our solution Can we solve both problems by an unified framework? Yes, we develop a sequential split-conquer-combine method based on our understating from the previously developed split-conquer-combine method (for independent data) & confidence distribution developments

12 Introduction to confidence distribution (CD) Statistical inference (Parameter estimation): Point estimate Interval estimate Distribution estimate (e.g., confidence distribution) Example: X 1,..., X n i.i.d. follows N(µ, 1) Point estimate: x n = 1 n n i=1 x i Interval estimate: ( x n 1.96/ n, x n / n) Distribution estimate: N( x n, 1 ) n

13 Introduction to confidence distribution (CD) Statistical inference (Parameter estimation): Point estimate Interval estimate Distribution estimate (e.g., confidence distribution) Example: X 1,..., X n i.i.d. follows N(µ, 1) Point estimate: x n = 1 n n i=1 x i Interval estimate: ( x n 1.96/ n, x n / n) Distribution estimate: N( x n, 1 ) n The idea of the CD approach is to use a sample-dependent distribution (or density) function to estimate the parameter of interest. o It is to provide simple and interpretable summaries of what can reasonably be learned from data (Cox 2013) and also meaningful answers for all inference questions. (Xie and Singh 2013)

14 Inference using CD: Point estimators, Intervals, p-values & more b

15 Inference using CD: Point estimators, Intervals, p-values & more b Statistical Dream: Bayesian, Fiducial and Frequestist = BFF = Best Friends Forever

16 Efficient estimation using CD and Split-Conquer-Combine CD is useful in combining information (meta-analysis, fusion learning,...) Chen and Xie (2012/14) considered a computationally efficient approach using Split-Conquer-Combine and CD. Entire Dataset Split and Conquer Combine D1 D2 ˆ 1 ˆ 2 Dm ˆm ˆc Result is asymptotically equivalent to the one using the entire dataset But, if SCC is directly applied to dependent data, it will loss information Question: How to extend SCC to big data with dependent structures, such as GP?

17 Sequential Split-Conquer-Combine for GP Our solution: Figure : Sequential Split-Conquer-Combine Approach

18 Ingredients Divide the entire dataset into subsets (correlated) Perform a transformation sequentially to create independent subsets Combine estimators using CD Quantify prediction uncertainty using CD Assumption (weak): compact support correlation for 1-dim (i.e., Σ to Σ t) Tapering needed only on one variable (say X 1 ) Σ 11 Σ 12 O O. Σ 21 Σ.. 22 O Σ t = O.. Σ(m 1)(m 1) Σ (m 1)m O O Σ m(m 1) Σ mm (after index sorting according to X 1 values) n n,

19 Sequentially update data Divide the entire dataset into subsets y = {y a }, a = 1,..., m. Transform y to y by sequentially updating: y a = y a L a(a 1)y a 1, where L (a+1)a = Σ t(a+1)a Da 1, D a = Σ aa L a(a 1) D (a 1) L a(a 1). y a are independent with covariance Da; Also, D 1 D 2 D =... D m I, L =. L 21 I.... L m1 L m2 I and Σ t = LDL is the compact support correlation function. Sequential updates are computationally efficient

20 Estimation from each subset (Assume θ is fixed/known) MLE of the ath subset: β a = argmax β l a(β) = (C a D 1 a C a) 1 C a D 1 Significant computational reduction because D a is much smaller than the original covariance matrix. a y a

21 Estimation from each subset (Assume θ is fixed/known) MLE of the ath subset: β a = argmax β l a(β) = (C a D 1 a C a) 1 C a D 1 Significant computational reduction because D a is much smaller than the original covariance matrix. An individual CD (confidence distribution) for the ath updated subset is (cf., Xie and Singh 2013): H a(β) = Φ ( S 1/2 a (β β ) a) where S a = Cov( β a) and Φ is the CDF of p-dimensional multivariate standard normal. a y a

22 CD combining Following Singh, Xie and Strwderman (2005), Liu, Liu and Xie (2014) and Yang et al. (2014), a combined CD H c(β) is: ( H c(β) = Φ S 1/2 SSCC (β β ) SSCC ) where β SSCC = ( W a) 1 ( W a βa) with W a = (Ca Da 1 C a) 1 and S SSCC = Cov( β c). Similar framework can be applied to all the parameters (β, θ). Iteratively loop between β θ and θ β

23 CD combining Following Singh, Xie and Strwderman (2005), Liu, Liu and Xie (2014) and Yang et al. (2014), a combined CD H c(β) is: ( H c(β) = Φ S 1/2 SSCC (β β ) SSCC ) where β SSCC = ( W a) 1 ( W a βa) with W a = (Ca Da 1 C a) 1 and S SSCC = Cov( β c). Similar framework can be applied to all the parameters (β, θ). Iteratively loop between β θ and θ β Theorem 1 Under some regularity assumptions, when the number of blocks m = o(n 1/2 ) and n, the SSCC estimator λ c = ( β sscc, θ sscc) is asymptotically as efficient as MLE λ mle = ( β mle, θ mle ).

24 GP predictive distribution GP prediction y 0 at a new point x 0, given parameters (β, θ), follows a normal distribution with mean p 0 (β, θ) and variance m 0 (β, θ), where, p 0 (β, θ) = x 0 β + γ(θ) Σ 1 (θ)(y Xβ) m 0 (β, θ) = σ 2 (1 γ(θ) Σ 1 (θ)γ(θ)), and γ(θ) is a n 1 vector of ith element equals to φ( x i x 0 ; θ). Drawbacks: Computational issue again! Conventional plug-in predictive distribution underestimate the uncertainty

25 Quantify GP prediction uncertainty using SSCC and CD An easy-to-compute GP prediction: p 1 (β, θ) =x 0 β + m 1 (β, θ) =σ 2 (1 m a=1 m a=1 γ a (θ) D 1 a (θ)y a + m γa (θ) Da 1 (θ)γa (θ)) a=1 γa (θ) Da 1 (θ)c a(θ)β A better way to quantify uncertainty: use a CD-based predictive distribution function: Q(y 0 ; y) = λ Θ G λ (y 0 )dh c(λ; y), where G λ ( ) is the CDF of N(p 1 (λ t ), m 1 (λ t )), and H c(λ; y) is joint CD of λ = (β, θ) obtained from our SSCC approach

26 Theoretical result on prediction Theorem 2 Under some regularity assumptions, when the number of blocks m = o(n 1/2 ) and n, p 1 (β, θ) p 0 (β, θ), m 1 (β, θ) m 0 (β, θ) The easy-to-compute GP prediction is asymptotically equivalent to the original prediction Compared with the plug-in predictive distribution, the CD-based predictive distribution function has smaller average Kullback-Leibler distance to the true predictive distribution (Shen, Liu, and Xie 2016).

27 Quantify GP prediction uncertainty using SSCC and CD Monte-Carlo Algorithm: Repeat the following two steps T time to obtain T copies of simulated y 0 from Q( ; y), denoted by y (t) 0 and t = 1,...T : Generate ξ t y H c(λ; y), Generate y t 0 ξ t N(p 1 (ξ t ), m 1 (ξ t )). These T copies of y (t) 0 can be used to make predictive inference for y 0.

28 Simulation study: setup GP model: y = xβ + Z (x), Sample points: n = 1000, 1500, 2000; Four covariates p = 4 True parameter values: β = (3, 2, 1, 2, 1.5), θ = (15, 1.5, 2, 3) & σ 2 = 1 The ijth element of Σ(θ) is computed by σ ij = exp( p k=1 θ k x ik x jk ) The number of splits m = 5

29 Simulation study: analysis results n = 1000 n = 1500 n = 2000 MLE Taper SSCC MLE Taper SSCC MLE Taper SSCC ˆβ (0.33) 1.96(0.33) 1.96(0.33) 2.00(0.30) 2.00(0.30) 2.00(0.30) 2.08(0.23) 2.08(0.23) 2.08(0.23) ˆβ (0.43) 3.02(0.43) 3.02(0.43) 3.01(0.36) 3.01(0.36) 3.01(0.36) 2.90(0.28) 2.90(0.28) 2.90(0.28) ˆβ (0.23) 1.00(0.23) 1.00(0.23) 0.99(0.21) 0.99(0.21) 0.99(0.21) 1.04(0.20) 1.04(0.20) 1.04(0.20) ˆβ (0.24) 2.04(0.24) 2.04(0.24) 1.97(0.25) 1.97(0.25) 1.97(0.25) 1.98(0.24) 1.98(0.24) 1.98(0.24) ˆβ (0.25) 1.53(0.25) 1.53(0.25) 1.52(0.28) 1.52(0.28) 1.52(0.28) 1.49(0.18) 1.49(0.18) 1.49(0.18) ˆθ (0.42) 14.72(0.42) 14.80(0.43) 14.65(0.39) 14.65(0.39) 14.65(0.49) 14.69(0.45) 14.74(0.45) 14.79(0.46) ˆθ (0.06) 1.49(0.06) 1.51(0.06) 1.49(0.06) 1.49(0.06) 1.50(0.10) 1.49(0.06) 1.49(0.06) 1.49(0.10) ˆθ (0.06) 2.00(0.06) 2.01(0.06) 1.98(0.06) 1.98(0.06) 2.00(0.06) 1.99(0.07) 1.99(0.07) 2.00(0.10) ˆθ (0.06) 3.01(0.06) 2.99(0.06) 3.00(0.06) 3.00(0.06) 3.00(0.08) 3.01(0.06) 3.01(0.06) 3.00(0.07) CT 32.46(1.29) 30.61(1.34) 4.54(0.11) 99.66(3.83) 95.90(5.44) 10.39(0.53) (6.96) (9.33) 20.32(0.95) Table : Simulation results for n = 1000, 1500, 2000 with 100 replications

30 Simulation study: computing time Computing Time vs Sample Size Seconds MLE Taper SSCC Sample Size (n) Remark: Reduction of computational complexity from O(n 3 ) to O( la 3 ), where la is the size of the ath subset and l a = n.

31 Real data example: IBM data center thermal management project Variable βmle βtaper βscc βsscc X 1 CRAC unit 1 flow rate (0.101) (0.101) (0.561) (0.101) X 2 CRAC unit 2 flow rate (0.099) (0.099) (0.581) (0.099) X 3 CRAC unit 3 flow rate (0.101) (0.101) (0.839) (0.101) X 4 CRAC unit 4 flow rate (0.113) (0.113) (1.406) (0.113) X 5 Room temperature (0.138) (0.138) (3.116) (0.138) X 6 Tile open area percentage (0.106) (0.106) (0.894) (0.106) X 7 Location in x-axis (0.054) (0.054) (0.086) (0.054) X 8 Location in y-axis 1.672(0.080) 1.681(0.080) 1.783(0.484) 1.681(0.080) X 9 Height (0.194) (0.194) (6.437) (0.194) Computing time (in seconds) Table : Using a subsample of n = 9000; Only estimating β with θ given (fixed at the estimated values in Chen et al 2014).

32 Real data example: IBM data center thermal management project Computing time Figure : Using a subsample of n = 9000; Only estimating β with θ given (fixed at the estimated values in Chen et al 2014).

33 Real data example: IBM data center thermal management project n = 1800 n = 3600 n = Variable MLE Taper SSCC MLE Taper SSCC MLE Taper SSCC CRAC unit 1 flow rate ˆβ1-8.28(0.10) -8.29(0.10) -8.29(0.10) (0.09) (0.08) ˆθ1 0.84(0.01) 0.84(0.01) 0.86(0.01) (0.01) (0.01) CRAC unit 2 flow rate ˆβ2-9.00(0.10) -8.99(0.10) -8.99(0.10) (0.09) (0.08) ˆθ2 0.76(0.01) 0.76(0.01) 0.79(0.01) (0.01) (0.01) CRAC unit 3 flow rate ˆβ3-6.44(0.10) -6.45(0.10) -6.45(0.10) (0.09) (0.09) ˆθ3 1.20(0.01) 1.20(0.01) 1.19(0.01) (0.01) (0.01) CRAC unit 4 flow rate ˆβ4-5.42(0.11) -5.41(0.11) -5.41(0.11) (0.10) (0.10) ˆθ4 1.90(0.01) 1.90(0.01) 1.80(0.01) (0.01) (0.01) Room temperature ˆβ5-0.08(0.13) -0.07(0.13) -0.07(0.13) (0.13) (0.13) ˆθ5 3.50(0.01) 3.50(0.01) 3.40(0.01) (0.01) (0.01) Tile open area percentage ˆβ6-1.98(0.10) -1.97(0.10) -1.97(0.10) (0.10) (0.09) ˆθ6 1.29(0.01) 1.29(0.01) 1.20(0.01) (0.01) (0.01) Location in x-axis ˆβ7-3.39(0.06) -3.41(0.06) -3.41(0.06) (0.04) (0.03) ˆθ7 0.20(0.01) 0.20(0.01) 0.17(0.01) (0.01) (0.01) Location in y-axis ˆβ8 2.80(0.08) 2.80(0.08) 2.80(0.08) (0.07) (0.06) ˆθ8 0.60(0.01) 0.60(0.01) 0.50(0.01) (0.01) (0.01) Location in z-axis ˆβ (0.18) 22.35(0.18) 22.35(0.18) (0.18) (0.18) ˆθ (0.03) 21.90(0.03) 21.45(0.03) (0.03) (0.02) Computing Time (in seconds) Table : CFD data results with n = 1800, 3600 and 26820; estimating both β and θ.

34 Summary Introduce an unified SSCC framework for GP modeling that can dramatically reduce computation and also provide a better quantification of the prediction uncertainty A Sequential Split-Conquer-Combine (SSCC) procedure for big spatial data CD Combining Predictive distribution based on CDs Provide asymptotically equivalent estimation Reduce computational complexity from O(n 3 ) to O( l 3 a ), where l a is the size of the ath subset and l a = n.!"#$%&'()*!

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

On Gaussian Process Models for High-Dimensional Geostatistical Datasets On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley 1 and Sudipto Banerjee 2 1 Department of Forestry & Department of Geography, Michigan

More information

Geostatistical Modeling for Large Data Sets: Low-rank methods

Geostatistical Modeling for Large Data Sets: Low-rank methods Geostatistical Modeling for Large Data Sets: Low-rank methods Whitney Huang, Kelly-Ann Dixon Hamil, and Zizhuang Wu Department of Statistics Purdue University February 22, 2016 Outline Motivation Low-rank

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Alan Gelfand 1 and Andrew O. Finley 2 1 Department of Statistical Science, Duke University, Durham, North

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota,

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

Combining information from different sources: A resampling based approach

Combining information from different sources: A resampling based approach Combining information from different sources: A resampling based approach S.N. Lahiri Department of Statistics North Carolina State University May 17, 2013 Overview Background Examples/Potential applications

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley Department of Forestry & Department of Geography, Michigan State University, Lansing

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

SURROGATE PREPOSTERIOR ANALYSES FOR PREDICTING AND ENHANCING IDENTIFIABILITY IN MODEL CALIBRATION

SURROGATE PREPOSTERIOR ANALYSES FOR PREDICTING AND ENHANCING IDENTIFIABILITY IN MODEL CALIBRATION International Journal for Uncertainty Quantification, ():xxx xxx, 0 SURROGATE PREPOSTERIOR ANALYSES FOR PREDICTING AND ENHANCING IDENTIFIABILITY IN MODEL CALIBRATION Zhen Jiang, Daniel W. Apley, & Wei

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets Sudipto Banerjee University of California, Los Angeles, USA Based upon projects involving: Abhirup Datta (Johns Hopkins University)

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

NEW MODELS AND METHODS FOR APPLIED STATISTICS: TOPICS IN COMPUTER EXPERIMENTS AND TIME SERIES ANALYSIS

NEW MODELS AND METHODS FOR APPLIED STATISTICS: TOPICS IN COMPUTER EXPERIMENTS AND TIME SERIES ANALYSIS NEW MODELS AND METHODS FOR APPLIED STATISTICS: TOPICS IN COMPUTER EXPERIMENTS AND TIME SERIES ANALYSIS by YIBO ZHAO A dissertation submitted to the School of Graduate Studies Rutgers, The State University

More information

Statistical Process Control for Multivariate Categorical Processes

Statistical Process Control for Multivariate Categorical Processes Statistical Process Control for Multivariate Categorical Processes Fugee Tsung The Hong Kong University of Science and Technology Fugee Tsung 1/27 Introduction Typical Control Charts Univariate continuous

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Quantifying Stochastic Model Errors via Robust Optimization

Quantifying Stochastic Model Errors via Robust Optimization Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Gaussian Processes 1. Schedule

Gaussian Processes 1. Schedule 1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project

More information

Confidence Distribution

Confidence Distribution Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

Inferring biological dynamics Iterated filtering (IF)

Inferring biological dynamics Iterated filtering (IF) Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February

More information

Introduction to Geostatistics

Introduction to Geostatistics Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging

A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging Cheng Li DSAP, National University of Singapore Joint work with Rajarshi Guhaniyogi (UC Santa Cruz), Terrance D. Savitsky (US Bureau of Labor

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Bayesian Dynamic Linear Modelling for. Complex Computer Models Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Monitoring Wafer Geometric Quality using Additive Gaussian Process Monitoring Wafer Geometric Quality using Additive Gaussian Process Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1 1 Department of Industrial and Systems Engineering, National University of Singapore 2 Department

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Regression with correlation for the Sales Data

Regression with correlation for the Sales Data Regression with correlation for the Sales Data Scatter with Loess Curve Time Series Plot Sales 30 35 40 45 Sales 30 35 40 45 0 10 20 30 40 50 Week 0 10 20 30 40 50 Week Sales Data What is our goal with

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Kriging and Alternatives in Computer Experiments

Kriging and Alternatives in Computer Experiments Kriging and Alternatives in Computer Experiments C. F. Jeff Wu ISyE, Georgia Institute of Technology Use kriging to build meta models in computer experiments, a brief review Numerical problems with kriging

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

When using physical experimental data to adjust, or calibrate, computer simulation models, two general

When using physical experimental data to adjust, or calibrate, computer simulation models, two general A Preposterior Analysis to Predict Identifiability in Experimental Calibration of Computer Models Paul D. Arendt Northwestern University, Department of Mechanical Engineering 2145 Sheridan Road Room B214

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

Fast Likelihood-Free Inference via Bayesian Optimization

Fast Likelihood-Free Inference via Bayesian Optimization Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes TTU, October 26, 2012 p. 1/3 Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes Hao Zhang Department of Statistics Department of Forestry and Natural Resources Purdue University

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information