A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data
|
|
- Clarissa McKinney
- 5 years ago
- Views:
Transcription
1 A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data Min-ge Xie Department of Statistics & Biostatistics Rutgers, The State University of New Jersey In collaboration with Xuying Chen, Ying Hung and Chengrui Li SAMSI Workshop on Distributed and Parallel Data Analysis (DPDA) Raleigh, North Carolina; September 2016 Research supported in part by grants from NSF, DHS and IBM
2 Introduction - era of data science/big data Today, the integration of computer technology into science and daily life has enabled the collection of big data across all fields Extremely large data... big n, big p, big t and growing Impose difficulties/challenges in analysis of extremely large data Memory/storage issues: too large to fit into a single computer or a single site. Computing issues: too expensive to perform any computationally intensive analysis Statistical issues: heterogeneity (in designs, information, & more), sparsity, non-conventional (e.g.,image/voice/text/network) data, missing data, random/stochastic components, etc. But also lead to opportunities in many fields, including statistics!
3 Introduction - split-conquer-combine approach Split-conquer-combine (SCC) method is natural, effective, generally applicable,... Recent publications (partial list): Battey et al. (2015), Chen & Xie (2012/14), Dai et al. (2015), Huang & Huo (2015), Song & Liang (2015), Wang & Dunson (2013), Zhang et al (2013), Zhao et al. (2014),... Versus parallel computing in CS: Emphasize on the statistical performance/properties and seek to get best possible outcomes.
4 Introduction - our first SCC project Port-of-entry inspection project (DHS/DIMACS/CCICADA) Background: Standardized shipping containers transport 95% of the U.S. imports by tonnage; they are highly vulnerable for smuggling nuclear/radiological and other illegal materials. Inspection goal: Find shipments of high risk and identify important influencers (variables). Approach: Analyze historical data and search for patterns Problem: A huge amount of data (available data: two week shipments; big n, big p) (Chen 2012, Chen and Xie 2012/14)
5 Introduction - our first SCC project Setting: penalized regression/variable selection in GLMs - Feasibility: The result obtained using the SCC method is the same as/asymptotically equivalent to the result of analyzing the entire data all at once Computational efficiency: The SCC method can substantially reduce computing time and computer memory requirement, if a computationally intensive algorithm is used Better error bounds: As a byproduct, the SCC method is more Mean computing time (seconds) ± 2SD All data K=2 K= n:number of observations resistant to false model selections caused by spurious correlations (Chen 2012, Chen and Xie 2012/14)
6 New (second) project - motivating example of spatial data IBM data center thermal management project A data center is an integrated facility housing multiple-unit servers, providing application services or management for data processing. Goal: Design a data center with an efficient heat removal mechanism. Computational Fluid Dynamics (CFD) simulation (n = 26820; p = 9) Figure : Heat map for IBM T. J. Watson Data Center
7 Gaussian process model Gaussian process (GP) model: y = xβ + Z (x), y: n 1 vector of observations (e.g., room temperatures) x: n p design matrix β: p 1 unknown parameters Z (x) is a GP process with mean 0 and covariance σ 2 Σ(θ) Σ(θ): n-by-n correlation matrix with correlation parameters θ Correlation matrix: The ijth element of Σ(θ) is defined by a power exponential function p corr(z (x i ), Z (x j )) = exp( θ T x i x j ) = exp( θ k x ik x jk ). k=1 Remark: Assume σ 2 is known for simplicity in this talk.
8 Estimation and prediction Likelihood inference l(β, θ, σ) = 1 2σ 2 (y Xβ) Σ 1 (θ)(y Xβ) 1 2 log Σ(θ) n 2 log(σ2 ) So, β θ = argmax β {l(β θ, σ 2 )} = (X Σ 1 (θ)x) 1 X Σ 1 (θ)y θ β, σ 2 = argmax θ {l(θ β, σ 2 )} GP prediction, say y 0, at a new point x 0, given parameters (β, θ), follows a normal distribution with mean p 0 (β, θ) and variance m 0 (β, θ), where, p 0 (β, θ) = x 0 β + γ(θ) Σ 1 (θ)(y Xβ) m 0 (β, θ) = σ 2 (1 γ(θ) Σ 1 (θ)γ(θ)), and γ(θ) is a n 1 vector of ith element equals to φ( x i x 0 ; θ).
9 Two challenges in GP modeling Computational issue: Estimation and prediction involve Σ 1 and Σ with order of O(n 3 ): Not feasible when n is large Uncertainty quantification of GP predictor Plug-in predictive distribution is widely used (i.e., unknown parameters in the formula replaced by their point estimates) It underestimates the uncertainty (cf. Santner et al 2003)
10 Existing methods/efforts For the computational issue: Change the model to one that is computationally convenient: Rue and Held (2005), Cressie and Johannesson (2008). Approximate the likelihood function: Stein et al. (2004), Furrer et al. (2006), Fuentes (2007), Kaufman et al. (2008). Not focusing on uncertainty quantification/bring in additional uncertainty Still not feasible when n is super large For uncertainty quantification of GP prediction Bayesian predictive distribution: Kennedy and O Hagan (2001) Bootstrap approach: Sjostedt-de Luna (2003), Santner et al. (2003) Intensive computationally
11 Our solution Can we solve both problems by an unified framework? Yes, we develop a sequential split-conquer-combine method based on our understating from the previously developed split-conquer-combine method (for independent data) & confidence distribution developments
12 Introduction to confidence distribution (CD) Statistical inference (Parameter estimation): Point estimate Interval estimate Distribution estimate (e.g., confidence distribution) Example: X 1,..., X n i.i.d. follows N(µ, 1) Point estimate: x n = 1 n n i=1 x i Interval estimate: ( x n 1.96/ n, x n / n) Distribution estimate: N( x n, 1 ) n
13 Introduction to confidence distribution (CD) Statistical inference (Parameter estimation): Point estimate Interval estimate Distribution estimate (e.g., confidence distribution) Example: X 1,..., X n i.i.d. follows N(µ, 1) Point estimate: x n = 1 n n i=1 x i Interval estimate: ( x n 1.96/ n, x n / n) Distribution estimate: N( x n, 1 ) n The idea of the CD approach is to use a sample-dependent distribution (or density) function to estimate the parameter of interest. o It is to provide simple and interpretable summaries of what can reasonably be learned from data (Cox 2013) and also meaningful answers for all inference questions. (Xie and Singh 2013)
14 Inference using CD: Point estimators, Intervals, p-values & more b
15 Inference using CD: Point estimators, Intervals, p-values & more b Statistical Dream: Bayesian, Fiducial and Frequestist = BFF = Best Friends Forever
16 Efficient estimation using CD and Split-Conquer-Combine CD is useful in combining information (meta-analysis, fusion learning,...) Chen and Xie (2012/14) considered a computationally efficient approach using Split-Conquer-Combine and CD. Entire Dataset Split and Conquer Combine D1 D2 ˆ 1 ˆ 2 Dm ˆm ˆc Result is asymptotically equivalent to the one using the entire dataset But, if SCC is directly applied to dependent data, it will loss information Question: How to extend SCC to big data with dependent structures, such as GP?
17 Sequential Split-Conquer-Combine for GP Our solution: Figure : Sequential Split-Conquer-Combine Approach
18 Ingredients Divide the entire dataset into subsets (correlated) Perform a transformation sequentially to create independent subsets Combine estimators using CD Quantify prediction uncertainty using CD Assumption (weak): compact support correlation for 1-dim (i.e., Σ to Σ t) Tapering needed only on one variable (say X 1 ) Σ 11 Σ 12 O O. Σ 21 Σ.. 22 O Σ t = O.. Σ(m 1)(m 1) Σ (m 1)m O O Σ m(m 1) Σ mm (after index sorting according to X 1 values) n n,
19 Sequentially update data Divide the entire dataset into subsets y = {y a }, a = 1,..., m. Transform y to y by sequentially updating: y a = y a L a(a 1)y a 1, where L (a+1)a = Σ t(a+1)a Da 1, D a = Σ aa L a(a 1) D (a 1) L a(a 1). y a are independent with covariance Da; Also, D 1 D 2 D =... D m I, L =. L 21 I.... L m1 L m2 I and Σ t = LDL is the compact support correlation function. Sequential updates are computationally efficient
20 Estimation from each subset (Assume θ is fixed/known) MLE of the ath subset: β a = argmax β l a(β) = (C a D 1 a C a) 1 C a D 1 Significant computational reduction because D a is much smaller than the original covariance matrix. a y a
21 Estimation from each subset (Assume θ is fixed/known) MLE of the ath subset: β a = argmax β l a(β) = (C a D 1 a C a) 1 C a D 1 Significant computational reduction because D a is much smaller than the original covariance matrix. An individual CD (confidence distribution) for the ath updated subset is (cf., Xie and Singh 2013): H a(β) = Φ ( S 1/2 a (β β ) a) where S a = Cov( β a) and Φ is the CDF of p-dimensional multivariate standard normal. a y a
22 CD combining Following Singh, Xie and Strwderman (2005), Liu, Liu and Xie (2014) and Yang et al. (2014), a combined CD H c(β) is: ( H c(β) = Φ S 1/2 SSCC (β β ) SSCC ) where β SSCC = ( W a) 1 ( W a βa) with W a = (Ca Da 1 C a) 1 and S SSCC = Cov( β c). Similar framework can be applied to all the parameters (β, θ). Iteratively loop between β θ and θ β
23 CD combining Following Singh, Xie and Strwderman (2005), Liu, Liu and Xie (2014) and Yang et al. (2014), a combined CD H c(β) is: ( H c(β) = Φ S 1/2 SSCC (β β ) SSCC ) where β SSCC = ( W a) 1 ( W a βa) with W a = (Ca Da 1 C a) 1 and S SSCC = Cov( β c). Similar framework can be applied to all the parameters (β, θ). Iteratively loop between β θ and θ β Theorem 1 Under some regularity assumptions, when the number of blocks m = o(n 1/2 ) and n, the SSCC estimator λ c = ( β sscc, θ sscc) is asymptotically as efficient as MLE λ mle = ( β mle, θ mle ).
24 GP predictive distribution GP prediction y 0 at a new point x 0, given parameters (β, θ), follows a normal distribution with mean p 0 (β, θ) and variance m 0 (β, θ), where, p 0 (β, θ) = x 0 β + γ(θ) Σ 1 (θ)(y Xβ) m 0 (β, θ) = σ 2 (1 γ(θ) Σ 1 (θ)γ(θ)), and γ(θ) is a n 1 vector of ith element equals to φ( x i x 0 ; θ). Drawbacks: Computational issue again! Conventional plug-in predictive distribution underestimate the uncertainty
25 Quantify GP prediction uncertainty using SSCC and CD An easy-to-compute GP prediction: p 1 (β, θ) =x 0 β + m 1 (β, θ) =σ 2 (1 m a=1 m a=1 γ a (θ) D 1 a (θ)y a + m γa (θ) Da 1 (θ)γa (θ)) a=1 γa (θ) Da 1 (θ)c a(θ)β A better way to quantify uncertainty: use a CD-based predictive distribution function: Q(y 0 ; y) = λ Θ G λ (y 0 )dh c(λ; y), where G λ ( ) is the CDF of N(p 1 (λ t ), m 1 (λ t )), and H c(λ; y) is joint CD of λ = (β, θ) obtained from our SSCC approach
26 Theoretical result on prediction Theorem 2 Under some regularity assumptions, when the number of blocks m = o(n 1/2 ) and n, p 1 (β, θ) p 0 (β, θ), m 1 (β, θ) m 0 (β, θ) The easy-to-compute GP prediction is asymptotically equivalent to the original prediction Compared with the plug-in predictive distribution, the CD-based predictive distribution function has smaller average Kullback-Leibler distance to the true predictive distribution (Shen, Liu, and Xie 2016).
27 Quantify GP prediction uncertainty using SSCC and CD Monte-Carlo Algorithm: Repeat the following two steps T time to obtain T copies of simulated y 0 from Q( ; y), denoted by y (t) 0 and t = 1,...T : Generate ξ t y H c(λ; y), Generate y t 0 ξ t N(p 1 (ξ t ), m 1 (ξ t )). These T copies of y (t) 0 can be used to make predictive inference for y 0.
28 Simulation study: setup GP model: y = xβ + Z (x), Sample points: n = 1000, 1500, 2000; Four covariates p = 4 True parameter values: β = (3, 2, 1, 2, 1.5), θ = (15, 1.5, 2, 3) & σ 2 = 1 The ijth element of Σ(θ) is computed by σ ij = exp( p k=1 θ k x ik x jk ) The number of splits m = 5
29 Simulation study: analysis results n = 1000 n = 1500 n = 2000 MLE Taper SSCC MLE Taper SSCC MLE Taper SSCC ˆβ (0.33) 1.96(0.33) 1.96(0.33) 2.00(0.30) 2.00(0.30) 2.00(0.30) 2.08(0.23) 2.08(0.23) 2.08(0.23) ˆβ (0.43) 3.02(0.43) 3.02(0.43) 3.01(0.36) 3.01(0.36) 3.01(0.36) 2.90(0.28) 2.90(0.28) 2.90(0.28) ˆβ (0.23) 1.00(0.23) 1.00(0.23) 0.99(0.21) 0.99(0.21) 0.99(0.21) 1.04(0.20) 1.04(0.20) 1.04(0.20) ˆβ (0.24) 2.04(0.24) 2.04(0.24) 1.97(0.25) 1.97(0.25) 1.97(0.25) 1.98(0.24) 1.98(0.24) 1.98(0.24) ˆβ (0.25) 1.53(0.25) 1.53(0.25) 1.52(0.28) 1.52(0.28) 1.52(0.28) 1.49(0.18) 1.49(0.18) 1.49(0.18) ˆθ (0.42) 14.72(0.42) 14.80(0.43) 14.65(0.39) 14.65(0.39) 14.65(0.49) 14.69(0.45) 14.74(0.45) 14.79(0.46) ˆθ (0.06) 1.49(0.06) 1.51(0.06) 1.49(0.06) 1.49(0.06) 1.50(0.10) 1.49(0.06) 1.49(0.06) 1.49(0.10) ˆθ (0.06) 2.00(0.06) 2.01(0.06) 1.98(0.06) 1.98(0.06) 2.00(0.06) 1.99(0.07) 1.99(0.07) 2.00(0.10) ˆθ (0.06) 3.01(0.06) 2.99(0.06) 3.00(0.06) 3.00(0.06) 3.00(0.08) 3.01(0.06) 3.01(0.06) 3.00(0.07) CT 32.46(1.29) 30.61(1.34) 4.54(0.11) 99.66(3.83) 95.90(5.44) 10.39(0.53) (6.96) (9.33) 20.32(0.95) Table : Simulation results for n = 1000, 1500, 2000 with 100 replications
30 Simulation study: computing time Computing Time vs Sample Size Seconds MLE Taper SSCC Sample Size (n) Remark: Reduction of computational complexity from O(n 3 ) to O( la 3 ), where la is the size of the ath subset and l a = n.
31 Real data example: IBM data center thermal management project Variable βmle βtaper βscc βsscc X 1 CRAC unit 1 flow rate (0.101) (0.101) (0.561) (0.101) X 2 CRAC unit 2 flow rate (0.099) (0.099) (0.581) (0.099) X 3 CRAC unit 3 flow rate (0.101) (0.101) (0.839) (0.101) X 4 CRAC unit 4 flow rate (0.113) (0.113) (1.406) (0.113) X 5 Room temperature (0.138) (0.138) (3.116) (0.138) X 6 Tile open area percentage (0.106) (0.106) (0.894) (0.106) X 7 Location in x-axis (0.054) (0.054) (0.086) (0.054) X 8 Location in y-axis 1.672(0.080) 1.681(0.080) 1.783(0.484) 1.681(0.080) X 9 Height (0.194) (0.194) (6.437) (0.194) Computing time (in seconds) Table : Using a subsample of n = 9000; Only estimating β with θ given (fixed at the estimated values in Chen et al 2014).
32 Real data example: IBM data center thermal management project Computing time Figure : Using a subsample of n = 9000; Only estimating β with θ given (fixed at the estimated values in Chen et al 2014).
33 Real data example: IBM data center thermal management project n = 1800 n = 3600 n = Variable MLE Taper SSCC MLE Taper SSCC MLE Taper SSCC CRAC unit 1 flow rate ˆβ1-8.28(0.10) -8.29(0.10) -8.29(0.10) (0.09) (0.08) ˆθ1 0.84(0.01) 0.84(0.01) 0.86(0.01) (0.01) (0.01) CRAC unit 2 flow rate ˆβ2-9.00(0.10) -8.99(0.10) -8.99(0.10) (0.09) (0.08) ˆθ2 0.76(0.01) 0.76(0.01) 0.79(0.01) (0.01) (0.01) CRAC unit 3 flow rate ˆβ3-6.44(0.10) -6.45(0.10) -6.45(0.10) (0.09) (0.09) ˆθ3 1.20(0.01) 1.20(0.01) 1.19(0.01) (0.01) (0.01) CRAC unit 4 flow rate ˆβ4-5.42(0.11) -5.41(0.11) -5.41(0.11) (0.10) (0.10) ˆθ4 1.90(0.01) 1.90(0.01) 1.80(0.01) (0.01) (0.01) Room temperature ˆβ5-0.08(0.13) -0.07(0.13) -0.07(0.13) (0.13) (0.13) ˆθ5 3.50(0.01) 3.50(0.01) 3.40(0.01) (0.01) (0.01) Tile open area percentage ˆβ6-1.98(0.10) -1.97(0.10) -1.97(0.10) (0.10) (0.09) ˆθ6 1.29(0.01) 1.29(0.01) 1.20(0.01) (0.01) (0.01) Location in x-axis ˆβ7-3.39(0.06) -3.41(0.06) -3.41(0.06) (0.04) (0.03) ˆθ7 0.20(0.01) 0.20(0.01) 0.17(0.01) (0.01) (0.01) Location in y-axis ˆβ8 2.80(0.08) 2.80(0.08) 2.80(0.08) (0.07) (0.06) ˆθ8 0.60(0.01) 0.60(0.01) 0.50(0.01) (0.01) (0.01) Location in z-axis ˆβ (0.18) 22.35(0.18) 22.35(0.18) (0.18) (0.18) ˆθ (0.03) 21.90(0.03) 21.45(0.03) (0.03) (0.02) Computing Time (in seconds) Table : CFD data results with n = 1800, 3600 and 26820; estimating both β and θ.
34 Summary Introduce an unified SSCC framework for GP modeling that can dramatically reduce computation and also provide a better quantification of the prediction uncertainty A Sequential Split-Conquer-Combine (SSCC) procedure for big spatial data CD Combining Predictive distribution based on CDs Provide asymptotically equivalent estimation Reduce computational complexity from O(n 3 ) to O( l 3 a ), where l a is the size of the ath subset and l a = n.!"#$%&'()*!
A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling
A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationOn Gaussian Process Models for High-Dimensional Geostatistical Datasets
On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May
More informationDivide-and-combine Strategies in Statistical Modeling for Massive Data
Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationStatistics for analyzing and modeling precipitation isotope ratios in IsoMAP
Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley 1 and Sudipto Banerjee 2 1 Department of Forestry & Department of Geography, Michigan
More informationGeostatistical Modeling for Large Data Sets: Low-rank methods
Geostatistical Modeling for Large Data Sets: Low-rank methods Whitney Huang, Kelly-Ann Dixon Hamil, and Zizhuang Wu Department of Statistics Purdue University February 22, 2016 Outline Motivation Low-rank
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Alan Gelfand 1 and Andrew O. Finley 2 1 Department of Statistical Science, Duke University, Durham, North
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota,
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationCombining information from different sources: A resampling based approach
Combining information from different sources: A resampling based approach S.N. Lahiri Department of Statistics North Carolina State University May 17, 2013 Overview Background Examples/Potential applications
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationBayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes
Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley Department of Forestry & Department of Geography, Michigan State University, Lansing
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationSURROGATE PREPOSTERIOR ANALYSES FOR PREDICTING AND ENHANCING IDENTIFIABILITY IN MODEL CALIBRATION
International Journal for Uncertainty Quantification, ():xxx xxx, 0 SURROGATE PREPOSTERIOR ANALYSES FOR PREDICTING AND ENHANCING IDENTIFIABILITY IN MODEL CALIBRATION Zhen Jiang, Daniel W. Apley, & Wei
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationChapter 10. Semi-Supervised Learning
Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationBayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets
Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets Sudipto Banerjee University of California, Los Angeles, USA Based upon projects involving: Abhirup Datta (Johns Hopkins University)
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationNEW MODELS AND METHODS FOR APPLIED STATISTICS: TOPICS IN COMPUTER EXPERIMENTS AND TIME SERIES ANALYSIS
NEW MODELS AND METHODS FOR APPLIED STATISTICS: TOPICS IN COMPUTER EXPERIMENTS AND TIME SERIES ANALYSIS by YIBO ZHAO A dissertation submitted to the School of Graduate Studies Rutgers, The State University
More informationStatistical Process Control for Multivariate Categorical Processes
Statistical Process Control for Multivariate Categorical Processes Fugee Tsung The Hong Kong University of Science and Technology Fugee Tsung 1/27 Introduction Typical Control Charts Univariate continuous
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationQuantifying Stochastic Model Errors via Robust Optimization
Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations
More informationA General Framework for High-Dimensional Inference and Multiple Testing
A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional
More informationESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS
ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationGaussian Processes 1. Schedule
1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project
More informationConfidence Distribution
Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationThe consequences of misspecifying the random effects distribution when fitting generalized linear mixed models
The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San
More informationInferring biological dynamics Iterated filtering (IF)
Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationSpatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields
Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February
More informationIntroduction to Geostatistics
Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,
More informationBayesian spatial quantile regression
Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationA Divide-and-Conquer Bayesian Approach to Large-Scale Kriging
A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging Cheng Li DSAP, National University of Singapore Joint work with Rajarshi Guhaniyogi (UC Santa Cruz), Terrance D. Savitsky (US Bureau of Labor
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationBayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling
Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation
More informationSTAT 135 Lab 5 Bootstrapping and Hypothesis Testing
STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationSupplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles
Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To
More informationBayesian Dynamic Linear Modelling for. Complex Computer Models
Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationQuasi-likelihood Scan Statistics for Detection of
for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for
More informationMonitoring Wafer Geometric Quality using Additive Gaussian Process
Monitoring Wafer Geometric Quality using Additive Gaussian Process Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1 1 Department of Industrial and Systems Engineering, National University of Singapore 2 Department
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationRegression with correlation for the Sales Data
Regression with correlation for the Sales Data Scatter with Loess Curve Time Series Plot Sales 30 35 40 45 Sales 30 35 40 45 0 10 20 30 40 50 Week 0 10 20 30 40 50 Week Sales Data What is our goal with
More informationModification and Improvement of Empirical Likelihood for Missing Response Problem
UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu
More informationKriging and Alternatives in Computer Experiments
Kriging and Alternatives in Computer Experiments C. F. Jeff Wu ISyE, Georgia Institute of Technology Use kriging to build meta models in computer experiments, a brief review Numerical problems with kriging
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationWhen using physical experimental data to adjust, or calibrate, computer simulation models, two general
A Preposterior Analysis to Predict Identifiability in Experimental Calibration of Computer Models Paul D. Arendt Northwestern University, Department of Mechanical Engineering 2145 Sheridan Road Room B214
More informationHierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,
More informationMultistate Modeling and Applications
Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationUnivariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation
Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationFast Likelihood-Free Inference via Bayesian Optimization
Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationKarhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes
TTU, October 26, 2012 p. 1/3 Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes Hao Zhang Department of Statistics Department of Forestry and Natural Resources Purdue University
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationEstimating subgroup specific treatment effects via concave fusion
Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More information