An application of the GAM-PCA-VAR model to respiratory disease and air pollution data
|
|
- Lucinda Porter
- 5 years ago
- Views:
Transcription
1 An application of the GAM-PCA-VAR model to respiratory disease and air pollution data Márton Ispány 1 Faculty of Informatics, University of Debrecen Hungary Joint work with Juliana Bottoni de Souza, Valdério A. Reisen, Glaura C. Franco, Pascal Bondon, Jane Meri Santos International work-conference on Time Series Granada, Spain September 18th-20th, Márton Ispány was supported by the EFOP project.the project is co-financed by the European Union and the European Social Fund. ITISE2017 Márton Ispány Respiratory diseases and air pollution data 1 / 28
2 Outline 1 Respiratory diseases and air pollution in Vitória (Brazil) 2 Dependence in the data: inter and temporal correlation 3 Generalized additive models (GAM) and relative risk (RR) 4 Simulation study of GAM under dependent covariates 5 GAM-PCA model 6 GAM-PCA-VAR model 7 Goodness-of-fit and RR 8 Conclusions ITISE2017 Márton Ispány Respiratory diseases and air pollution data 2 / 28
3 1. Data Time period: January 1, 2005 and December 31, 2010, sample size n=2191. Discrete response Y : the number of hospital admissions for respiratory diseases (RD) was obtained from the main children s emergency department in the Metropolitan Area (called Hospital Infantil Nossa Senhora da Gloria). Continuous covariates X: atmospheric pollutants as particulate material (PM 10 ), sulphur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), ozone (O 3 ) and carbon monoxide (CO). Confounding variables: temperature, relative humidity The concentrations of the pollutants considered exceeded neither the primary air quality standard recommended by the Brazilian National Council for the Environment (CONAMA), nor the guidelines suggested by the World Health Organization (WHO). ITISE2017 Márton Ispány Respiratory diseases and air pollution data 3 / 28
4 1. Descriptive statistics Pollutants are measured as the 24-hour average concentration for PM 10 and SO 2, 8-hour moving average concentrations for CO and O 3, and the 24-hour maximum concentration for NO 2 by daily averages among the stations. Percentile Mean Std. dev. Min Max PM 10 (µg/m 3 ) SO 2 (µg/m 3 ) O 3 (µg/m 3 ) NO 2 (µg/m 3 ) CO (µg/m 3 ) Min temp ( C) Ave temp ( C) Max temp ( C) Air relative humidity (%) Number of treatments for RD ITISE2017 Márton Ispány Respiratory diseases and air pollution data 4 / 28
5 1. Concentration of CO; NO 2 ; SO 2 ; PM 10 ; O 3 and the Number of treatments for RD CO NO Concentration Concentration Jan 06 Jan 08 Jan 10 Jan 12 Time Jan 06 Jan 08 Jan 10 Jan 12 Time SO 2 PM Concentration Concentration Jan 06 Jan 08 Jan 10 Jan 12 Time Jan 06 Jan 08 Jan 10 Jan 12 Time O Treatments for respiratory Concentration Admissions Jan 06 Jan 08 Jan 10 Jan 12 Time Jan 06 Jan 08 Jan 10 Jan 12 Time ITISE2017 Márton Ispány Respiratory diseases and air pollution data 5 / 28
6 2. Correlation between pollutants, meteorological variables and number of treatments Although some sample correlations appear not to be numerically significant, the non-parametric Pearson correlation test indicated that correlation among the atmospheric pollutants is significant for all pairs of variables at level 5% and for most pairs at level 0.1%. PM 10 SO 2 NO 2 CO O 3 T(max) T(min) RH Number of treatments PM SO NO CO O T(max) T(min) RH Number of treatments T= Temperature ( C); RH= Air relative humidity (%) All correlations were significant at a 5% level ITISE2017 Márton Ispány Respiratory diseases and air pollution data 6 / 28
7 2. Sample of the pollutants 1.00 CO 1.00 O Lag Lag SO 2 NO Lag Lag PM Lag ITISE2017 Márton Ispány Respiratory diseases and air pollution data 7 / 28
8 2. Result of the explorative data analysis Evidences for: Interdependence between the continuous covariates (pollutants) Serial (temporal) dependence in the multivariate time series of covariates (pollutants) The aim of the analysis: Quantify the association between respiratory diseases and air pollution concentrations, especially, PM 10, SO 2, NO 2, CO and O 3. ITISE2017 Márton Ispány Respiratory diseases and air pollution data 8 / 28
9 3. Generalized additive models (GAM) Hastie and Tibshirani (1990) {Y t } {Y t } t Z is a count time series with conditional Poisson distribution of mean µ t : P (Y t = y t F t 1 ) = e µ t µ y t t y t!, y t = 0, 1,... For a sample Y 1,..., Y n of {Y t }, the conditional log-likelihood function is given by n l(µ) (Y t ln µ t µ t ), t=1 where µ t depends on the pollutants covariates and confounding variables thorugh the canonical link ln(µ t ) = q p β j X jt + f j (X jt ) with q < p j=0 j=q+1 ITISE2017 Márton Ispány Respiratory diseases and air pollution data 9 / 28
10 3. Relative risk (RR) Frequently used in epidemiological studies to measure the impact of atmospheric pollutant concentrations on the health of the exposed population, Baxter et al. (1997) The relative change in the expected count of respiratory disease events per ξ unit change in the covariate while keeping the other covariates fixed: RR Xj (ξ) := E(Y X j = ξ, X i = x i, i j) E(Y X j = 0, X i = x i, i j) For Poisson regression: RR Xj (ξ) = exp ( β j ξ ) RR and its CI are estimated as follows: ( ) ( ) RR Xj (ξ) = exp ˆβ j ξ, CI(RR Xj (ξ)) = exp ˆβ j ξ z α/2 se( ˆβ j )ξ Hypothesis testing: H 0 : RR Xj (1) = 1 against H 1 : RR Xj (1) > 1 ITISE2017 Márton Ispány Respiratory diseases and air pollution data 10 / 28
11 4. Simulation study of GAM Scenarios: independent data (S1); the dependent variable is a time series and the covariates are independent random vectors in time (S2); both the dependent and independent variables are time series (S3) Independent means N(0, 1). Dependent means AR(1) process. For the 3 scenarios, the data were generated from a conditional Poisson model, Y t X t Po(µ t ), with canonical link function. The sample size n = 100 and the number of Monte Carlo simulations was equal to Similar result was obtained in Dionisio et al. (2016). ITISE2017 Márton Ispány Respiratory diseases and air pollution data 11 / 28
12 4. Simulation results for a single covariate Model Parameter Mean Bias MSE S1: Independent β 0 = β 1 = S2: ϕ=0.1 β 0 = β 1 = S2: ϕ=0.5 β 0 = β 1 = S2: ϕ=0.9 β 0 = β 1 = S3: ϕ=0.1 β 0 = β 1 = S3: ϕ=0.5 β 0 = β 1 = S3: ϕ=0.9 β 0 = β 1 = ITISE2017 Márton Ispány Respiratory diseases and air pollution data 12 / 28
13 4. Simulation results for two covariates, X 1 and X 2 Model Parameter Mean Bias MSE S1: Independent β 0 = β 1 = β 2 = S2 β 0 = β 1 = β 2 = S3: φ 11 = 0.7, φ 12 = 0 β 0 = φ 21 = 0, φ 22 = 0.5 β 1 = β 2 = S3: φ 11 = 0.7, φ 12 = 0.4 β 0 = φ 21 = 0, φ 22 = 0.5 β 1 = β 2 = ITISE2017 Márton Ispány Respiratory diseases and air pollution data 13 / 28
14 5. PCA for the pollutants Principal component analysis (PCA) for a stationary vector time series {X t } with the covariance matrix Σ X with eigenvalues/eigenvectors pair (λ i, a i ), i = 1,..., q: Z it = a i X t, i = 1,..., q Properties: Cov(Z it, Z j(t+h) ) = a i Cov(X t, X t+h )a j = a i Γ X (h)a j PC1 PC2 PC3 PC4 PC5 Standard deviation Proportion of variance Cumulative proportion of variance CO * NO * O * PM * SO * Standard deviation is the square root of the eigenvalue ITISE2017 Márton Ispány Respiratory diseases and air pollution data 14 / 28
15 5. GAM-PCA - Generalized additive modelling and principal component analysis A probabilistic latent variable model defined by with link function µ t (υ 0, υ, A) = exp Y t F t 1 Po(µ t ) and X t = AZ t { r i=0 } } υ i Z it = exp {υ 0 + υ A r X t where the latent variables {Z t } WN q (Λ), Λ is a diagonal variance matrix of dimension q and A is an orthogonal matrix of dimension q q. Given a sample (X 1, Y 1 ),..., (X n, Y n ), the log-likelihood: l n (Y t ln µ t µ t ) 1 2 t=1 n (A X t ) Λ 1 (A X t ) n ln det Λ 2 t=1 ITISE2017 Márton Ispány Respiratory diseases and air pollution data 15 / 28
16 5. Fitting GAM-PCA model and RR GAM-PCA, as a two-stage model, can be fitted by a two-stage procedure: 1 The parameter matrices A and Λ are estimated by applying the PCA for the estimated covariance matrix Σ X. 2 The parameters υ 0 and υ are estimated by fitting the GAM model with link function using the first r-th PCs. The estimate of RR per ξ unit change in the pollutant concentration is given as follows: RR ( ) X j (ξ) = exp ˆβ j ξ where r ˆβ j := â ji ˆυ i The standard error of ˆβ j se 2 ( ˆβ j ) = i=1 can be derived as r âji 2 se2 (ˆυ i ). i=1 ITISE2017 Márton Ispány Respiratory diseases and air pollution data 16 / 28
17 5. CCF of the main PCs of the pollutants PC1 PC1 x PC Lag Lag PC1 x PC3 PC2 x PC Lag Lag A counterexample to page 299 in Jolliffe (2002), in which the author argues that when the main objective of PCA is only descriptive, complications such as non-independence (temporal) does not seriously affect this objective ITISE2017 Márton Ispány Respiratory diseases and air pollution data 17 / 28
18 5. Impact of PCA for temporally dependent (VAR(1)) covariates Sample autocorrelation function () and cross-correlation function (CCF) of the PCs for simulated data: PC ₁ PC ₁ x PC₂ Lag Lag PC ₂ x PC₁ PC ₂ Lag Lag ITISE2017 Márton Ispány Respiratory diseases and air pollution data 18 / 28
19 6. GAM-PCA-VAR - GAM-PCA and vector autoregressive modelling A probabilistic latent variable model defined by Y t F t 1 Po(µ t ) and X t = ΦX t 1 + AZ t with link function r ln(µ t ) = υ i Z it = υ 0 + υ A r X t υ A r ΦX t 1, i=0 where the latent variables {Z t } WN q (Λ), Λ is a diagonal variance matrix of dimension q, A is an orthogonal matrix of dimension q q and Φ is a matrix of dimension q q. The quintuplet (υ 0, υ, A, Λ, Φ) forms the parameters of the GAM-PCA-VAR model. By the link function, Y t depends on both X t and X t 1 demonstrating the presence of serial dependence in the GAM-PCA-VAR model. ITISE2017 Márton Ispány Respiratory diseases and air pollution data 19 / 28
20 6. Fitting GAM-PCA-VAR model Given a sample (X 1, Y 1 ),..., (X n, Y n ), the log-likelihood: l n (Y t ln µ t µ t ) 1 2 t=2 n t=2 ε t AΛ 1 A ε t n 1 2 where ε t = X t ΦX t 1 is the error term. GAM-PCA-VAR is fitted by a three-stage procedure: ln det Λ, 1 (S)VAR(1) model is fitted to the original covariates by applying standard time series techniques. 2 Using PCA for the residuals defined by ˆε t = X t ˆΦX t 1, t = 2,..., n, where ˆΦ denotes the estimated autoregressive coefficient matrix in the fitted VAR(1) model, the first r-th PCs are computed. 3 GAM model is fitted using these PCs by maximizing the Poisson part of the log-likelihood. ITISE2017 Márton Ispány Respiratory diseases and air pollution data 20 / 28
21 6. PCA for the filtered pollutants The time structure of the pollutants did not alter the cumulative proportion of the variance but the clustering of the pollutants by factor loadings resulted in a different interpretation, which is more coherent with the behaviour of the variables considered. PC1 PC2 PC3 PC4 PC5 Standard deviation Proportion of variance Cumulative proportion of variance CO NO O PM SO Standard deviation is the square root of the eigenvalue ITISE2017 Márton Ispány Respiratory diseases and air pollution data 21 / 28
22 6. CCF of the main PCs of the filtered pollutants Figure shows that the fitting of the seasonal VAR(1) model practically eliminated the autocorrelation of PC1 and the cross-correlation, as expected from the aforementioned discussion. PC1 PC1 x PC PC1 x PC PC2 x PC ITISE2017 Márton Ispány Respiratory diseases and air pollution data 22 / 28
23 7. Goodness-of-fit statistics The AIC and BIC information criteria indicate that the GAM-PCA-VAR model is the best to fit the data. Model MSE AIC BIC GAM GAM-PCA GAM-PCA-VAR ITISE2017 Márton Ispány Respiratory diseases and air pollution data 23 / 28
24 7. Fitted GAM-PCA-VAR model to the number of treatments for RD The graph shows that the model provided a good fit to the number of daily treatments for children under 6 years old in the Vitória metropolitan area. Number of hospital treatments Original series Adjusted series Time ITISE2017 Márton Ispány Respiratory diseases and air pollution data 24 / 28
25 7. Relative risk for air pollutants Relative risk (RR) and 95% confidence intervals for treatments for respiratory diseases in children under 6 years old for an interquartile variation in the pollutants PM 10, SO 2, NO 2, O 3 and CO in the Metropolitan Area from Jan 2005 to Dec RR RR RR PM (1.010,1.039) 1.029(1.001,1.090) 1.075(1.001,1.092) SO (1.010,1.080) 0.982(0.972,1.001) 1.027(1.010,1.040) CO 1.020(1.010,1.030) 1.048(1.002,1.071) 1.077(1.020,1.100) NO (0.990,1.020) 1.028(1.010,1.040) 1.012(1.010,1.030) O (0.972,1.001) 1.081(1.003,1.093) 0.992(0.992,1.020) RR: GAM, RR : GAM-PCA and RR : GAM-PCA-VAR The RR estimates for the pollutant PM 10 increased from 2% ( RR) to 3% ( RR ˆ ) and 7% ( RR ). Substantial increases in the RR estimates were also observed for the pollutant CO. In this case, RR = 1.020, RR = and RR = ITISE2017 Márton Ispány Respiratory diseases and air pollution data 25 / 28
26 8. Conclusions Clear evidences were found for inter- and temporal correlation between the pollutant covariates. Simulation study was performed to study the impact of the presence of inter- and temporal correlation between the covariates for estimating the parameters of GAM. A novel hybrid GAM-PCA-VAR model was proposed. GAM-PCA-VAR model was fitted to the data. By this new model more significant (and more realistic) RR estimates were obtained. ITISE2017 Márton Ispány Respiratory diseases and air pollution data 26 / 28
27 References Juliana Bottoni de Souza, Valdério A. Reisen, Glaura C. Franco, Márton Ispány, Pascal Bondon, Jane Meri Santos (2018) Generalized additive model with principal component analysis: An application to time series of respiratory disease and air pollution data Journal of the Royal Statistical Society: Series C (Applied Statistics), DOI: /rssc ITISE2017 Márton Ispány Respiratory diseases and air pollution data 27 / 28
28 Thank you for your attention! ITISE2017 Márton Ispány Respiratory diseases and air pollution data 28 / 28
Poisson INAR processes with serial and seasonal correlation
Poisson INAR processes with serial and seasonal correlation Márton Ispány University of Debrecen, Faculty of Informatics Joint result with Marcelo Bourguignon, Klaus L. P. Vasconcellos, and Valdério A.
More informationSTATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS
STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS Eric Gilleland Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research Supported
More informationHidden Markov Models for precipitation
Hidden Markov Models for precipitation Pierre Ailliot Université de Brest Joint work with Peter Thomson Statistics Research Associates (NZ) Page 1 Context Part of the project Climate-related risks for
More informationInference for stochastic processes in environmental science. V: Meteorological adjustment of air pollution data. Coworkers
NRCSE Inference for stochastic processes in environmental science V: Meteorological adjustment of air pollution data Peter Guttorp NRCSE Coworkers Fadoua Balabdaoui, NRCSE Merlise Clyde, Duke Larry Cox,
More informationVector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.
Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series
More informationLecture 2: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationEcon 423 Lecture Notes: Additional Topics in Time Series 1
Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes
More informationAn alternative estimator for the number of factors for high-dimensional time series. A robust approach.
An alternative estimator for the number of factors for high-dimensional time series. A robust approach. Valdério A. Reisen DEST-CCE/PPGEA/PPGECON - Federal University of Espírito Santo, Brazil valderioanselmoreisen@gmail.com
More informationTransmission of Hand, Foot and Mouth Disease and Its Potential Driving
Transmission of Hand, Foot and Mouth Disease and Its Potential Driving Factors in Hong Kong, 2010-2014 Bingyi Yang 1, Eric H. Y. Lau 1*, Peng Wu 1, Benjamin J. Cowling 1 1 WHO Collaborating Centre for
More informationPredictive spatio-temporal models for spatially sparse environmental data. Umeå University
Seminar p.1/28 Predictive spatio-temporal models for spatially sparse environmental data Xavier de Luna and Marc G. Genton xavier.deluna@stat.umu.se and genton@stat.ncsu.edu http://www.stat.umu.se/egna/xdl/index.html
More informationChallenges in modelling air pollution and understanding its impact on human health
Challenges in modelling air pollution and understanding its impact on human health Alastair Rushworth Joint Statistical Meeting, Seattle Wednesday August 12 th, 2015 Acknowledgements Work in this talk
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationEstimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee
Estimating the long-term health impact of air pollution using spatial ecological studies Duncan Lee EPSRC and RSS workshop 12th September 2014 Acknowledgements This is joint work with Alastair Rushworth
More informationMultivariate Count Time Series Modeling of Surveillance Data
Multivariate Count Time Series Modeling of Surveillance Data Leonhard Held 1 Michael Höhle 2 1 Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Switzerland 2 Department of Mathematics,
More informationAn Introduction to Nonstationary Time Series Analysis
An Introduction to Analysis Ting Zhang 1 tingz@bu.edu Department of Mathematics and Statistics Boston University August 15, 2016 Boston University/Keio University Workshop 2016 A Presentation Friendly
More informationFlexible Spatio-temporal smoothing with array methods
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS046) p.849 Flexible Spatio-temporal smoothing with array methods Dae-Jin Lee CSIRO, Mathematics, Informatics and
More informationLecture 5: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationPrincipal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17
Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into
More informationSTAT Financial Time Series
STAT 6104 - Financial Time Series Chapter 4 - Estimation in the time Domain Chun Yip Yau (CUHK) STAT 6104:Financial Time Series 1 / 46 Agenda 1 Introduction 2 Moment Estimates 3 Autoregressive Models (AR
More informationQuasi-likelihood Scan Statistics for Detection of
for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for
More informationSTOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES. Richard W. Katz LECTURE 5
STOCHASTIC MODELING OF ENVIRONMENTAL TIME SERIES Richard W Katz LECTURE 5 (1) Hidden Markov Models: Applications (2) Hidden Markov Models: Viterbi Algorithm (3) Non-Homogeneous Hidden Markov Model (1)
More information2. Multivariate ARMA
2. Multivariate ARMA JEM 140: Quantitative Multivariate Finance IES, Charles University, Prague Summer 2018 JEM 140 () 2. Multivariate ARMA Summer 2018 1 / 19 Multivariate AR I Let r t = (r 1t,..., r kt
More informationTIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA
CHAPTER 6 TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA 6.1. Introduction A time series is a sequence of observations ordered in time. A basic assumption in the time series analysis
More informationNew Introduction to Multiple Time Series Analysis
Helmut Lütkepohl New Introduction to Multiple Time Series Analysis With 49 Figures and 36 Tables Springer Contents 1 Introduction 1 1.1 Objectives of Analyzing Multiple Time Series 1 1.2 Some Basics 2
More informationStatistics: A review. Why statistics?
Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval
More informationBirths at Edendale Hospital
CHAPTER 14 Births at Edendale Hospital 14.1 Introduction Haines, Munoz and van Gelderen (1989) have described the fitting of Gaussian ARIMA models to various discrete-valued time series related to births
More informationThe Role of "Leads" in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji
he Role of "Leads" in the Dynamic itle of Cointegrating Regression Models Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji Citation Issue 2006-12 Date ype echnical Report ext Version publisher URL http://hdl.handle.net/10086/13599
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationP -spline ANOVA-type interaction models for spatio-temporal smoothing
P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and
More informationRobust Testing and Variable Selection for High-Dimensional Time Series
Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional
More informationCointegration Lecture I: Introduction
1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic
More information18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013. [ ] variance: E[X] =, and Cov[X] = Σ = =
18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013 1. Consider a bivariate random variable: [ ] X X = 1 X 2 with mean and co [ ] variance: [ ] [ α1 Σ 1,1 Σ 1,2 σ 2 ρσ 1 σ E[X] =, and Cov[X]
More informationMAT 3379 (Winter 2016) FINAL EXAM (PRACTICE)
MAT 3379 (Winter 2016) FINAL EXAM (PRACTICE) 15 April 2016 (180 minutes) Professor: R. Kulik Student Number: Name: This is closed book exam. You are allowed to use one double-sided A4 sheet of notes. Only
More informationAfrican dust and forest fires: Impacts on health MED-PARTICLES-LIFE+ project
African dust and forest fires: Impacts on health MED-PARTICLES-LIFE+ project Massimo Stafoggia Barcelona, April 28 th 2015 Particles size and composition in Mediterranean countries: geographical variability
More informationHeteroskedasticity in Panel Data
Essex Summer School in Social Science Data Analysis Panel Data Analysis for Comparative Research Heteroskedasticity in Panel Data Christopher Adolph Department of Political Science and Center for Statistics
More informationHeteroskedasticity in Panel Data
Essex Summer School in Social Science Data Analysis Panel Data Analysis for Comparative Research Heteroskedasticity in Panel Data Christopher Adolph Department of Political Science and Center for Statistics
More informationVolatility. Gerald P. Dwyer. February Clemson University
Volatility Gerald P. Dwyer Clemson University February 2016 Outline 1 Volatility Characteristics of Time Series Heteroskedasticity Simpler Estimation Strategies Exponentially Weighted Moving Average Use
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationMeasurement Error in Spatial Modeling of Environmental Exposures
Measurement Error in Spatial Modeling of Environmental Exposures Chris Paciorek, Alexandros Gryparis, and Brent Coull August 9, 2005 Department of Biostatistics Harvard School of Public Health www.biostat.harvard.edu/~paciorek
More informationPrincipal Component Analysis
I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Chapter 9 Multivariate time series 2 Transfer function
More information4. MA(2) +drift: y t = µ + ɛ t + θ 1 ɛ t 1 + θ 2 ɛ t 2. Mean: where θ(l) = 1 + θ 1 L + θ 2 L 2. Therefore,
61 4. MA(2) +drift: y t = µ + ɛ t + θ 1 ɛ t 1 + θ 2 ɛ t 2 Mean: y t = µ + θ(l)ɛ t, where θ(l) = 1 + θ 1 L + θ 2 L 2. Therefore, E(y t ) = µ + θ(l)e(ɛ t ) = µ 62 Example: MA(q) Model: y t = ɛ t + θ 1 ɛ
More informationRegression. Oscar García
Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is
More informationGeneralized Autoregressive Score Models
Generalized Autoregressive Score Models by: Drew Creal, Siem Jan Koopman, André Lucas To capture the dynamic behavior of univariate and multivariate time series processes, we can allow parameters to be
More informationS-GSTAR-SUR Model for Seasonal Spatio Temporal Data Forecasting ABSTRACT
Malaysian Journal of Mathematical Sciences (S) March : 53-65 (26) Special Issue: The th IMT-GT International Conference on Mathematics, Statistics and its Applications 24 (ICMSA 24) MALAYSIAN JOURNAL OF
More informationLecture 4: Generalized Linear Mixed Models
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 An example with one random effect An example with two nested random effects
More information10. Time series regression and forecasting
10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the
More informationA TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED
A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED by W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand Email: bob.reed@canterbury.ac.nz
More informationGeneralized additive modelling of hydrological sample extremes
Generalized additive modelling of hydrological sample extremes Valérie Chavez-Demoulin 1 Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) 1 Faculty of Business and Economics, University of
More informationScenario 5: Internet Usage Solution. θ j
Scenario : Internet Usage Solution Some more information would be interesting about the study in order to know if we can generalize possible findings. For example: Does each data point consist of the total
More informationVector autoregressions, VAR
1 / 45 Vector autoregressions, VAR Chapter 2 Financial Econometrics Michael Hauser WS17/18 2 / 45 Content Cross-correlations VAR model in standard/reduced form Properties of VAR(1), VAR(p) Structural VAR,
More informationState-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53
State-space Model Eduardo Rossi University of Pavia November 2014 Rossi State-space Model Fin. Econometrics - 2014 1 / 53 Outline 1 Motivation 2 Introduction 3 The Kalman filter 4 Forecast errors 5 State
More informationList of Supplemental Figures
Online Supplement for: Weather-Related Mortality: How Heat, Cold, and Heat Waves Affect Mortality in the United States, GB Anderson and ML Bell, Epidemiology List of Supplemental Figures efigure 1. Distribution
More informationEstimating AR/MA models
September 17, 2009 Goals The likelihood estimation of AR/MA models AR(1) MA(1) Inference Model specification for a given dataset Why MLE? Traditional linear statistics is one methodology of estimating
More informationSuan Sunandha Rajabhat University
Forecasting Exchange Rate between Thai Baht and the US Dollar Using Time Series Analysis Kunya Bowornchockchai Suan Sunandha Rajabhat University INTRODUCTION The objective of this research is to forecast
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationAir Quality Modelling under a Future Climate
Air Quality Modelling under a Future Climate Rachel McInnes Met Office Hadley Centre Quantifying the impact of air pollution on health - Fri 12th Sep 2014 Crown copyright Met Office Rachel.McInnes@metoffice.gov.uk
More informationSwitching Regime Estimation
Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms
More informationFactor models. March 13, 2017
Factor models March 13, 2017 Factor Models Macro economists have a peculiar data situation: Many data series, but usually short samples How can we utilize all this information without running into degrees
More informationReduced Overdispersion in Stochastic Weather Generators for Statistical Downscaling of Seasonal Forecasts and Climate Change Scenarios
Reduced Overdispersion in Stochastic Weather Generators for Statistical Downscaling of Seasonal Forecasts and Climate Change Scenarios Yongku Kim Institute for Mathematics Applied to Geosciences National
More information* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.
Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course
More informationCommunity Health Needs Assessment through Spatial Regression Modeling
Community Health Needs Assessment through Spatial Regression Modeling Glen D. Johnson, PhD CUNY School of Public Health glen.johnson@lehman.cuny.edu Objectives: Assess community needs with respect to particular
More informationSTOCHASTIC MODELING OF MONTHLY RAINFALL AT KOTA REGION
STOCHASTIC MODELIG OF MOTHLY RAIFALL AT KOTA REGIO S. R. Bhakar, Raj Vir Singh, eeraj Chhajed and Anil Kumar Bansal Department of Soil and Water Engineering, CTAE, Udaipur, Rajasthan, India E-mail: srbhakar@rediffmail.com
More informationModel Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection
Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationAdvanced Econometrics
Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 2, 2013 Outline Univariate
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationSurvival models and health sequences
Survival models and health sequences Walter Dempsey University of Michigan July 27, 2015 Survival Data Problem Description Survival data is commonplace in medical studies, consisting of failure time information
More informationGaussian Copula Regression Application
International Mathematical Forum, Vol. 11, 2016, no. 22, 1053-1065 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2016.68118 Gaussian Copula Regression Application Samia A. Adham Department
More informationClimate Change Impact Analysis
Climate Change Impact Analysis Patrick Breach M.E.Sc Candidate pbreach@uwo.ca Outline July 2, 2014 Global Climate Models (GCMs) Selecting GCMs Downscaling GCM Data KNN-CAD Weather Generator KNN-CADV4 Example
More informationCo-integration in Continuous and Discrete Time The State-Space Error-Correction Model
Co-integration in Continuous and Discrete Time The State-Space Error-Correction Model Bernard Hanzon and Thomas Ribarits University College Cork School of Mathematical Sciences Cork, Ireland E-mail: b.hanzon@ucc.ie
More informationStochastic decadal simulation: Utility for water resource planning
Stochastic decadal simulation: Utility for water resource planning Arthur M. Greene, Lisa Goddard, Molly Hellmuth, Paula Gonzalez International Research Institute for Climate and Society (IRI) Columbia
More informationSpatio-temporal correlations in fuel pin simulation : prediction of true uncertainties on local neutron flux (preliminary results)
Spatio-temporal correlations in fuel pin simulation : prediction of true uncertainties on local neutron flux (preliminary results) Anthony Onillon Neutronics and Criticality Safety Assessment Department
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationApplied Econometrics. Professor Bernard Fingleton
Applied Econometrics Professor Bernard Fingleton 1 Causation & Prediction 2 Causation One of the main difficulties in the social sciences is estimating whether a variable has a true causal effect Data
More informationSpatio-temporal modeling of weekly malaria incidence in children under 5 for early epidemic detection in Mozambique
Spatio-temporal modeling of weekly malaria incidence in children under 5 for early epidemic detection in Mozambique Katie Colborn, PhD Department of Biostatistics and Informatics University of Colorado
More informationBootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions
JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi
More informationPaper Review: NONSTATIONARY COVARIANCE MODELS FOR GLOBAL DATA
Paper Review: NONSTATIONARY COVARIANCE MODELS FOR GLOBAL DATA BY MIKYOUNG JUN AND MICHAEL L. STEIN Presented by Sungkyu Jung April, 2009 Outline 1 Introduction 2 Covariance Models 3 Application: Level
More informationDeposited on: 07 September 2010
Lee, D. and Shaddick, G. (2008) Modelling the effects of air pollution on health using Bayesian dynamic generalised linear models. Environmetrics, 19 (8). pp. 785-804. ISSN 1180-4009 http://eprints.gla.ac.uk/36768
More informationHypothesis testing:power, test statistic CMS:
Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this
More informationChapter 5. Analysis of Multiple Time Series. 5.1 Vector Autoregressions
Chapter 5 Analysis of Multiple Time Series Note: The primary references for these notes are chapters 5 and 6 in Enders (2004). An alternative, but more technical treatment can be found in chapters 10-11
More informationDimension Reduction. David M. Blei. April 23, 2012
Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do
More informationMultivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]
1 Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] Insights: Price movements in one market can spread easily and instantly to another market [economic globalization and internet
More informationMultivariate Time Series
Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationChemometrics. Matti Hotokka Physical chemistry Åbo Akademi University
Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references
More informationREGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University
REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.
More informationPERIODIC ARMA MODELS: APPLICATION TO PARTICULATE MATTER CONCENTRATIONS
PERIODIC ARMA MODELS: APPLICATION TO PARTICULATE MATTER CONCENTRATIONS A. J. Q. Sarnaglia, V. A. Reisen, P. Bondon Federal University of Espírito Santo, Department of Statistics, Vitória, ES, Brazil Federal
More informationUncertainty quantification and visualization for functional random variables
Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationFactor models. May 11, 2012
Factor models May 11, 2012 Factor Models Macro economists have a peculiar data situation: Many data series, but usually short samples How can we utilize all this information without running into degrees
More informationA Modified Fractionally Co-integrated VAR for Predicting Returns
A Modified Fractionally Co-integrated VAR for Predicting Returns Xingzhi Yao Marwan Izzeldin Department of Economics, Lancaster University 13 December 215 Yao & Izzeldin (Lancaster University) CFE (215)
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationMAT3379 (Winter 2016)
MAT3379 (Winter 2016) Assignment 4 - SOLUTIONS The following questions will be marked: 1a), 2, 4, 6, 7a Total number of points for Assignment 4: 20 Q1. (Theoretical Question, 2 points). Yule-Walker estimation
More informationTesting methodology. It often the case that we try to determine the form of the model on the basis of data
Testing methodology It often the case that we try to determine the form of the model on the basis of data The simplest case: we try to determine the set of explanatory variables in the model Testing for
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationUsing statistical methods to analyse environmental extremes.
Using statistical methods to analyse environmental extremes. Emma Eastoe Department of Mathematics and Statistics Lancaster University December 16, 2008 Focus of talk Discuss statistical models used to
More informationModelling trends in the ocean wave climate for dimensioning of ships
Modelling trends in the ocean wave climate for dimensioning of ships STK1100 lecture, University of Oslo Erik Vanem Motivation and background 2 Ocean waves and maritime safety Ships and other marine structures
More information