Revista INGENIERÍA UC ISSN: Universidad de Carabobo Venezuela
|
|
- Linette Cynthia Snow
- 6 years ago
- Views:
Transcription
1 Revista INGENIERÍA UC ISSN: Universidad de Carabobo Venezuela Vega-González, Cristóbal E. Nota Técnica: Selección de modelos en regresión lineal Revista INGENIERÍA UC, vol. 20, núm. 1, enero-abril, 2013, pp Universidad de Carabobo Valencia, Venezuela Disponible en: Cómo citar el artículo Número completo Más información del artículo Página de la revista en redalyc.org Sistema de Información Científica Red de Revistas Científicas de América Latina, el Caribe, España y Portugal Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto
2 REVISTA INGENIERíA UC, VOL. 20, NO. 1, ABRIL Nota Técnica: Selección de modelos en regresión lineal Cristóbal E. Vega González Instituto de Matemática y Cálculo Aplicado (IMYCA), Facultad de Ingeniería, Universidad de Carabobo, Valencia, Venezuela. Motivado por la lectura de varios artículos científicos con graves detalles en la estimación por mínimos cuadrados ordinarios, el autor se propuso redactar estas notas. Revisa y recomienda varias referencias para los mínimos cuadrados ordinarios. Seguidamente analiza los procedimientos de selección de modelo, donde desarrolla y recomienda la selección del modelo por el principio de mínima longitud de descripción. Palabras clave: Mínimos cuadrados ordinarios, Criterios de selección de modelos, Principio MDL Resumen.- Abstract.- Tech note: Model selection at linear regression Motivated by reading various scientific papers with serious details at fit by ordinary least squares, the author was to write this note. This note reviews and recommends several references for the ordinary least squares. The author then discusses procedures for model criterion selection, where he develops and recommends the selection of the model by the minimum description length principle. Keywords: Ordinary least squares, Model criterion selection, MDL principle Recibido: enero 2013 Aceptado: abril Introduction Recently, several scientific papers got our hands, in which for a dataset the authors want fit a curve via ordinary least square (OLS). Papers with the following characteristics The researches want to fit by OLS a dataset where for l values C 1,..., C l of concentration and the amplitudes of instrumental are given by s replica of each one {A j,1,..., A j,s, } (a l s points dataset) the research determinate the calibration curve via OLS method, but the research does not use the l s pairs, used only l points given for (C j, A), where A = k A j,k /s. Correo-e: cvega@uc.edu.ve (Cristóbal E. Vega González) This fit model is sub adjusted to data, the t values and residuals are false. The researches used for a OLS with dataset size small, with less than 30 point. The researches like a polynomial fit via OLS, polynomial grade 12 and only 20 points. For polynomial selection (among the possible results) the authors used the minimum mean square error (MSE). The techniques used in the procedures extremely caught our attention. Therefore, these notes will be written with a purely educational intention. Example 1: An example on the previous observation is for a dataset consistent of 30 points for concentration values {1, 2, 3, 4, 5, 6} the authors take 5 amplitude replica of each one. The instrumental calibration curve is the OLS fit at Table 1 and Figure 1. But the authors reported
3 88 C. Vega /, Vol. 20, No. 1, Abril 2013, Table 1: OLS, with 1 30 observations, Dependent variable: A Coef. Desv t value p value C Mean (V.D.) Desv. (V.D.) RQS Regr. Desv R R 2 corrected F(1, 28) p Value (F) 1.55e 18 Figura 2: Linear fit dataset On the second fit, Table 2 and Figure 2, the researches ignore the residual heterocedastic. Example 2: A critical case is to fit a linear regression to dataset Figura 1: Linear fit dataset OLS fit with only five point (C, A) (see Table 2 and Figure 2). A = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (5, 1), (5, 2), (5, 3), (4, 4), (5, 5), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5)} Then, you compare the result with a linear regression fit A = {(1, 3), (2, 3), (3, 3), (4, 3), (5, 3), (6, 3)}. Table 2: OLS, with 1 6 observations, Dependent variable: A Coef. Desv t value p value C Mean (V.D.) Desv. (V.D.) RQS Regr. Desv R R 2 corrected F(1, 28) p Value (F) 6.73e 07 This note reports first a description of ordinary least square, the OLS assumption necessaries, the necessary of t values estimation, and residuals test validation. Secondly, the note describes the model criteria selection where it shows the developedment of MDL principle. 2. Ordinary least square Ordinary least squares (OLS) is a method for the estimation of the unknown parameters in a linear regression. This method minimizes the sum
4 C. Vega /, Vol. 20, No. 1, Abril 2013, of square vertical distances between the responses observed in dataset and responses predicted by linear approximation. The resulting estimator can be expressed by a simple formula [1]. The OLS estimator is consistent when the regressors are exogenous and there is no perfect multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum variance mean unbiased estimation when the errors have finite variances. Under the additional assumption that the errors are normally distributed, OLS is the maximum likelihood estimator. OLS is used in economics (econometrics), political science and electrical engineering (control theory and signal processing), among many areas of application. This consistence affirmation is not possible to show it with a small size dataset. The first clear and concise exposition of the method of least squares was published by Legendre in 1805 [2]. The technique is described as an algebraic procedure for fitting linear equations to data and Legendre demonstrates the new method by analyzing the same data as Laplace for the shape of the earth. The value of Legendre s method of least squares was immediately recognized by leading astronomers and geodesists of the time. For more than one explanatory variable, it is called multiple linear regression. In multiple linear regression, there are several independent variables or functions of independent variables x 1, x 2,..., x n (potential predictors). Associated with each predictor x j a binary variable γ j, and consider models given by y = β j x j + ɛ, (1) γ j =1 Replacing the term in x j by x j, the regression gives a polynomial regression [3] Estimation of the t values A priority task in the OLS estimate is t values estimation. These indicate whether the calculated coefficients are significantly different from zero. Suppose one is fitting a linear model. It is desired to test the null hypothesis that the parameters taken to be 0, in which case the hypothesis is that variable are unrelated. If your scientific paper does not reports the t values, the estimation is null Residuals study After the model fits, a residuals study is necessary (for example see [4]). This study should prove that residuals are independent and identically distributed random variables, with mean zero and constant variance. For this, some of following test are necessary: (a) A t test, with Null hypothesis: mean = 0.0 Alternative: not equal. (b) A sign test, with Null hypothesis: median = 0.0 Alternative: not equal. (c) A signed rank test, with Null hypothesis: median = 0.0 Alternative: not equal. (d) A normality test as Shapiro Wilk test. If the data provide by a stochastic processes, as a time series, in addition, the following tests for residuals randomness are necessary: (e) Runs above and below median. (f) Runs up and down. (g) Box Pierce test for autocorrelations. (h) Media comparative t test, first and second data half. (i) Variance comparative variances F test, first and second data half. The residuals test give seriousness to work Sample size For that the residuals tests work, it is necessary to have more than 30 data, for this reason dataset size should be more than 30. Fit OLS is a matrix method, which expected that the matrix will be not singular. When the objective is estimating parameters n, it is necessary dataset size greater than 2n + 1. Then, it is not statistically plausible estimating a polynomial of a degree greater or equal to nine with a dataset of only twenty points.
5 90 C. Vega /, Vol. 20, No. 1, Abril 2013, Model selection criterion 3.1. Minimum mean square error However, a terminological difference arises in the expression mean squared error (MSE). The mean squared error of a regression is a number computed from the sum of squares of the computed residuals, and not of the unobservable errors. If that sum of squares is divided by n, the number of observations, the result is the mean of the squared residuals. Since this is a biased estimate of the variance of the unobserved errors, the bias is removed by multiplying the mean of the squared residuals by n/d f where d f is the number of degrees of freedom (n minus the number of parameters being estimated). This latter formula serves as an unbiased estimate of the variance of the unobserved errors, and is called the mean squared error [5]. OLS method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear approximation, i.e., OLS method minimizes quadratic error, and between the mean square error. This is how you use to measure the quality of the tool to the tool itself. How it should be a criterion for model selection? There are several criterions for model selection, namely, Akaike information criterion (AIC) [6] y [7], Bayesian information criterion (BIC) by Schwarz [8], Hannan Quinn information criterion (HQC) [9], and Rissanen minimum description length (MDL) principle [10], [11], [12] y [13]. The author of a research paper should choose one, for a model selection. AIC, BIC and HQC are developed on literature, and they are implemented on several computer software such as Mathlab, Scilab, R Cran and GRETL. Now, the abstention will be on Rissanen MDL principle MDL principle Rissanen [10] condenses the principle of parsimony of a model in his MDL principle: opting by the model that gives the shortest description of dataset. In this context a concise model is an easy one to describe. Considering that, a good fit implies that the model captures or describes important characteristics evident in the data. Let A be a finite set and C a codewords set, a code C on A in a simple mapping C : A C. Usually, binary codes are considered so that each codeword is a string of 0 s and 1 s. In the discrete case, A is a finite set and Q denotes a probability distribution on A. The fundamental premise of the MDL principle is that log 2 Q can be viewed as the code length of a binary code for elements or symbols in A. A linear code of length n and rank k is a linear subspace B with dimension k of the vector space F n q where F q is the finite field with q elements. Such a code is called a q ary code. If q = 2 or q = 3, the code is described as a binary code, or a ternary code respectively. The vectors in C are called codewords. The size of a code is the number of codewords and equals q k. The weight of a codeword is the number of its elements that are non zero and the distance between two codewords is the Hamming distance between them, that is, the number of elements in which they differ. The distance d of a linear code is minimum weight of its non zero codewords, or equivalently, the minimum distance between distinct codewords. A linear code of length n, dimension k, and distance d is called an [n, k, d] code. In codes context, the principle MDL suggests choosing the model that provides the shortest description of dataset. Data description is formally equivalent to the coding. Thus, implementing MDL, focus is based on statistical modeling as a means for generating codes, and the resulting code lengths provide a metric by which compares models in competition. As a broad principle, MDL has rich connections with more traditional frameworks for statistical estimation. In classical parametric statistics, for example, we want to estimate the parameter θ of a given model (class) M = { f (x n θ); θ Θ R k } based on observations x n = (x 1,..., x n ).
6 C. Vega /, Vol. 20, No. 1, Abril 2013, From a coding perspective, assume that both sender and receiver know which member f θ of the parametric family M generated a data string x n (or, equivalently, both sides know θ). Then Shannon s Source Coding Theorem states that the best description length of x n (in an average sense) is simply log f θ (x n ), because on average the code based on f θ achieves the entropy lower bound. Adding to this cost, it arrives at a code length log f θ (x n ) + L(θ) for the data string x n. Binary variable vector γ = (γ 1,..., γ n ) {0, 1} n is used as a simple index for the 2 M possible models given by Equation 1. Let β γ and X γ denote the vector of coefficients and the design matrix associated with those variables x j for which γ 1 = 1. Now, to apply MDL principle by the problem of model selection is equivalent to the problem of identifying one or more vectors γ that yield the best or nearly best models for y in Equation 1. In many cases, not all of the 2 M possibilities have sense, and hence our search might be confined to only a subset of index vectors γ. Forms of Minimum Description Length for Regression For regression, MDL criteria can be written as a sum of two code lengths, L(γ) + L(y X γ, γ). (2) The first summand on Equation 2 penalizes the code length, and second penalizes the residual energy. The residual energy by OLS is given by L(y X γ, γ) = n 2 log ( (ˆɛ γ ) 2 ), where {ˆɛ γ } are residues from estimated model γ. The penalized code length is given by L(γ) = η k log n, 2 k is numbers of non null parameters and η = {2, 2,5, 3} is a weight preventing spurious regressions. At applications for regression models, it will be considered the value of 2.5 reported in [13]. 4. A polynomial example Figura 3: Example Dataset For an illustrative example take the dataset an Figure 3. This figure also represents the linear regression y = 0, ,026055t, with t values parameter t values C C A simple Figure 3 observation shows that the dataset are not at linear fit. For this motive, the following alternative polynomial models are determined: M1 Linear, Figure 3. M2 Quadratic, Figure 4, Table 3. M3 y = c 0 + c 1 t + c 2 t 2 + c 3 t 3, Table 4. M4 y = c 0 + c 1 t + c 3 t 3, Figure 5, Table 5. M5 y = c 2 t 2 + c 3 t 3 + c 4 t 4, Table 6. M6 y = c 1 t + c 4 t 4, Figure 6, Table 7.
7 92 C. Vega /, Vol. 20, No. 1, Abril 2013, Figura 4: Example quadratic fit Figura 5: Example y = c 0 + c 1 t + c 3 t 3 fit Table 3: Example M2 fit (y = c 0 + c 1 t + c 2 t 2 ) Table 5: Example M4 fit const t e t e e const t e t e e Table 4: Example M3 fit (y = c 0 + c 1 t + c 2 t 2 + c 3 t 3 ) const t e t e e t e e Table 6: Example M5 fit (y = c 2 t 2 + c 3 t 3 + c 4 t 4 ) t e e t e e t e e M7 y = c 2 t 2 + c 3 t 3 + c 5 t 5, Table 8. M3 Fit model given at Table 4 is not representative statistically, because some t values are very small, then its p values are over a established significance, say 5 %. This model can not be reportd at any scientific paper. Table 7: Example M6 fit (y = c 1 t + c 4 t 4 ) t e t e e Table 8: Example M7 fit (y = c 2 t 2 + c 3 t 3 + c 5 t 5 ) t e e t e e t e e Now, the models M1, M2, M4, M5, M6 and M7
8 C. Vega /, Vol. 20, No. 1, Abril 2013, length. For this reason, the model that best fits the data is M6. Additionally, M6 is the only one that passes the residuals randomness among the obtained significant statistically models. We must remember that this example comes from simulated data and M6 corresponds to the polynomial simulation source. 5. Conclusion Figura 6: Example y = c 1 t + c 4 t 4 M7 fit are representative statistically. How to select the best model fitted to dataset? The authors hope that with this Tech Note may clear up doubts regarding the dataset size and the model selection criterion. For any further inquiry, please contact Stochastic processes laboratory (Laboratorio de Procesos Estocásticos) at Institute of mathematics and calculus applied (IMYCA), Facultad de Ingeniería, Universidad de Carabobo. Table 9: MSE and DL by example models fit Model MSE LD M M M M M M Acknowledgements This work has partial financing of the creation project of the laboratory of stochastic processes (FONACIT) and the Research Direction of the Faculty of engineering at the University of Carabobo. The author particularly thanked Jhoseth Rodríguez for forms reviews. Table 9 shows MSE and LD by the determinated models, with this information it is possible a model selection. M5 Model y = 2, t 2 1, t 3 +1, t 4 has a minimum MSE, and M6 Model y = 0,1329t 4, t 4 has MDL. M5 has more parameters, which guarantees that it reduces residual energy, and therefore the ECM, as well the model parameters number must be penalized, which is achieved by the description Referencias [1] Burnham, Kenneth P.; David Anderson (2002). Model Selection and Multi-Model Inference (2nd ed.). Springer. [2] Legendre, Adrien-Marie (1805), Nouvelles méthodes pour la détermination des orbites des comètes [New Methods for the Determination of the Orbits of Comets] (in French), Paris: F. Didot [3] Hazewinkel, Michiel, ed. (2001), Regression analysis, Encyclopedia of Mathematics, Springer [4] DeWayne R. Derryberry (2014). Basic Data Analysis for Time Series with R, Wiley. [5] Steel, Robert G. D.and Torrie, James H. (1960). Principles and Procedures of Statistics, with Special Reference to Biological Sciences. McGraw-Hill. [6] Akaike, Hirotugu (1974), A new look at the statistical model identification, IEEE Transactions on Automatic Control 19 (6):
9 94 C. Vega /, Vol. 20, No. 1, Abril 2013, [7] Akaike, Hirotugu (1980), Likelihood and the Bayes procedure, in Bernardo, J. M.; et al., Bayesian Statistics, Valencia: University Press, pp [8] Schwarz, Gideon E. (1978). Estimating the dimension of a model. Annals of Statistics 6 (2): [9] Hannan, E. J., and B. G. Quinn (1979) The Determination of the Order of an Autoregression, Journal of the Royal Statistical Society, B, 41, [10] Rissanen, J. (1978). Modeling by shortest data description. Automatica 14 (5): [11] Mark H. Hansen and Bin Yu (2001) Model Selection and the Principle of Minimum Description Length, Journal of the American Statistical Association, 2001, 96 (45): [12] Vega, Cristóbal (2003) Aplicación de la técnicas wavelets a series temporales, Tesis Doctoral, Universidad de Granada, Granada España. [13] Rissanen, J. (2007). Information and Complexity in Statistical Modeling. Springer.
Alfredo A. Romero * College of William and Mary
A Note on the Use of in Model Selection Alfredo A. Romero * College of William and Mary College of William and Mary Department of Economics Working Paper Number 6 October 007 * Alfredo A. Romero is a Visiting
More informationCHAPTER 6: SPECIFICATION VARIABLES
Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero
More informationGeofísica Internacional ISSN: Universidad Nacional Autónoma de México México
Geofísica Internacional ISSN: 0016-7169 silvia@geofisica.unam.mx Universidad Nacional Autónoma de México México Foppiano, A. J.; Ovalle, E. M.; Bataille, K.; Stepanova, M. Ionospheric evidence of the May
More informationRevista INGENIERÍA UC ISSN: Universidad de Carabobo Venezuela
Revista INGENIERÍA UC ISSN: 1316-6832 revistaing@uc.edu.ve Universidad de Carabobo Venezuela Martínez, Carlos Design and development of some functions to estimate the counting processes on the survival
More informationMultiple Regression Analysis
Chapter 4 Multiple Regression Analysis The simple linear regression covered in Chapter 2 can be generalized to include more than one variable. Multiple regression analysis is an extension of the simple
More informationThe Multiple Regression Model Estimation
Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:
More informationIngeniería y Desarrollo ISSN: Universidad del Norte Colombia
Ingeniería y Desarrollo ISSN: 0122-3461 ingydes@uninorte.edu.co Universidad del Norte Colombia Iglesias, Edinzo J.; Sanjuán, Marco E.; Smith, Carlos A. Tuning equation ford dynamic matrix control in siso
More informationIngeniería y Competitividad ISSN: Universidad del Valle Colombia
Ingeniería y Competitividad ISSN: 0123-3033 inycompe@gmail.com Universidad del Valle Colombia Ustariz-Farfan, Armando J.; Cano-Plata, Eduardo A.; Tacca, Hernán E. Evaluación y mejoramiento de la calidad
More informationCase of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1
Mediation Analysis: OLS vs. SUR vs. ISUR vs. 3SLS vs. SEM Note by Hubert Gatignon July 7, 2013, updated November 15, 2013, April 11, 2014, May 21, 2016 and August 10, 2016 In Chap. 11 of Statistical Analysis
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationModel selection using penalty function criteria
Model selection using penalty function criteria Laimonis Kavalieris University of Otago Dunedin, New Zealand Econometrics, Time Series Analysis, and Systems Theory Wien, June 18 20 Outline Classes of models.
More informationAnalysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems
Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA
More informationNova Scientia E-ISSN: Universidad De La Salle Bajío México
Nova Scientia E-ISSN: 2007-0705 nova_scientia@delasalle.edu.mx Universidad De La Salle Bajío México Zandron, Oscar P. Desarrollo No Perturbativo para el Modelo de Hubbard Generalizado Nova Scientia, vol.
More informationTesting methodology. It often the case that we try to determine the form of the model on the basis of data
Testing methodology It often the case that we try to determine the form of the model on the basis of data The simplest case: we try to determine the set of explanatory variables in the model Testing for
More informationSelecting an optimal set of parameters using an Akaike like criterion
Selecting an optimal set of parameters using an Akaike like criterion R. Moddemeijer a a University of Groningen, Department of Computing Science, P.O. Box 800, L-9700 AV Groningen, The etherlands, e-mail:
More informationARIMA Modelling and Forecasting
ARIMA Modelling and Forecasting Economic time series often appear nonstationary, because of trends, seasonal patterns, cycles, etc. However, the differences may appear stationary. Δx t x t x t 1 (first
More informationChristopher Dougherty London School of Economics and Political Science
Introduction to Econometrics FIFTH EDITION Christopher Dougherty London School of Economics and Political Science OXFORD UNIVERSITY PRESS Contents INTRODU CTION 1 Why study econometrics? 1 Aim of this
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationChapter 3: Regression Methods for Trends
Chapter 3: Regression Methods for Trends Time series exhibiting trends over time have a mean function that is some simple function (not necessarily constant) of time. The example random walk graph from
More informationPerformance of Autoregressive Order Selection Criteria: A Simulation Study
Pertanika J. Sci. & Technol. 6 (2): 7-76 (2008) ISSN: 028-7680 Universiti Putra Malaysia Press Performance of Autoregressive Order Selection Criteria: A Simulation Study Venus Khim-Sen Liew, Mahendran
More informationAR-order estimation by testing sets using the Modified Information Criterion
AR-order estimation by testing sets using the Modified Information Criterion Rudy Moddemeijer 14th March 2006 Abstract The Modified Information Criterion (MIC) is an Akaike-like criterion which allows
More informationLATVIAN GDP: TIME SERIES FORECASTING USING VECTOR AUTO REGRESSION
LATVIAN GDP: TIME SERIES FORECASTING USING VECTOR AUTO REGRESSION BEZRUCKO Aleksandrs, (LV) Abstract: The target goal of this work is to develop a methodology of forecasting Latvian GDP using ARMA (AutoRegressive-Moving-Average)
More informationUNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS FINAL EXAM (Type B) 2. This document is self contained. Your are not allowed to use any other material.
DURATION: 125 MINUTES Directions: UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS FINAL EXAM (Type B) 1. This is an example of a exam that you can use to self-evaluate about the contents of the course Econometrics
More informationVariable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1
Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationLocal Asymptotics and the Minimum Description Length
Local Asymptotics and the Minimum Description Length Dean P. Foster and Robert A. Stine Department of Statistics The Wharton School of the University of Pennsylvania Philadelphia, PA 19104-6302 March 27,
More informationPackage ForecastCombinations
Type Package Title Forecast Combinations Version 1.1 Date 2015-11-22 Author Eran Raviv Package ForecastCombinations Maintainer Eran Raviv November 23, 2015 Description Aim: Supports
More informationThe Role of "Leads" in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji
he Role of "Leads" in the Dynamic itle of Cointegrating Regression Models Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji Citation Issue 2006-12 Date ype echnical Report ext Version publisher URL http://hdl.handle.net/10086/13599
More informationEmpirical Economic Research, Part II
Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction
More informationEstimation of Variance Components in Linear Mixed Models with Commutative Orthogonal Block Structure
Revista Colombiana de Estadística Diciembre 2013, volumen 36, no 2, pp 261 a 271 Estimation of Variance Components in Linear Mixed Models with Commutative Orthogonal Block Structure Estimación de las componentes
More information7. Integrated Processes
7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider
More informationModel selection criteria Λ
Model selection criteria Λ Jean-Marie Dufour y Université de Montréal First version: March 1991 Revised: July 1998 This version: April 7, 2002 Compiled: April 7, 2002, 4:10pm Λ This work was supported
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationModel Selection. Frank Wood. December 10, 2009
Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide
More informationTime Series Forecasting: A Tool for Out - Sample Model Selection and Evaluation
AMERICAN JOURNAL OF SCIENTIFIC AND INDUSTRIAL RESEARCH 214, Science Huβ, http://www.scihub.org/ajsir ISSN: 2153-649X, doi:1.5251/ajsir.214.5.6.185.194 Time Series Forecasting: A Tool for Out - Sample Model
More informationBistua: Revista de la Facultad de Ciencias Básicas ISSN: Universidad de Pamplona Colombia
Bistua: Revista de la Facultad de Ciencias Básicas ISSN: 0120-4211 revistabistua@unipamplona.edu.co Universidad de Pamplona Colombia Hincapie, D.; Kreuzer, H.J.; Garcia-Sucerquia, J. Medición del Potencial
More informationMachine learning, ALAMO, and constrained regression
Machine learning, ALAMO, and constrained regression Nick Sahinidis Acknowledgments: Alison Cozad, David Miller, Zach Wilson MACHINE LEARNING PROBLEM Build a model of output variables as a function of input
More informationSection 2 NABE ASTEF 65
Section 2 NABE ASTEF 65 Econometric (Structural) Models 66 67 The Multiple Regression Model 68 69 Assumptions 70 Components of Model Endogenous variables -- Dependent variables, values of which are determined
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationTopic 4 Unit Roots. Gerald P. Dwyer. February Clemson University
Topic 4 Unit Roots Gerald P. Dwyer Clemson University February 2016 Outline 1 Unit Roots Introduction Trend and Difference Stationary Autocorrelations of Series That Have Deterministic or Stochastic Trends
More informationEmpirical Market Microstructure Analysis (EMMA)
Empirical Market Microstructure Analysis (EMMA) Lecture 3: Statistical Building Blocks and Econometric Basics Prof. Dr. Michael Stein michael.stein@vwl.uni-freiburg.de Albert-Ludwigs-University of Freiburg
More informationMinimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions
Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Parthan Kasarapu & Lloyd Allison Monash University, Australia September 8, 25 Parthan Kasarapu
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationArma-Arch Modeling Of The Returns Of First Bank Of Nigeria
Arma-Arch Modeling Of The Returns Of First Bank Of Nigeria Emmanuel Alphonsus Akpan Imoh Udo Moffat Department of Mathematics and Statistics University of Uyo, Nigeria Ntiedo Bassey Ekpo Department of
More information7. Integrated Processes
7. Integrated Processes Up to now: Analysis of stationary processes (stationary ARMA(p, q) processes) Problem: Many economic time series exhibit non-stationary patterns over time 226 Example: We consider
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationInformation Criteria and Model Selection
Information Criteria and Model Selection Herman J. Bierens Pennsylvania State University March 12, 2006 1. Introduction Let L n (k) be the maximum likelihood of a model with k parameters based on a sample
More informationSimultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations
Simultaneous Equation Models. Introduction: basic definitions 2. Consequences of ignoring simultaneity 3. The identification problem 4. Estimation of simultaneous equation models 5. Example: IS LM model
More informationSimple Linear Regression Model & Introduction to. OLS Estimation
Inside ECOOMICS Introduction to Econometrics Simple Linear Regression Model & Introduction to Introduction OLS Estimation We are interested in a model that explains a variable y in terms of other variables
More informationIntroduction to Econometrics
Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle
More informationG. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication
G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?
More informationThe Simple Regression Model. Part II. The Simple Regression Model
Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square
More informationBristol Business School
Bristol Business School Academic Year: 10/11 Examination Period: January Module Leader: Module Code: Title of Module: John Paul Dunne Econometrics UMEN3P-15-M Examination Date: 12 January 2011 Examination
More informationMeasures of Fit from AR(p)
Measures of Fit from AR(p) Residual Sum of Squared Errors Residual Mean Squared Error Root MSE (Standard Error of Regression) R-squared R-bar-squared = = T t e t SSR 1 2 ˆ = = T t e t p T s 1 2 2 ˆ 1 1
More informationDynamic Time Series Regression: A Panacea for Spurious Correlations
International Journal of Scientific and Research Publications, Volume 6, Issue 10, October 2016 337 Dynamic Time Series Regression: A Panacea for Spurious Correlations Emmanuel Alphonsus Akpan *, Imoh
More informationThe Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models
The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationA NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.
A EW IFORMATIO THEORETIC APPROACH TO ORDER ESTIMATIO PROBLEM Soosan Beheshti Munther A. Dahleh Massachusetts Institute of Technology, Cambridge, MA 0239, U.S.A. Abstract: We introduce a new method of model
More informationAutoregressive Moving Average (ARMA) Models and their Practical Applications
Autoregressive Moving Average (ARMA) Models and their Practical Applications Massimo Guidolin February 2018 1 Essential Concepts in Time Series Analysis 1.1 Time Series and Their Properties Time series:
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationBusiness Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM
Subject Business Economics Paper No and Title Module No and Title Module Tag 8, Fundamentals of Econometrics 3, The gauss Markov theorem BSE_P8_M3 1 TABLE OF CONTENTS 1. INTRODUCTION 2. ASSUMPTIONS OF
More informationOn Autoregressive Order Selection Criteria
On Autoregressive Order Selection Criteria Venus Khim-Sen Liew Faculty of Economics and Management, Universiti Putra Malaysia, 43400 UPM, Serdang, Malaysia This version: 1 March 2004. Abstract This study
More informationStatistics 262: Intermediate Biostatistics Model selection
Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.
More informationFinancial Econometrics
Financial Econometrics Multivariate Time Series Analysis: VAR Gerald P. Dwyer Trinity College, Dublin January 2013 GPD (TCD) VAR 01/13 1 / 25 Structural equations Suppose have simultaneous system for supply
More informationApplied Econometrics. Professor Bernard Fingleton
Applied Econometrics Professor Bernard Fingleton Regression A quick summary of some key issues Some key issues Text book JH Stock & MW Watson Introduction to Econometrics 2nd Edition Software Gretl Gretl.sourceforge.net
More informationLecture#17. Time series III
Lecture#17 Time series III 1 Dynamic causal effects Think of macroeconomic data. Difficult to think of an RCT. Substitute: different treatments to the same (observation unit) at different points in time.
More informationAn Introduction to Econometrics. A Self-contained Approach. Frank Westhoff. The MIT Press Cambridge, Massachusetts London, England
An Introduction to Econometrics A Self-contained Approach Frank Westhoff The MIT Press Cambridge, Massachusetts London, England How to Use This Book xvii 1 Descriptive Statistics 1 Chapter 1 Prep Questions
More informationEstimating multilevel models for categorical data via generalized least squares
Revista Colombiana de Estadística Volumen 28 N o 1. pp. 63 a 76. Junio 2005 Estimating multilevel models for categorical data via generalized least squares Minerva Montero Díaz * Valia Guerra Ones ** Resumen
More informationAdditive Outlier Detection in Seasonal ARIMA Models by a Modified Bayesian Information Criterion
13 Additive Outlier Detection in Seasonal ARIMA Models by a Modified Bayesian Information Criterion Pedro Galeano and Daniel Peña CONTENTS 13.1 Introduction... 317 13.2 Formulation of the Outlier Detection
More informationIntroduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017
Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent
More informationEconometric Forecasting Overview
Econometric Forecasting Overview April 30, 2014 Econometric Forecasting Econometric models attempt to quantify the relationship between the parameter of interest (dependent variable) and a number of factors
More informationMinimum Message Length Autoregressive Model Order Selection
Minimum Message Length Autoregressive Model Order Selection Leigh J. Fitzgibbon School of Computer Science and Software Engineering, Monash University Clayton, Victoria 38, Australia leighf@csse.monash.edu.au
More informationTESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST
Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationREVISTA INVESTIGACION OPERACIONAL VOL. 38, NO. 3, , 2017
REVISTA INVESTIGACION OPERACIONAL VOL. 38, NO. 3, 247-251, 2017 LINEAR REGRESSION: AN ALTERNATIVE TO LOGISTIC REGRESSION THROUGH THE NON- PARAMETRIC REGRESSION Ernesto P. Menéndez*, Julia A. Montano**
More informationUNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS Academic year 2009/10 FINAL EXAM (2nd Call) June, 25, 2010
UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS Academic year 2009/10 FINAL EXAM (2nd Call) June, 25, 2010 Very important: Take into account that: 1. Each question, unless otherwise stated, requires a complete
More informationBootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions
JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi
More informationMultiple Regression. Peerapat Wongchaiwat, Ph.D.
Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model
More information1. The Multivariate Classical Linear Regression Model
Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The
More information10. Time series regression and forecasting
10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationTesting and Model Selection
Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses
More informationEstimating AR/MA models
September 17, 2009 Goals The likelihood estimation of AR/MA models AR(1) MA(1) Inference Model specification for a given dataset Why MLE? Traditional linear statistics is one methodology of estimating
More informationIntroduction to Eco n o m et rics
2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. Introduction to Eco n o m et rics Third Edition G.S. Maddala Formerly
More informationAnswer all questions from part I. Answer two question from part II.a, and one question from part II.b.
B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries
More informationLeast Squares Estimation-Finite-Sample Properties
Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More informationHow the mean changes depends on the other variable. Plots can show what s happening...
Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How
More informationA MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS
A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS Yi-Ou Li and Tülay Adalı University of Maryland Baltimore County Baltimore, MD Vince D. Calhoun The MIND Institute
More informationEcon 510 B. Brown Spring 2014 Final Exam Answers
Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationECON 4160, Spring term Lecture 12
ECON 4160, Spring term 2013. Lecture 12 Non-stationarity and co-integration 2/2 Ragnar Nymoen Department of Economics 13 Nov 2013 1 / 53 Introduction I So far we have considered: Stationary VAR, with deterministic
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationECON 4160, Lecture 11 and 12
ECON 4160, 2016. Lecture 11 and 12 Co-integration Ragnar Nymoen Department of Economics 9 November 2017 1 / 43 Introduction I So far we have considered: Stationary VAR ( no unit roots ) Standard inference
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More information