REGRESSION WITH TRUNCATED DATA

Size: px
Start display at page:

Download "REGRESSION WITH TRUNCATED DATA"

Transcription

1 ROBUST 2004 c JČMF 2004 REGRESSION WITH TRUNCATED DATA Marek Brabec Keywords: Regression, truncated data, spectral estimation. Abstract:Inthispaper,wewillpresentanexampleoftheanalysisofhistorical height data. First, we will discuss some conceptual problems related to the fact that no representable surveys are available for historical periods of interests(18th and 19th centuries). Then we will state a statistical model that can be used to correct available data(swedish soldiers measurements) for their selectivity with respect to the general population height distribution. The model is based on truncated normal regression. There, we will concentrate on periodicity in the height data, whose time series is spanning more than a century. In addition to presenting some explorative spectral estimates, we will discuss some problems and features related to maximum likelihood estimation in the model. 1 Introduction In this paper, we will present an interesting practical problem encountered by anthropologists and historians. The goal is to estimate dynamics of human population mean heights in the past, from historical data(18th and 19thcenturies).Asonecanexpect,thetaskiscomplicatedbythefactthat informative data of high quality are difficult to obtain. Namely, no reasonably representative height surveys were organized then. Small samples are available from historical records here and there(e.g. family records of aristocrats etc.), their relevance for population height distribution is dubious, to say the very least, however. On the other hand, large amounts of systematically and rather precisely measured data are available from military records. Since the soldiers came from various social strata, geographical areas etc., historians think that military drafts spatially covered the concurrent(healthyadultmale)populationtoareasonableextent,[8],[9].evenifwetake this for granted, one substantial problem remains, however. To illustrate it, we consider the fact that while the recent adult population heights are known to follow normal distribution rather closely and there is no substantial reason why their historical counterparts should not behave similarly, the military sample shows consistently substantial positive skewness. Boxplots in Figure 1 demonstrate the situation for one particular sub-group of soldiers(born in Swedish rural areas) from the dataset that was collected by Richard Steckel and Lars Sandberg,[7], and which we will analyze subsequently. Why is this happening? The answer is simple: military sample, as it stands, is not representative for the healthy male population as a whole. Itishighlyselectiveinthesensethatthearmydislikedshortmen.Itspreference for taller soldiers was embodied in a simple policy: men shorter than

2 34 Marek Brabec height in cm birth year Figure 1: Height of rural-born soldiers,distributional summaries by birthyear. a prescribed value(minimum height requirement, MHR) should not be drafted.therefore,onlythoseatorabovemhrshouldappearinthemilitary sample, theoretically. Comparing boxplots in Figure 1 with a presupposed MHRof165.76cm(horizontalline),wecanseethatwhilethepolicyreally ledtotheapparentrightskew,therearesomedatabelowmhraswell.we checked with prof. Komlos(economic historian from University of Munich, who introduced us to the substantive problem of historical height distribution estimation) and made sure that these are not coding errors. Although it was educationaltolearnthat(eveninkingdomodsweden...)afewindividuals maintained to get into the draft while being(even seriously) below the MHR, one quickly realizes that while data obtained from these individuals can tell something about how effectively the policy was implemented, they do not tell much about the general male population distribution. Therefore, it became customary among historians to discard them and analyze only those above MHR,[6]. While saturated (i.e. on year-by-year basis) analyses for this data type have been attempted,[6],(including analyses of this particular dataset[7]), current interest of economic and anthropometric historians focuses on analysis of some generalizable long-term properties of the yearly(indexed by birthyear) height time series(like trends, periodicity etc.). Here, we will focus on the periodicity properties(after some simple trend and inter-regional

3 Regression with truncated data 35 corrections).thiswasthetaskwhichwasbroughttousbyourcustomer (prof. Komlos), who was interested in getting picture of the periodicity properties in an explorative style. This interest is connected to the general attention that economic historians pay to height(and other anthropological variables that thay see as possible indicators of biological standard of living ), see e.g.[11]. The ultimate idea is that biological changes(like height fluctuations) might(to some extent) reflect changes in economic conditions(e.g. through food price and hence food availability) and hence can be potentially used as economic indicators more relevant to human well being than traditional econometric characteristics like GDP. Although one can be a bit skeptical about this upshot, many economic historians take this route, see e.g.[4]. Present investigation was motivated by somewhat more modest goal of comparison between periodic properties of height series and some economic(e.g. food price)seriesasaroughcheckofsensibilityofthepreviousattemptstouse heights as indicators(then, for instance the periodicity properties should be roughly similar ). Obviously, such comparisons straightforwardly lead to the need to estimate the spectra. Nevertheless, it is immediately clear, that the fact that all the heights below MHR are discarded precludes standard statistical/spectral analysis and calls for a model that corrects for this complication. We will formulate onesuchmodelinthesection2. 2 Model The available data consist of(about 17 thousand) measurements of adult Swedish army soldiers, 22 years or older at the time of measurement, born between 1711 and 1864). To assess periodicity properties of the Swedish healthy male population height time series(indexed by birthyear), we outlined the following simple(linear) model(1) with normally distributed errors. Its form has been proposed after certain amount of data explorations and discussions with anthropometric history experts. where: Y tij = µ+α i + βt+σ F k=1 (δ 1kcos(2πtf k )+δ 2k sin(2πtf k ))+ǫ tij (1) Y tij istheheightof j-thmanfromgeneralswedishpopulation,bornin yeartat i-thbirthlocation. Timeisindexedbybirthyear, t=1correspondsto1711. Birthlocationisindexedas i=0for unknown, i=1for rural, i=2for urban. ǫ tij N ( 0; σ 2),independentlyacross t, i, j s

4 36 Marek Brabec Due to its linear(additive) structure, interpretation of the model is rather simple. It tries to assess amount of variability associated with periodic movementsofvariousfrequencies f 1,... f F aftercorrectingforpossiblebirthlocation differences and for possible(common) linear trend(as a simple form of non-stationarity). It is precisely the correction, together with the fact that the data are not balanced(having different sample sizes for different birthyears) which calls for a formulation in regression style and which precludes straightforward periodogram estimation based on standard estimators,[2]. Note that in the non-trigonometric part, the model resembles analysis of covariance. It fits a common linear trend shifted up or down differently at different birth locations, allowing for different average heights at rural/urban locations, a phenomenon which has been well documented for both historicalandrecentheights[5].weuseaparticularparametrizationwith α 0 =0, whichmeansthatthemeanheightofamenbornatunknownlocation(either ruralorurban)isgivenby µpluslinearandtrigonometrictermsintime. α 1 correspondstothedifferencebetweenmeanheightofmenbornatthesame yearatruralandunknownlocations.similarly α 2 correspondstodifference between urban and unknown locations. In order to roughly mimic periodogram estimation of variability at Fourier frequencies,wechoosefrequencies f i = 1 75,1.5 75, 2 75, ,37 75,whichareclose towhatthefourierfrequencieswouldbe,ifwehadasingleseriesoflength150 (withfrequency added,basedonpreliminarydataexplorations). Model(1)isnicelyinterpretableandwouldbeeasilyfittedtodatafrom historical Swedish population-representative male height surveys. The only flaw is that unfortunately no such surveys were performed. Available military data are non-representative of the general male population due to the MHR enforcement(and discarding the below-mhr measurements, see 1). Nevertheless, the mis-representation can be corrected rather easily, if we think ofthemilitarysampleasofasamplefromgeneralpopulation,whichislefttruncatedatthemhr.then,fortheavailablemilitarydata Y tij,wegetthe truncated regression model(2) from the original OLS model(1). Y tij remainsunobservedif Y tij < τ t Y tij = Y tij if Y tij τ t Y tij = µ+α i + βt+σ F k=1(δ 1k cos(2πtf k )+δ 2k sin(2πtf k ))+ǫ tij ǫ tij N ( 0; σ 2) α 0 =0 (2) NotethattheMHR(τ t )cangenerallyvaryintime.ideally,itshouldbe known from military regulations. Practically, it is not completely known and have been expertly estimated by historians(prof. Komlos). We have used both constant and time-varying MHR estimates and found that in terms of the main goal of the analysis. i.e. of rough spectral shape assessment, there are no substantial differences. More careful version with time-varying MHR wasusedtogetresultsinsection3,however.

5 Regression with truncated data 37 We make use maximum likelihood approach to estimate unknown parameters(µ, α 1, α 2, β, σ 2, δ 11,... δ 1F, δ 21,... δ 2F )ofthemodel(2).because of truncation, the model becomes nonlinear and no explicit formulas for the MLE s are available. Therefore, we maximize the loglikelihood(which is still rather easy to wright down) numerically, using a Newton-Raphson-like routine. We use the S-plus, especially the censorreg environment,[10] for the necessary computations. 3 Results and discussion Before discussing the periodicity properties, we did some tests of the model(2). Namely, we used asymptotic likelihood ratio test(lrt) to check whether: i) birth-location specific intercepts are necessary(p < for H 0 : α 1 = α 2 =0),ii)lineartimetrendisnecessary(p <0.0001for H 0 : β = 0). No opening for a substantial simplification of the model structure was detected here. Toassessperiodicity,onecanlookatMLEestimatesˆδ 1k,ˆδ 2k, k=1,2,...,38.or,moreconvenientlyatˆγ k = δ1k 2 + δ2 2k, k=1,2,...38(respectivelytheirsimpletransformationtodecibels:10.log 10 (ˆγ)).Theyareanalogous to periodogram estimate of raw spectrum. Figure 2 compares raw ˆγ k s(dots)withtheirsmoothedversions(solidlineforthesmooth,dotted line for pointwise computed 95% confidence interval limits). Smoothing is smoothed spectrum in db frequency Figure 2: Spectrum estimate.

6 38 Marek Brabec based on loess(locally linear) regression, namely its robust version,[3], with unequalweightingofˆγ k s,accordingtotheirasymptoticvariancesobtained as a byproduct of the MLE fitting procedure. Fromthere,wecanseethattherawestimatesfallreasonablywithinthe confidence limits(although these are necessarily too narrow to be thought of as simultaneous confidence bands due to their construction which guarantees only pointwise and not simultaneous coverage), except for estimates at two large frequencies. The overall shape of the spectrum is of non-trivial shape. Especially low frequencies are prominent(most likely remnants of some longtermtrendthatisnotestimatedseparatelyinmodel2andhenceisconfounded with long-periodic trigonometric part). Frequencies around 0.25(periods ofaboutfouryears)seemtocontributelesstotheheightserieschanges.on the other hand, high frequency part of the spectrum is presented substantially (especially periods shorter than, say 3 years). This is interesting, since this picture resembles results of[11], who did spectral analysis on another height data(not truncated and collected in a completely different way, at different times and locations). To get additional checks of the results concerning overall spectral shape, we re-estimated it under several alternative model modifications in the sensitivity analysis style. We have tried: i) both time-varying and constantexpertestimatesofmhr,ii)freeand σ 2 -restrictedmodels(σ 2 = σ 2 0 withexternallyexpert-supplied σ 2 0 anapproachthathasbeenadvocated in the past,[1]) as a way to circumvent correlation between mean-related and scale parameter estimates introduced by truncation, iii) combination of left truncation and additional interval censoring that can be suspected in connection with rounding, iv) simultaneous left and right truncation(to assess the possibility of over-representation of extremely tall men in the military sample in addition to the MHR complication), v) addition of quadratic trend, vi) omission of any trend whatsoever. Variants i) through vi) influenced the non-trigonometric part of the model and absolute values of the spectrum to various extent. Nevertheless, the spectrum shape remained remarkably similar and insensitive to the model perturbations considered. Ingeneral,wenotethatwhileintercept-likecoefficients(µ, α 1, α 2 )are rather difficult to estimate precisely, the slope-type coefficients(β, δ s) are estimated much more precisely. This is because the likelihood surface has anear-ridgee.g.inthe µ-σ 2 plane(seefigure3forasituationwithone particular birhtyear-year of 1751). Consequently, periodicity properties can be appreciated much better than intercepts determining average height per se.verticalshiftismoreuncertainandhenceitismuchhardertoanswer questions about absolute height, compared to questions about shape of their changes in time. Apart from overall(smoothed) spectrum estimate(which was required by the customer for explorative purposes), one can think of testing individual harmonictermscontributions(e.g. H 0 : δ 1k = δ 2k =0).Forsimplicity,the screening tests can be performed as Wald tests(using asymptotic variance-

7 Regression with truncated data 39 sigma o mu Figure 3: Likelihood surface example. covariance matrix of parameter estimates). Only few components were significant, namely those corresponding to periods of 75, 50, 25 and years. Resulting reduced model(with full non-trigonometric part and four harmonic components only) can be tested against the original model via LRT, and it yields p=0.47. From this, one would get the impression, that the original model can be dramatically simplified. We do not recommend such simplification, however. Coefficient estimates for different sine and cosine terms are correlated here(unlike in the classical Fourier analysis,[2]). This lack of orthogonalityisduetothefactthatthedataareunbalanced,thatwedonot work exactly with Fourier frequencies and because of truncation. It is of practical interest that the smooth spectral density estimate ŝ(f) canbeintegratedas I 1 p i+1 (pi,p i+1)= 1 ŝ(f)dftogetsomeideaaboutamount p i ofperiodicvariabilityinvariousperiodintervals(p i, p i+1 ).Thesecanbe standardized to get proportions. We have estimated proportions of variability in four sub-intervals of(2, 15) years interval(which has been intensively investigated in connection with buisness cycle in the past,[11]). They are: I (2,3) I =0.58, [3,5) I =0.20, [5,7) =0.09, I [7,10) =0.07, I [10,15) =0.06,corresponding rather nicely to proportions estimated in[11] by different methods under different circumstances.

8 40 Marek Brabec References [1] A Hearn B.(2004). A restricted maximum likelihood estimator for truncatedheightsamples.economicsandhumanbiology2,1,5 19. [2] Brockwell P.J., Davis R.A.(1991). Time series: Theory and methods. Springer, New York. [3] Cleveland W.S., Devlin S.J.(1988). Locally-weighted regression: An approach to regression analysis by local fitting. J. Am. Statist. Assoc. 83, [4] Easterlin R.A.(2000). The worldwide standard of living since J. Econ.Perspect.14,7 26. [5] Floud R., Wachtel K., Gregory A.(1990). Height, health and history. Nutritional status in the United Kingdom, Cambridge University Press. Cambridge. [6] Heintel M.(1996). Historical height samples with shortfall, a computationalapproach.historyandcomputing8,1, [7] Heintel M., Sandberg L., Steckel R.(1998). Swedish historical heights revisited: New estimation techniques and results. Proceedings of the conference The biological standard of livinging in comparative perspective. Komlos J., Baten J.(eds.) Franz Steiner Verlag. Stuttgart. [8] Komlos J.(1989). Nutrition and economic development in the eighteen century Habsburg monarchy: An anthropometric history. Princeton University Press. Princeton, NJ. [9] Komlos J.(1993). The secular trend in the biological standard of living in the United Kingdom, Economic History review 46, [10] Meeker W.Q., Duke S.D.(1981). CENSOR- A user-oriented computer program for life data analysis. The American Statistician 35, 2, 112. [11] Woitek U.(2003). Height cycles in the 18th and 19th centuries. Economic andhumanbiology2, Address: M. Brabec, Department of Biostatistics and Computing Services, National Institute of Public Health, Šrobárova 48, Praha 10, Czech Republic mbrabec@szu.cz

CHAPTER 21: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS

CHAPTER 21: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS CHAPTER 21: TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS 21.1 A stochastic process is said to be weakly stationary if its mean and variance are constant over time and if the value of the covariance between

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

On the econometrics of the Koyck model

On the econometrics of the Koyck model On the econometrics of the Koyck model Philip Hans Franses and Rutger van Oest Econometric Institute, Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR, Rotterdam, The Netherlands Econometric Institute

More information

Instructor (Brad Osgood)

Instructor (Brad Osgood) TheFourierTransformAndItsApplications-Lecture26 Instructor (Brad Osgood): Relax, but no, no, no, the TV is on. It's time to hit the road. Time to rock and roll. We're going to now turn to our last topic

More information

EQ: What is a normal distribution?

EQ: What is a normal distribution? Unit 5 - Statistics What is the purpose EQ: What tools do we have to assess data? this unit? What vocab will I need? Vocabulary: normal distribution, standard, nonstandard, interquartile range, population

More information

Regression M&M 2.3 and 10. Uses Curve fitting Summarization ('model') Description Prediction Explanation Adjustment for 'confounding' variables

Regression M&M 2.3 and 10. Uses Curve fitting Summarization ('model') Description Prediction Explanation Adjustment for 'confounding' variables Uses Curve fitting Summarization ('model') Description Prediction Explanation Adjustment for 'confounding' variables MALES FEMALES Age. Tot. %-ile; weight,g Tot. %-ile; weight,g wk N. 0th 50th 90th No.

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

CLASS NOTES: BUSINESS CALCULUS

CLASS NOTES: BUSINESS CALCULUS CLASS NOTES: BUSINESS CALCULUS These notes can be thought of as the logical skeleton of my lectures, although they will generally contain a fuller exposition of concepts but fewer examples than my lectures.

More information

Health, Mortality and the. Standard of Living in Europe. and North America since Volume II. Edited by. Roderick Floud

Health, Mortality and the. Standard of Living in Europe. and North America since Volume II. Edited by. Roderick Floud Health, Mortality and the Standard of Living in Europe and North America since 1700 Volume II Edited by Roderick Floud Former Provost Gresham College, London, UK The late Robert William Fogel Charles R.

More information

Discrete Dependent Variable Models

Discrete Dependent Variable Models Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

Exponential smoothing is, like the moving average forecast, a simple and often used forecasting technique

Exponential smoothing is, like the moving average forecast, a simple and often used forecasting technique EconS 450 Advanced Farm Management Forecasting Lecture 2 Simple Exponential Smoothing Exponential smoothing is, like the moving average forecast, a simple and often used forecasting technique Exponential

More information

A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS

A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS K Y B E R N E T I K A V O L U M E 4 3 ( 2 0 0 7, N U M B E R 4, P A G E S 4 7 1 4 8 0 A MODIFICATION OF THE HARTUNG KNAPP CONFIDENCE INTERVAL ON THE VARIANCE COMPONENT IN TWO VARIANCE COMPONENT MODELS

More information

Improved Holt Method for Irregular Time Series

Improved Holt Method for Irregular Time Series WDS'08 Proceedings of Contributed Papers, Part I, 62 67, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Improved Holt Method for Irregular Time Series T. Hanzák Charles University, Faculty of Mathematics and

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

Mid-term exam Practice problems

Mid-term exam Practice problems Mid-term exam Practice problems Most problems are short answer problems. You receive points for the answer and the explanation. Full points require both, unless otherwise specified. Explaining your answer

More information

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX Well, it depends on where you're born: A practical application of geographically weighted regression to the study of infant mortality in the U.S. P. Johnelle Sparks and Corey S. Sparks 1 Introduction Infant

More information

Lecture Wigner-Ville Distributions

Lecture Wigner-Ville Distributions Introduction to Time-Frequency Analysis and Wavelet Transforms Prof. Arun K. Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras Lecture - 6.1 Wigner-Ville Distributions

More information

Maximum-Likelihood Estimation: Basic Ideas

Maximum-Likelihood Estimation: Basic Ideas Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators

More information

Economics 883 Spring 2016 Tauchen. Jump Regression

Economics 883 Spring 2016 Tauchen. Jump Regression Economics 883 Spring 2016 Tauchen Jump Regression 1 Main Model In the jump regression setting we have X = ( Z Y where Z is the log of the market index and Y is the log of an asset price. The dynamics are

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

8.2 Harmonic Regression and the Periodogram

8.2 Harmonic Regression and the Periodogram Chapter 8 Spectral Methods 8.1 Introduction Spectral methods are based on thining of a time series as a superposition of sinusoidal fluctuations of various frequencies the analogue for a random process

More information

4.8 Instrumental Variables

4.8 Instrumental Variables 4.8. INSTRUMENTAL VARIABLES 35 4.8 Instrumental Variables A major complication that is emphasized in microeconometrics is the possibility of inconsistent parameter estimation due to endogenous regressors.

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

EC408 Topics in Applied Econometrics. B Fingleton, Dept of Economics, Strathclyde University

EC408 Topics in Applied Econometrics. B Fingleton, Dept of Economics, Strathclyde University EC408 Topics in Applied Econometrics B Fingleton, Dept of Economics, Strathclyde University Applied Econometrics What is spurious regression? How do we check for stochastic trends? Cointegration and Error

More information

Reducing Computation Time for the Analysis of Large Social Science Datasets

Reducing Computation Time for the Analysis of Large Social Science Datasets Reducing Computation Time for the Analysis of Large Social Science Datasets Douglas G. Bonett Center for Statistical Analysis in the Social Sciences University of California, Santa Cruz Jan 28, 2014 Overview

More information

Greene, Econometric Analysis (6th ed, 2008)

Greene, Econometric Analysis (6th ed, 2008) EC771: Econometrics, Spring 2010 Greene, Econometric Analysis (6th ed, 2008) Chapter 17: Maximum Likelihood Estimation The preferred estimator in a wide variety of econometric settings is that derived

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions Econ 513, USC, Department of Economics Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions I Introduction Here we look at a set of complications with the

More information

Quantile regression and heteroskedasticity

Quantile regression and heteroskedasticity Quantile regression and heteroskedasticity José A. F. Machado J.M.C. Santos Silva June 18, 2013 Abstract This note introduces a wrapper for qreg which reports standard errors and t statistics that are

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

S.R.S Varadhan by Professor Tom Louis Lindstrøm

S.R.S Varadhan by Professor Tom Louis Lindstrøm S.R.S Varadhan by Professor Tom Louis Lindstrøm Srinivasa S. R. Varadhan was born in Madras (Chennai), India in 1940. He got his B. Sc. from Presidency College in 1959 and his Ph.D. from the Indian Statistical

More information

Practicing forecasters seek techniques

Practicing forecasters seek techniques HOW TO SELECT A MOST EFFICIENT OLS MODEL FOR A TIME SERIES DATA By John C. Pickett, David P. Reilly and Robert M. McIntyre Ordinary Least Square (OLS) models are often used for time series data, though

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Describing Change over Time: Adding Linear Trends

Describing Change over Time: Adding Linear Trends Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Commentary. Regression toward the mean: a fresh look at an old story

Commentary. Regression toward the mean: a fresh look at an old story Regression toward the mean: a fresh look at an old story Back in time, when I took a statistics course from Professor G., I encountered regression toward the mean for the first time. a I did not understand

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic Chapter 6 ESTIMATION OF THE LONG-RUN COVARIANCE MATRIX An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic standard errors for the OLS and linear IV estimators presented

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

1 What does the random effect η mean?

1 What does the random effect η mean? Some thoughts on Hanks et al, Environmetrics, 2015, pp. 243-254. Jim Hodges Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota USA 55414 email: hodge003@umn.edu October 13, 2015

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle  holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive

More information

The Derivative of a Function

The Derivative of a Function The Derivative of a Function James K Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University March 1, 2017 Outline A Basic Evolutionary Model The Next Generation

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

Simple Regression Model (Assumptions)

Simple Regression Model (Assumptions) Simple Regression Model (Assumptions) Lecture 18 Reading: Sections 18.1, 18., Logarithms in Regression Analysis with Asiaphoria, 19.6 19.8 (Optional: Normal probability plot pp. 607-8) 1 Height son, inches

More information

The Wave Function. Chapter The Harmonic Wave Function

The Wave Function. Chapter The Harmonic Wave Function Chapter 3 The Wave Function On the basis of the assumption that the de Broglie relations give the frequency and wavelength of some kind of wave to be associated with a particle, plus the assumption that

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Research Note: A more powerful test statistic for reasoning about interference between units

Research Note: A more powerful test statistic for reasoning about interference between units Research Note: A more powerful test statistic for reasoning about interference between units Jake Bowers Mark Fredrickson Peter M. Aronow August 26, 2015 Abstract Bowers, Fredrickson and Panagopoulos (2012)

More information

SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES

SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES This document is meant as a complement to Chapter 4 in the textbook, the aim being to get a basic understanding of spectral densities through

More information

Economics 672 Fall 2017 Tauchen. Jump Regression

Economics 672 Fall 2017 Tauchen. Jump Regression Economics 672 Fall 2017 Tauchen 1 Main Model In the jump regression setting we have Jump Regression X = ( Z Y where Z is the log of the market index and Y is the log of an asset price. The dynamics are

More information

DEPARTMENT OF ENGINEERING MANAGEMENT. Two-level designs to estimate all main effects and two-factor interactions. Pieter T. Eendebak & Eric D.

DEPARTMENT OF ENGINEERING MANAGEMENT. Two-level designs to estimate all main effects and two-factor interactions. Pieter T. Eendebak & Eric D. DEPARTMENT OF ENGINEERING MANAGEMENT Two-level designs to estimate all main effects and two-factor interactions Pieter T. Eendebak & Eric D. Schoen UNIVERSITY OF ANTWERP Faculty of Applied Economics City

More information

Volume vs. Diameter. Teacher Lab Discussion. Overview. Picture, Data Table, and Graph

Volume vs. Diameter. Teacher Lab Discussion. Overview. Picture, Data Table, and Graph 5 6 7 Middle olume Length/olume vs. Diameter, Investigation page 1 of olume vs. Diameter Teacher Lab Discussion Overview Figure 1 In this experiment we investigate the relationship between the diameter

More information

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.

More information

Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay. Lecture - 15 Momentum Energy Four Vector

Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay. Lecture - 15 Momentum Energy Four Vector Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay Lecture - 15 Momentum Energy Four Vector We had started discussing the concept of four vectors.

More information

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014 Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the

More information

Session-Based Queueing Systems

Session-Based Queueing Systems Session-Based Queueing Systems Modelling, Simulation, and Approximation Jeroen Horters Supervisor VU: Sandjai Bhulai Executive Summary Companies often offer services that require multiple steps on the

More information

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta Statistical Science 2005, Vol. 20, No. 4, 375 379 DOI 10.1214/088342305000000395 Institute of Mathematical Statistics, 2005 Comment: Fuzzy and Randomized Confidence Intervals and P -Values Lawrence D.

More information

Chapter 6. Exploring Data: Relationships. Solutions. Exercises:

Chapter 6. Exploring Data: Relationships. Solutions. Exercises: Chapter 6 Exploring Data: Relationships Solutions Exercises: 1. (a) It is more reasonable to explore study time as an explanatory variable and the exam grade as the response variable. (b) It is more reasonable

More information

A Better Way to Do R&R Studies

A Better Way to Do R&R Studies The Evaluating the Measurement Process Approach Last month s column looked at how to fix some of the Problems with Gauge R&R Studies. This month I will show you how to learn more from your gauge R&R data

More information

An Introduction to Parameter Estimation

An Introduction to Parameter Estimation Introduction Introduction to Econometrics An Introduction to Parameter Estimation This document combines several important econometric foundations and corresponds to other documents such as the Introduction

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

Small-Area Population Forecasting Using a Spatial Regression Approach

Small-Area Population Forecasting Using a Spatial Regression Approach Small-Area Population Forecasting Using a Spatial Regression Approach Guangqing Chi and Paul R. Voss Applied Population Laboratory Department of Rural Sociology University of Wisconsin-Madison Extended

More information

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Time Series: Theory and Methods

Time Series: Theory and Methods Peter J. Brockwell Richard A. Davis Time Series: Theory and Methods Second Edition With 124 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition vn ix CHAPTER 1 Stationary

More information

II. MATCHMAKER, MATCHMAKER

II. MATCHMAKER, MATCHMAKER II. MATCHMAKER, MATCHMAKER Josh Angrist MIT 14.387 Fall 2014 Agenda Matching. What could be simpler? We look for causal effects by comparing treatment and control within subgroups where everything... or

More information

Week 8: Correlation and Regression

Week 8: Correlation and Regression Health Sciences M.Sc. Programme Applied Biostatistics Week 8: Correlation and Regression The correlation coefficient Correlation coefficients are used to measure the strength of the relationship or association

More information

Trip Distribution Modeling Milos N. Mladenovic Assistant Professor Department of Built Environment

Trip Distribution Modeling Milos N. Mladenovic Assistant Professor Department of Built Environment Trip Distribution Modeling Milos N. Mladenovic Assistant Professor Department of Built Environment 25.04.2017 Course Outline Forecasting overview and data management Trip generation modeling Trip distribution

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

V. Properties of estimators {Parts C, D & E in this file}

V. Properties of estimators {Parts C, D & E in this file} A. Definitions & Desiderata. model. estimator V. Properties of estimators {Parts C, D & E in this file}. sampling errors and sampling distribution 4. unbiasedness 5. low sampling variance 6. low mean squared

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Inference with few assumptions: Wasserman s example

Inference with few assumptions: Wasserman s example Inference with few assumptions: Wasserman s example Christopher A. Sims Princeton University sims@princeton.edu October 27, 2007 Types of assumption-free inference A simple procedure or set of statistics

More information

Measurement Independence, Parameter Independence and Non-locality

Measurement Independence, Parameter Independence and Non-locality Measurement Independence, Parameter Independence and Non-locality Iñaki San Pedro Department of Logic and Philosophy of Science University of the Basque Country, UPV/EHU inaki.sanpedro@ehu.es Abstract

More information

Appendix A. Review of Basic Mathematical Operations. 22Introduction

Appendix A. Review of Basic Mathematical Operations. 22Introduction Appendix A Review of Basic Mathematical Operations I never did very well in math I could never seem to persuade the teacher that I hadn t meant my answers literally. Introduction Calvin Trillin Many of

More information

The number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them.

The number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them. Chapter 2 Statistics In the present chapter, I will briefly review some statistical distributions that are used often in this book. I will also discuss some statistical techniques that are important in

More information

Name: Date: Period: #: Chapter 1: Outline Notes What Does a Historian Do?

Name: Date: Period: #: Chapter 1: Outline Notes What Does a Historian Do? Name: Date: Period: #: Chapter 1: Outline Notes What Does a Historian Do? Lesson 1.1 What is History? I. Why Study History? A. History is the study of the of the past. History considers both the way things

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Gaussian processes. Basic Properties VAG002-

Gaussian processes. Basic Properties VAG002- Gaussian processes The class of Gaussian processes is one of the most widely used families of stochastic processes for modeling dependent data observed over time, or space, or time and space. The popularity

More information

What Makes the XmR Chart Work?

What Makes the XmR Chart Work? Quality Digest Daily, December 3, 212 Manuscript 2 How does it separate the signals from the noise? Donald J. Wheeler There are two basic ideas or principles that need to be respected when creating a chart

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Forecasting. Simon Shaw 2005/06 Semester II

Forecasting. Simon Shaw 2005/06 Semester II Forecasting Simon Shaw s.c.shaw@maths.bath.ac.uk 2005/06 Semester II 1 Introduction A critical aspect of managing any business is planning for the future. events is called forecasting. Predicting future

More information

The Wave Function. Chapter The Harmonic Wave Function

The Wave Function. Chapter The Harmonic Wave Function Chapter 3 The Wave Function On the basis of the assumption that the de Broglie relations give the frequency and wavelength of some kind of wave to be associated with a particle, plus the assumption that

More information

An algorithm for robust fitting of autoregressive models Dimitris N. Politis

An algorithm for robust fitting of autoregressive models Dimitris N. Politis An algorithm for robust fitting of autoregressive models Dimitris N. Politis Abstract: An algorithm for robust fitting of AR models is given, based on a linear regression idea. The new method appears to

More information

Further extensions: Non-nested models and generalized linear models

Further extensions: Non-nested models and generalized linear models Further extensions: Non-nested models and generalized linear models Patrick Breheny April 2 Patrick Breheny BST 71: Bayesian Modeling in Biostatistics 1/25 Flight simulator study Today we will consider

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS QUESTIONS 5.1. (a) In a log-log model the dependent and all explanatory variables are in the logarithmic form. (b) In the log-lin model the dependent variable

More information

SHOPPING FOR EFFICIENT CONFIDENCE INTERVALS IN STRUCTURAL EQUATION MODELS. Donna Mohr and Yong Xu. University of North Florida

SHOPPING FOR EFFICIENT CONFIDENCE INTERVALS IN STRUCTURAL EQUATION MODELS. Donna Mohr and Yong Xu. University of North Florida SHOPPING FOR EFFICIENT CONFIDENCE INTERVALS IN STRUCTURAL EQUATION MODELS Donna Mohr and Yong Xu University of North Florida Authors Note Parts of this work were incorporated in Yong Xu s Masters Thesis

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Introductory Quantum Chemistry Prof. K. L. Sebastian Department of Inorganic and Physical Chemistry Indian Institute of Science, Bangalore

Introductory Quantum Chemistry Prof. K. L. Sebastian Department of Inorganic and Physical Chemistry Indian Institute of Science, Bangalore Introductory Quantum Chemistry Prof. K. L. Sebastian Department of Inorganic and Physical Chemistry Indian Institute of Science, Bangalore Lecture - 4 Postulates Part 1 (Refer Slide Time: 00:59) So, I

More information

Calibration of ECMWF forecasts

Calibration of ECMWF forecasts from Newsletter Number 142 Winter 214/15 METEOROLOGY Calibration of ECMWF forecasts Based on an image from mrgao/istock/thinkstock doi:1.21957/45t3o8fj This article appeared in the Meteorology section

More information

Gravity and the Hungarian Railway Network Csaba Gábor Pogonyi

Gravity and the Hungarian Railway Network Csaba Gábor Pogonyi Statistical Methods in Network Science Gravity and the Hungarian Railway Network Csaba Gábor Pogonyi Table of Contents 1 Introduction... 2 2 Theory The Gravity Model... 2 3 Data... 4 3.1 Railway network

More information

AGEC 661 Note Fourteen

AGEC 661 Note Fourteen AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

So far we have limited the discussion to state spaces of finite dimensions, but it turns out that, in

So far we have limited the discussion to state spaces of finite dimensions, but it turns out that, in Chapter 0 State Spaces of Infinite Dimension So far we have limited the discussion to state spaces of finite dimensions, but it turns out that, in practice, state spaces of infinite dimension are fundamental

More information

Optimization Problems

Optimization Problems Optimization Problems The goal in an optimization problem is to find the point at which the minimum (or maximum) of a real, scalar function f occurs and, usually, to find the value of the function at that

More information