A Count Data Frontier Model

Size: px
Start display at page:

Download "A Count Data Frontier Model"

Transcription

1 A Count Data Frontier Model This is an incomplete draft. Cite only as a working paper. Richard A. Hofler (rhofler@bus.ucf.edu) David Scrogin Both of the Department of Economics University of Central Florida Orlando, FL There are many cases in which a count variable is being either maximized or minimized. We propose one method for estimating the extent of inefficiency of a maximizing process that produces a nonnegative integer variable. It is based on the beta binomial/negative binomial distribution model of Schmittlein at al., We show how this model can estimate the unobserved frontier (maximum potential) number of items for each observed count value and, consequently, estimate the extent of inefficiency for each observed count value. This model s estimation by ML is illustrated on a sample of data in which individuals are attempting to maximize the number of items assembled. 0

2 A Count Data Frontier Model I. Introduction There are many cases in which a discrete variable is being either maximized or minimized. Examples of the former case include number of new patents by firms, number of wins by a sports team, a person s years of education, number of weeks worked by an individual, etc. Conversely, it is reasonable to believe that minimization behavior occurs in situations involving number of accidents along a certain stretch of roadway, number of patient accidents in a medical care facility, number of failures in a new product, number of incorrect results (false positives and false negatives) from a medical test, number of errors in the air space around an airport (i.e., letting airplanes get too close to each other), the number of times a person is arrested during his or her lifetime etc. Given the plethora of such optimizing situations involving nonnegative discrete variables, it is natural to wonder about, and investigate, how well the decision-makers are doing in approaching the very best performance that they can attain. Since 1979 (Aigner, Lovell, and Schmidt and Meeusen and van den Broek) a large and continually expanding literature on stochastic frontier analysis has investigated the extents and causes of inefficient behaviors and developed many models for such investigations. 1 One of the features common to all of the publications in this literature is in addressing continuous variables that are being maximized or minimized. No one (to our knowledge) has 1 Data Envelopment Analysis (DEA) is an alternative method for exploring the extents and causes of inefficient behaviors. It is a mathematical programming approach whereas stochastic frontiers are econometric in nature. 1

3 proposed a model for count data frontier analysis when the count variable is being maximized. (Fe-Rodriguez (2007) has proposed a frontier model for minimizing a count.) It is into this unexplored territory that this paper ventures. Specifically, we propose one method for estimating the extent of inefficiency of a maximizing process that produces a nonnegative integer variable. In other words, we propose a count data frontier model. II. The Count Data Frontier Model In order to make some of the statements in the rest of this paper easier to follow, we must first explain our context: producing a count variable. By this we mean that an entity (individual or firm) is engaged in a (maximizing, in this model) process whose outcome is a count variable (e.g., number of patents, number of wins, number of weeks worked during a period, etc.) In this process, there is a latent maximum possible (frontier) number of items (patents, wins, etc.) that can be produced. However, due to inefficiency, some percentage of that unobserved frontier outcome number is not produced. The items that are produced are observed. The shortfall between the frontier output and the observed output is the extent of inefficiency. For instance, assume that a firm is attempting to generate as many patents as possible each year. Further, imagine that the maximum possible number of patents that it can produce in a year is 17. Suppose that the firm generates only 9 patents in that year. So, the frontier outcome is 17 patents. The observed (produced) output is 9 patents. The extent of inefficiency is 8 unobserved (not produced) patents. The four foundations (below) of this count data frontier model follow those listed in Schmittlein at al., (1985.) This paper is one in a relatively small literature on underreporting (or under counting) of discrete variables. We revise their context from imperfectly recording purchases into inefficiently maximizing a count variable. Thus, when this literature refers to 2

4 recorded purchases, we translate that concept into observed or produced output. Whereas this literature talks about the actual number of purchases (the sum of those that are recorded and those that are not), we discuss the frontier output or outcome. i. The unobserved (maximum potential = frontier) count for an entity during a specified time period is Poisson distributed with mean λ. λ n e λ (1) PN ( = n λ) = n= 0,1, 2,...; λ > 0 n! ii. (2) iii. The distribution of λ is a two-parameter gamma with pdf r r 1 λα αλ e f( λ) = λ > 0; α, r > 0 Γ() r With probability p, a count item is observed. That is, a specific count item (the first item, the second item, etc.) may be either produced (so it is observed) or not produced (it is not observed). Thus, the number of observed counts (x) is distributed binomial with pmf n x n x (3) PX ( = xnp, ) = p(1 p) x= 0, 1, 2,..., n; 0 < p< 1 x iv. (4) The distribution of p is beta with pdf 1 g p = p p < p< a b> Bab (, ) a 1 b 1 ( ) (1 ) 0 1;, 0 Points iii. and iv. reflect that fact that this model inherently takes the view that production has a binary dimension. First of all, there exists for each entity a frontier number of (count) items that can be produced. Starting from the first of those items, each one can be either produced (observed) or not. As a result, number iii. is a plausible way to model the binary situation of either producing an item (e.g., a patent, a win, etc.) or not. Number iv. captures the heterogeneity in production success (efficiency) across entities. 3

5 As is standard, combining assumptions (i) and (ii) gives us a negative binomial distribution (NBD). (5) r Γ ( r+ n) α 1 pn ( = n) = n= 0,1, 2,...; r, α > 0 Γ () r n! α + 1 α + 1 n For future reference, recall the usual result for the NBD that r (6) EN [ ] = α In this case, this NBD describes the distribution of frontier counts, which are the sum of observed and unobserved counts. The latter unobserved counts are the result of a process that is attempting to maximize a count variable but falls short by the amount of the unobserved count value. In other words, that unobserved count value reflects the inefficiency of the production process. Similarly, assumptions (iii) and (iv) give us a beta-binomial (BB) model for the distribution of the observed counts given the unobserved counts. (7) n B( α + x, β + n x) PX ( = xn ) = x= 0, 1, 2,..., nn ; = 0, 1, 2,..., x B( αβ, ) Finally, Schmittlein at al., (1985) derive the marginal distribution of observed counts (8) Γ ( r+ x) α 1 Γ ( a+ x) Γ ( a+ b) PX ( = x) = Γ () r x! α + 1 α + 1 Γ() a Γ ( a+ b+ x) 1 F r+ xba,, + b+ x, x= 0,1, 2,..., nabr ;,,, α > 0 α r x where 2 F 1 () is the Gauss hypergeometric function. 4

6 Schmittlein at al., (1985) recognize that this a beta-binomial/negative binomial distribution (BB/NBD) distribution. The mean of this distribution is ra (9) EX [ ] = α( a+ b) Finally, it can be shown (Fader and Hardie, 2000) that the distribution of unobserved (frontier) counts, conditional on the observed counts, is given by (10) Γ ( r+ n) 1 Γ ( a+ b+ x) Γ ( b+ n x) PN ( = nx = x) = Γ ( r+ x)( n x)! α + 1 Γ ( a+ b+ n) Γ( b) 1 1 2F1 r+ x, b, a+ b+ x, α + 1 n= 0,1, 2,..., ; x= 0, 1, 2,..., nabr ;,,, α > 0 n x The expected value of frontier counts, conditional on the observed counts, is given by (11) r+ x B( a+ x, b+ 1) EN [ X= x) = x+ α + 1 Ba ( + xb, ) 1 F r+ x+ 1, b+ 1, a+ b+ x+ 1, α F1 r+ x, b, a+ b+ x, α A model of maximizing behavior should possess the characteristic that the observed outcomes r can never be greater than the frontier outcomes. Recall from (6) and (9) that EN [ ] = and α ra a EX [ ] =. Thus, EX [ ] = EN [ ]. Since a > 0 and b > 0, it is clear that, on average, α( a+ b) a + b the observed outcomes are always less than the frontier outcomes. Furthermore, it can easily be shown by repeatedly evaluating (10) for values of x > n, that 5

7 P(N = n X=x) = 0.00 for all values of x > n. Based on these two pieces of evidence, it appears that this model possesses the required characteristic that the observed outcomes can never be greater than the frontier outcomes. III. Estimating the Count Data Frontier Model The parameters of this model can be estimated by maximum likelihood. Let us assume that we have data on the counts x i, i = 1, 2,..., I, where x i is the number of observed counts for entity i. Assuming that the observations are independent, the likelihood is the product of the probabilities P(X = x) over all observations and the log-likelihood is given by: (12) * x ln L( abr,,, α X) = ln PX ( = xabr,,, α) where x* = max{x 1, x 2,..., x n }. x= 0 See Fader and Hardie (2000) for more. IV. Empirical Illustration An electronics firm in the South asks job applicants for its assembly operation to take a test as part of the application process. Applicants are given some written instructions about how to assemble a certain item and then taken into a test room where they are faced with a large number of those items that are unassembled. They are told that they have a specified amount of time to assemble as many items as they can. Their performance will be assessed in two ways: (i) how many items they assemble and (ii) how well they complete each assembly. They are told that the more items they correctly assemble, the better will be their chance of getting a job offer. This phase of the application process is designed to test cognitive ability, dexterity, and the applicant's ability to handle pressure. 6

8 We have the counts of how many items were assembled by each of 80 randomly-selected applicants. The sample mean is 6.63, the sample mode is five, and the sample variance is The values of assembled items ranges from zero (two occurrences) up to 17 (one person.) Table 1 contains the estimation results. Table 1. MLE Results for Item Assembly Count Data Parameter Estimate Standard Error Significance a <.01 b <.01 r <.01 α <.01 log likelihood One immediate use to which these estimates can be put is to calculate several samplewide mean values. These are the mean frontier count, the mean shortfall of observed counts below the estimated frontier count, and the mean percentage inefficiency. First of all, the mean frontier count is obtained by evaluating (6) using the estimates. This gives a value of 8.49 items that could have been assembled by each applicant, on average. The actual mean number of items assembled is 6.63, yielding an average shortfall of assembled items equal to Finally, the mean applicant was 21.92% inefficient, which corresponds to the shortfall of 1.86 divided by In other words, that applicant could have assembled 1.86, or nearly 22%, more items than were actually assembled. These means, while informative, likely obscure deeper insights that can be gained by examining the numbers of frontier counts (N) for different observed count (X) values. Table 2 7

9 shows the values of X from 0 to 17 (the largest observed sample value), the value of E[N X = x] corresponding to each observed value, and the percentage inefficiency for each X value. One feature is immediately apparent when looking at this table. Inefficiency greatly varies across the values of number of items actually assembled. 2 The largest percentage inefficiency (other than the obvious 100% when no items are assembled) is 77.4% for those who assembled only one item. The smallest shortfall is 1.1 items at the upper end of the distribution. This is a 6.3% inefficiency rate. These applicants assembled the most items, yet still could have done better. Perhaps not surprisingly, the inefficiency rate declines monotonically as the actual number of items assembled rises. Table 2. Values of X, E[N X=x] and Percentage Inefficiency For Each X Value x E[N X=x] % Inefficiency % inefficiency is calculated from E[ ] values to four decimals (not rounded) 8

10 Even more can be learned by digging even deeper into these results. The values for the frontier counts, given the actual number of items assembled (E[N X = x] ) are, after all, the means of a distribution of potential number of items that each individual could have assembled. Figure 1, showing the conditional distributions for different observed item counts, contains two examples of additional information that can be gleaned from these data. We chose to display the distributions for an observed count of zero items assembled and for five items (the modal number of items assembled.) P[N=n X=0] P[N=n X = 5] Figure 1. Conditional Distributions for Two Observed Counts 9

11 The top panel of Figure 1shows the conditional distribution of potential completed items for those who assembled zero items. This distribution reveals that only a little under 12% of those who failed to complete any items were performing at their capability. That is, this conditional distribution has only approximately 12% of its values equal to zero, the number actually assembled by these applicants. The remaining 88% of those who did not assemble even one item could have assembled at least one. In fact, more than one-quarter of them could have assembled two or three items. Furthermore, approximately 20% of those who assembled no items could have completed seven or more. The information contained in this one conditional distribution shows the extent of the underachievement (inefficiency) exhibited by many of those in this group. Similarly, the conditional distribution for 5 items assembled shows a range of potential performances. About one-third of the applicants who completed five items performed up to their potential. Obviously, then, two-thirds did not. Fully 20% of those who completed this modal number of items (5) could potentially have assembled 9 or more. V. Conclusion This paper proposes one method for estimating the extent of inefficiency for cases in which a count variable is being maximized. We show how this model can estimate a number of values relating to inefficiency in producing counts. First, the researcher can calculate the samplewide mean extent of inefficiency and the mean shortfall of actual counts below frontier (maximum potential) counts. Second, one can determine the extent of inefficiency for every observed value of the count variable being maximized. Beyond that, you can derive and examine the distribution of the number of frontier counts for each value that was actually 10

12 produced. Thus, this model provides a rich and informative set of information about the frontier number of items that can be produced and various aspects of inefficiency. This model omits covariates in favor of representing heterogeneity in production and efficiency through the assumption of specific distributions for frontier counts, observed/produced counts and the probability of producing a count item. However, it seems that introducing covariates could be done in a straightforward manner if it appears that covariates would strengthen this model and add to its ability to inform researchers about efficiency in count variable maximizing processes. 11

13 References Fader, P.S. and B.G.S. Hardie A note on modeling underreported Poisson counts. Journal of Applied Statistics, 27(8), Fe-Rodriguez, E Exploring a stochastic frontier model when the dependent variable is a count. The School of Economics Discussion Paper Series, The University of Manchester. Schmittlein, D.C., A.C. Bemmaor, D.G.Morrison Why does the NBD model work? Robustness in representing product purchases, brand purchases and imperfectly recorded purchases. Marketing Science, 4(3),

Lesson B1 - Probability Distributions.notebook

Lesson B1 - Probability Distributions.notebook Learning Goals: * Define a discrete random variable * Applying a probability distribution of a discrete random variable. * Use tables, graphs, and expressions to represent the distributions. Should you

More information

Modeling Discrete-Time Transactions Using the BG/BB Model

Modeling Discrete-Time Transactions Using the BG/BB Model University of Pennsylvania ScholarlyCommons Wharton Research Scholars Wharton School May 2008 Modeling Discrete-Time Transactions Using the BG/BB Model Harvey Yang Zhang University of Pennsylvania Follow

More information

MC3: Econometric Theory and Methods. Course Notes 4

MC3: Econometric Theory and Methods. Course Notes 4 University College London Department of Economics M.Sc. in Economics MC3: Econometric Theory and Methods Course Notes 4 Notes on maximum likelihood methods Andrew Chesher 25/0/2005 Course Notes 4, Andrew

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Using copulas to model time dependence in stochastic frontier models

Using copulas to model time dependence in stochastic frontier models Using copulas to model time dependence in stochastic frontier models Christine Amsler Michigan State University Artem Prokhorov Concordia University November 2008 Peter Schmidt Michigan State University

More information

BINOMIAL DISTRIBUTION

BINOMIAL DISTRIBUTION BINOMIAL DISTRIBUTION The binomial distribution is a particular type of discrete pmf. It describes random variables which satisfy the following conditions: 1 You perform n identical experiments (called

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

Applied Probability Models in Marketing Research: Introduction

Applied Probability Models in Marketing Research: Introduction Applied Probability Models in Marketing Research: Introduction (Supplementary Materials for the A/R/T Forum Tutorial) Bruce G. S. Hardie London Business School bhardie@london.edu www.brucehardie.com Peter

More information

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or

Expectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or Expectations Expectations Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or µ X, is E(X ) = µ X = x D x p(x) Expectations

More information

37.3. The Poisson Distribution. Introduction. Prerequisites. Learning Outcomes

37.3. The Poisson Distribution. Introduction. Prerequisites. Learning Outcomes The Poisson Distribution 37.3 Introduction In this Section we introduce a probability model which can be used when the outcome of an experiment is a random variable taking on positive integer values and

More information

QUESTION ONE Let 7C = Total Cost MC = Marginal Cost AC = Average Cost

QUESTION ONE Let 7C = Total Cost MC = Marginal Cost AC = Average Cost ANSWER QUESTION ONE Let 7C = Total Cost MC = Marginal Cost AC = Average Cost Q = Number of units AC = 7C MC = Q d7c d7c 7C Q Derivation of average cost with respect to quantity is different from marginal

More information

Business Statistics PROBABILITY DISTRIBUTIONS

Business Statistics PROBABILITY DISTRIBUTIONS Business Statistics PROBABILITY DISTRIBUTIONS CONTENTS Probability distribution functions (discrete) Characteristics of a discrete distribution Example: uniform (discrete) distribution Example: Bernoulli

More information

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5. Introduction to Econometrics. Hypothesis testing LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

More information

Truncation and Censoring

Truncation and Censoring Truncation and Censoring Laura Magazzini laura.magazzini@univr.it Laura Magazzini (@univr.it) Truncation and Censoring 1 / 35 Truncation and censoring Truncation: sample data are drawn from a subset of

More information

Time: 1 hour 30 minutes

Time: 1 hour 30 minutes Paper Reference(s) 6684/01 Edexcel GCE Statistics S2 Bronze Level B4 Time: 1 hour 30 minutes Materials required for examination papers Mathematical Formulae (Green) Items included with question Nil Candidates

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. A Probability Primer A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. Are you holding all the cards?? Random Events A random event, E,

More information

Discrete Distributions

Discrete Distributions Discrete Distributions Applications of the Binomial Distribution A manufacturing plant labels items as either defective or acceptable A firm bidding for contracts will either get a contract or not A marketing

More information

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise. 54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

14.2 THREE IMPORTANT DISCRETE PROBABILITY MODELS

14.2 THREE IMPORTANT DISCRETE PROBABILITY MODELS 14.2 THREE IMPORTANT DISCRETE PROBABILITY MODELS In Section 14.1 the idea of a discrete probability model was introduced. In the examples of that section the probability of each basic outcome of the experiment

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

Varieties of Count Data

Varieties of Count Data CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function

More information

b. ( ) ( ) ( ) ( ) ( ) 5. Independence: Two events (A & B) are independent if one of the conditions listed below is satisfied; ( ) ( ) ( )

b. ( ) ( ) ( ) ( ) ( ) 5. Independence: Two events (A & B) are independent if one of the conditions listed below is satisfied; ( ) ( ) ( ) 1. Set a. b. 2. Definitions a. Random Experiment: An experiment that can result in different outcomes, even though it is performed under the same conditions and in the same manner. b. Sample Space: This

More information

MTH 452 Mathematical Statistics

MTH 452 Mathematical Statistics MTH 452 Mathematical Statistics Instructor: Orlando Merino University of Rhode Island Spring Semester, 2006 1 5.1 Introduction An Experiment: In 10 consecutive trips to the free throw line, a professional

More information

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Fred Mannering University of South Florida Highway Accidents Cost the lives of 1.25 million people per year Leading cause

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

MATH 250 / SPRING 2011 SAMPLE QUESTIONS / SET 3

MATH 250 / SPRING 2011 SAMPLE QUESTIONS / SET 3 MATH 250 / SPRING 2011 SAMPLE QUESTIONS / SET 3 1. A four engine plane can fly if at least two engines work. a) If the engines operate independently and each malfunctions with probability q, what is the

More information

Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models

Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models Peter S Fader wwwpetefadercom Bruce G S Hardie wwwbrucehardiecom August 2007 1 Introduction This note documents how to incorporate

More information

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices

More information

Stat 3115D - Exam 2. If you run out of room, use the back of the page and indicate this on the question.

Stat 3115D - Exam 2. If you run out of room, use the back of the page and indicate this on the question. Stat 3115D - Exam 2 Name: Wednesday, April 8, 2015 Time: 50 minutes Instructor: Brittany Cuchta Instructions: Do not open the exam until I say you may. Circle or box your final answer where appropriate.

More information

Continuous Random Variables

Continuous Random Variables Continuous Random Variables Recall: For discrete random variables, only a finite or countably infinite number of possible values with positive probability. Often, there is interest in random variables

More information

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20 Problem Set MAS 6J/.6J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 0 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a

More information

Discrete Choice Modeling

Discrete Choice Modeling [Part 6] 1/55 0 Introduction 1 Summary 2 Binary Choice 3 Panel Data 4 Bivariate Probit 5 Ordered Choice 6 7 Multinomial Choice 8 Nested Logit 9 Heterogeneity 10 Latent Class 11 Mixed Logit 12 Stated Preference

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Graduate Econometrics I: What is econometrics?

Graduate Econometrics I: What is econometrics? Graduate Econometrics I: What is econometrics? Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: What is econometrics?

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

Lecture 2: Discrete Probability Distributions

Lecture 2: Discrete Probability Distributions Lecture 2: Discrete Probability Distributions IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge February 1st, 2011 Rasmussen (CUED) Lecture

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes.

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes. Closed book and notes. 60 minutes. A summary table of some univariate continuous distributions is provided. Four Pages. In this version of the Key, I try to be more complete than necessary to receive full

More information

Cross Panel Imputation

Cross Panel Imputation Cross Panel Imputation Yunting Sun, Jim Koehler, Nicolas Remy, Wiesner Vos Google Inc. 1 Introduction Many empirical microeconomics studies rely on consumer panels. For example, TV and web metering panels

More information

3.4. The Binomial Probability Distribution

3.4. The Binomial Probability Distribution 3.4. The Binomial Probability Distribution Objectives. Binomial experiment. Binomial random variable. Using binomial tables. Mean and variance of binomial distribution. 3.4.1. Four Conditions that determined

More information

ISyE 6739 Test 1 Solutions Summer 2015

ISyE 6739 Test 1 Solutions Summer 2015 1 NAME ISyE 6739 Test 1 Solutions Summer 2015 This test is 100 minutes long. You are allowed one cheat sheet. 1. (50 points) Short-Answer Questions (a) What is any subset of the sample space called? Solution:

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Chapter 9 Hypothesis Testing: Single Population Ch. 9-1 9.1 What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population

More information

STAT/MA 416 Answers Homework 4 September 27, 2007 Solutions by Mark Daniel Ward PROBLEMS

STAT/MA 416 Answers Homework 4 September 27, 2007 Solutions by Mark Daniel Ward PROBLEMS STAT/MA 416 Answers Homework 4 September 27, 2007 Solutions by Mark Daniel Ward PROBLEMS 2. We ust examine the 36 possible products of two dice. We see that 1/36 for i = 1, 9, 16, 25, 36 2/36 for i = 2,

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................

More information

3 Continuous Random Variables

3 Continuous Random Variables Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random

More information

Chapter 3 Probability Distribution

Chapter 3 Probability Distribution Chapter 3 Probability Distribution Probability Distributions A probability function is a function which assigns probabilities to the values of a random variable. Individual probability values may be denoted

More information

Optimal Design for the Rasch Poisson-Gamma Model

Optimal Design for the Rasch Poisson-Gamma Model Optimal Design for the Rasch Poisson-Gamma Model Ulrike Graßhoff, Heinz Holling and Rainer Schwabe Abstract The Rasch Poisson counts model is an important model for analyzing mental speed, an fundamental

More information

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators. IE 230 Seat # Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators. Score Exam #3a, Spring 2002 Schmeiser Closed book and notes. 60 minutes. 1. True or false. (for each,

More information

Plotting data is one method for selecting a probability distribution. The following

Plotting data is one method for selecting a probability distribution. The following Advanced Analytical Models: Over 800 Models and 300 Applications from the Basel II Accord to Wall Street and Beyond By Johnathan Mun Copyright 008 by Johnathan Mun APPENDIX C Understanding and Choosing

More information

5.1 Introduction. # of successes # of trials. 5.2 Part 1: Maximum Likelihood. MTH 452 Mathematical Statistics

5.1 Introduction. # of successes # of trials. 5.2 Part 1: Maximum Likelihood. MTH 452 Mathematical Statistics MTH 452 Mathematical Statistics Instructor: Orlando Merino University of Rhode Island Spring Semester, 2006 5.1 Introduction An Experiment: In 10 consecutive trips to the free throw line, a professional

More information

Statistics 427: Sample Final Exam

Statistics 427: Sample Final Exam Statistics 427: Sample Final Exam Instructions: The following sample exam was given several quarters ago in Stat 427. The same topics were covered in the class that year. This sample exam is meant to be

More information

Chapter (4) Discrete Probability Distributions Examples

Chapter (4) Discrete Probability Distributions Examples Chapter (4) Discrete Probability Distributions Examples Example () Two balanced dice are rolled. Let X be the sum of the two dice. Obtain the probability distribution of X. Solution When the two balanced

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. STAT 302 Introduction to Probability Learning Outcomes Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. Chapter 1: Combinatorial Analysis Demonstrate the ability to solve combinatorial

More information

Estimation of Theoretically Consistent Stochastic Frontier Functions in R

Estimation of Theoretically Consistent Stochastic Frontier Functions in R of ly in R Department of Agricultural Economics University of Kiel, Germany Outline ly of ( ) 2 / 12 Production economics Assumption of traditional empirical analyses: all producers always manage to optimize

More information

CONTINUOUS RANDOM VARIABLES

CONTINUOUS RANDOM VARIABLES the Further Mathematics network www.fmnetwork.org.uk V 07 REVISION SHEET STATISTICS (AQA) CONTINUOUS RANDOM VARIABLES The main ideas are: Properties of Continuous Random Variables Mean, Median and Mode

More information

Chapter Three. Hypothesis Testing

Chapter Three. Hypothesis Testing 3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables CDA6530: Performance Models of Computers and Networks Chapter 2: Review of Practical Random Variables Two Classes of R.V. Discrete R.V. Bernoulli Binomial Geometric Poisson Continuous R.V. Uniform Exponential,

More information

Implementing the Pareto/NBD Model Given Interval-Censored Data

Implementing the Pareto/NBD Model Given Interval-Censored Data Implementing the Pareto/NBD Model Given Interval-Censored Data Peter S. Fader www.petefader.com Bruce G. S. Hardie www.brucehardie.com November 2005 Revised August 2010 1 Introduction The Pareto/NBD model

More information

Chapters 3.2 Discrete distributions

Chapters 3.2 Discrete distributions Chapters 3.2 Discrete distributions In this section we study several discrete distributions and their properties. Here are a few, classified by their support S X. There are of course many, many more. For

More information

by Dimitri P. Bertsekas and John N. Tsitsiklis

by Dimitri P. Bertsekas and John N. Tsitsiklis INTRODUCTION TO PROBABILITY by Dimitri P. Bertsekas and John N. Tsitsiklis CHAPTER 2: ADDITIONAL PROBLEMS SECTION 2.2. Probability Mass Functions Problem 1. The probability of a royal flush in poker is

More information

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R Random Variables Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R As such, a random variable summarizes the outcome of an experiment

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

STA 256: Statistics and Probability I

STA 256: Statistics and Probability I Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. There are situations where one might be interested

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

More information

Lecture 6: Gaussian Mixture Models (GMM)

Lecture 6: Gaussian Mixture Models (GMM) Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning

More information

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5 s in the 6 rolls. Let X = number of 5 s. Then X could be 0, 1, 2, 3, 4, 5, 6. X = 0 corresponds to the

More information

Reading Material for Students

Reading Material for Students Reading Material for Students Arnab Adhikari Indian Institute of Management Calcutta, Joka, Kolkata 714, India, arnaba1@email.iimcal.ac.in Indranil Biswas Indian Institute of Management Lucknow, Prabandh

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2 IEOR 316: Introduction to Operations Research: Stochastic Models Professor Whitt SOLUTIONS to Homework Assignment 2 More Probability Review: In the Ross textbook, Introduction to Probability Models, read

More information

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions Introduction to Statistical Data Analysis Lecture 3: Probability Distributions James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Introduction to Statistical Data Analysis Lecture 4: Sampling

Introduction to Statistical Data Analysis Lecture 4: Sampling Introduction to Statistical Data Analysis Lecture 4: Sampling James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis 1 / 30 Introduction

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

Probability - Lecture 4

Probability - Lecture 4 1 Introduction Probability - Lecture 4 Many methods of computation physics and the comparison of data to a mathematical representation, apply stochastic methods. These ideas were first introduced in the

More information

MTH4451Test#2-Solutions Spring 2009

MTH4451Test#2-Solutions Spring 2009 Pat Rossi Instructions. MTH4451Test#2-Solutions Spring 2009 Name Show CLEARLY how you arrive at your answers. 1. A large jar contains US coins. In this jar, there are 350 pennies ($0.01), 300 nickels ($0.05),

More information

Prof. Thistleton MAT 505 Introduction to Probability Lecture 13

Prof. Thistleton MAT 505 Introduction to Probability Lecture 13 Prof. Thistleton MAT 55 Introduction to Probability Lecture 3 Sections from Text and MIT Video Lecture: Sections 5.4, 5.6 http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-4- probabilisticsystems-analysis-and-applied-probability-fall-2/video-lectures/lecture-8-continuousrandomvariables/

More information

Math 151. Rumbos Spring Solutions to Review Problems for Exam 3

Math 151. Rumbos Spring Solutions to Review Problems for Exam 3 Math 151. Rumbos Spring 2014 1 Solutions to Review Problems for Exam 3 1. Suppose that a book with n pages contains on average λ misprints per page. What is the probability that there will be at least

More information

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

Chapter 3 Single Random Variables and Probability Distributions (Part 1) Chapter 3 Single Random Variables and Probability Distributions (Part 1) Contents What is a Random Variable? Probability Distribution Functions Cumulative Distribution Function Probability Density Function

More information

Common Discrete Distributions

Common Discrete Distributions Common Discrete Distributions Statistics 104 Autumn 2004 Taken from Statistics 110 Lecture Notes Copyright c 2004 by Mark E. Irwin Common Discrete Distributions There are a wide range of popular discrete

More information

Business Statistics. Chapter 6 Review of Normal Probability Distribution QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 6 Review of Normal Probability Distribution QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 6 Review of Normal Probability Distribution QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing this chapter,

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions 1999 Prentice-Hall, Inc. Chap. 4-1 Chapter Topics Basic Probability Concepts: Sample

More information

Lecture-19: Modeling Count Data II

Lecture-19: Modeling Count Data II Lecture-19: Modeling Count Data II 1 In Today s Class Recap of Count data models Truncated count data models Zero-inflated models Panel count data models R-implementation 2 Count Data In many a phenomena

More information

39.3. Sums and Differences of Random Variables. Introduction. Prerequisites. Learning Outcomes

39.3. Sums and Differences of Random Variables. Introduction. Prerequisites. Learning Outcomes Sums and Differences of Random Variables 39.3 Introduction In some situations, it is possible to easily describe a problem in terms of sums and differences of random variables. Consider a typical situation

More information

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,

More information

Introduction to Probability Theory for Graduate Economics Fall 2008

Introduction to Probability Theory for Graduate Economics Fall 2008 Introduction to Probability Theory for Graduate Economics Fall 008 Yiğit Sağlam October 10, 008 CHAPTER - RANDOM VARIABLES AND EXPECTATION 1 1 Random Variables A random variable (RV) is a real-valued function

More information

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables Probability Distributions for Continuous Variables Probability Distributions for Continuous Variables Let X = lake depth at a randomly chosen point on lake surface If we draw the histogram so that the

More information

STA 247 Solutions to Assignment #1

STA 247 Solutions to Assignment #1 STA 247 Solutions to Assignment #1 Question 1: Suppose you throw three six-sided dice (coloured red, green, and blue) repeatedly, until the three dice all show different numbers. Assuming that these dice

More information

Be sure that your work gives a clear indication of reasoning. Use notation and terminology correctly.

Be sure that your work gives a clear indication of reasoning. Use notation and terminology correctly. MATH 232 Fall 2009 Test 1 Name: Instructions. Be sure that your work gives a clear indication of reasoning. Use notation and terminology correctly. No mystry numbers: If you use sage, Mathematica, or your

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

The Random Variable for Probabilities Chris Piech CS109, Stanford University

The Random Variable for Probabilities Chris Piech CS109, Stanford University The Random Variable for Probabilities Chris Piech CS109, Stanford University Assignment Grades 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Frequency Frequency 10 20 30 40 50 60 70 80

More information

Peter Fader Professor of Marketing, The Wharton School Co-Director, Wharton Customer Analytics Initiative

Peter Fader Professor of Marketing, The Wharton School Co-Director, Wharton Customer Analytics Initiative DATA-DRIVEN DONOR MANAGEMENT Peter Fader Professor of Marketing, The Wharton School Co-Director, Wharton Customer Analytics Initiative David Schweidel Assistant Professor of Marketing, University of Wisconsin-

More information

Probability Midterm Exam 2:15-3:30 pm Thursday, 21 October 1999

Probability Midterm Exam 2:15-3:30 pm Thursday, 21 October 1999 Name: 2:15-3:30 pm Thursday, 21 October 1999 You may use a calculator and your own notes but may not consult your books or neighbors. Please show your work for partial credit, and circle your answers.

More information