A Count Data Frontier Model
|
|
- Suzan Taylor
- 6 years ago
- Views:
Transcription
1 A Count Data Frontier Model This is an incomplete draft. Cite only as a working paper. Richard A. Hofler (rhofler@bus.ucf.edu) David Scrogin Both of the Department of Economics University of Central Florida Orlando, FL There are many cases in which a count variable is being either maximized or minimized. We propose one method for estimating the extent of inefficiency of a maximizing process that produces a nonnegative integer variable. It is based on the beta binomial/negative binomial distribution model of Schmittlein at al., We show how this model can estimate the unobserved frontier (maximum potential) number of items for each observed count value and, consequently, estimate the extent of inefficiency for each observed count value. This model s estimation by ML is illustrated on a sample of data in which individuals are attempting to maximize the number of items assembled. 0
2 A Count Data Frontier Model I. Introduction There are many cases in which a discrete variable is being either maximized or minimized. Examples of the former case include number of new patents by firms, number of wins by a sports team, a person s years of education, number of weeks worked by an individual, etc. Conversely, it is reasonable to believe that minimization behavior occurs in situations involving number of accidents along a certain stretch of roadway, number of patient accidents in a medical care facility, number of failures in a new product, number of incorrect results (false positives and false negatives) from a medical test, number of errors in the air space around an airport (i.e., letting airplanes get too close to each other), the number of times a person is arrested during his or her lifetime etc. Given the plethora of such optimizing situations involving nonnegative discrete variables, it is natural to wonder about, and investigate, how well the decision-makers are doing in approaching the very best performance that they can attain. Since 1979 (Aigner, Lovell, and Schmidt and Meeusen and van den Broek) a large and continually expanding literature on stochastic frontier analysis has investigated the extents and causes of inefficient behaviors and developed many models for such investigations. 1 One of the features common to all of the publications in this literature is in addressing continuous variables that are being maximized or minimized. No one (to our knowledge) has 1 Data Envelopment Analysis (DEA) is an alternative method for exploring the extents and causes of inefficient behaviors. It is a mathematical programming approach whereas stochastic frontiers are econometric in nature. 1
3 proposed a model for count data frontier analysis when the count variable is being maximized. (Fe-Rodriguez (2007) has proposed a frontier model for minimizing a count.) It is into this unexplored territory that this paper ventures. Specifically, we propose one method for estimating the extent of inefficiency of a maximizing process that produces a nonnegative integer variable. In other words, we propose a count data frontier model. II. The Count Data Frontier Model In order to make some of the statements in the rest of this paper easier to follow, we must first explain our context: producing a count variable. By this we mean that an entity (individual or firm) is engaged in a (maximizing, in this model) process whose outcome is a count variable (e.g., number of patents, number of wins, number of weeks worked during a period, etc.) In this process, there is a latent maximum possible (frontier) number of items (patents, wins, etc.) that can be produced. However, due to inefficiency, some percentage of that unobserved frontier outcome number is not produced. The items that are produced are observed. The shortfall between the frontier output and the observed output is the extent of inefficiency. For instance, assume that a firm is attempting to generate as many patents as possible each year. Further, imagine that the maximum possible number of patents that it can produce in a year is 17. Suppose that the firm generates only 9 patents in that year. So, the frontier outcome is 17 patents. The observed (produced) output is 9 patents. The extent of inefficiency is 8 unobserved (not produced) patents. The four foundations (below) of this count data frontier model follow those listed in Schmittlein at al., (1985.) This paper is one in a relatively small literature on underreporting (or under counting) of discrete variables. We revise their context from imperfectly recording purchases into inefficiently maximizing a count variable. Thus, when this literature refers to 2
4 recorded purchases, we translate that concept into observed or produced output. Whereas this literature talks about the actual number of purchases (the sum of those that are recorded and those that are not), we discuss the frontier output or outcome. i. The unobserved (maximum potential = frontier) count for an entity during a specified time period is Poisson distributed with mean λ. λ n e λ (1) PN ( = n λ) = n= 0,1, 2,...; λ > 0 n! ii. (2) iii. The distribution of λ is a two-parameter gamma with pdf r r 1 λα αλ e f( λ) = λ > 0; α, r > 0 Γ() r With probability p, a count item is observed. That is, a specific count item (the first item, the second item, etc.) may be either produced (so it is observed) or not produced (it is not observed). Thus, the number of observed counts (x) is distributed binomial with pmf n x n x (3) PX ( = xnp, ) = p(1 p) x= 0, 1, 2,..., n; 0 < p< 1 x iv. (4) The distribution of p is beta with pdf 1 g p = p p < p< a b> Bab (, ) a 1 b 1 ( ) (1 ) 0 1;, 0 Points iii. and iv. reflect that fact that this model inherently takes the view that production has a binary dimension. First of all, there exists for each entity a frontier number of (count) items that can be produced. Starting from the first of those items, each one can be either produced (observed) or not. As a result, number iii. is a plausible way to model the binary situation of either producing an item (e.g., a patent, a win, etc.) or not. Number iv. captures the heterogeneity in production success (efficiency) across entities. 3
5 As is standard, combining assumptions (i) and (ii) gives us a negative binomial distribution (NBD). (5) r Γ ( r+ n) α 1 pn ( = n) = n= 0,1, 2,...; r, α > 0 Γ () r n! α + 1 α + 1 n For future reference, recall the usual result for the NBD that r (6) EN [ ] = α In this case, this NBD describes the distribution of frontier counts, which are the sum of observed and unobserved counts. The latter unobserved counts are the result of a process that is attempting to maximize a count variable but falls short by the amount of the unobserved count value. In other words, that unobserved count value reflects the inefficiency of the production process. Similarly, assumptions (iii) and (iv) give us a beta-binomial (BB) model for the distribution of the observed counts given the unobserved counts. (7) n B( α + x, β + n x) PX ( = xn ) = x= 0, 1, 2,..., nn ; = 0, 1, 2,..., x B( αβ, ) Finally, Schmittlein at al., (1985) derive the marginal distribution of observed counts (8) Γ ( r+ x) α 1 Γ ( a+ x) Γ ( a+ b) PX ( = x) = Γ () r x! α + 1 α + 1 Γ() a Γ ( a+ b+ x) 1 F r+ xba,, + b+ x, x= 0,1, 2,..., nabr ;,,, α > 0 α r x where 2 F 1 () is the Gauss hypergeometric function. 4
6 Schmittlein at al., (1985) recognize that this a beta-binomial/negative binomial distribution (BB/NBD) distribution. The mean of this distribution is ra (9) EX [ ] = α( a+ b) Finally, it can be shown (Fader and Hardie, 2000) that the distribution of unobserved (frontier) counts, conditional on the observed counts, is given by (10) Γ ( r+ n) 1 Γ ( a+ b+ x) Γ ( b+ n x) PN ( = nx = x) = Γ ( r+ x)( n x)! α + 1 Γ ( a+ b+ n) Γ( b) 1 1 2F1 r+ x, b, a+ b+ x, α + 1 n= 0,1, 2,..., ; x= 0, 1, 2,..., nabr ;,,, α > 0 n x The expected value of frontier counts, conditional on the observed counts, is given by (11) r+ x B( a+ x, b+ 1) EN [ X= x) = x+ α + 1 Ba ( + xb, ) 1 F r+ x+ 1, b+ 1, a+ b+ x+ 1, α F1 r+ x, b, a+ b+ x, α A model of maximizing behavior should possess the characteristic that the observed outcomes r can never be greater than the frontier outcomes. Recall from (6) and (9) that EN [ ] = and α ra a EX [ ] =. Thus, EX [ ] = EN [ ]. Since a > 0 and b > 0, it is clear that, on average, α( a+ b) a + b the observed outcomes are always less than the frontier outcomes. Furthermore, it can easily be shown by repeatedly evaluating (10) for values of x > n, that 5
7 P(N = n X=x) = 0.00 for all values of x > n. Based on these two pieces of evidence, it appears that this model possesses the required characteristic that the observed outcomes can never be greater than the frontier outcomes. III. Estimating the Count Data Frontier Model The parameters of this model can be estimated by maximum likelihood. Let us assume that we have data on the counts x i, i = 1, 2,..., I, where x i is the number of observed counts for entity i. Assuming that the observations are independent, the likelihood is the product of the probabilities P(X = x) over all observations and the log-likelihood is given by: (12) * x ln L( abr,,, α X) = ln PX ( = xabr,,, α) where x* = max{x 1, x 2,..., x n }. x= 0 See Fader and Hardie (2000) for more. IV. Empirical Illustration An electronics firm in the South asks job applicants for its assembly operation to take a test as part of the application process. Applicants are given some written instructions about how to assemble a certain item and then taken into a test room where they are faced with a large number of those items that are unassembled. They are told that they have a specified amount of time to assemble as many items as they can. Their performance will be assessed in two ways: (i) how many items they assemble and (ii) how well they complete each assembly. They are told that the more items they correctly assemble, the better will be their chance of getting a job offer. This phase of the application process is designed to test cognitive ability, dexterity, and the applicant's ability to handle pressure. 6
8 We have the counts of how many items were assembled by each of 80 randomly-selected applicants. The sample mean is 6.63, the sample mode is five, and the sample variance is The values of assembled items ranges from zero (two occurrences) up to 17 (one person.) Table 1 contains the estimation results. Table 1. MLE Results for Item Assembly Count Data Parameter Estimate Standard Error Significance a <.01 b <.01 r <.01 α <.01 log likelihood One immediate use to which these estimates can be put is to calculate several samplewide mean values. These are the mean frontier count, the mean shortfall of observed counts below the estimated frontier count, and the mean percentage inefficiency. First of all, the mean frontier count is obtained by evaluating (6) using the estimates. This gives a value of 8.49 items that could have been assembled by each applicant, on average. The actual mean number of items assembled is 6.63, yielding an average shortfall of assembled items equal to Finally, the mean applicant was 21.92% inefficient, which corresponds to the shortfall of 1.86 divided by In other words, that applicant could have assembled 1.86, or nearly 22%, more items than were actually assembled. These means, while informative, likely obscure deeper insights that can be gained by examining the numbers of frontier counts (N) for different observed count (X) values. Table 2 7
9 shows the values of X from 0 to 17 (the largest observed sample value), the value of E[N X = x] corresponding to each observed value, and the percentage inefficiency for each X value. One feature is immediately apparent when looking at this table. Inefficiency greatly varies across the values of number of items actually assembled. 2 The largest percentage inefficiency (other than the obvious 100% when no items are assembled) is 77.4% for those who assembled only one item. The smallest shortfall is 1.1 items at the upper end of the distribution. This is a 6.3% inefficiency rate. These applicants assembled the most items, yet still could have done better. Perhaps not surprisingly, the inefficiency rate declines monotonically as the actual number of items assembled rises. Table 2. Values of X, E[N X=x] and Percentage Inefficiency For Each X Value x E[N X=x] % Inefficiency % inefficiency is calculated from E[ ] values to four decimals (not rounded) 8
10 Even more can be learned by digging even deeper into these results. The values for the frontier counts, given the actual number of items assembled (E[N X = x] ) are, after all, the means of a distribution of potential number of items that each individual could have assembled. Figure 1, showing the conditional distributions for different observed item counts, contains two examples of additional information that can be gleaned from these data. We chose to display the distributions for an observed count of zero items assembled and for five items (the modal number of items assembled.) P[N=n X=0] P[N=n X = 5] Figure 1. Conditional Distributions for Two Observed Counts 9
11 The top panel of Figure 1shows the conditional distribution of potential completed items for those who assembled zero items. This distribution reveals that only a little under 12% of those who failed to complete any items were performing at their capability. That is, this conditional distribution has only approximately 12% of its values equal to zero, the number actually assembled by these applicants. The remaining 88% of those who did not assemble even one item could have assembled at least one. In fact, more than one-quarter of them could have assembled two or three items. Furthermore, approximately 20% of those who assembled no items could have completed seven or more. The information contained in this one conditional distribution shows the extent of the underachievement (inefficiency) exhibited by many of those in this group. Similarly, the conditional distribution for 5 items assembled shows a range of potential performances. About one-third of the applicants who completed five items performed up to their potential. Obviously, then, two-thirds did not. Fully 20% of those who completed this modal number of items (5) could potentially have assembled 9 or more. V. Conclusion This paper proposes one method for estimating the extent of inefficiency for cases in which a count variable is being maximized. We show how this model can estimate a number of values relating to inefficiency in producing counts. First, the researcher can calculate the samplewide mean extent of inefficiency and the mean shortfall of actual counts below frontier (maximum potential) counts. Second, one can determine the extent of inefficiency for every observed value of the count variable being maximized. Beyond that, you can derive and examine the distribution of the number of frontier counts for each value that was actually 10
12 produced. Thus, this model provides a rich and informative set of information about the frontier number of items that can be produced and various aspects of inefficiency. This model omits covariates in favor of representing heterogeneity in production and efficiency through the assumption of specific distributions for frontier counts, observed/produced counts and the probability of producing a count item. However, it seems that introducing covariates could be done in a straightforward manner if it appears that covariates would strengthen this model and add to its ability to inform researchers about efficiency in count variable maximizing processes. 11
13 References Fader, P.S. and B.G.S. Hardie A note on modeling underreported Poisson counts. Journal of Applied Statistics, 27(8), Fe-Rodriguez, E Exploring a stochastic frontier model when the dependent variable is a count. The School of Economics Discussion Paper Series, The University of Manchester. Schmittlein, D.C., A.C. Bemmaor, D.G.Morrison Why does the NBD model work? Robustness in representing product purchases, brand purchases and imperfectly recorded purchases. Marketing Science, 4(3),
Lesson B1 - Probability Distributions.notebook
Learning Goals: * Define a discrete random variable * Applying a probability distribution of a discrete random variable. * Use tables, graphs, and expressions to represent the distributions. Should you
More informationModeling Discrete-Time Transactions Using the BG/BB Model
University of Pennsylvania ScholarlyCommons Wharton Research Scholars Wharton School May 2008 Modeling Discrete-Time Transactions Using the BG/BB Model Harvey Yang Zhang University of Pennsylvania Follow
More informationMC3: Econometric Theory and Methods. Course Notes 4
University College London Department of Economics M.Sc. in Economics MC3: Econometric Theory and Methods Course Notes 4 Notes on maximum likelihood methods Andrew Chesher 25/0/2005 Course Notes 4, Andrew
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationUsing copulas to model time dependence in stochastic frontier models
Using copulas to model time dependence in stochastic frontier models Christine Amsler Michigan State University Artem Prokhorov Concordia University November 2008 Peter Schmidt Michigan State University
More informationBINOMIAL DISTRIBUTION
BINOMIAL DISTRIBUTION The binomial distribution is a particular type of discrete pmf. It describes random variables which satisfy the following conditions: 1 You perform n identical experiments (called
More informationProbability and Probability Distributions. Dr. Mohammed Alahmed
Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about
More informationNinth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"
Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric
More informationDiscrete Distributions
Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have
More informationApplied Probability Models in Marketing Research: Introduction
Applied Probability Models in Marketing Research: Introduction (Supplementary Materials for the A/R/T Forum Tutorial) Bruce G. S. Hardie London Business School bhardie@london.edu www.brucehardie.com Peter
More informationExpectations. Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or
Expectations Expectations Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X ) or µ X, is E(X ) = µ X = x D x p(x) Expectations
More information37.3. The Poisson Distribution. Introduction. Prerequisites. Learning Outcomes
The Poisson Distribution 37.3 Introduction In this Section we introduce a probability model which can be used when the outcome of an experiment is a random variable taking on positive integer values and
More informationQUESTION ONE Let 7C = Total Cost MC = Marginal Cost AC = Average Cost
ANSWER QUESTION ONE Let 7C = Total Cost MC = Marginal Cost AC = Average Cost Q = Number of units AC = 7C MC = Q d7c d7c 7C Q Derivation of average cost with respect to quantity is different from marginal
More informationBusiness Statistics PROBABILITY DISTRIBUTIONS
Business Statistics PROBABILITY DISTRIBUTIONS CONTENTS Probability distribution functions (discrete) Characteristics of a discrete distribution Example: uniform (discrete) distribution Example: Bernoulli
More informationLECTURE 5. Introduction to Econometrics. Hypothesis testing
LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will
More informationTruncation and Censoring
Truncation and Censoring Laura Magazzini laura.magazzini@univr.it Laura Magazzini (@univr.it) Truncation and Censoring 1 / 35 Truncation and censoring Truncation: sample data are drawn from a subset of
More informationTime: 1 hour 30 minutes
Paper Reference(s) 6684/01 Edexcel GCE Statistics S2 Bronze Level B4 Time: 1 hour 30 minutes Materials required for examination papers Mathematical Formulae (Green) Items included with question Nil Candidates
More informationPotential Outcomes Model (POM)
Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics
More informationA Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.
A Probability Primer A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. Are you holding all the cards?? Random Events A random event, E,
More informationDiscrete Distributions
Discrete Distributions Applications of the Binomial Distribution A manufacturing plant labels items as either defective or acceptable A firm bidding for contracts will either get a contract or not A marketing
More information(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.
54 We are given the marginal pdfs of Y and Y You should note that Y gamma(4, Y exponential( E(Y = 4, V (Y = 4, E(Y =, and V (Y = 4 (a With U = Y Y, we have E(U = E(Y Y = E(Y E(Y = 4 = (b Because Y and
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationIntroduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017
Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent
More information14.2 THREE IMPORTANT DISCRETE PROBABILITY MODELS
14.2 THREE IMPORTANT DISCRETE PROBABILITY MODELS In Section 14.1 the idea of a discrete probability model was introduced. In the examples of that section the probability of each basic outcome of the experiment
More informationLecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019
Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial
More informationVarieties of Count Data
CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function
More informationb. ( ) ( ) ( ) ( ) ( ) 5. Independence: Two events (A & B) are independent if one of the conditions listed below is satisfied; ( ) ( ) ( )
1. Set a. b. 2. Definitions a. Random Experiment: An experiment that can result in different outcomes, even though it is performed under the same conditions and in the same manner. b. Sample Space: This
More informationMTH 452 Mathematical Statistics
MTH 452 Mathematical Statistics Instructor: Orlando Merino University of Rhode Island Spring Semester, 2006 1 5.1 Introduction An Experiment: In 10 consecutive trips to the free throw line, a professional
More informationUnobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida
Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Fred Mannering University of South Florida Highway Accidents Cost the lives of 1.25 million people per year Leading cause
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationMATH 250 / SPRING 2011 SAMPLE QUESTIONS / SET 3
MATH 250 / SPRING 2011 SAMPLE QUESTIONS / SET 3 1. A four engine plane can fly if at least two engines work. a) If the engines operate independently and each malfunctions with probability q, what is the
More informationIncorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models
Incorporating Time-Invariant Covariates into the Pareto/NBD and BG/NBD Models Peter S Fader wwwpetefadercom Bruce G S Hardie wwwbrucehardiecom August 2007 1 Introduction This note documents how to incorporate
More informationSalt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E
Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices
More informationStat 3115D - Exam 2. If you run out of room, use the back of the page and indicate this on the question.
Stat 3115D - Exam 2 Name: Wednesday, April 8, 2015 Time: 50 minutes Instructor: Brittany Cuchta Instructions: Do not open the exam until I say you may. Circle or box your final answer where appropriate.
More informationContinuous Random Variables
Continuous Random Variables Recall: For discrete random variables, only a finite or countably infinite number of possible values with positive probability. Often, there is interest in random variables
More informationProblem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20
Problem Set MAS 6J/.6J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 0 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a
More informationDiscrete Choice Modeling
[Part 6] 1/55 0 Introduction 1 Summary 2 Binary Choice 3 Panel Data 4 Bivariate Probit 5 Ordered Choice 6 7 Multinomial Choice 8 Nested Logit 9 Heterogeneity 10 Latent Class 11 Mixed Logit 12 Stated Preference
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationGraduate Econometrics I: What is econometrics?
Graduate Econometrics I: What is econometrics? Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: What is econometrics?
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationLecture 2: Discrete Probability Distributions
Lecture 2: Discrete Probability Distributions IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge February 1st, 2011 Rasmussen (CUED) Lecture
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationIE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes.
Closed book and notes. 60 minutes. A summary table of some univariate continuous distributions is provided. Four Pages. In this version of the Key, I try to be more complete than necessary to receive full
More informationCross Panel Imputation
Cross Panel Imputation Yunting Sun, Jim Koehler, Nicolas Remy, Wiesner Vos Google Inc. 1 Introduction Many empirical microeconomics studies rely on consumer panels. For example, TV and web metering panels
More information3.4. The Binomial Probability Distribution
3.4. The Binomial Probability Distribution Objectives. Binomial experiment. Binomial random variable. Using binomial tables. Mean and variance of binomial distribution. 3.4.1. Four Conditions that determined
More informationISyE 6739 Test 1 Solutions Summer 2015
1 NAME ISyE 6739 Test 1 Solutions Summer 2015 This test is 100 minutes long. You are allowed one cheat sheet. 1. (50 points) Short-Answer Questions (a) What is any subset of the sample space called? Solution:
More informationEcon 325: Introduction to Empirical Economics
Econ 325: Introduction to Empirical Economics Chapter 9 Hypothesis Testing: Single Population Ch. 9-1 9.1 What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population
More informationSTAT/MA 416 Answers Homework 4 September 27, 2007 Solutions by Mark Daniel Ward PROBLEMS
STAT/MA 416 Answers Homework 4 September 27, 2007 Solutions by Mark Daniel Ward PROBLEMS 2. We ust examine the 36 possible products of two dice. We see that 1/36 for i = 1, 9, 16, 25, 36 2/36 for i = 2,
More informationContents 1. Contents
Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................
More information3 Continuous Random Variables
Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random
More informationChapter 3 Probability Distribution
Chapter 3 Probability Distribution Probability Distributions A probability function is a function which assigns probabilities to the values of a random variable. Individual probability values may be denoted
More informationOptimal Design for the Rasch Poisson-Gamma Model
Optimal Design for the Rasch Poisson-Gamma Model Ulrike Graßhoff, Heinz Holling and Rainer Schwabe Abstract The Rasch Poisson counts model is an important model for analyzing mental speed, an fundamental
More informationClosed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.
IE 230 Seat # Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators. Score Exam #3a, Spring 2002 Schmeiser Closed book and notes. 60 minutes. 1. True or false. (for each,
More informationPlotting data is one method for selecting a probability distribution. The following
Advanced Analytical Models: Over 800 Models and 300 Applications from the Basel II Accord to Wall Street and Beyond By Johnathan Mun Copyright 008 by Johnathan Mun APPENDIX C Understanding and Choosing
More information5.1 Introduction. # of successes # of trials. 5.2 Part 1: Maximum Likelihood. MTH 452 Mathematical Statistics
MTH 452 Mathematical Statistics Instructor: Orlando Merino University of Rhode Island Spring Semester, 2006 5.1 Introduction An Experiment: In 10 consecutive trips to the free throw line, a professional
More informationStatistics 427: Sample Final Exam
Statistics 427: Sample Final Exam Instructions: The following sample exam was given several quarters ago in Stat 427. The same topics were covered in the class that year. This sample exam is meant to be
More informationChapter (4) Discrete Probability Distributions Examples
Chapter (4) Discrete Probability Distributions Examples Example () Two balanced dice are rolled. Let X be the sum of the two dice. Obtain the probability distribution of X. Solution When the two balanced
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationSTAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.
STAT 302 Introduction to Probability Learning Outcomes Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. Chapter 1: Combinatorial Analysis Demonstrate the ability to solve combinatorial
More informationEstimation of Theoretically Consistent Stochastic Frontier Functions in R
of ly in R Department of Agricultural Economics University of Kiel, Germany Outline ly of ( ) 2 / 12 Production economics Assumption of traditional empirical analyses: all producers always manage to optimize
More informationCONTINUOUS RANDOM VARIABLES
the Further Mathematics network www.fmnetwork.org.uk V 07 REVISION SHEET STATISTICS (AQA) CONTINUOUS RANDOM VARIABLES The main ideas are: Properties of Continuous Random Variables Mean, Median and Mode
More informationChapter Three. Hypothesis Testing
3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being
More informationNorthwestern University Department of Electrical Engineering and Computer Science
Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability
More informationCDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables
CDA6530: Performance Models of Computers and Networks Chapter 2: Review of Practical Random Variables Two Classes of R.V. Discrete R.V. Bernoulli Binomial Geometric Poisson Continuous R.V. Uniform Exponential,
More informationImplementing the Pareto/NBD Model Given Interval-Censored Data
Implementing the Pareto/NBD Model Given Interval-Censored Data Peter S. Fader www.petefader.com Bruce G. S. Hardie www.brucehardie.com November 2005 Revised August 2010 1 Introduction The Pareto/NBD model
More informationChapters 3.2 Discrete distributions
Chapters 3.2 Discrete distributions In this section we study several discrete distributions and their properties. Here are a few, classified by their support S X. There are of course many, many more. For
More informationby Dimitri P. Bertsekas and John N. Tsitsiklis
INTRODUCTION TO PROBABILITY by Dimitri P. Bertsekas and John N. Tsitsiklis CHAPTER 2: ADDITIONAL PROBLEMS SECTION 2.2. Probability Mass Functions Problem 1. The probability of a royal flush in poker is
More informationDefinition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R
Random Variables Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R As such, a random variable summarizes the outcome of an experiment
More informationMathematical statistics
October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:
More informationSTA 256: Statistics and Probability I
Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. There are situations where one might be interested
More informationStatistical Methods in Particle Physics
Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationRandom Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping
Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5 s in the 6 rolls. Let X = number of 5 s. Then X could be 0, 1, 2, 3, 4, 5, 6. X = 0 corresponds to the
More informationReading Material for Students
Reading Material for Students Arnab Adhikari Indian Institute of Management Calcutta, Joka, Kolkata 714, India, arnaba1@email.iimcal.ac.in Indranil Biswas Indian Institute of Management Lucknow, Prabandh
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationIEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2
IEOR 316: Introduction to Operations Research: Stochastic Models Professor Whitt SOLUTIONS to Homework Assignment 2 More Probability Review: In the Ross textbook, Introduction to Probability Models, read
More informationIntroduction to Statistical Data Analysis Lecture 3: Probability Distributions
Introduction to Statistical Data Analysis Lecture 3: Probability Distributions James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis
More informationIntroduction to Statistical Data Analysis Lecture 4: Sampling
Introduction to Statistical Data Analysis Lecture 4: Sampling James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis 1 / 30 Introduction
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationEstimation of Quantiles
9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles
More informationProbability - Lecture 4
1 Introduction Probability - Lecture 4 Many methods of computation physics and the comparison of data to a mathematical representation, apply stochastic methods. These ideas were first introduced in the
More informationMTH4451Test#2-Solutions Spring 2009
Pat Rossi Instructions. MTH4451Test#2-Solutions Spring 2009 Name Show CLEARLY how you arrive at your answers. 1. A large jar contains US coins. In this jar, there are 350 pennies ($0.01), 300 nickels ($0.05),
More informationProf. Thistleton MAT 505 Introduction to Probability Lecture 13
Prof. Thistleton MAT 55 Introduction to Probability Lecture 3 Sections from Text and MIT Video Lecture: Sections 5.4, 5.6 http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-4- probabilisticsystems-analysis-and-applied-probability-fall-2/video-lectures/lecture-8-continuousrandomvariables/
More informationMath 151. Rumbos Spring Solutions to Review Problems for Exam 3
Math 151. Rumbos Spring 2014 1 Solutions to Review Problems for Exam 3 1. Suppose that a book with n pages contains on average λ misprints per page. What is the probability that there will be at least
More informationChapter 3 Single Random Variables and Probability Distributions (Part 1)
Chapter 3 Single Random Variables and Probability Distributions (Part 1) Contents What is a Random Variable? Probability Distribution Functions Cumulative Distribution Function Probability Density Function
More informationCommon Discrete Distributions
Common Discrete Distributions Statistics 104 Autumn 2004 Taken from Statistics 110 Lecture Notes Copyright c 2004 by Mark E. Irwin Common Discrete Distributions There are a wide range of popular discrete
More informationBusiness Statistics. Chapter 6 Review of Normal Probability Distribution QMIS 220. Dr. Mohammad Zainal
Department of Quantitative Methods & Information Systems Business Statistics Chapter 6 Review of Normal Probability Distribution QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing this chapter,
More informationStatistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions
Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions 1999 Prentice-Hall, Inc. Chap. 4-1 Chapter Topics Basic Probability Concepts: Sample
More informationLecture-19: Modeling Count Data II
Lecture-19: Modeling Count Data II 1 In Today s Class Recap of Count data models Truncated count data models Zero-inflated models Panel count data models R-implementation 2 Count Data In many a phenomena
More information39.3. Sums and Differences of Random Variables. Introduction. Prerequisites. Learning Outcomes
Sums and Differences of Random Variables 39.3 Introduction In some situations, it is possible to easily describe a problem in terms of sums and differences of random variables. Consider a typical situation
More informationSTAT 135 Lab 3 Asymptotic MLE and the Method of Moments
STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,
More informationIntroduction to Probability Theory for Graduate Economics Fall 2008
Introduction to Probability Theory for Graduate Economics Fall 008 Yiğit Sağlam October 10, 008 CHAPTER - RANDOM VARIABLES AND EXPECTATION 1 1 Random Variables A random variable (RV) is a real-valued function
More informationProbability Distributions for Continuous Variables. Probability Distributions for Continuous Variables
Probability Distributions for Continuous Variables Probability Distributions for Continuous Variables Let X = lake depth at a randomly chosen point on lake surface If we draw the histogram so that the
More informationSTA 247 Solutions to Assignment #1
STA 247 Solutions to Assignment #1 Question 1: Suppose you throw three six-sided dice (coloured red, green, and blue) repeatedly, until the three dice all show different numbers. Assuming that these dice
More informationBe sure that your work gives a clear indication of reasoning. Use notation and terminology correctly.
MATH 232 Fall 2009 Test 1 Name: Instructions. Be sure that your work gives a clear indication of reasoning. Use notation and terminology correctly. No mystry numbers: If you use sage, Mathematica, or your
More informationFinal Exam. Economics 835: Econometrics. Fall 2010
Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each
More informationThe Random Variable for Probabilities Chris Piech CS109, Stanford University
The Random Variable for Probabilities Chris Piech CS109, Stanford University Assignment Grades 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Frequency Frequency 10 20 30 40 50 60 70 80
More informationPeter Fader Professor of Marketing, The Wharton School Co-Director, Wharton Customer Analytics Initiative
DATA-DRIVEN DONOR MANAGEMENT Peter Fader Professor of Marketing, The Wharton School Co-Director, Wharton Customer Analytics Initiative David Schweidel Assistant Professor of Marketing, University of Wisconsin-
More informationProbability Midterm Exam 2:15-3:30 pm Thursday, 21 October 1999
Name: 2:15-3:30 pm Thursday, 21 October 1999 You may use a calculator and your own notes but may not consult your books or neighbors. Please show your work for partial credit, and circle your answers.
More information