Simulating Realistic Ecological Count Data
|
|
- Ira Harper
- 5 years ago
- Views:
Transcription
1 1 / 76 Simulating Realistic Ecological Count Data Lisa Madsen Dave Birkes Oregon State University Statistics Department Seminar May 2, 2011
2 2 / 76 Outline 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results
3 Outline Motivation Example: Weed Counts 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results 3 / 76
4 4 / 76 Motivation Example: Weed Counts Why Simulate Data? Simulation studies useful for Assessing the performance of analytical procedures
5 5 / 76 Motivation Example: Weed Counts Why Simulate Data? Simulation studies useful for Assessing the performance of analytical procedures Power analysis or sample size determination
6 6 / 76 Motivation Example: Weed Counts Why Simulate Data? Simulation studies useful for Assessing the performance of analytical procedures Power analysis or sample size determination Finding a good sampling design
7 Outline Motivation Example: Weed Counts 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results 7 / 76
8 Motivation Example: Weed Counts Weed Counts vs. Soil Magnesium (Heijting et al, 2007) Weeds Counts Outliers Soil Magnesium (mg/kg) 8 / 76
9 Motivation Example: Weed Counts Maps of Weed Counts and Magnesium Weed Counts Magnesium y Counts Outliers Zeros y x x 9 / 76
10 Outline Pearson Correlation Spearman Correlation Limits to Dependence 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results 10 / 76
11 11 / 76 Pearson Correlation Pearson Correlation Spearman Correlation Limits to Dependence The usual measure of dependence between X and Y is the Pearson product-moment correlation coefficient: ρ(x, Y ) = E(XY ) E(X)E(Y ) [var(x) var(y )] 1/2.
12 12 / 76 Pearson Correlation Pearson Correlation Spearman Correlation Limits to Dependence The usual measure of dependence between X and Y is the Pearson product-moment correlation coefficient: ρ(x, Y ) = E(XY ) E(X)E(Y ) [var(x) var(y )] 1/2. Estimate ρ(x, Y ) from sample (X 1, Y 1 ),..., (X n, Y n ) as ˆρ(X, Y ) = n i=1 [(X i X)(Y i Y )] [ n i=1 (X i X) 2 n i=1 (Y i Y ) 2 ], 1/2
13 Pearson Correlation Spearman Correlation Limits to Dependence Pearson Correlation Measures Linear Dependence ρ(x,x)=1 X X 13 / 76
14 Pearson Correlation Spearman Correlation Limits to Dependence Pearson Correlation Measures Linear Dependence ρ(x,e^x)<1 exp(x) X 14 / 76
15 15 / 76 Pearson Correlation Spearman Correlation Limits to Dependence Pearson Correlation Measures Linear Dependence For bivariate normal X and Y, ρ(x, Y ) completely characterizes dependence.
16 16 / 76 Pearson Correlation Spearman Correlation Limits to Dependence Pearson Correlation Measures Linear Dependence For bivariate normal X and Y, ρ(x, Y ) completely characterizes dependence. For non-normal X and Y, other measures of dependence may be more appropriate.
17 Outline Pearson Correlation Spearman Correlation Limits to Dependence 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results 17 / 76
18 18 / 76 Spearman Correlation Pearson Correlation Spearman Correlation Limits to Dependence The Spearman correlation coefficient is ρ S (X, Y ) = 3{P[(X X 1 )(Y Y 1 ) > 0] P[(X X 1 )(Y Y 1 ) < 0]} where X 1 d = X Y 1 d = Y with X 1 and Y 1 independent of one another and of (X, Y ).
19 19 / 76 Pearson Correlation Spearman Correlation Limits to Dependence Estimating Spearman Correlation Given bivariate sample (X 1, Y 1 ),..., (X n, Y n ), calculate ranks r(x i ) and r(y i ). Then ˆρ S (X, Y ) = n i=1 {[r(x i) (n + 1)/2][r(Y i ) (n + 1)/2]} n(n 2, 1)/12 the sample Pearson correlation coefficient of the ranked data.
20 20 / 76 Pearson Correlation Spearman Correlation Limits to Dependence Spearman Correlation Measures Monotone Dependence ρ S (X, e X ) = ρ S (X, X) = 1...
21 21 / 76 Pearson Correlation Spearman Correlation Limits to Dependence Spearman Correlation Measures Monotone Dependence ρ S (X, e X ) = ρ S (X, X) = 1... provided X is continuous.
22 22 / 76 Correcting for Ties Pearson Correlation Spearman Correlation Limits to Dependence When X is discrete, one can construct X and Y so that X = Y almost surely but ρ S (X, Y ) < 1.
23 23 / 76 Correcting for Ties Pearson Correlation Spearman Correlation Limits to Dependence When X is discrete, one can construct X and Y so that X = Y almost surely but ρ S (X, Y ) < 1. Rescale ρ S so that it ranges between 1 and 1: ρ RS (X, Y ) = ρ S (X, Y ) {[1 x p(x)3 ][1 y q(y)3 ]} 1/2, where p(x) = P(X = x) and q(y) = P(Y = y) (Nešlehová, 2007).
24 24 / 76 Pearson Correlation Spearman Correlation Limits to Dependence Ties in Sample Ranks Two common methods for handling ties in sample X 1,..., X n : Random ranks: When u tied values would occupy ranks p 1,..., p u if they were distinct, randomly assign these u ranks to the tied values.
25 25 / 76 Pearson Correlation Spearman Correlation Limits to Dependence Ties in Sample Ranks Two common methods for handling ties in sample X 1,..., X n : Random ranks: When u tied values would occupy ranks p 1,..., p u if they were distinct, randomly assign these u ranks to the tied values. Midranks: Assign each tied value the average rank, 1 u u k=1 p k.
26 26 / 76 Pearson Correlation Spearman Correlation Limits to Dependence Rescaled Spearman Correlation and Midranks For sample (X 1, Y 1 ),..., (X n, Y n ), let the distribution of (X, Y ) be the empirical distribution function of the sample. Then ρ RS (X, Y ) coincides with the sample Pearson correlation coefficient of the midranks (Nešlehová, 2007).
27 Outline Pearson Correlation Spearman Correlation Limits to Dependence 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results 27 / 76
28 28 / 76 Pearson Correlation Spearman Correlation Limits to Dependence For X and Y with joint CDF H(x, y) and marginal CDFs F(x) and G(y), the Fréchet-Hoeffding bounds are max[f (x) + G(Y ) 1, 0] H(x, y) min[f(x), G(y)].
29 29 / 76 Pearson Correlation Spearman Correlation Limits to Dependence For X and Y with joint CDF H(x, y) and marginal CDFs F(x) and G(y), the Fréchet-Hoeffding bounds are max[f (x) + G(Y ) 1, 0] H(x, y) min[f(x), G(y)]. These bounds induce margin-dependent bounds on ρ(x, Y ) and ρ S (X, Y ).
30 30 / 76 Pearson Correlation Spearman Correlation Limits to Dependence For X and Y with joint CDF H(x, y) and marginal CDFs F(x) and G(y), the Fréchet-Hoeffding bounds are max[f (x) + G(Y ) 1, 0] H(x, y) min[f(x), G(y)]. These bounds induce margin-dependent bounds on ρ(x, Y ) and ρ S (X, Y ). For X Bernoulli(p X ) and Y Bernoulli(p Y ) with p X p Y, ρ(x, Y ) (1 p X )p Y /[p X (1 p Y )].
31 Outline 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results 31 / 76
32 32 / 76 Simulation Suppose we want to simulate dependent Y = [Y 1,..., Y N ] where Y i has marginal CDF F i. 1. Simulate a multivariate standard normal vector Z.
33 33 / 76 Simulation Suppose we want to simulate dependent Y = [Y 1,..., Y N ] where Y i has marginal CDF F i. 1. Simulate a multivariate standard normal vector Z. Variance-covariance matrix Σ Z will determine dependence among the Y i.
34 34 / 76 Simulation Suppose we want to simulate dependent Y = [Y 1,..., Y N ] where Y i has marginal CDF F i. 1. Simulate a multivariate standard normal vector Z. Variance-covariance matrix Σ Z will determine dependence among the Y i. Aside: To simulate maximally dependent Y i and Y j, set the corresponding element of Σ Z equal to 1.
35 35 / 76 Simulation Suppose we want to simulate dependent Y = [Y 1,..., Y N ] where Y i has marginal CDF F i. 1. Simulate a multivariate standard normal vector Z. Variance-covariance matrix Σ Z will determine dependence among the Y i. Aside: To simulate maximally dependent Y i and Y j, set the corresponding element of Σ Z equal to Transform each element of Z to obtain desired marginals.
36 36 / 76 Simulation Suppose we want to simulate dependent Y = [Y 1,..., Y N ] where Y i has marginal CDF F i. 1. Simulate a multivariate standard normal vector Z. Variance-covariance matrix Σ Z will determine dependence among the Y i. Aside: To simulate maximally dependent Y i and Y j, set the corresponding element of Σ Z equal to Transform each element of Z to obtain desired marginals. Y i = F 1 i {Φ(Z i )}, Φ( ) the standard normal CDF.
37 37 / 76 Inverse CDF for Discrete Distributions Bernoulli(0.4) CDF F(x) F 1 i (u) = inf{x : F i (x) u} x
38 Outline 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results 38 / 76
39 Weed Data Weeds Counts Outliers Soil Magnesium (mg/kg) 39 / 76
40 40 / 76 A Plausible Marginal Model Negative binomial hurdle model is a Bernoulli mixture of a point mass at 0 and a negative binomial, left-truncated at 1. π, y = 0 P(Y = y) = (1 π) Γ(θ + y) ( θ Γ(θ)Γ(y + 1) 1 ) θ ( µ ) y θ + µ θ + µ ( ) θ θ, y 1 θ + µ Model π and negative binomial mean µ as functions of covariate, x = soil magnesium.
41 41 / 76 Negative Binomial Hurdle CDF The target CDF for Y i is then F i (y) = π i + 1 π i 1 g i (0 µ i, θ) {G i(y µ i, θ) g i (0 µ i, θ)} for y 0, where G i ( µ i, θ) and g i ( µ i, θ) are the negative binomial CDF and PDF with log(µ i ) = β 0 + β 1 x i, and logit(π i ) = γ 0 + γ 1 x i.
42 Weed Data With Fitted Means NB Hurdle Fit Weeds Data Fitted Mean Soil Magnesium (mg/kg) 42 / 76
43 43 / 76 Comments Unlike data analysis, goal is a tractable but flexible model.
44 44 / 76 Comments Unlike data analysis, goal is a tractable but flexible model. Can determine marginal CDFs by other methods.
45 45 / 76 Comments Unlike data analysis, goal is a tractable but flexible model. Can determine marginal CDFs by other methods. Different marginals need not come from the same family.
46 Outline 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results 46 / 76
47 The Principle of Spatial Dependence Dependence between observations is higher when they are close together. Dependence Distance 47 / 76
48 The Variogram var(y i Y j ) is small if Y i and Y j are dependent. Variogram Distance 48 / 76
49 49 / 76 Stationarity A typical spatial data set represents a single incomplete sample of size N = 1 from a spatial random process.
50 50 / 76 Stationarity A typical spatial data set represents a single incomplete sample of size N = 1 from a spatial random process. To make inference feasible, we assume stationarity, i.e. E(Y i )=E(Y j ) and var(y i Y j ) = 2γ(h ij ), where h ij is the vector between locations of Y i and Y j, and γ( ) is called the semivariogram.
51 51 / 76 Stationarity A typical spatial data set represents a single incomplete sample of size N = 1 from a spatial random process. To make inference feasible, we assume stationarity, i.e. E(Y i )=E(Y j ) and var(y i Y j ) = 2γ(h ij ), where h ij is the vector between locations of Y i and Y j, and γ( ) is called the semivariogram. Weed counts are not stationary: means differ, and larger means are associated with larger variance.
52 52 / 76 Stationarity A typical spatial data set represents a single incomplete sample of size N = 1 from a spatial random process. To make inference feasible, we assume stationarity, i.e. E(Y i )=E(Y j ) and var(y i Y j ) = 2γ(h ij ), where h ij is the vector between locations of Y i and Y j, and γ( ) is called the semivariogram. Weed counts are not stationary: means differ, and larger means are associated with larger variance. Stationarity assumption is more reasonable for ranks than counts.
53 53 / 76 Ranking Spatial Data Estimator of ρ S uses sample (X 1, Y 1 ),..., (X n, Y n ), but spatial sample has no replication.
54 54 / 76 Ranking Spatial Data Estimator of ρ S uses sample (X 1, Y 1 ),..., (X n, Y n ), but spatial sample has no replication. Kruskal (1958): Population analog of rank r(y i ) is F(Y i ).
55 55 / 76 Ranking Spatial Data Estimator of ρ S uses sample (X 1, Y 1 ),..., (X n, Y n ), but spatial sample has no replication. Kruskal (1958): Population analog of rank r(y i ) is F(Y i ). For each Y i, we can estimate its CDF F i by plugging in point estimates of the parameters.
56 56 / 76 Ranking Spatial Data Estimator of ρ S uses sample (X 1, Y 1 ),..., (X n, Y n ), but spatial sample has no replication. Kruskal (1958): Population analog of rank r(y i ) is F(Y i ). For each Y i, we can estimate its CDF F i by plugging in point estimates of the parameters. If Y i is unusually large (or small), given its estimated distribution, ˆF i (Y i ) will also be unusually large (or small), but ˆF 1 (Y 1 ),..., ˆF n (Y n ) will all be on the same scale.
57 57 / 76 Estimating Spatial Dependence Fit a parametric semivariogram model to the ranked spatial counts. Semivariance For Y i and Y j separated by a distance of h ij, 1 2 var[f i (Y i ) F j (Y j )] = (1 ) e h ij /1.36 ˆρ RS (Y i, Y j ) = 0.47e h ij /1.36 Distance
58 58 / 76 Calculating Σ Z 1. For each pair i, j, obtain {[ ] ]} ˆρ S (Y i, Y j ) = 1 ˆfi (r) [1 3 ˆfj (s) 1/2 ˆρ 3 RS (Y i, Y j ), r=0 s=0 where ˆf i and ˆf j are the estimated PMFs of Y i and Y j.
59 59 / 76 Calculating Σ Z 1. For each pair i, j, obtain {[ ] ]} ˆρ S (Y i, Y j ) = 1 ˆfi (r) [1 3 ˆfj (s) 1/2 ˆρ 3 RS (Y i, Y j ), r=0 s=0 where ˆf i and ˆf j are the estimated PMFs of Y i and Y j. 2. Then numerically solve for δ = ρ(z i, Z j ): ˆρ S (Y i, Y j ) = 3 r=0 s=0 ˆfi (r)ˆf j (s)(φ δ {Φ 1 [ˆF i (r 1)], Φ 1 [ˆF j (s 1)]} + Φ δ {Φ 1 [1 ˆF i (r)], Φ 1 [1 ˆF j (s)]} Φ δ {Φ 1 [ˆF i (r 1)], Φ 1 [1 ˆF j (s)]} Φ δ {Φ 1 [1 ˆF i (r)], Φ 1 [ˆF j (s 1)]}).
60 60 / 76 Simulating Outliers Simulation Results Apply Retain locations and covariate values from data set. Simulate a multivariate standard normal vector Z with correlation matrix Σ Z.
61 61 / 76 Apply Simulating Outliers Simulation Results Retain locations and covariate values from data set. Simulate a multivariate standard normal vector Z with correlation matrix Σ Z. 1 Set Y i = ˆF i {Φ(Z i )}.
62 Outline Simulating Outliers Simulation Results 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results 62 / 76
63 Two Outlier Processes Simulating Outliers Simulation Results Weeds Counts Outliers Soil Magnesium (mg/kg) 63 / 76
64 Outliers Localized Simulating Outliers Simulation Results Weed Counts Magnesium y Counts Outliers Zeros y x x 64 / 76
65 65 / 76 Simulating Outliers Simulation Results Empirical Observations About Outliers Outliers occur in the region between y = 17 and y = 33 meters.
66 66 / 76 Simulating Outliers Simulation Results Empirical Observations About Outliers Outliers occur in the region between y = 17 and y = 33 meters. Outliers associated with mg between 250 and 300 are between 12.9 and 14.9 larger than target means, whereas outliers associated with mg above 330 are between 2.6 and 10.3 larger.
67 67 / 76 Simulating Outliers Simulation Results Augmenting the Simulated Data with Outliers From the 139 simulated weed counts, Randomly select 4 to 6 locations with y-coordinates between 17 and 33 and mg between 250 and 300.
68 68 / 76 Simulating Outliers Simulation Results Augmenting the Simulated Data with Outliers From the 139 simulated weed counts, Randomly select 4 to 6 locations with y-coordinates between 17 and 33 and mg between 250 and 300. Set these counts equal to the integer part of target mean plus a random uniform on (12, 15).
69 69 / 76 Simulating Outliers Simulation Results Augmenting the Simulated Data with Outliers From the 139 simulated weed counts, Randomly select 4 to 6 locations with y-coordinates between 17 and 33 and mg between 250 and 300. Set these counts equal to the integer part of target mean plus a random uniform on (12, 15). Randomly select another 4 to 6 points with y-coordinates between 17 and 33 and mg exceeding 330.
70 70 / 76 Simulating Outliers Simulation Results Augmenting the Simulated Data with Outliers From the 139 simulated weed counts, Randomly select 4 to 6 locations with y-coordinates between 17 and 33 and mg between 250 and 300. Set these counts equal to the integer part of target mean plus a random uniform on (12, 15). Randomly select another 4 to 6 points with y-coordinates between 17 and 33 and mg exceeding 330. Set these to the integer part of target means plus a random uniform on (2, 11).
71 Outline Simulating Outliers Simulation Results 1 Motivation Example: Weed Counts 2 Pearson Correlation Spearman Correlation Limits to Dependence 3 4 Simulating Outliers Simulation Results 71 / 76
72 Simulating Outliers Simulation Results Simulated Data vs. Observed Data Simulated Datasets Soil Magnesium (mg/kg) Weeds Simulated Observed 72 / 76
73 Simulating Outliers Simulation Results Simulated Data vs. Observed Data Weeds Data Sim. Means Target Means Soil Magnesium (mg/kg) 73 / 76
74 Simulating Outliers Simulation Results Simulated Data vs. Observed Data Distance Rank Correlation Simulated Target 74 / 76
75 Simulating Outliers Simulation Results A Couple of Simulated Maps Weed Counts x y Counts Zeros Weed Counts x y Counts Zeros 75 / 76
76 References Simulating Outliers Simulation Results S. Heijting, W. Van Der Werf, A. Stein, and M.J. Kropff (2007), Are weed patches stable in location? Application of an explicitly two-dimensional methodology, Weed Research 47 (5), pp DOI: /j x W.H. Kruskal (1958), Ordinal measures of association, Journal of the American Statistical Association 53, pp L. Madsen and D. Birkes (2013), Simulating dependent discrete data, Journal of Computational and Graphical Statistics, 83(4), pp J. Nešlehová (2007), On rank correlation measures for non-continuous random variables, Journal of Multivariate Analysis 98, pp / 76
Bivariate Paired Numerical Data
Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationNonparametric Statistics Notes
Nonparametric Statistics Notes Chapter 5: Some Methods Based on Ranks Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Ch 5: Some Methods Based on Ranks 1
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationMath 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14
Math 325 Intro. Probability & Statistics Summer Homework 5: Due 7/3/. Let X and Y be continuous random variables with joint/marginal p.d.f. s f(x, y) 2, x y, f (x) 2( x), x, f 2 (y) 2y, y. Find the conditional
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationCovariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance
Covariance Lecture 0: Covariance / Correlation & General Bivariate Normal Sta30 / Mth 30 We have previously discussed Covariance in relation to the variance of the sum of two random variables Review Lecture
More informationRandom vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.
Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just
More informationJoint Distribution of Two or More Random Variables
Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few
More informationContents 1. Contents
Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................
More informationLecture 11. Probability Theory: an Overveiw
Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationStatistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University
Statistics for Economists Lectures 6 & 7 Asrat Temesgen Stockholm University 1 Chapter 4- Bivariate Distributions 41 Distributions of two random variables Definition 41-1: Let X and Y be two random variables
More informationChapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be
Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationNonparametric hypothesis tests and permutation tests
Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating Functions 3.8.3. Wilcoxon Signed Rank Test 3.8.2. Mann-Whitney Test Prof. Tesler Math 283 Fall 2018 Prof. Tesler Wilcoxon
More informationChapter 4 continued. Chapter 4 sections
Chapter 4 sections Chapter 4 continued 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP:
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationElements of Probability Theory
Short Guides to Microeconometrics Fall 2016 Kurt Schmidheiny Unversität Basel Elements of Probability Theory Contents 1 Random Variables and Distributions 2 1.1 Univariate Random Variables and Distributions......
More informationProblem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},
ECE32 Spring 25 HW Solutions April 6, 25 Solutions to HW Note: Most of these solutions were generated by R. D. Yates and D. J. Goodman, the authors of our textbook. I have added comments in italics where
More informationChapter 2. Discrete Distributions
Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation
More informationClass 8 Review Problems 18.05, Spring 2014
1 Counting and Probability Class 8 Review Problems 18.05, Spring 2014 1. (a) How many ways can you arrange the letters in the word STATISTICS? (e.g. SSSTTTIIAC counts as one arrangement.) (b) If all arrangements
More informationEE4601 Communication Systems
EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two
More informationProbability and Statistics Notes
Probability and Statistics Notes Chapter Five Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Five Notes Spring 2011 1 / 37 Outline 1
More informationFirst Year Examination Department of Statistics, University of Florida
First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your
More informationMore than one variable
Chapter More than one variable.1 Bivariate discrete distributions Suppose that the r.v. s X and Y are discrete and take on the values x j and y j, j 1, respectively. Then the joint p.d.f. of X and Y, to
More informationmatrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2
Short Guides to Microeconometrics Fall 2018 Kurt Schmidheiny Unversität Basel Elements of Probability Theory 2 1 Random Variables and Distributions Contents Elements of Probability Theory matrix-free 1
More informationRandom Variables. P(x) = P[X(e)] = P(e). (1)
Random Variables Random variable (discrete or continuous) is used to derive the output statistical properties of a system whose input is a random variable or random in nature. Definition Consider an experiment
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationMultivariate Random Variable
Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationBivariate distributions
Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient
More informationMATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)
MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) Last modified: March 7, 2009 Reference: PRP, Sections 3.6 and 3.7. 1. Tail-Sum Theorem
More informationMath Review Sheet, Fall 2008
1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the
More informationLecture 16: Hierarchical models and miscellanea
Lecture 16: Hierarchical models and miscellanea It is often easier to model a practical situation by thinking of things in a hierarchy. Example 4.4.1 (binomial-poisson hierarchy) An insect lays many eggs,
More informationECE 302 Division 2 Exam 2 Solutions, 11/4/2009.
NAME: ECE 32 Division 2 Exam 2 Solutions, /4/29. You will be required to show your student ID during the exam. This is a closed-book exam. A formula sheet is provided. No calculators are allowed. Total
More informationClass 8 Review Problems solutions, 18.05, Spring 2014
Class 8 Review Problems solutions, 8.5, Spring 4 Counting and Probability. (a) Create an arrangement in stages and count the number of possibilities at each stage: ( ) Stage : Choose three of the slots
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More information2 (Statistics) Random variables
2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes
More information15 Discrete Distributions
Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.
More informationSTAT Chapter 5 Continuous Distributions
STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range
More informationIntroduction to Computational Finance and Financial Econometrics Probability Review - Part 2
You can t see this text! Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Probability Review - Part 2 1 /
More informationFor a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,
CHAPTER 2 FUNDAMENTAL CONCEPTS This chapter describes the fundamental concepts in the theory of time series models. In particular, we introduce the concepts of stochastic processes, mean and covariance
More informationProblem Set #5. Econ 103. Solution: By the complement rule p(0) = 1 p q. q, 1 x 0 < 0 1 p, 0 x 0 < 1. Solution: E[X] = 1 q + 0 (1 p q) + p 1 = p q
Problem Set #5 Econ 103 Part I Problems from the Textbook Chapter 4: 1, 3, 5, 7, 9, 11, 13, 15, 25, 27, 29 Chapter 5: 1, 3, 5, 9, 11, 13, 17 Part II Additional Problems 1. Suppose X is a random variable
More informationBivariate Distributions
Bivariate Distributions EGR 260 R. Van Til Industrial & Systems Engineering Dept. Copyright 2013. Robert P. Van Til. All rights reserved. 1 What s It All About? Many random processes produce Examples.»
More informationChapter 5: Joint Probability Distributions
Chapter 5: Joint Probability Distributions Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 19 Joint pmf Definition: The joint probability mass
More informationEstimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith
Estimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith Melbourne Business School, University of Melbourne (Joint with Mohamad Khaled, University of Queensland)
More informationf X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx
INDEPENDENCE, COVARIANCE AND CORRELATION Independence: Intuitive idea of "Y is independent of X": The distribution of Y doesn't depend on the value of X. In terms of the conditional pdf's: "f(y x doesn't
More informationClass 26: review for final exam 18.05, Spring 2014
Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event
More informationTest Problems for Probability Theory ,
1 Test Problems for Probability Theory 01-06-16, 010-1-14 1. Write down the following probability density functions and compute their moment generating functions. (a) Binomial distribution with mean 30
More informationt x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.
Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae
More informationProbability and Stochastic Processes
Probability and Stochastic Processes A Friendly Introduction Electrical and Computer Engineers Third Edition Roy D. Yates Rutgers, The State University of New Jersey David J. Goodman New York University
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationPractice Examination # 3
Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single
More informationCopula modeling for discrete data
Copula modeling for discrete data Christian Genest & Johanna G. Nešlehová in collaboration with Bruno Rémillard McGill University and HEC Montréal ROBUST, September 11, 2016 Main question Suppose (X 1,
More informationENGG2430A-Homework 2
ENGG3A-Homework Due on Feb 9th,. Independence vs correlation a For each of the following cases, compute the marginal pmfs from the joint pmfs. Explain whether the random variables X and Y are independent,
More informationCopulas. MOU Lili. December, 2014
Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability
More informationEE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm
EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationNotes for Math 324, Part 19
48 Notes for Math 324, Part 9 Chapter 9 Multivariate distributions, covariance Often, we need to consider several random variables at the same time. We have a sample space S and r.v. s X, Y,..., which
More informationP (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n
JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationSTAT/MATH 395 PROBABILITY II
STAT/MATH 395 PROBABILITY II Bivariate Distributions Néhémy Lim University of Washington Winter 2017 Outline Distributions of Two Random Variables Distributions of Two Discrete Random Variables Distributions
More informationMultivariate negative binomial models for insurance claim counts
Multivariate negative binomial models for insurance claim counts Peng Shi (Northern Illinois University) and Emiliano A. Valdez (University of Connecticut) 9 November 0, Montréal, Quebec Université de
More informationJoint Probability Distributions, Correlations
Joint Probability Distributions, Correlations What we learned so far Events: Working with events as sets: union, intersection, etc. Some events are simple: Head vs Tails, Cancer vs Healthy Some are more
More informationLehrstuhl für Statistik und Ökonometrie. Diskussionspapier 87 / Some critical remarks on Zhang s gamma test for independence
Lehrstuhl für Statistik und Ökonometrie Diskussionspapier 87 / 2011 Some critical remarks on Zhang s gamma test for independence Ingo Klein Fabian Tinkl Lange Gasse 20 D-90403 Nürnberg Some critical remarks
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationSTA 2201/442 Assignment 2
STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationEXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY GRADUATE DIPLOMA, Statistical Theory and Methods I. Time Allowed: Three Hours
EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY GRADUATE DIPLOMA, 008 Statistical Theory and Methods I Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks.
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.
ECE32 Exam 2 Version A April 21, 214 1 Name: Solution Score: /1 This exam is closed-book. You must show ALL of your work for full credit. Please read the questions carefully. Please check your answers
More informationChapter 4 Multiple Random Variables
Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous
More informationSTAT 430/510: Lecture 16
STAT 430/510: Lecture 16 James Piette June 24, 2010 Updates HW4 is up on my website. It is due next Mon. (June 28th). Starting today back at section 6.7 and will begin Ch. 7. Joint Distribution of Functions
More informationQualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama
Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours
More informationInferential Statistics
Inferential Statistics Eva Riccomagno, Maria Piera Rogantin DIMA Università di Genova riccomagno@dima.unige.it rogantin@dima.unige.it Part G Distribution free hypothesis tests 1. Classical and distribution-free
More informationStatistics STAT:5100 (22S:193), Fall Sample Final Exam B
Statistics STAT:5 (22S:93), Fall 25 Sample Final Exam B Please write your answers in the exam books provided.. Let X, Y, and Y 2 be independent random variables with X N(µ X, σ 2 X ) and Y i N(µ Y, σ 2
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationMultiple Random Variables
Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationProbability Distributions Columns (a) through (d)
Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)
More informationStatistics 427: Sample Final Exam
Statistics 427: Sample Final Exam Instructions: The following sample exam was given several quarters ago in Stat 427. The same topics were covered in the class that year. This sample exam is meant to be
More informationThe Binomial distribution. Probability theory 2. Example. The Binomial distribution
Probability theory Tron Anders Moger September th 7 The Binomial distribution Bernoulli distribution: One experiment X i with two possible outcomes, probability of success P. If the experiment is repeated
More informationChapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued
Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More information18 Bivariate normal distribution I
8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer
More informationLet X and Y denote two random variables. The joint distribution of these random
EE385 Class Notes 9/7/0 John Stensby Chapter 3: Multiple Random Variables Let X and Y denote two random variables. The joint distribution of these random variables is defined as F XY(x,y) = [X x,y y] P.
More informationLecture 3. Discrete Random Variables
Math 408 - Mathematical Statistics Lecture 3. Discrete Random Variables January 23, 2013 Konstantin Zuev (USC) Math 408, Lecture 3 January 23, 2013 1 / 14 Agenda Random Variable: Motivation and Definition
More information