Statistics for scientists and engineers

Similar documents
Probability and Distributions

Probability Theory and Statistics. Peter Jochumzen

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

1 Review of Probability and Distributions

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Continuous Random Variables

MAS223 Statistical Inference and Modelling Exercises

Recitation 2: Probability

3. Probability and Statistics

2 Functions of random variables

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Multiple Random Variables

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

1 Random Variable: Topics

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

Lecture 2: Repetition of probability theory and statistics

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Review (Probability & Linear Algebra)

7.3 The Chi-square, F and t-distributions

SOLUTION FOR HOMEWORK 12, STAT 4351

Lecture 2: Review of Basic Probability Theory

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 2: Random Variables

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Statistical Pattern Recognition

Multivariate Random Variable

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Brief Review of Probability

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

1: PROBABILITY REVIEW

Basics on Probability. Jingrui He 09/11/2007

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Multiple Random Variables

Random Variables and Their Distributions

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

3. Review of Probability and Statistics

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

conditional cdf, conditional pdf, total probability theorem?

Lecture 1: August 28

7 Random samples and sampling distributions

Introduction to Probability and Statistics (Continued)

Lecture 2: Review of Probability

Bivariate distributions

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Statistical Methods in Particle Physics

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Review of Probability Theory

Introduction to Probability and Statistics (Continued)

Gaussian vectors and central limit theorem

This does not cover everything on the final. Look at the posted practice problems for other topics.

Lecture 6 Basic Probability

Sampling Distributions

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Probability Distributions Columns (a) through (d)

Contents 1. Contents

2. The CDF Technique. 1. Introduction. f X ( ).

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

Exercises and Answers to Chapter 1

CME 106: Review Probability theory

Chapter 4 Multiple Random Variables

Introduction to Machine Learning

STA2603/205/1/2014 /2014. ry II. Tutorial letter 205/1/

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

EE4601 Communication Systems

ECE 4400:693 - Information Theory

FINAL EXAM: Monday 8-10am

Probability Review. Gonzalo Mateos

[Chapter 6. Functions of Random Variables]

Discrete Random Variables

Chapter 5. Random Variables (Continuous Case) 5.1 Basic definitions

Things to remember when learning probability distributions:

Sampling Distributions

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Chapter 7. Basic Probability Theory

1 Presessional Probability

1 Probability and Random Variables

4. Distributions of Functions of Random Variables

Probability Background

Lecture 11. Probability Theory: an Overveiw

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Probability Background

Algorithms for Uncertainty Quantification

ECE531: Principles of Detection and Estimation Course Introduction

Multivariate distributions

Mathematical Statistics 1 Math A 6330

A Probability Review

1 Random variables and distributions

Review: mostly probability and some statistics

Probability and Estimation. Alan Moses

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Chapter 5 continued. Chapter 5 sections

Continuous random variables

Multivariate Distributions (Hogg Chapter Two)

Transcription:

Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3 Statistics................................................. 3 Probability theory 3. Random variables............................................ 3. Vectors of random variables...................................... 4.3 Conditioning............................................... 4.4 Moments and expected values..................................... 5.5 Transformation of random variables................................. 5.5. One-dimensional case..................................... 5.5. Multi-variate case........................................ 6.5.3 Example............................................. 6.5.4 Special case: affine transformations............................. 7 3 Some useful distributions 8 3. Continuous RVs............................................. 8 3.. Dirac delta distribution.................................... 8 3.. Uniform distribution...................................... 8 3..3 ormal distribution...................................... 8 3..4 Multi-variate normal distribution.............................. 8 3..5 Exponential distribution.................................... 9 3..6 Chi-squared distribution.................................... 9 3..7 Beta distribution........................................ 9 3..8 Gamma distribution...................................... 9 3. Discrete RVs............................................... 0 3.. Kronecker delta distribution................................. 0 3.. Bernoulli distribution..................................... 0 3..3 Binomial distribution...................................... 0 3..4 Poisson distribution...................................... 0 4 Some important relations 4. Orthonormal transformation of independent normal RVs..................... 4. Statistics of normal random variables................................ 4.3 Chi-s quared distribution....................................... 4.4 Relation between Chi-squared and Gamma distribution...................... 3 4.5 Relation between Gamma and Beta distributions.......................... 3 4.6 Statistics from normal RVs....................................... 4

Introduction. Motivation - why study statistics? Statistical modeling and analysis, including the collection and interpretation of data, form an essential part of the scientific method in diverse fields, including social, biological, and physical sciences. Statistical theory is primarily based on the mathematical theory of probability, and covers a wide range of topics, from highly abstract areas to topics directly relevant for applications. The main goal of the theory of statistics is to draw information from data. Data can come in a variety of forms including signals in continuous time and lists of discrete-time values in dimensions. Important aspects include:. Model construction: to get insight into a problem, we build a generally drastically oversimplified model of the problem. This model should capture the properties in which we re interested and abstract away everything else. Example: Shannon s BSC in information theory.. Methods: given a certain model, we derive methods for extracting useful information from the data. Examples: MMSE estimation. 3. Performance comparison: different methods need to be compared in terms of certain performance criteria. We also introduce notions of optimality. Examples: information inequality. 4. Algorithm design: in most cases, estimation methods do not lead to closed-form solutions. We need to develop clever numerical methods to solve these problems. Example: ewton-raphson, Expectation-Maximization, Turbo codes.. Examples A sequence of elements from an assembly line. An unknown number of these elements θ is defective. We would like to know what θ is, but we do not have the time or resources to investigate each of the elements. We choose to draw without replacement n elements and try to determine θ. We wish to study how the income of a large population e.g., grad students is distributed. An exhaustive study of the entire population is impossible. We base our study on n samples. We make n observations to determine a constant µ. The observations are corrupted by random fluctuations. For instance we transmit a symbol b, +} n times and try to recover that bit in the presence of thermal noise. To solve these problems, we first require to construct a model. Let us consider some simple models: Define a random variable X k, as being the whether or not item k out of the n items is defective. So X k defective, OK}. A possible model could be independent observations with p Xk defective θ and p Xk OK θ. This model may not be correct: for instance we may have drawn samples just as after machine in the assembly line was repaired. We introduce a random variable, being the incomes of n people X [X,..., X n ]. The joint distribution of these incomes is given by p X x. Let us assume the incomes are independent, so that p X x n k p X k x k. Finally, let us model p Xk x k as a normal distribution with mean µ and variance σ, both independent of k. The observation is given by a random variable X [X,..., X n ], with X k µ + W k where w k is a noise sample. Clearly p X x can be found from p W w. ow we can introduce some additional assumptions regarding p W w. We can say that the noise at time k does not depend on the noise at time l k. The noise samples are independent. In that case p W w n k p Z k z k. We can also assume the noise samples are identically distributed, so that p Zk z k is independent of k. The specific distribution p Z z depends on the physical properties of the noise.

We see that we always need to introduce some simplifying assumptions. These are generally based on knowledge regarding the observations. For instance, when we see some observations looks like it is normally distributed, let us then model it so. These models are not always 00% accurate. Look at the income distribution. A normal distribution can result in some people having negative incomes!.3 Statistics Definition A statistic T is a map from an observation space Ω to some space of values J. T x is usually what you compute after you observe x. J is commonly within a Euclidean space. The choice of the statistics is closely related to what we are trying to infer from the data. Examples the fraction of defective items out of n samples, T x. the sample mean T x n k x k x the sample variance T x n k x k x Probability theory We now give a non-too-rigorous introduction into the basics of probability.. Random variables Given a sample space Ω e.g, heads, tails, Ω H, T }. A random variable RV X is a mapping from Ω to the real numbers R. When Ω is finite or countably infinite, X is said to be a discrete RV. Otherwise X is a continuous RV. With each RV we can associate a cumulative distribution function CDF: F X x P ω Ω : X ω x} where P E} is the probability of some event. We generally abuse the notation and write P X x} instead of P ω Ω : X ω x}. where the last equation is a slight abuse of notation. We will often use probability density functions pdfs p X x, defined as b a p X x dx P a X b} P ω Ω : a X ω b} ote that p X x and P X x} are not necessarily the same thing! ote also that + For continuous RVs, when F X x is differentiable, Similarly, p X x dx. d dx F X x p X x. F X x x p X u du. In the case of discrete RVs, we use a slightly different terminology: F X x is the cumulative mass function CMF, while p X x is the probability mass function pmf. In that case, we have p X x P X x}. For discrete RVs all integrals should be replaced by summations, and sections dealing with differentiations cannot be applied. 3

. Vectors of random variables The concept of a RV is easily extended to a multi-dimensional case. We can group n random variables X,..., X n in a vector X. This vector is again a random variable with probability density function p X x. X,..., X n are said to be mutually independent when p X x n p Xk x k. k X,..., X n are said to be identically distributed when p Xk x p Xl x for any k and l. In many problems we will consider variables which are independent and identically distributed iid. The marginal distributions p Xk x k, k,..., n can be obtained as follows p Xk x k... p X x dx... dx k dx k... dx n..3 Conditioning Given two random variables X and Y. The conditional probability function p X Y x y is given by p X Y x y p X,Y x, y p Y y for p Y y 0. This is to be interpreted as the probability function of x, given than Y y. In p X Y x y, x is the random variable, while y should be interpreted as a parameter. Also, p X Y x y dx for all y, while p X Y x y dx g x for some function g.. Examples: What happens when X and Y are independent? dice, with X k being the number of eyes of dice k and Y k even, odd}. Determine p X,X x, x, p X,Y x, y as well as the conditional probability functions. Bayes Rule Probably one of the most useful results is Bayes Rule. Looking back at, we easily find that so that p X,Y x, y p X Y x y p Y y p Y X y x p X x p X Y x y p Y X y x p X x p Y y Learn this rule by heart. You will be using this a lot! As a variation, note that p Y y p X,Y x, y dx py X y x p X x dx, so that p X Y x y p Y X y x p X x py X y x p X x dx which is known as Bayes Theorem. 4

.4 Moments and expected values We introduce the expectation operator on a RV X: given a function g : R Γ, the expectation or: expected value of g is w.r.t. p X x is given by E X g X} g x p X x dx. Observe that E X g X} is just an element in Γ, no longer dependent on x. A special case are the moments and the central moments µ n E X X n } n } µ n E X X µ n. The mean and variance of a distribution are given by µ and µ, respectively. We will sometimes denote the mean by µ and the variance by σ. The standard deviation is given by σ. Properties of the expectation operator Linearity E X g X + g X} E X g X} + E X g X} Uncorrelated RVs: X and Y are said to be uncorrelated when E X,Y X Y } E X X} E Y Y }. Show that independent RVs are uncorrelated. Show than uncorrelated RVs are not necessarily independent Conditional expectation E X Y g X} g x p X Y x Y dx, which is a function of the RV Y. Iterative expectations E X,Y g X, Y } E Y EX Y g X, Y } } Expectations and functions: let Y g X, then E X g X} E Y Y } so we can evaluate E Y Y } without explicit knowledge of p Y y..5 Transformation of random variables The discrete case is trivial, so we will focus on continuous RVs..5. One-dimensional case RV X and an invertible function f : R R. We wish to determine the probability distribution of Y f X. We see that X f Y g Y. It is easily verified that p Y y p X g y y. When f is not invertible, we use a different technique: F Y y P Y y} P f X y} which should be evaluated further and then differentiated wrt y. 5

.5. Multi-variate case Given real-valued RVs X, X,..., X and functions h,..., h h k : R R. We define Y [Y,..., Y ] T and X [X,..., X ] T where Y n h n X or simply Y h X. We assume h is one-to-one invertible, so that X h Y with X n g n Y. We now introduce the Jacobian as the determinant of the matrix J y with [J y] k,n y n g k y. Then p Y y p X h Y det J y..5.3 Example Problem: Given X and X with known p X,X x, x. Y X + X. Determine p Y y. Solution : We see that the transformation is not one-to-one. So we first introduce Y X X. ow Y h X is invertible: given Y and Y, we find X and X as X g Y, Y and Y + Y X g Y, Y so that Y Y [J y], y g y [J y], y g y [J y], y g y [J y], y g y 6

so that J y [ and det J y /. Hence Y + Y p Y,Y y, y p X,X, Y Y / p Y + Y X,X, Y Y ] And finally Solution : Since p Y y p Y,Y y, y dy F Y! X y x P Y y X x } P X + X y X x } P X y x } F X y x so that and p Y! X y x p X y x p Y y p X y x p X x dx which can be interpreted as a convolution of two pdfs..5.4 Special case: affine transformations Introduce an matrix A and an vector c. Define Y h X AX + c then h. is an affine transformation. When A is invertible X A Y c and so that J y A p Y y p X A y c deta p X A y c deta. When A is an invertible square matrix with AA T I A T A, we say that A is orthonormal. Orthonormal matrices are norm-preserving AX X where X k X k. 7

3 Some useful distributions For a more exhaustive list, go to http://mathworld.wolfram.com/topics/statisticaldistributions.html. With each distribution we will provide the mean µ and the variance σ. 3. Continuous RVs 3.. Dirac delta distribution The Dirac delta distribution is used when we have absolute certainty regarding a random variable. We write X δ x where δ x is defined as f x δ x dx f 0 and has µ σ 0. 3.. Uniform distribution X U a, b, with b > a, then Also and p X x b a a < x < b 0 else µ b a σ b a.. 3..3 ormal distribution Probably the most important distribution. X µ, σ p X x 3..4 Multi-variate normal distribution exp πσ σ x µ X [X,..., X ] is said to have a multi-variate normal distribution X m, Σ if its probability function has the form p X x π / det Σ exp x mt Σ x m where and Properties E X X} m E X X m X m T } Σ. When X m, Σ, then X k µ k, σ k. Determine µk and σ k. When X m, Σ and X k and X l are uncorrelated, then they are also independent. X k µ k, σ k does not imply X m, Σ! ot a function!. 8

3..5 Exponential distribution X has an exponential distribution with rate parameter λ > 0 when p X x λe λx x > 0 0 else. Then µ λ σ λ 3..6 Chi-squared distribution When Y k 0,, Y,..., Y n independent, then X n degrees of freedom. We write X χ n with where Γ z is the Gamma function. Also, Properties k Y k p X x x n e x/ Γ n, x > 0 n/ µ n σ n X k χ n k, X,..., X L independent, then L k X k χ P k n k. Γ z + zγ z Γ. So, Γ n n! Γ / π. 3..7 Beta distribution X β r, s is defined for x [0, ] with where B r, s is the beta function, given by p X x xr x s B r, s B r, s Γ r Γ s Γ r + s. Mean and variance are given by µ s/ s + r, and σ rs/ has a chi-squared distribution with n s + r r + s +. 3..8 Gamma distribution X Γ k, θ for x > 0, k > 0 and θ > 0, with p X x xk e x/θ Γ k θ k. Mean and variance are given by µ θk, and σ θ k. Somewhat confusingly, sometimes you will see λ θ but write X Γ k, λ for p X x x k e xλ λ k /Γ k. Beware! 9

3. Discrete RVs 3.. Kronecker delta distribution This is the discrete counterpart to the Dirac delta distribution: p X x δ x with x 0 δ x 0 x 0 and σ µ 0. 3.. Bernoulli distribution There are two possible outcomes 0 failure and success with probability p and p, respectively. p X x p x 0 p x with µ p σ p p 3..3 Binomial distribution X is the number of successes out of Bernoulli trials p X x p x p x x for x 0,,..., } where and x! x! x! µ p σ p p 3..4 Poisson distribution Events occur with a known average rate /λ expressed in events per unit of time. Then the distribution of the number of events X in a unit of time has the following distribution for x, with p X x e λ λ x x! µ λ σ λ 0

4 Some important relations 4. Orthonormal transformation of independent normal RVs Theorem Z Z, Z,...Z n T has independent normal distributed elements with the same variance σ, and expected value E Z} d. Y g Z AZ + c, where A is an n n orthonormal matrix. Then Y has independent normal components with the same variance σ and E Y} Ad + c. Proof: We know that since det A for orthonormal matrices. Hence ow, since p Z z p Y y p Y A y c deta p Y A y c πσ n exp n exp πσ n exp πσ σ n z i d i i z d σ z d σ z d A y c d y Ad + c where we have used the fact that A and A are norm-preserving. This leads to p Y y. n exp y Ad + c πσ σ which proves that the Y k s are independent, normal distributed, variance σ with E Y} Ad + c. QED. 4. Statistics of normal random variables Theorem: Let Z [Z,..., Z ] be a sample iid from a µ, σ population. Then. Z i Z i and i Zi Z are independent. Z µ, σ 3. σ i Zi Z χ Proof: We introduce an orthonormal matrix A and Y AZ with Y n Z and i Y i i Zi Z. This matrix is constructed as follows: select the first row as [,,,..., ] and the remaining rows are then obtained by the Gram-Schmidt orthogonalization procedure: Y i Z i Y Z

and i Y i Y AZ Z Zi. i We find that i Y i Zi nz i Z i Z. i Since A is an orthonormal matrix, a T i a j δ i j, where a T i denotes the i-th row of A. Since at [,,,..., ], we see that for j a T j a 0 a ji i so that for j E Y j } E a T j Z } a T j E Z} µ 0. i a ji We can draw the following conclusions: Y, Y,..., Y are independent normal RVs with variance σ and E Y j } 0 for j, and E Y } µ µ. Y µ, σ so that Z Y / µ, σ /. This proves the second part of the theorem. Y,..., Y 0, σ iid, so that Y /σ,..., Y /σ 0,. We see that proves the third part of the theorem. i Y χ n. This Since Y,..., Y are independent of Y, it follows that Z is independent of i Zi Z. This proves the first part of the theorem. 4.3 Chi-s quared distribution Z µ. σ Relate the result to the χ distribu- Problem: Z µ, σ. Determine the distribution of X tion. Solution:

Y Z µ σ, with Y 0,. Since X Y is a non-invertible function, we cannot use the Jacobian. However F X x P X x} P Y x } P x Y x } F Y x FY x Taking the derivative wrt x, and noting that p Y y is an even function: p X x p Y x x + p Y x x p Y x x exp x πx If we consider the χ distribution, this gives us the same result: x e x/ Γ e x/ / π x 4.4 Relation between Chi-squared and Gamma distribution Problem: How are the gamma distribution and the χ distribution related? Solution: Γ k, θ with k n/ and θ yields which is clearly equal to the χ n distribution. n Γ, xn/ e x/ Γ n n/ 4.5 Relation between Gamma and Beta distributions Problem: X Γ k, θ and X Γ k, θ independent. A Determine the distribution of Y X + X and of Y X / X + X. B Show that Y and Y are independent. C Show from this result that the sum of squared iid zero-mean unit-variance normal RVs has a χ distribution. Solution: A Since we can always introduce Z X /θ, with Z Γ k,, we can assume θ without loss of generality. Since X and X are independent p X,X x, x xk e x Γ k x k e x Γ k Since Y X + X and Y X / X + X, X Y Y and X Y Y Y. Hence y y J y y y and det J y y y y + y y y. We find that p Y Y y, y p X X y y, y y y y yk y k e yy y k y k e y e +yy Γ k Γ k 3 y

y k +k e y y k y k Γ k + k Γ k + k Γ k Γ k yk +k e y Γ k + k y k y k B k, k so that Y Γ k + k, and Y β k, k. B follows from A, since p Y Y y, y can be written as p Y Y y, y p Y y p Y y. CWhen X l Γ, independent, we now know that n l X l Γ n Y l 0,, and X l Yl, then X l χ Γ,. So, n l Y l n l Y l Γ n, χ n, n l Y l χ n. 4.6 Statistics from normal RVs,. When we introduce Γ n,. And since Problem: X [X,..., X ] is a vector of iid RVs with X k µ, σ. Compute the expected value of the following RVs: A Sample mean M k X k B Sample variance with known mean C Sample variance with unknown mean Solution: A B R S X k µ k X k M k E M M} E X µ E R R} E X σ } X k k E Xk X k } k } X k µ k E Xk X k µx k + µ } k µ + σ µ + µ k 4

C E S S} σ } E X X k M k } E X Xk + M M X k k k } E X Xk + M M E X k } Xk M k n } E X X k E X M } k µ + σ µ + σ Verify that E X X k } µ + σ and that E X M } σ + µ. 5