Analysis of Experimental Designs

Similar documents
1 Probability and Random Variables

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

Stat 704 Data Analysis I Probability Review

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Set Theory. Pattern Recognition III. Michal Haindl. Set Operations. Outline

Chapter 2. Continuous random variables

Review of Statistics I

Recitation 2: Probability

Lecture 11. Probability Theory: an Overveiw

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

EE4601 Communication Systems

A random variable is a quantity whose value is determined by the outcome of an experiment.

Random Variables and Their Distributions

Will Landau. Feb 28, 2013

Introduction to Normal Distribution

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Lecture 2: Review of Probability

2 Functions of random variables

Lecture 11. Multivariate Normal theory

Discrete Random Variables

S n = x + X 1 + X X n.

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Chapter 2. Probability

Examples of random experiment (a) Random experiment BASIC EXPERIMENT

II. The Normal Distribution

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Lecture 2: Repetition of probability theory and statistics

Bivariate distributions

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Brief Review of Probability

Random Variables. P(x) = P[X(e)] = P(e). (1)

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Homework for 1/13 Due 1/22

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Appendix A : Introduction to Probability and stochastic processes

Preliminary Statistics. Lecture 3: Probability Models and Distributions

7.3 The Chi-square, F and t-distributions

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Asymptotic Statistics-VI. Changliang Zou

Algorithms for Uncertainty Quantification

BASICS OF PROBABILITY

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Math438 Actuarial Probability

CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution

Chapter 4. Continuous Random Variables

Joint Probability Distributions and Random Samples (Devore Chapter Five)

7 Random samples and sampling distributions

This does not cover everything on the final. Look at the posted practice problems for other topics.

Continuous r.v. s: cdf s, Expected Values

1 Review of Probability and Distributions

Continuous Random Variables and Continuous Distributions

BACKGROUND NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2016 PROBABILITY A. STRANDLIE NTNU AT GJØVIK AND UNIVERSITY OF OSLO

Single Maths B: Introduction to Probability

Continuous Random Variables

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

Outline PMF, CDF and PDF Mean, Variance and Percentiles Some Common Distributions. Week 5 Random Variables and Their Distributions

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Actuarial Science Exam 1/P

STAT 512 sp 2018 Summary Sheet

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Exercises and Answers to Chapter 1

Review of Probabilities and Basic Statistics

Brief Review of Probability

2 (Statistics) Random variables

Sampling Distributions

Topic 6 Continuous Random Variables

Discrete Probability distribution Discrete Probability distribution

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Limiting Distributions

Relationship between probability set function and random variable - 2 -

Statistics, Data Analysis, and Simulation SS 2015

Probability Review. Chao Lan

Probability Density Functions

Def 1 A population consists of the totality of the observations with which we are concerned.

Math 362, Problem set 1

Introduction to Probability and Stocastic Processes - Part I

IV. The Normal Distribution

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Continuous Random Variables

1 INFO Sep 05

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Lecture 2: Review of Basic Probability Theory

Introduction to Statistics and Error Analysis II

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

IV. The Normal Distribution

Math/Stat 352 Lecture 9. Section 4.5 Normal distribution

Statistics, Data Analysis, and Simulation SS 2017

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Transcription:

Analysis of Experimental Designs p. 1/? Analysis of Experimental Designs Gilles Lamothe Mathematics and Statistics University of Ottawa

Analysis of Experimental Designs p. 2/? Review of Probability A short Introduction to SAS Probability Random Variables Expectation Operator E{} Variance Operator σ 2 {} Independence A few important models

Analysis of Experimental Designs p. 3/? Short Intro to SAS Here is the URL for the course Web page: http://aix1.uottawa.ca/ glamothe/mat3378spring2012 Scroll down to updates on the course Web page and you will find a link to an Introduction to SAS. Set up a computer account at the CUBE computer lab on the 2nd floor. The SAS windows environment is composed of five different types of windows: Results, Explorer, Log, Output and Enhanced Editor.

Analysis of Experimental Designs p. 4/? Short Intro to SAS Consider a standard normal random variable Z N(0,1). Its p.d.f. is ( ) f(z) = 1 2π exp z2 2, < z <. We say that the quantile 1.96 is a 97.5th percentile of N(0,1) since P(Z < 1.96) =.975

Analysis of Experimental Designs p. 5/? Short Intro to SAS SAS program: We enter a SAS program into the editor window. Consider the following program: data quantiles; z=probit(0.975); proc print data=quantiles; run; Submitting: In the menu, select run and then select submit. The output will be sent to the output window. We got: Obs z 1 1.95996

Analysis of Experimental Designs p. 6/? Short Intro to SAS Suppose now that we want to compute P(Z > 1.96) = 1 P(Z 1.96). SAS program: data prob; p=1-cdf( normal,1.96); proc print data=prob; run; Submitting: In the menu, select run and then select submit. The output will be sent to the output window. We got: Thus, P(Z > 1.96) = 0.025. Obs p 1 0.024998

Analysis of Experimental Designs p. 7/? Short Intro to SAS Example: Write a SAS program to compute the probability that a standard normal random variable will be as far away from 0 as 1.56.

Analysis of Experimental Designs p. 7/? Short Intro to SAS Example: Write a SAS program to compute the probability that a standard normal random variable will be as far away from 0 as 1.56. We want 2P(Z > 1.56) = 2[1 P(Z 1.56)].

Analysis of Experimental Designs p. 7/? Short Intro to SAS Example: Write a SAS program to compute the probability that a standard normal random variable will be as far away from 0 as 1.56. SAS program: data prob; p=2*(1-cdf( normal,1.56)); proc print data=prob; run; From the output, we get 2P(Z > 1.56) = 0.11876.

Analysis of Experimental Designs p. 8/? Terminology from Probability Theory Consider a random experiment, that is a process which will result in outcomes where we cannot predict the outcome with certainty. S=set of all possible outcomes. A subset E of the sample space S is called an event. Example: Consider the set of possible grades for this course S = {A+,A,A,...,E,F}. The event that you get at least an A is E = {A+,A,A }. We say that the event has occurred if the observed outcome is in the event.

Analysis of Experimental Designs p. 9/? Terminology from Probability Theory The goal of probability theory is to construct measures of the chances that an event will occur. According to Kolmogorov such a measure should satisfy the following axioms: Probability Axioms: Suppose that we can construct a function P such that [Certainty] P(S) = 1; [Positivity] P(E) 0 for all events E; [Additivity] Let E 1,E 2,... be mutually exclusive events, then P(E 1 E 2...) = P(E 1 )+P(E 2 )+...; then we say that P is a probability measure.

Analysis of Experimental Designs p. 10/? Terminology from Probability Theory Consequences of the axioms: P(A c ) = 1 P(A) if A B, then P(A) P(B) 0 P(E) 1 Addition Rule: P(A B) = P(A)+P(B) P(A B) A B

Analysis of Experimental Designs p. 11/? Terminology from Probability Theory Law of Large Numbers: Suppose that we can repeat n independent trials of the experiment. Let Y be the number of times that E occurs and p = P(E). Then, for all ɛ > 0, lim P ( Y/n p ɛ) = 0. n The above result leads to the following approach that allows us to interpret probabilities. Frequentist Approach: A probability refers to limiting relative frequencies. Example: If we say that the probability that a salesman will sell a car to the next customer is 20%. The frequentist interpretation is that in the long run this salesman sells cars to about 20% of his customers.

Analysis of Experimental Designs p. 12/? Terminology from Probability Theory A random variable X is a function that associates a real number X(s) to each outcome s. We use a cummulative distribution function (c.d.f.) F X to specify probabilities associated with X: F X (x) = P(X x) = P(s S : X(s) x) F(x) = x i :x i x f(x i ) F(x) = x f(y)dy

Analysis of Experimental Designs p. 13/? Terminology from Probability Theory We can also specify the probabilities with f X : for X discrete, we use the prob. mass function for X continuous, we use the prob. density function Recall: Area under the density are probabilities. f(x i ) = P(X = x i ) f(x) = d dx F X(x)

Analysis of Experimental Designs p. 14/? Terminology from Probability Theory The expectation of X (also called its mean) is defined as a weighted average of the values in its support (i.e. possible values x taken by X), it is { xf(x), discrete case µ X = E{X} = xf(x), discrete case Remark: µ X = E{X} is a measure of central tendency and also of location. It is a Linear Operator: E{aX +b} = ae{x}+b

Analysis of Experimental Designs p. 15/? Normal Model f(x) = 1 σ 2π e 1 2( x µ σ ) 2 The mean µ is a measure of location and of central tendency. µ x

Analysis of Experimental Designs p. 15/? Normal Model f(x) = 1 σ 2π e 1 2( x µ σ ) 2 The mean µ is a measure of location and of central tendency. µ µ 1 x µ 1 > µ

Analysis of Experimental Designs p. 15/? Normal Model f(x) = 1 σ 2π e 1 2( x µ σ ) 2 The mean µ is a measure of location and of central tendency. µ 1 µ µ 1 < µ x

Analysis of Experimental Designs p. 16/? Terminology from Probability Theory The variance of X is defined as the mean squared deviations from the mean: σ 2 {X} = E[(X µ X ) 2 ] = E[X 2 ] µ 2 X Remark: σ 2 {X} is a measure of dispersion and variability. It is not a linear operator. Actually σ 2 {ax +b} = a 2 σ 2 {X} Square Units of σ 2 : Since the units of the variance are squared, we often refer to the standard deviation σ{x} = σ 2 {X} Remark: σ{x} is a measure of scale. Of course it also measures dispersion and variability.

Analysis of Experimental Designs p. 17/? Normal Model f(x) = 1 σ 2π e 1 2( x µ σ ) 2 The standard deviation σ is a measure dispersion and variability. µ x

Analysis of Experimental Designs p. 17/? Normal Model f(x) = 1 σ 2π e 1 2( x µ σ ) 2 The standard deviation σ is a measure dispersion and variability. µ x σ 1 > σ

Analysis of Experimental Designs p. 17/? Normal Model f(x) = 1 σ 2π e 1 2( x µ σ ) 2 The standard deviation σ is a measure dispersion and variability. µ x σ 1 < σ

Analysis of Experimental Designs p. 18/? Joint Distributions Consider two random variable X 1 and X 2. We say that they will have a joint c.d.f. F(x 1,x 2 ) = P({X 1 x 1 } {X 2 x 2 }). Discrete case: The joint probability mass function is f X1,X 2 (x 1,x 2 ) = P({X 1 = x 1 } {X 2 = x 2 }). Frequentist Interpretation: f X1,X 2 (1,5) = 0.2 means that if we repeat the experiment a larger number of times, then for about 20% of the trials we will observe {X 1 = 1,X 2 = 5}. Continuous case: The joint probability density function is f X1,X 2 (x 1,x 2 ) = x 1 x2 F X1,X 2 (x 1,x 2 )

Analysis of Experimental Designs p. 19/? Expectation The expectation of h(x 1,...,X n ) is E{h(X 1,...,X n )} = { h(x1,...,x n )f(x 1,...,x n ), discrete case h(x1,...,x n )f(x 1,...,x n )dx 1 dx n, continuous case It is a linear operator: E{b+ n i=1 a ix i } = b+ n i=1 a ie{x i }

Analysis of Experimental Designs p. 20/? Independence Consider two random variable X and Y. We say that they are independent if P({X A} {Y B}) = P(X A)P(Y B) for all events {X A} and {Y B} Why is this independence? Assuming that P(X A Y B) and P(Y B X A) exist. Independence is equivalent to the following: P(X A Y B) = P(X A) P(Y B X A) = P(Y B) Consequences of independence: f(x,y) = f X (x)f Y (y) E[h 1 (X)h 2 (Y)] = E[h 1 (X)]E[h 2 (Y)]

Analysis of Experimental Designs p. 21/? Independent Random Variables We can cumulate the variance from independent random variables: Let X and Y be independent random variables, then σ 2 {X +Y} = E{[(X +Y) (µ X +µ Y )] 2 } = E{(X µ X ) 2 }+E{(Y µ Y ) 2 } 2E{X µ X }E{Y µ Y } = E{(X µ X ) 2 }+E{(Y µ Y ) 2 } = σ 2 {X}+σ 2 {Y} Linear functions of independent random variables: Consider Y 1,Y 2,...,Y n independent random variables. σ 2 {b+ n i=1 a iy i } = n i=1 a2 i σ2 {Y i }

Analysis of Experimental Designs p. 22/? Random Sample Consider a random sample X 1,...,X n from a population with mean µ and variance σ 2. In other words, the random variables are independent and E[X i ] = µ and σ 2 {X i } = σ 2. Show that E{X} = µ, and σ 2 {X} = σ 2 /n. Furthermore show that E{S 2 } = σ 2. X = (1/n) n i=1 X i is the sample mean and S 2 = is the sample variance. n i=1 (X i X) 2 n 1 hint: E{X 2 } = σ 2 {X}+(E{X}) 2 = ( n i=1 X2 i) nx 2 n 1

Analysis of Experimental Designs p. 23/? A few prob. models There are many important probability models that are used in statistics. We will present a few of these models. discrete uniform normal chi-square Student s t Snedecor s F distribution

Analysis of Experimental Designs p. 24/? A few prob. models Important for rank based statistics. We say that X follows a discrete uniform distribution on {1,2,...,m} if its prob. mass function is f(x) = 1/m for x {1,2,...,m}. E{X} = m i=1 i(1/m) = (m+1)/2 σ 2 {X} = E{X 2 } µ 2 X = m i=1 i2 (1/m) Example: Uniform on {1,2,3,4,5}: [ (m+1) 2 ] 2 = m 2 1 12

Analysis of Experimental Designs p. 25/? Normal Model A normal random variable X with mean µ and variance σ 2, i.e. X N(µ,σ 2 ), has the following density f(x) = 1 σ 2π e 1 2( x µ σ ) 2, < x < µ x The normal (gaussian) distribution is often used as an approximative model for quantitative variables. It is often a reasonable model since many variables are the result of many different additive effects.

Analysis of Experimental Designs p. 26/? Normal Model Is the normal curve a reasonable model? Abraham de Moivre Pierre-Simon Laplace CAUTION: Assuming normality is not always appropriate!! However due to the Central Limit Theorem the assumption of normality is often reasonable.

Analysis of Experimental Designs p. 27/? A few prob. models The family of normal distributions is closed under linear transformations of independent normal random variables. if Y i N(µ i,σ 2 i ), i = 1,...,n and Y 1,...,Y n are independent, then W = b+ n i=1 a iy i N(E{W},σ 2 {W}), where E{W} = b+ n i=1 a iµ i and σ 2 {W} = n i=1 a2 i σ2 i Consequences: [standardizing] if X N(µ,σ 2 ), then Z = (X µ)/σ = (1/σ)X µ/σ N(0,1). [sample mean] if X 1,...,X n is a random sample from N(µ,σ 2 ), then X = n i=1 X i/n N(µ,σ 2 /n).

Analysis of Experimental Designs p. 28/? A few prob. models The following three distributions are the distributions that we will use the most throughout this course. chi-square distribution t distribution F distribution In the following slides we will remind you how to construct these distributions from the normal distribution.

Analysis of Experimental Designs p. 29/? Chi-Square: χ 2 Let Z N(0,1), then Z 2 χ 2 (1), that is a chi-square distribution with ν = 1 degree of freedom. Theorem: Let U i be a chi-square random variable with ν = ν i degrees of freedom. Notation: χ 2 (ν i ). If U 1,...,U n are independent, then U 1 +...+U n χ 2 (ν 1 +...+ν n ) Remark: Assume that X i N(µ,σ 2 ) where µ is known. We can construct a χ 2 random variable to collect information concerning the unknown variance. n ) i=1 (X i µ) 2 2 χ 2 (n) σ 2 = n i=1 ( Xi µ σ

Analysis of Experimental Designs p. 30/? Chi-Square: χ 2 Notation: We will use χ 2 (ν) to denote a χ 2 distribution with ν degrees of freedom. However it will also be used as the notation of a random variable with that distribution. P(χ 2 (6) > 12.6) = 0.05. Quantile Notation: χ 2 (A;ν) is a quantity such that P(χ 2 (ν) χ 2 (A;ν)) = A. Thus χ 2 (0.95;6) = 12.6.

Analysis of Experimental Designs p. 31/? Student stdistribution Theorem: Let Z N(0,1) and U χ 2 (ν) be independent random variables, then T = Z/ U/ν t(ν). Notation: We use t(ν) to denote both a t distribution with ν degrees of freedom and also a random variable with this distribution. Here P(t(10) > 1.18) = 0.05. Quantile Notation: t(0.95; 10) = 1.18.

Analysis of Experimental Designs p. 32/? Random Sample fromn(µ,σ 2 ) Consider X 1,...,X n a random sample from a N(µ,σ 2 ). The sample mean : X = 1 n n i=1 X i The sample variance : S 2 = Theorem: X N(µ,σ 2 /n) (n 1)S 2 /σ 2 χ 2 (n 1) X and S 2 are independent. n 1 i=1 (X i X) 2 n 1 Standard Application from your Intro to Stats class: T = X µ = (X µ)/ σ 2 /n t(n 1) S2 /n [(n 1)S2 /σ 2 ]/(n 1)

Analysis of Experimental Designs p. 33/? Snedecor sf distribution Theorem: Let U 1 χ 2 (ν 1 ) and U 2 χ 2 (ν 2 ) be independent random variables, then F = (U 1 /ν 1 )/(U 2 /ν 2 ) F(ν 1,ν 2 ). Notation: We use F(ν 1,ν 2 ) to denote both an F distribution with ν 1 in the numerator and ν 2 degrees of freedom in the denominator and also a random variable with this distribution. Here P(F(10, 15) > 2.54) = 0.05. Quantile Notation: F(0.95; 10, 15) = 2.54.

Analysis of Experimental Designs p. 34/? Snedecor sf distribution Let X 1,...,X n be a random sample from a N(µ,σ 2 ) population. We showed that T = X µ t(n 1) S2 /n Show that T 2 = [ 2 X µ F(1,n 1) S2 /n]