REVIEW: Midterm Exam Spring 2012
Introduction Important Definitions: - Data - Statistics - A Population - A census - A sample
Types of Data Parameter (Describing a characteristic of the Population) Statistic (Describing a characteristic of the Sample) -QUALITATIVE DATA (Categorical or Attribute Data) -QUANTITATIVE DATA: - Discrete - Continuous Levels of Measurement: - Nominal - Ordinal - Interval - Ratio
Design of Experiments An observational study (don t attempt to modify the subjects) An experiment (treatment group vs. control group) Types of Observational Studies: Cross-sectional Retrospective (or case-control) Prospective (or longitudinal or cohort)
Problems Confounding (confusion of variables effects) How to solve this problem?: Blinding (placebo effect, single-blind, double-blind) Blocking Randomization: Completely randomized design Randomized block design
Sampling strategies Random sample Simple random sample (assumed throughout the book) Systematic sampling Convenience sampling Stratified sampling Cluster sampling Sampling Error :difference between sampling result and the true population result Nonsampling error: Sample data incorrectly collected
Important characteristics of data Center Variation Distribution Outliers Time
Frequency distribution Counts of data values individually or by groups of intervals Other forms: Relative frequency distribution (divide each class frequency by the total of all frequencies) Cumulative frequency distribution (cumulative totals) Histogram: Graphical representation of the frequency distribution
Other graphs Relative frequency histogram Frequency polygon Dotplots Steam-Leaf plots Pareto chart Pie Charts Scatter Diagrams Time series graphs
Examples: Histogram and Scatter plot
Measures of center Sample Mean: = Median: Middle value Mode: Most frequent value Bimodal Multimodal No mode Midrange=
Skewed distributions
Measures of Variation Range= (Maximum value-minimum value) Sample standard deviation: Variation from the mean = ( ) 1 Population standard deviation: = ( ) Sample variance: = ( ) 1 Population Variance: = ( )
Measures of Variation (Cont.) Sample Coefficient of Variation: =. 100% Population coefficient of variation: =. 100% Range Rule of Thumb 4 Minimum usual value: (mean)-2 x (standard deviation) Maximum usual value: (mean)+2 x (standard deviation)
Rule of data with Bell-Shaped distribution About 68% of all values fall within 1 standard deviation of the mean About 95% of all values fall within 2 standard deviations of the mean About 99.7% of all values fall within 3 standard deviations of the mean
Z Scores Sample Population = = Ordinary values: -2 z score 2 Unusual value: z score < -2 or z score> 2
Quartiles and Percentiles Quartiles: Separate a data set into four parts Q1 (First): Separates bottom 25% of the sorted values from the top 75% Q2 (Second): Same as the median Q3 (Third): Separates bottom 75% of the sorted values from the top 25% Percentiles: Separate the data into 100 parts (P 1, P 2,, P 99 ) Percentile value of x=. 100 Intercuartile range= Q3-Q1
Boxplots
Probability Definitions: An event A simple event The Sample Space Notation P: Probability A,B and C: specific events P(A): Probability of event A occurring
Definitions of Probability Frequency approximation: P(A)= Classical Approach: P(A)= Subjective Probability = LAW OF LARGE NUMBERS: A procedure is repeated many times. Relative frequency probability tends to the actual probability
Properties of probability Probability of an impossible event is 0 Probability of an event that is certain is 1 For any event A, 0 P(A) 1 P(Complement of event A)=P( ) = 1 ( ) Addition Rule: P(A or B)=P(in a single trial, event A occurs or event B occurs or they both occur)= P(A)+P(B)-P(A and B) Or P(A B)= P(A)+P(B)-P(A B) Events A and B are disjoint if P(A B)=0
Multiplication Rule P(A and B)=P(event A occurs in the first trial and event B occurs in a second trial) =. Independent events: P(B A)=P(B) If A and B are independent: =. ( ) Conditional probability: = ( ) ( )
Bayes Theorem =. ( ). [. ]
Probability distributions Definitions: Random Variable (x): Numerical value given to an outcome of a procedure. Example: Number Mountain lions seen at UCSC campus last year Probability distribution (P(x)): Gives the probability to each value of the random variable. Types of random variables: Discrete Continuous
Requirements of a Probability distribution = 1(Discrete case) 0 P(x) 1 Expected value of a discrete random variable = [. ] Discrete Distributions: Binomial Poisson
Binomial distribution Requirements: Fixed number of trials Trials are independent Each trial can be a success or a failure Probabilities remain constant Random variable: x=number of successes among n trials =!.!!. (You can also use the Binomial Table) n= number of trials p=probability of success in one trial q=probability of failure in one trial (q=1-p)
Mean,Variance and Standard deviation of the Binomial distribution Mean: = Variance: = Standard deviation: Maximum usual value: + 2 Minimum usual value: 2
Poisson distribution Requirements: Random variable x is the number of occurrences of an event over some interval The occurrences must be random The occurrences must be independent =. The Poisson distribution only depends on (the mean of the process)!
Mean, Variance and Standard deviation of the Poisson distribution Mean: Variance: Standard deviation: Maximum usual value: + 2 Minimum usual value: 2
Continuous distributions Uniform distribution Normal distribution Density curve: Graph of a continuous distribution Properties: Area below the curve is equal to 1 All points in the curve are greater or equal than zero
Uniform and Normal distributions
Sampling distributions Variation of the value of a statistics from sample to sample: Sampling variability Sampling distribution of the sample mean Sampling distribution of the sample proportion CENTRAL LIMIT THEOREM: The random variable x has a distribution (normal or not) with mean and standard deviation The distribution of the sample means will approach to a normal distribution as the sample size increases.
Mean and standard deviation of the sample mean Mean: = Standard deviation: =
Normal approximation to the Binomial If np 5 and nq 5 a Binomial random variable x can be approximated with a Normal distribution with mean and standard deviation: Mean: = Standard deviation:
Confidence Interval for the Population Proportion (p) p=population proportion = = sample proportion of successes = 1- = sample proportion of failures
Procedure to build a CI of confidence level (1-1) Check the normal approximation to the Binomial distribution (np 5 and nq 5 ) 2) Get the critical value / 3) Evaluate the margin of error: = /. 4) Confidence Interval: < < + ± (, + ) 5) Interpret results
Sample size for estimating proportion p is given: = [ / ] is not given: ( is assumed = 0.5) = [ / ] 0.25
Finding (point estimate) and E from the Confidence Interval Point estimate: = ( ) Margin of Error: E= ( )
Confidence Interval for the Population Mean ( Check Requirements: Sample is a simple random sample Population standard deviation is known Population is normally distributed or n>30 Procedure 1) Check normality requirements 2) Get the critical value / 3) Evaluate the margin of error: = /. 4) Confidence Interval: < < + ± (, + ) 5) Interpret results
Sample size for estimating Mean =. Values of, and E are given.
Confidence Interval for the Population Mean ( In this case we use the Student t distribution with n-1 degrees of freedom Check Requirements: Sample is a simple random sample Population standard deviation is estimated by s (sample standard dev.) Population is normally distributed or n>30 Procedure 1) Check normality requirements 2) Get the critical value / with n-1 degrees of freedom 3) Evaluate the margin of error: = /. 4) Confidence Interval: < < + ± (, + ) 5) Interpret results
Finding point estimate and E from Confidence Interval Point estimate of : + ( ) = 2 Margin of Error E= ( )