Lecture 6 Exploratory data analysis Point and interval estimation

Size: px
Start display at page:

Download "Lecture 6 Exploratory data analysis Point and interval estimation"

Transcription

1 Lecture 6 Exploratory data analysis Point and interval estimation Dr. Wim P. Krijnen Lecturer Statistics University of Groningen Faculty of Mathematics and Natural Sciences Johann Bernoulli Institute for Mathematics and Computer Science October 26, 2010

2 Lecture overview Exploratory data analysis Numerical summaries mean median measures of association between variable (Pearson product moment correlation, Spearman s rank correlation coefficient) brief overview of exploratory data visualizations Histogram and density plot Quantile-Quantile plot Empirical cumulative distribution function Box-and-wiskers-plot Point estimation by Maximum Likelihood Interval estimation 2

3 Exploratory data analysis generates hypotheses (inductive) let the data speak thick well on data you (don t) have Assumptions: 1. random samples 2. finite variance 3. population density unchanged under sampling 3

4 Numerical summaries sample X 1,, X n (rv) has realizations x 1,, x n ( R) sample mean X = 1 n n X i (= r.v.) is random variable with distribution sample mean with size n possibly infinite µ = x = 1 n n x i (= fixed number) is fixed number without distribution; always E[X] = µ popu- sample sample statistic lation estimator (rv) estimate (fixed) mean µ X = 1 n n X i x = 1 n n x i variance σ 2 S 2 = 1 n n 1 (X i X) 2 s 2 = 1 n n 1 (x i x) 2 4

5 Determinations of copper in wholemeal flour chem: 24 determinations of copper in wholemeal flour (ppm) Large study suggests µ = 3.68 (Venables & Ripley, 2002) Median = middle value of data (50% >, 50% <) trimmed mean = mean leaving out percentage of extreme data > library(mass) > c(mean(chem),median(chem)) [1] > x <- sort(chem, decreasing=true, index.return=true) > x$x [1] [13] > plot(x$x) > mean(chem, trim = 1/24)#exclude smallest, largest [1] > x$x [1] [13] > mean(x$x[2:23]) #=

6 sorted chem data 6

7 Measure of spread of data Range = largest minus smallest Sample variance= S 2 = 1 n n 1 (X i X) 2 Interquartile Range (IQR)= upper quartile - lower quartile lower/upper quartile have 25% / 75% lower values > range(chem) [1] > c(var(chem),var(x$x[2:23])) [1] #great difference! > summary(chem) Min. 1st Qu. Median Mean 3rd Qu. Max > IQR(chem) [1] > quantile(chem,3/4) - quantile(chem,1/4)

8 Chebyshev and empirical rules P ( X µ < kσ) 1 1 k 2 probability is at least 1 1/k 2 that X takes value k standard deviations from the mean it is general, but often imprecise Empirical rule for approximately normal data 68% of observations within 1 standard deviation from mean 95% of observations within 2 sd from mean 99.7% of observations within 3 sd from mean > pnorm(1) - pnorm(-1) [1] > pnorm(2) - pnorm(-2) [1]

9 Measures of association between variables Correlation coefficient: measure of strength of linear relationship (Pearson) ρ = COV (X, Y ) V [X] V [Y ] = E[(X µ X )(Y µ Y )] E[(X µx ) 2 ] E[(Y µ Y ) 2 ] Properties ρ = n (x i x i )(y j y j ) n (x i x i ) 2. n (y j y j ) 2 1 ρ 1; bounded measure of linear relationship if ρ = ±1 there are a and b: Y = ax + b ρ > 0 both X, Y increase/decrease ρ not robust against outliers under normality ρ measures stochastic dependence; ρ = 0 independence ρ is symmetric ρ(x, Y ) = ρ(y, X) 9

10 Teaching Demonstrations Interactive graphical visualizations of correlation coefficient Minimize other screens and interactively use Tk slider library(teachingdemos) run.cor2.examp(n=500, wait=false) sensitivity to outliers: put a few points in small circle, add one far away put.points.demo() Conclusion: extreme outlier can have large influence on (non-robust) statistic 10

11 Spearman s rank correlation coefficient in case of outliers: use rank correlation coefficient ρ = 12 n(n 1)(n + 1) n ( rank(x i ) n + 1 ) ( rank(y i ) n + 1 ) 2 2 Example: Is there a correlation between Hunter s L measure of lightness (x) to the averages of consumer panel scores averaged over 80 (y) for 9 lots of canned tuna. > x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1) > y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8) 11

12 Assessment of tuna quality (Hollander & Wolfe, 1973) > n <- length(x) > sumr <- sum((rank(x)-(n+1)/2)*(rank(y)-(n+1)/2)) > (rhohat <- 12 * sumr /(n*(n-1)*(n+1))) [1] 0.6 #value of spearman rho > cor.test(x,y,method = "spearman") Spearman s rank correlation rho data: x and y S = 48, p-value = alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.6 ρ has asymptotically normal distribution (CLT) p-value Conclusion: H 0 : ρ = 0 not rejected 12

13 Comparing Pearson p.m.c.c with Spearman s r.c.c. set.seed(110) x <- rnorm(15); y <- rnorm(15) #rho = 0 x[16] <- 10; y[16] <- 10 > c(cor.test(x,y,method = "pearson")$estimate, + cor.test(x,y,method = "spearman")$estimate) cor rho Spearman rank correlation coefficient ρ more robust against outliers than Pearson correlation coefficient in case a suspicion: check for differences by computation or plotting assumption for normality does not hold 13

14 Basic visualizations of univariate data sets Histogram: estimates the density by presenting (relative) frequencies in consecutive intervals (bins) as height of bars (hist) Density plot: smooth graph representing estimated proportions per bin (plot(density(x))) Quantile-Quantile plot: represents as points the quantiles of the first distribution (x-coordinate) against the same quantile of the second (theoretical) distribution; all points on y = x line imply perfect match (qqplot) Empirical cumulative distribution function: step function Fn jumps i/n at observation values, where i is the number of tied observations at that value (plot(ecdf) Box-and-wiskers-plot: box between Q1 and Q3, with line segment for median Q 2, whiskers for minimum and maximum 14

15 R code for basic visualizations par(mfrow = c(2, 2)) x <- rnorm(100) hist(x,freq=false) qqnorm(x); qqline(x) plot(ecdf(x)) boxplot(x) par(mfrow = c(1, 1))

16

17 Illustrations of Box-and-whisker-plot five-number summary of data: minimum, first quartile Q 1, median Q 2, third quartile Q 3, maximum. box between Q1 and Q3, whisker from minimum to Q 1 ; Q 3 to maximum Example: pulse measures 62,64,68,70,70,74,.74,76,76,78,78,80. min=62, Q 1 = 69, median = 74, Q 3 = 77, max=80

18 Outlier Description of outlier: Data point far away from bulk of data How far? outlier < Q IQR outlier > Q IQR Example 12 pulse measures: 62,64,68,70,70,74,.74,76,76,78,78,80. Q 1 = 69, Q 3 = 77, IQR=77-69=8 outlier < = 57 outlier > = 89 Conclusion: there are no outliers 18

19 Example of outlier Example: Radish Growth in mm after 3 days: 3,5,5,7,7,8,9,.10,10,10,10,14,20,21 median = = 9.5 Q 1 = 7, Q 3 = 10, IQR=10-7=3 outlier < Q IQR = = 2.5 outlier > Q IQR = = 14.5 outliers 20, 21 plotted as small circles Remark: There are statistical tests for outliers (see library outliers) 19

20 Series of box plots Use factor of m groups to produce m box plots > data(plantgrowth) > boxplot(plantgrowth$weight PlantGrowth$group)

21 Point and interval estimation: Notation Sample X 1,, X n (r.v.) with realizations x 1,, x n Parameter population estimator Sample estimate type of variable fixed random random fixed Mean µ µ X x Variance σ 2 σ 2 S 2 s 2 Standard Deviation σ σ S s Proportion π π p p Intensity λ λ l l X = 1 n n X i, S 2 = 1 n 1 p := l := n x = 1 n n x i, S = (X i X) 2, s 2 = 1 n 1 S 2, s = s 2 n (x i x) 2 number of succes in the sample sample size number of counts in the sample sample size = n S n = n C n

22 Maximum likelihood estimation (optional) n observations x 1,, x n from X 1,, X n iid rv likelihood of the data given model parameters θ n L(θ x) = P(X i = x i θ) log likelihood equals L(θ x) = log L(θ x) = n log P(X = x i θ) θ is Maximum Likelihood Estimator (MLE) of θ if it maximizes the log likelihood θ is statistic; function of X 1,, X n ; random variable If sample size n is large enough, then L has maximum If L differentiable, try to solve θ i L(θ x) = 0, i = 1, m or maximize L numerically (mainly Newton type algorithms) 22

23 Maximum likelihood estimation (optional) I(θ) = E n( θ θ) N ( 0, [ θ ] 2 log f (X) = ) 1, where I(θ) [ f ] (x) 2 f (x)dx f (x) denotes the information number (matrix); the amount of information about θ contained in X Example: MLE of Poisson intensity parameter λ = X λ P(X = x λ) = f (x) = λx x! e λ λx log f (x) = log λ x! e λ = λ x log λ log x! λ = x λ 1 = x λ λ [ ] X λ 2 E(X λ)2 I(λ) = E = λ λ 2 = 1 λ n( λ λ) N (0, λ) 23

24 Parameter estimation for normal distribution (optional) L(θ x) = = n { 1 σ 2π exp 1 2 { 1 ( σ 2π ) n exp ( ) } xi µ 2 σ 1 n ( ) } xi µ 2 2 σ n ( xi µ ) 2 L(θ x) = n 2 log(2πσ2 ) 1 2 µ L(θ x) = 1 σ 2 = n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 n (x i µ) = 0 σ 2 L(θ x) = n 2σ σ 4 σ n (x i µ) 2 = 0 n (x i µ) 2

25 µ L(θ x) = 1 σ 2 n x i µ = 1 n n µ = n (x i µ) = 0 n x i nµ = 0 n x i µ = 1 n σ 2 L(θ x) = n 2σ σ 4 n 2σ 2 = 1 2σ 4 n X i = X n (x i µ) 2 = 0 n (x i µ) 2 σ 2 = 1 n σ 2 = 1 n n n (x i µ) 2 (x i µ) 2 = n 1 n S2 25

26 Desirable properties of estimators minimal Mean squared error (MSE) E [ θ θ ] 2 = V [ θ] + (E [ θ] θ ) 2 no bias; E [ θ] = θ ( no systematic error ) minimal variance V [ θ] = E( θ θ) 2 θ 1 more precise (efficient) than θ 2 if V [ θ1 ] < V [ θ2 ] θ efficient if V [ θ] is smallest possible θ consistent if V [ θ] 0, as n (LLN holds!) MLE may by slightly biased, but it is consistent and efficient 26

27 Confidence interval on the mean σ known Z is normally distributed with mean 0 and variance 1 Probability that Z takes values between z α/2 and z 1 α/2 is P { z α/2 Z z 1 α/2 } = 1 α The basis of confidence intervals! Remember Φ(z 0, 1) = P(Z z) and z α/2 = Φ 1 (α/2) = qnorm(α/2,0,1) If α =.05, then z = Φ 1 (0.025) = qnorm(0.025) = 1.96 z = Φ 1 (0.975) = qnorm(0.975) = 1.96 P { } z α/2 Z z 1 α/2 = P { 1.96 Z 1.96} =

28 Confidence Interval for µ X 1,, X n iid rv from normal population mean µ, variance σ 2 E[ µ] = E[X] = µ and V [ µ ] = V [ X ] = σ 2 /n, so that Z = X µ σ/ n is normally distributed with mean 0 and variance 1 { } P z α/2 X µ σ/ n z 1 α/2 = 1 α or, equivalently, after a some algebra, } σ σ P {X + z α/2 n µ X + z 1 α/2 n = 1 α interval X ± z 1 α/2 σ/ n contains µ in 95% of taking samples of size n 28

29 Algebra of rewriting the interval P P P P { { z α/2 { z α/2 z α/2 X µ σ/ n z 1 α/2 σ n X µ z 1 α/2 σ n X µ z 1 α/2 { z 1 α/2 σ n + X µ z α/2 P } = σ n } = } σ n X = } σ + X n {X + z α/2 σ n µ X + z 1 α/2 σ n } = using that z α/2 = z 1 α/2 29

30 CI Notation in the literature P {X + z α/2 σ n µ X + z 1 α/2 σ n } = 1 α the 1 α confidence interval I equal to [X + z α/2 σ n, X + z 1 α/2 σ n ] = [ µ + Φ 1 (α/2 0, 1) σ, µ + Φ 1 (1 α/2 0, 1) σ ] = n n ( [Φ 1 α/2 µ, ) σ n, Φ (1 1 α/2 µ, ) σ ] n This confidence interval is denoted by I 1 α (X µ) The unknown population mean µ is estimated by µ on the basis of data x 1,, x n 30

31 Three confidence intervals true interval I 1 α (X µ, σ 2 ) = = [µ + z α/2 σ n, µ + z 1 α/2 σ n ] ( [Φ 1 α/2 µ, ) σ n, Φ (1 1 α/2 µ, estimated interval, σ known I 1 α (X µ, σ 2 σ σ ] ) = [ µ + z α/2 n, µ + z 1 α/2 n ( ) = [Φ 1 σ α/2 µ,, Φ (1 1 α/2 µ, n estimated interval, σ unknown (most relevant!) I 1 α (X µ, σ 2 σ σ ] ) = [ µ + z α/2 n, µ + z 1 α/2 n ( ) = [Φ 1 σ α/2 µ,, Φ (1 1 α/2 µ, n ) σ ] n ) σ ] n ) σ ] n Illustration by simulation example from book: 31

32 alpha <- 0.05; mu <- 10; sigma <- 2; n <- 35 set.seed(222); x <- rnorm(n, mu, sigma) mu.hat <-mean(x) ; s <- sd(x) I.mu <- c(low = qnorm(alpha/2, mu, sigma/sqrt(n)), high = qnorm(1- alpha/2, mu, sigma/sqrt(n))) I.mu.hat <- c(low = qnorm(alpha/2, mu.hat, sigma/sqrt(n)), high = qnorm(1- alpha/2, mu.hat, sigma/sqrt(n))) I.mu.sigma.hat <- c(low = qnorm(alpha/2, mu.hat, s/sqrt(n)), high = qnorm(1- alpha/2, mu.hat, s/sqrt(n))) round(rbind( true interval = I.mu, estimated interval, sigma known = I.mu.hat, estimated interval, sigma unknown = I.mu.sigma.hat),2) low high true interval estimated interval, sigma known estimated interval, sigma unknown

33 Computing CI by MLE for Geometric parameter Sample 10 trials x until first success occurs Estimate π and its SE by MLE Construct CI library(mass) pi <- 0.1; alpha < x <- rgeom(n,pi); n <- 10 fit <- fitdistr(x, "geometric") pihat <- fit$estimate se <- fit$sd > pihat + c(-1,1) * qnorm(1-alpha/2)* se [1]

34 Example MLE: CI for mean daily energy intake Daily energy intake (Altman, 1991, p.183) of group of woman; recommended intake 7725 kj library(mass) x <- c(5260,5470,5640,6180,6390,6515,6805,7515,7515, 8230,8770) fit <- fitdistr(x, "normal") muhat <- as.numeric(fit$estimate[1]) semuhat <- fit$sd[1] lower <- as.numeric(muhat + qnorm(alpha/2) * semuhat) upper <- as.numeric(muhat + qnorm(1-alpha/2)*semuhat) > round(c(muhat=muhat,lower=lower,upper=upper),1) muhat lower upper Conclusion: We are 95% certain that the population mean is in (6110.1, )

35 Example MLE: esimation using mle Daily energy intake (Altman, 1991, p.183) of group of woman; recommended intake 7725 kj library(stats4) X <- c(5260,5470,5640,6180,6390,6515,6805,7515,7515, 8230,8770) log.l <- function(mu = 7000, sigma = 1000){ # minus log-likelihood normal density n <- length(x) return(n * log(2 * pi * sigmaˆ2)/2 + sum((x - mu)ˆ2 / (2 * sigmaˆ2))) } fit <- mle(log.l) 35

36 Example MLE: output estimates and CI Recommended energy intake is 7725 kj > summary(fit) Maximum likelihood estimation Call: mle(minuslogl = log.l) Coefficients: Estimate Std. Error mu sigma log L: > confint(fit) Profiling % 97.5 % mu sigma Conclusion: We are 95% certain the population mean energy intake is in ( , ) 36

37 Remarks on Confidence Interval Remarks on CI true interval centered around µ is fixed estimated intervals σ (un)known centered around µ have random limits converging to true Effects on CI α decreases confidence level 1 α increases CI length increases n increases standard error s/ n decreases CI length decreases Teaching demonstration of CI Interactive graphical visualization of confidence intervals: library(teachingdemos) run.ci.examp(reps = 100, method="z", n=35) 37

38 Proportions sex ratio, success ratio, ratio of surviving patients N population size, N S number of successes in population n sample size, n S number of successes in sample π = N S N, π = p = n S n Number of successes in population has binomial density proportion p approximated by normal density if where np 5 and n(1 p) 5 E[p] = π, V [p] = π(1 π) n = σ2 n by the central limit theorem ( ) π(1 π) density of p normal density φ p π, n 38

39 39 CI for Proportions Z = π π σ/ n tends to normal with mean 0 and variance 1 { P z α/2 π π } σ/ n z 1 α/2 = 1 α after a some algebra } σ σ P { π + z α/2 n π π + z 1 α/2 n = 1 α Interval π ± z 1 α/2 σ/ n contains π in 95% of taking samples of size n; I 1 α (X π, σ 2 ) = [ π + z α/2 σ n, π + z 1 α/2 σ n ] = [ Φ (α/2 1 π, ) π(1 π) n, Φ (1 1 α/2 µ, ) π(1 π) ] n

40 Computation of CI for Proportions = I 1 α (X π, σ 2 ) [ ) Φ (α/2 1 π(1 π) ) π,, Φ (1 1 π(1 π) ] α/2 µ, n n estimated by c(qnorm(alpha/2, pi.hat, sqrt(pi.hat*(1-pi.hat)/n)), qnorm(1-alpha/2, pi.hat, sqrt(pi.hat*(1-pi.hat)/n))) Example: 39 patients out of 215 have asthma (Altman, 1991) Confidence interval for proportion n <- 215; n.s <- 39; pi.hat <- n.s/n; alpha < round(c( low= qnorm(alpha/2, pi.hat, sqrt(pi.hat*(1-pi.hat)/n)) high = qnorm(1-alpha/2, pi.hat, sqrt(pi.hat*(1-pi.hat) 3) low high

41 Comparison of CI for Proportions > library(hmisc) > round(binconf(n.s,n, method= all ),2) PointEst Lower Upper Exact Wilson Asymptotic Recommendation: Use Wilson (c.f. L.D. Brown, T.T. Cai and A. DasGupta (2001). Interval estimation for a binomial proportion (with discussion). Statistical Science, 16: , 2001.) 41

42 Bootstrap take 1000 random samples from the sample compute θ i from each re-sample compute mean of θ 1,, θ 1000 compute quantiles of θ 1,, θ 1000 compute histogram or density from θ 1,, θ

43 Example: Daily energy intake Daily energy intake (Altman, 1991, p.183) of group of woman; recommended intake 7725 kj x <- c(5260,5470,5640,6180,6390,6515,6805,7515,7515, 8230,8770) n <- length(x); n <- length(x) nboot <- 1000; bs <- double(nboot) for (i in 1:nboot){ resample <- x[sample(1:n,replace=true)] bs[i] <- mean(resample) } #boot statistic mu.0 <- 7725; x.bar <- mean(x); x.bar.boot <- mean(bs) > round(c(mu.0=mu.0, x.bar=x.bar,x.bar.boot=x.bar.boot) mu.0 x.bar x.bar.boot Sample mean and bootstrap mean are much smaller than recommended 43

44 hist(bs,freq=false,xlim=c(5500,8000),col= lightblue, main= Histogram and density curve,sub= bootstrap mea lines(density(bs));abline(v=7725) mtext("7725",side=1,at=7725,cex=1) 44

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

MIT Spring 2015

MIT Spring 2015 MIT 18.443 Dr. Kempthorne Spring 2015 MIT 18.443 1 Outline 1 MIT 18.443 2 Batches of data: single or multiple x 1, x 2,..., x n y 1, y 2,..., y m w 1, w 2,..., w l etc. Graphical displays Summary statistics:

More information

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

First steps of multivariate data analysis

First steps of multivariate data analysis First steps of multivariate data analysis November 28, 2016 Let s Have Some Coffee We reproduce the coffee example from Carmona, page 60 ff. This vignette is the first excursion away from univariate data.

More information

Outline. Unit 3: Inferential Statistics for Continuous Data. Outline. Inferential statistics for continuous data. Inferential statistics Preliminaries

Outline. Unit 3: Inferential Statistics for Continuous Data. Outline. Inferential statistics for continuous data. Inferential statistics Preliminaries Unit 3: Inferential Statistics for Continuous Data Statistics for Linguists with R A SIGIL Course Designed by Marco Baroni 1 and Stefan Evert 1 Center for Mind/Brain Sciences (CIMeC) University of Trento,

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Statistics. Statistics

Statistics. Statistics The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that? Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)

More information

Introductory Statistics with R: Simple Inferences for continuous data

Introductory Statistics with R: Simple Inferences for continuous data Introductory Statistics with R: Simple Inferences for continuous data Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail: sungkyu@pitt.edu

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Topic 15: Simple Hypotheses

Topic 15: Simple Hypotheses Topic 15: November 10, 2009 In the simplest set-up for a statistical hypothesis, we consider two values θ 0, θ 1 in the parameter space. We write the test as H 0 : θ = θ 0 versus H 1 : θ = θ 1. H 0 is

More information

MSc / PhD Course Advanced Biostatistics. dr. P. Nazarov

MSc / PhD Course Advanced Biostatistics. dr. P. Nazarov MSc / PhD Course Advanced Biostatistics dr. P. Nazarov petr.nazarov@crp-sante.lu 2-12-2012 1. Descriptive Statistics edu.sablab.net/abs2013 1 Outline Lecture 0. Introduction to R - continuation Data import

More information

(Re)introduction to Statistics Dan Lizotte

(Re)introduction to Statistics Dan Lizotte (Re)introduction to Statistics Dan Lizotte 2017-01-17 Statistics The systematic collection and arrangement of numerical facts or data of any kind; (also) the branch of science or mathematics concerned

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Instrumentation (cont.) Statistics vs. Parameters. Descriptive Statistics. Types of Numerical Data

Instrumentation (cont.) Statistics vs. Parameters. Descriptive Statistics. Types of Numerical Data Norm-Referenced vs. Criterion- Referenced Instruments Instrumentation (cont.) October 1, 2007 Note: Measurement Plan Due Next Week All derived scores give meaning to individual scores by comparing them

More information

CSE 312 Final Review: Section AA

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

F79SM STATISTICAL METHODS

F79SM STATISTICAL METHODS F79SM STATISTICAL METHODS SUMMARY NOTES 9 Hypothesis testing 9.1 Introduction As before we have a random sample x of size n of a population r.v. X with pdf/pf f(x;θ). The distribution we assign to X is

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

The Union and Intersection for Different Configurations of Two Events Mutually Exclusive vs Independency of Events

The Union and Intersection for Different Configurations of Two Events Mutually Exclusive vs Independency of Events Section 1: Introductory Probability Basic Probability Facts Probabilities of Simple Events Overview of Set Language Venn Diagrams Probabilities of Compound Events Choices of Events The Addition Rule Combinations

More information

f (1 0.5)/n Z =

f (1 0.5)/n Z = Math 466/566 - Homework 4. We want to test a hypothesis involving a population proportion. The unknown population proportion is p. The null hypothesis is p = / and the alternative hypothesis is p > /.

More information

Chapter 2. Discrete Distributions

Chapter 2. Discrete Distributions Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Continuous Distributions

Continuous Distributions Chapter 3 Continuous Distributions 3.1 Continuous-Type Data In Chapter 2, we discuss random variables whose space S contains a countable number of outcomes (i.e. of discrete type). In Chapter 3, we study

More information

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,

More information

Statistical Inference

Statistical Inference Statistical Inference Bernhard Klingenberg Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Outline Estimation: Review of concepts

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS 1a. Under the null hypothesis X has the binomial (100,.5) distribution with E(X) = 50 and SE(X) = 5. So P ( X 50 > 10) is (approximately) two tails

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Additional Problems Additional Problem 1 Like the http://www.stat.umn.edu/geyer/5102/examp/rlike.html#lmax example of maximum likelihood done by computer except instead of the gamma shape model, we will

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

MATH4427 Notebook 2 Fall Semester 2017/2018

MATH4427 Notebook 2 Fall Semester 2017/2018 MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45 Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS 21 June 2010 9:45 11:45 Answer any FOUR of the questions. University-approved

More information

Unit 2: Numerical Descriptive Measures

Unit 2: Numerical Descriptive Measures Unit 2: Numerical Descriptive Measures Summation Notation Measures of Central Tendency Measures of Dispersion Chebyshev's Rule Empirical Rule Measures of Relative Standing Box Plots z scores Jan 28 10:48

More information

After completing this chapter, you should be able to:

After completing this chapter, you should be able to: Chapter 2 Descriptive Statistics Chapter Goals After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Find the range, variance, standard

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Empirical Likelihood

Empirical Likelihood Empirical Likelihood Patrick Breheny September 20 Patrick Breheny STA 621: Nonparametric Statistics 1/15 Introduction Empirical likelihood We will discuss one final approach to constructing confidence

More information

Chapter 3. Data Description

Chapter 3. Data Description Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Bootstrapping Spring 2014

Bootstrapping Spring 2014 Bootstrapping 18.05 Spring 2014 Agenda Bootstrap terminology Bootstrap principle Empirical bootstrap Parametric bootstrap January 1, 2017 2 / 16 Empirical distribution of data Data: x 1, x 2,..., x n (independent)

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Topic 2 We next look at quantitative data. Recall that in this case, these data can be subject to the operations of arithmetic. In particular, we can add or subtract observation values, we can sort them

More information

Hypothesis testing: theory and methods

Hypothesis testing: theory and methods Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable

More information

CSE 312: Foundations of Computing II Quiz Section #10: Review Questions for Final Exam (solutions)

CSE 312: Foundations of Computing II Quiz Section #10: Review Questions for Final Exam (solutions) CSE 312: Foundations of Computing II Quiz Section #10: Review Questions for Final Exam (solutions) 1. (Confidence Intervals, CLT) Let X 1,..., X n be iid with unknown mean θ and known variance σ 2. Assume

More information

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2 STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Common Discrete Distributions

Common Discrete Distributions Common Discrete Distributions Statistics 104 Autumn 2004 Taken from Statistics 110 Lecture Notes Copyright c 2004 by Mark E. Irwin Common Discrete Distributions There are a wide range of popular discrete

More information

MATH4427 Notebook 4 Fall Semester 2017/2018

MATH4427 Notebook 4 Fall Semester 2017/2018 MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their

More information

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY SECOND YEAR B.Sc. SEMESTER - III SYLLABUS FOR S. Y. B. Sc. STATISTICS Academic Year 07-8 S.Y. B.Sc. (Statistics)

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Introduction to Error Analysis

Introduction to Error Analysis Introduction to Error Analysis Part 1: the Basics Andrei Gritsan based on lectures by Petar Maksimović February 1, 2010 Overview Definitions Reporting results and rounding Accuracy vs precision systematic

More information

MAT Mathematics in Today's World

MAT Mathematics in Today's World MAT 1000 Mathematics in Today's World Last Time 1. Three keys to summarize a collection of data: shape, center, spread. 2. Can measure spread with the fivenumber summary. 3. The five-number summary can

More information

Statistical Computing with R MATH , Set 6 (Monte Carlo Methods in Statistical Inference)

Statistical Computing with R MATH , Set 6 (Monte Carlo Methods in Statistical Inference) Statistical Computing with R MATH 6382 1, Set 6 (Monte Carlo Methods in Statistical Inference) Tamer Oraby UTRGV tamer.oraby@utrgv.edu 1 Based on textbook. Last updated November 14, 2016 Tamer Oraby (University

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures

More information

Robust statistics. Michael Love 7/10/2016

Robust statistics. Michael Love 7/10/2016 Robust statistics Michael Love 7/10/2016 Robust topics Median MAD Spearman Wilcoxon rank test Weighted least squares Cook's distance M-estimators Robust topics Median => middle MAD => spread Spearman =>

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22 Announcements Announcements Lecture 1 - Data and Data Summaries Statistics 102 Colin Rundel January 13, 2013 Homework 1 - Out 1/15, due 1/22 Lab 1 - Tomorrow RStudio accounts created this evening Try logging

More information

Stat 427/527: Advanced Data Analysis I

Stat 427/527: Advanced Data Analysis I Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 120 minutes.

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 120 minutes. Closed book and notes. 10 minutes. Two summary tables from the concise notes are attached: Discrete distributions and continuous distributions. Eight Pages. Score _ Final Exam, Fall 1999 Cover Sheet, Page

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland Overview Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland Confidence Intervals When a random variable lies in an interval a X b with a specified

More information

BNG 495 Capstone Design. Descriptive Statistics

BNG 495 Capstone Design. Descriptive Statistics BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus

More information

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests Chapters 3.5.1 3.5.2, 3.3.2 Prof. Tesler Math 283 Fall 2018 Prof. Tesler z and t tests for mean Math

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium November 12, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

Basic Concepts of Inference

Basic Concepts of Inference Basic Concepts of Inference Corresponds to Chapter 6 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT) with some slides by Jacqueline Telford (Johns Hopkins University) and Roy Welsch (MIT).

More information

Chapter 2 Descriptive Statistics

Chapter 2 Descriptive Statistics Chapter 2 Descriptive Statistics Lecture 1: Measures of Central Tendency and Dispersion Donald E. Mercante, PhD Biostatistics May 2010 Biostatistics (LSUHSC) Chapter 2 05/10 1 / 34 Lecture 1: Descriptive

More information

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs STATISTICS 4 Summary Notes. Geometric and Exponential Distributions GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs P(X = x) = ( p) x p x =,, 3,...

More information

20 Hypothesis Testing, Part I

20 Hypothesis Testing, Part I 20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she

More information

Exploratory data analysis

Exploratory data analysis Exploratory data analysis November 29, 2017 Dr. Khajonpong Akkarajitsakul Department of Computer Engineering, Faculty of Engineering King Mongkut s University of Technology Thonburi Module III Overview

More information

Package jmuoutlier. February 17, 2017

Package jmuoutlier. February 17, 2017 Type Package Package jmuoutlier February 17, 2017 Title Permutation Tests for Nonparametric Statistics Version 1.3 Date 2017-02-17 Author Steven T. Garren [aut, cre] Maintainer Steven T. Garren

More information

Nonparametric hypothesis tests and permutation tests

Nonparametric hypothesis tests and permutation tests Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating Functions 3.8.3. Wilcoxon Signed Rank Test 3.8.2. Mann-Whitney Test Prof. Tesler Math 283 Fall 2018 Prof. Tesler Wilcoxon

More information

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution Lecture 12: Small Sample Intervals Based on a Normal Population MSU-STT-351-Sum-17B (P. Vellaisamy: MSU-STT-351-Sum-17B) Probability & Statistics for Engineers 1 / 24 In this lecture, we will discuss (i)

More information