Class 04 - Statistical Inference

Class 4 - Statistical Inference Question 1: 1. What parameters control the shape of the normal distribution? Make some histograms of different normal distributions, in each, alter the parameter values in a systematic way to understand how these control the shape of the distribution. Interpret you results in words using the terms precision and central tendency. **Google: How do I make a normal distribution in R the first item that comes up is the R help. The second is by a site from R-bloggers (a great site for learning statistics), this has a table with runable code and explains all of the rnorm, dnorm, ect... All the information you need in this question is provided: 1. You will need to make histograms (see Data Visualization in R section in the R Course Condensed documentation 2. I ask you to evaluate precision and central tendency, big clue here - this is the var/sd and the mean # Here is some code to help you. You will copy the code and paste it in - # I have written it as a function... mean.eval.fun <- function(mean. = seq(1,5)) { n.val <- 1 sim.mat <- matrix(na, nrow = n.val, ncol = length(mean.)) par(mfrow = c(ceiling(length(mean.)/2),2)) for (j in 1:length(mean.)) { sim.mat[,j] <- rnorm(n = n.val, mean = mean.[j], sd = 1) for (j in 1:length(mean.)) { hist(sim.mat[,j], xlim = range(sim.mat), main = paste("mean value of distribution = ", mean.[j])) abline(v = mean.[j], col = "red", lwd = 2) # Copy this code into your console. You will notice that in your # history window mean.eval.fun will show up as a function... # Now you can change the argument "mean." mean.eval.fun(mean. = c(1,3,5)) mean.eval.fun(mean. = seq(1,14)) 1

Mean value of distribution = 1 Mean value of distribution = 3 2 2 4 6 8 2 2 4 6 8 Mean value of distribution = 5 2 2 4 6 8 # I would recommend not plotting too many at one time... 2

Mean value of distribution = 1 Mean value of distribution = 11 2 2 Mean value of distribution = 12 Mean value of distribution = 13 2 Mean value of distribution = 14 2 So, now we can see what happens to the distribution when we change the mean, the mean is the measure of the central tendency Here is a function to evaluate how changing the sd impacts the distribution # Here is some code to help you. You will copy the code and paste it in - # I have written it as a function... sd.eval.fun <- function(sd. = seq(1,5)) { n.val <- 1 sim.mat <- matrix(na, nrow = n.val, ncol = length(sd.)) par(mfrow = c(ceiling(length(sd.)/2),2)) for (j in 1:length(sd.)) { sim.mat[,j] <- rnorm(n = n.val, mean =, sd = sd.[j]) for (j in 1:length(sd.)) { hist(sim.mat[,j], xlim = range(sim.mat), main = paste("st. Dev. value of distribution = ", sd.[j])) 3

# Copy this code into your console. You will notice that in your # history window sd.eval.fun will show up as a function... # Now you can change the argument "sd." sd.eval.fun(sd. = c(1,3,5)) sd.eval.fun(sd. = seq(1,14)) St. Dev. value of distribution = 1 St. Dev. value of distribution = 3 15 2 1 1 2 2 1 1 2 St. Dev. value of distribution = 5 2 1 1 2 # I would recommend not plotting too many at one time... 4

St. Dev. value of distribution = 1 St. Dev. value of distribution = 11 St. Dev. value of distribution = 12 St. Dev. value of distribution = 13 St. Dev. value of distribution = 14 14 Question 2: Create a vector consisting of random draws from a normal distribution with (mean = 2, sd = 1) with at least 2 samples. a. Take 5 samples (without replacement) from this distribution and calculate some summary statistics.b. Now take an increasing large number of samples (without replacement), n = 8, 1,15,.2. For each iteration of random draws record the summary statistics. I gave you some starting values but you may want to play around. In the function below I find that the taking more samples ## seems to give a more satisfactory result. Okay, so basically this is an evaluation of how sample size influences summary statistics... ## Summary stats are descriptive statistics that describe the characteristics of distributions. # Here is another function to help us evaluate this: samp.eval.fun <- function(samples. = seq(1,5, by = 2), mean.val = 2, sd.val = 1) { norm.dist <- rnorm(n = 1, mean = mean.val, sd = sd.val) sum.mat <- matrix(na, nrow = length(samples.), ncol = 4) 5

for (j in 1:length(samples.)) { samp.vect <- sample(x = norm.dist, size = samples.[j], replace = FALSE) sum.mat[j,1] <- mean(samp.vect) sum.mat[j,2] <- sd(samp.vect) sum.mat[j,3] <- min(samp.vect) sum.mat[j,4] <- max(samp.vect) plot(samples., sum.mat[,1], xlab = "Number of Samples", ylab = "Mean of Samples", type = "b") abline(h = mean.val, col = "red", lwd = 2) plot(samples., sum.mat[,2], xlab = "Number of Samples", ylab = "SD of Samples", type = "b") abline(h = sd.val, col = "red", lwd = 2) # Copy this code into your console. You will notice that in your # history window samp.eval.fun will show up as a function... # Now you can change the argument "samples." samp.eval.fun() Mean of Samples 1.6 1.8 2. 2.2 2.4 2.6 1 2 3 4 5 Number of Samples 6

SD of Samples.9 1. 1.1 1.2 1.3 1 2 3 4 5 Number of Samples Question 3: 3. Make a vector of 1 randomly drawn numbers from a normal distribution. a. Calculate the z-scores of each value, b. Plot the z- scores and the vector of numbers **So, z-scores are not mysterious.. between values and z-scores. this is just asking about the relationship # Here is another function to help us evaluate this: vals.zscores <- function() { rnorm.vect <- rnorm(1) plot(rnorm.vect, (rnorm.vect - mean(rnorm.vect))/sd(rnorm.vect), xlab = "Original Vector", ylab = "Z-score") abline(h =, col = "red", lwd = 2) abline(v = mean(rnorm.vect), col = "red", lwd = 2) # Copy this code into your console. You will notice that in your # history window vals.zscores will show up as a function... vals.zscores() 7

Z score 1.5.5.5 1.5 1..5..5 1. 1.5 2. Original Vector 8