Class 04 - Statistical Inference

Similar documents
GENERALIZED ERROR DISTRIBUTION

Prediction problems 3: Validation and Model Checking

Two-sample t-tests. Patrick Breheny. October 20, 2016

Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer

Package bpp. December 13, 2016

Probability and Samples. Sampling. Point Estimates

Metric Predicted Variable on Two Groups

Metric Predicted Variable on One Group

Introduction to Statistics and R

Lecture 5 : The Poisson Distribution

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

Case Study: Modelling Industrial Dryer Temperature Arun K. Tangirala 11/19/2016

Hierarchical Modeling

Metric Predicted Variable With One Nominal Predictor Variable

1 The Normal approximation

Review of the Normal Distribution

Describing distributions with numbers

Holiday Assignment PS 531

Understanding p Values

Package leiv. R topics documented: February 20, Version Type Package

Package esaddle. R topics documented: January 9, 2017

Probability Distributions & Sampling Distributions

Explore the data. Anja Bråthen Kristoffersen

(Re)introduction to Statistics Dan Lizotte

f Simulation Example: Simulating p-values of Two Sample Variance Test. Name: Example June 26, 2011 Math Treibergs

Multiple Regression: Nominal Predictors. Tim Frasier

STAT 675 Statistical Computing

The Normal Distribution. Chapter 6

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

Chapter 18. Sampling Distribution Models /51

Using R in 200D Luke Sonnet

Examine characteristics of a sample and make inferences about the population

The OmicCircos usages by examples

3. Shrink the vector you just created by removing the first element. One could also use the [] operators with a negative index to remove an element.

Explore the data. Anja Bråthen Kristoffersen Biomedical Research Group

Examples of fitting various piecewise-continuous functions to data, using basis functions in doing the regressions.

Multiple Regression: Mixed Predictor Types. Tim Frasier

STATISTICS - CLUTCH CH.7: THE STANDARD NORMAL DISTRIBUTION (Z-SCORES)

13. Sampling distributions

STAT Lecture Slides Variability in Estimates & Central Limit Theorem. Yibi Huang Department of Statistics University of Chicago

4.2 The Normal Distribution. that is, a graph of the measurement looks like the familiar symmetrical, bell-shaped

COMP6053 lecture: Sampling and the central limit theorem. Markus Brede,

Package CEC. R topics documented: August 29, Title Cross-Entropy Clustering Version Date

OPIM 303, Managerial Statistics H Guy Williams, 2006

Consider fitting a model using ordinary least squares (OLS) regression:

STT 315 Problem Set #3

Homework 6 Solutions

Eyetracking Analysis in R

R: A Quick Reference

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression

Hypothesis Testing. Gordon Erlebacher. Thursday, February 14, 13

Describing distributions with numbers

Chapter 4 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (December 11, 2006)

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015

Robust Inference in Generalized Linear Models

Hypothesis Tests and Confidence Intervals Involving Fitness Landscapes fit by Aster Models By Charles J. Geyer and Ruth G. Shaw Technical Report No.

Nonstationary time series models

Measurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R

2015 SISG Bayesian Statistics for Genetics R Notes: Generalized Linear Modeling

Canadian climate: function-on-function regression

The Central Limit Theorem

Collisions and Momentum in 1D Teacher s Guide

Sampling. What is the purpose of sampling: Sampling Terms. Sampling and Sampling Distributions

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University

appstats8.notebook October 11, 2016

Hypothesis Tests and Confidence Intervals Involving Fitness Landscapes fit by Aster Models By Charles J. Geyer and Ruth G. Shaw Technical Report No.

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

Exam 2 (KEY) July 20, 2009

Logistic Regression. 0.1 Frogs Dataset

Lecture 30. DATA 8 Summer Regression Inference

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

4/19/2009. Probability Distributions. Inference. Example 1. Example 2. Parameter versus statistic. Normal Probability Distribution N

Ch 5 : Probability To Statistics

MATH 412: Homework # 5 Tim Ahn July 22, 2016

You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question.

Package plw. R topics documented: May 7, Type Package

Chapter 5 Exercises 1

Package bayeslm. R topics documented: June 18, Type Package

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

Modern Regression HW #6 Solutions

1 Probability Distributions

CHAPTER 5 Probabilistic Features of the Distributions of Certain Sample Statistics

Business Statistics. Lecture 10: Course Review

6.1 Normal Distribution

probability George Nicholson and Chris Holmes 31st October 2008

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Simulation. 1 As discussed ad nauseam in Chapter 2, in your linear models class, you learned about the sampling

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

Simulating MLM. Paul E. Johnson 1 2. Descriptive 1 / Department of Political Science

Paul: Do you know enough about Mixing Models to present it to the class?

Correlation. January 11, 2018

Package FDRSeg. September 20, 2017

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Contents 1 Admin 2 General extensions 3 FWL theorem 4 Omitted variable bias 5 The R family Admin 1.1 What you will need Packages Data 1.

Package SpatPCA. R topics documented: February 20, Type Package

Renormalizing Illumina SNP Cell Line Data

First steps of multivariate data analysis

Statistical Simulation An Introduction

SUPPLEMENTARY MATERIAL TECHNICAL APPENDIX Article: Comparison of control charts for monitoring clinical performance using binary data

Transcription:

Class 4 - Statistical Inference Question 1: 1. What parameters control the shape of the normal distribution? Make some histograms of different normal distributions, in each, alter the parameter values in a systematic way to understand how these control the shape of the distribution. Interpret you results in words using the terms precision and central tendency. **Google: How do I make a normal distribution in R the first item that comes up is the R help. The second is by a site from R-bloggers (a great site for learning statistics), this has a table with runable code and explains all of the rnorm, dnorm, ect... All the information you need in this question is provided: 1. You will need to make histograms (see Data Visualization in R section in the R Course Condensed documentation 2. I ask you to evaluate precision and central tendency, big clue here - this is the var/sd and the mean # Here is some code to help you. You will copy the code and paste it in - # I have written it as a function... mean.eval.fun <- function(mean. = seq(1,5)) { n.val <- 1 sim.mat <- matrix(na, nrow = n.val, ncol = length(mean.)) par(mfrow = c(ceiling(length(mean.)/2),2)) for (j in 1:length(mean.)) { sim.mat[,j] <- rnorm(n = n.val, mean = mean.[j], sd = 1) for (j in 1:length(mean.)) { hist(sim.mat[,j], xlim = range(sim.mat), main = paste("mean value of distribution = ", mean.[j])) abline(v = mean.[j], col = "red", lwd = 2) # Copy this code into your console. You will notice that in your # history window mean.eval.fun will show up as a function... # Now you can change the argument "mean." mean.eval.fun(mean. = c(1,3,5)) mean.eval.fun(mean. = seq(1,14)) 1

Mean value of distribution = 1 Mean value of distribution = 3 2 2 4 6 8 2 2 4 6 8 Mean value of distribution = 5 2 2 4 6 8 # I would recommend not plotting too many at one time... 2

Mean value of distribution = 1 Mean value of distribution = 11 2 2 Mean value of distribution = 12 Mean value of distribution = 13 2 Mean value of distribution = 14 2 So, now we can see what happens to the distribution when we change the mean, the mean is the measure of the central tendency Here is a function to evaluate how changing the sd impacts the distribution # Here is some code to help you. You will copy the code and paste it in - # I have written it as a function... sd.eval.fun <- function(sd. = seq(1,5)) { n.val <- 1 sim.mat <- matrix(na, nrow = n.val, ncol = length(sd.)) par(mfrow = c(ceiling(length(sd.)/2),2)) for (j in 1:length(sd.)) { sim.mat[,j] <- rnorm(n = n.val, mean =, sd = sd.[j]) for (j in 1:length(sd.)) { hist(sim.mat[,j], xlim = range(sim.mat), main = paste("st. Dev. value of distribution = ", sd.[j])) 3

# Copy this code into your console. You will notice that in your # history window sd.eval.fun will show up as a function... # Now you can change the argument "sd." sd.eval.fun(sd. = c(1,3,5)) sd.eval.fun(sd. = seq(1,14)) St. Dev. value of distribution = 1 St. Dev. value of distribution = 3 15 2 1 1 2 2 1 1 2 St. Dev. value of distribution = 5 2 1 1 2 # I would recommend not plotting too many at one time... 4

St. Dev. value of distribution = 1 St. Dev. value of distribution = 11 St. Dev. value of distribution = 12 St. Dev. value of distribution = 13 St. Dev. value of distribution = 14 14 Question 2: Create a vector consisting of random draws from a normal distribution with (mean = 2, sd = 1) with at least 2 samples. a. Take 5 samples (without replacement) from this distribution and calculate some summary statistics.b. Now take an increasing large number of samples (without replacement), n = 8, 1,15,.2. For each iteration of random draws record the summary statistics. I gave you some starting values but you may want to play around. In the function below I find that the taking more samples ## seems to give a more satisfactory result. Okay, so basically this is an evaluation of how sample size influences summary statistics... ## Summary stats are descriptive statistics that describe the characteristics of distributions. # Here is another function to help us evaluate this: samp.eval.fun <- function(samples. = seq(1,5, by = 2), mean.val = 2, sd.val = 1) { norm.dist <- rnorm(n = 1, mean = mean.val, sd = sd.val) sum.mat <- matrix(na, nrow = length(samples.), ncol = 4) 5

for (j in 1:length(samples.)) { samp.vect <- sample(x = norm.dist, size = samples.[j], replace = FALSE) sum.mat[j,1] <- mean(samp.vect) sum.mat[j,2] <- sd(samp.vect) sum.mat[j,3] <- min(samp.vect) sum.mat[j,4] <- max(samp.vect) plot(samples., sum.mat[,1], xlab = "Number of Samples", ylab = "Mean of Samples", type = "b") abline(h = mean.val, col = "red", lwd = 2) plot(samples., sum.mat[,2], xlab = "Number of Samples", ylab = "SD of Samples", type = "b") abline(h = sd.val, col = "red", lwd = 2) # Copy this code into your console. You will notice that in your # history window samp.eval.fun will show up as a function... # Now you can change the argument "samples." samp.eval.fun() Mean of Samples 1.6 1.8 2. 2.2 2.4 2.6 1 2 3 4 5 Number of Samples 6

SD of Samples.9 1. 1.1 1.2 1.3 1 2 3 4 5 Number of Samples Question 3: 3. Make a vector of 1 randomly drawn numbers from a normal distribution. a. Calculate the z-scores of each value, b. Plot the z- scores and the vector of numbers **So, z-scores are not mysterious.. between values and z-scores. this is just asking about the relationship # Here is another function to help us evaluate this: vals.zscores <- function() { rnorm.vect <- rnorm(1) plot(rnorm.vect, (rnorm.vect - mean(rnorm.vect))/sd(rnorm.vect), xlab = "Original Vector", ylab = "Z-score") abline(h =, col = "red", lwd = 2) abline(v = mean(rnorm.vect), col = "red", lwd = 2) # Copy this code into your console. You will notice that in your # history window vals.zscores will show up as a function... vals.zscores() 7

Z score 1.5.5.5 1.5 1..5..5 1. 1.5 2. Original Vector 8