Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*

Similar documents
Using R in Undergraduate Probability and Mathematical Statistics Courses. Amy G. Froelich Department of Statistics Iowa State University

Subject CS1 Actuarial Statistics 1 Core Principles

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Testing Statistical Hypotheses

Stat 5101 Lecture Notes

Testing Statistical Hypotheses

STATISTICS-STAT (STAT)

MATHEMATICAL COMPUTING IN STATISTICS FRED TINSLEY (WITH STEVEN JANKE) JANUARY 18, :40 PM

3 Joint Distributions 71

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Master of Science in Statistics A Proposal

Lecture 2: Repetition of probability theory and statistics

Foundations of Probability and Statistics

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Introduction to Statistical Analysis

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Math Review Sheet, Fall 2008

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

A Course in Statistical Theory

Applied Probability and Stochastic Processes

Math 562 Homework 1 August 29, 2006 Dr. Ron Sahoo

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Institute of Actuaries of India

Primer on statistics:

Institute of Actuaries of India

Fundamentals of Probability Theory and Mathematical Statistics

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

f (1 0.5)/n Z =

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

Learning Objectives for Stat 225

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

Master s Written Examination - Solution

Lecture 1: Probability Fundamentals

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

HANDBOOK OF APPLICABLE MATHEMATICS

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

review session gov 2000 gov 2000 () review session 1 / 38

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Spring 2012 Math 541B Exam 1

Class 26: review for final exam 18.05, Spring 2014

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

STAT 4385 Topic 01: Introduction & Review

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Bootstrap inference for the finite population total under complex sampling designs

LECTURE 1. Introduction to Econometrics

Point and Interval Estimation II Bios 662

Review of Statistics 101

Review of Statistics

Model Assisted Survey Sampling

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

A Very Brief Summary of Statistical Inference, and Examples

Mathematical statistics

Testing and Model Selection

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Section 4.6 Simple Linear Regression

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Semester I BASIC STATISTICS AND PROBABILITY STS1C01

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

Problem Selected Scores

STATISTICS SYLLABUS UNIT I

Algorithms for Uncertainty Quantification

SPRING 2007 EXAM C SOLUTIONS

Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Probability and Information Theory. Sargur N. Srihari

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Practice Problems Section Problems

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Topic 15: Simple Hypotheses

Multivariate Survival Analysis

STAT 461/561- Assignments, Year 2015

Regression Models - Introduction

Discrete probability distributions

Contents 1. Contents

Probability and Stochastic Processes

Part 2: One-parameter models

Exploratory Data Analysis. CERENA Instituto Superior Técnico

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

This does not cover everything on the final. Look at the posted practice problems for other topics.

1 Introduction Overview of the Book How to Use this Book Introduction to R 10

Stat 427/527: Advanced Data Analysis I

Note: Solve these papers by yourself This VU Group is not responsible for any solved content. Paper 1. Question No: 3 ( Marks: 1 ) - Please choose one

Observed Brain Dynamics

A Primer on Statistical Inference using Maximum Likelihood

Statistical Distribution Assumptions of General Linear Models

Basic Concepts of Inference

Probability & Statistics - FALL 2008 FINAL EXAM

Confidence Distribution

Transcription:

Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses* Amy G. Froelich Michael D. Larsen Iowa State University *The work presented in this talk was partially supported by a grant from the ISU College of Liberal Arts and Sciences Computer Advisory Committee Stat 341 at ISU Statistics, Math, and Math Ed Majors Prereq: : Calculus I, II and III Course Description: Probability; distribution functions and their properties; classical discrete and continuous distribution functions; moment generating functions, multivariate probability distributions and their properties; transformations of random variables. 1

Stat 342 at ISU Statistics majors (some math majors) Prereq: : Stat 341, Linear Algebra Course Description: Sampling distributions; confidence intervals and hypothesis testing; theory of estimation and hypothesis tests; linear model theory, enumerative data. Stat 542 at ISU Graduate students in Statistics (Master s s Level) Prereq: : Stat 341, Real Analysis or Advanced Calculus Course description: Sample spaces, probability, conditional probability, random variables, univariate distributions, expectation, moment generating functions; common theoretical distributions; joint distributions, conditional distributions and independence, covariance; probability laws and transformations, introduction to the multivariate normal distribution, sampling distributions, order statistics, convergence concepts, the central limit theorem and delta method, basics of stochastic simulation. 2

Stat 543 at ISU Graduate students in Statistics (Master s s Level) Prereq: : Stat 542 Course description: Point estimation including method of moments, maximum likelihood estimation, exponential family, Bayes estimators, loss function and Bayesian optimality, unbiasedness,, sufficiency, completeness, interval estimation including confidence intervals, prediction intervals, Bayesian interval estimation, hypothesis testing including Neyman-Pearson Lemma, uniformly most powerful tests, likelihood ratio tests, Bayesian tests, nonparametric methods, bootstrap. Abstract Statistical computing software packages, such as R, have been used primarily in applied statistics courses at both the undergraduate and graduate level. We have found that incorporating R into the calculus-based probability and mathematical statistics courses at both levels can facilitate the instruction of many concepts and principles typically covered in these courses and allow for expansion to other topics as well. In this talk, we will present a few of the ways we have used R in these courses to allow students to connect, explore, visualize, and expand different concepts in probability and mathematical statistics. 3

Connect *Observed and Theoretical Probabilities and Distributions Simulation versus theory Theory versus data Simulation versus data Observed Moments to Theoretical Moments Univariate Normal Distribution to Multivariate Normal Distribution Univariate to linear regression Multivariate to linear regression Models versus simulation versus data Observed and Theoretical Probabilities and Distributions Opening Activity: Roll a die 30 times. Keep track of the number of 1s, 2s, 3s, etc. Use R to plot results from one group. Use R to plot results from entire class. Compare observed distributions of number on die to theoretical distribution. 4

Observed and Theoretical Probabilities and Distributions Birthday Problem Guess probability for class of 40 students. Simulate: birthdays<- c(1:365) sample(birthdays,, 40, replace = T) Make histogram of birthdays using breaks to count number of students with each birthday. 5

In this sample, there are 3 birthdays that appear twice in the 40 students. Simulate: In 8897 of 10000 samples there is at least one shared birthday. Observed and Theoretical Probabilities and Distributions ipod shuffle feature Is it random? Autofill feature in itunes fills smallest ipod with approx. 120 songs. The first few times..., I found some disturbing clusters in the songs chosen. More than once the random playlist included three tracks from the same album! Since there are more than 3000 tunes in my library, this seemed to defy the odds.'' Steven Levy, Newsweek Magazine, January 31, 2005. 6

Is Levy correct? Is probability of 3 or more songs from ANY one album small? Probability of 3 or more songs from ONE SPECIFIC album is small approx 1%. Assume 3000 songs 250 albums 12 songs per album. library<- rep(1:250, 12) Simulate selection of 120 songs from library. In this sample, there were 3 songs selected from 3 albums. Simulate: In 9453 out of 10000 samples at least 3 songs were selected from at least one album. 7

Explore Law of Large Numbers *Transformations of Random Variables. Central Limit Theorem Sampling Distributions Confidence Intervals Hypothesis testing - Type I and Type II error rates and test procedures Transformations of Random Variables U is uniform(0,1). Y = -33 ln(1-u). What is the distribution of Y? Simulate y: u<- runif(10000,0,1); y<- -3*log(1-u) mean(y) [1] 2.98441 var(y) [1] 9.38336 8

Right-skewed distribution Mean approx. 3 Variance approx. 9 = 3^2 Exponential?? Transformations of Random Variables Y1, Y2, Y3 and Y4 are independent Poisson r.v. with means 3, 2, 5 and 4 respectively. What is the distribution of Y1 + Y2 + Y3 + Y4? Simulate: sums<- rpois(10000, 3) + rpois(10000, 2) + rpois(10000, 5) + rpois(10000, 4) What distribution does the sum have? Is it Poisson? mean(sums) [1] 14.0018 var(sums) [1] 13.97699 9

Is the observed distribution of the sum similar to the distribution of a Poisson r.v. with mean 14? Transformations of Random Variables Y1, Y2, Y3 and Y4 are independent exponential r.v.s with mean 3. What is the distribution of Y1 + Y2 + Y3 + Y4? Simulate: sums2<- rexp(10000, 1/3) + rexp(10000, 1/3) + rexp(10000, 1/3) + rexp(10000, 1/3) mean(sums2) [1] 12.03085 var(sums2) [1] 36.44002 10

Right-skewed distribution Mean approx. 12 Variance approx. 36 Exponential?? Other Related Distribution?? Visualize Distribution Functions of Random Variables *Order Statistics Likelihood Functions Asymptotic Normality of Maximum Likelihood Estimators 11

Ordered Statistics X = Min(Y1, Y2,..., Y5) where Yi are independent exponential r.v.s with mean 10. Distribution of X? Simulate: for (i in 1:10000) x[i]< ]<- min(rexp(5, 1/10)) mean(x) [1] 2.011339 var(x) [1] 4.106261 Right-skewed distribution Mean approx. 2 Variance approx. 4 Exponential?? 12

Ordered Statistics X = max(y1, Y2,..., Y10) where Yi are independent Uniform (0, 1) r.v.s Distribution of X? Simulate: for (i in 1:10000) x[i]< ]<- max(runif(10, 0, 1)) mean(x) [1] 0.9098818 var(x) [1] 0.006769303 Left-Skewed Distribution Mean approx. 0.91 Variance approx. 0.007 Distribution??? 13

Ordered Statistics X = median(y1, Y2,..., Yn) ) where Yi are independent Normal r.v.s with mean 0 and variance 1. Vary n: n = 5, n = 25, n = 75, n = 125 Distribution of X? Depends on n? Simulate: for (i in 1:10000) xn[i]< ]<- median(rnorm(n,, 0, 1)) 14

Observed Means and Variances of the Simulated Medians n = 5 n = 25 n = 75 n = 125 Mean -0.0075 0.0010-0.0005-0.0005 Var 0.2834 (0.20) 0.0612 (0.04) 0.0205 (0.013) 0.0124 (0.008) Expand Additional Probability Distributions Non-Central Distributions and Power for Hypothesis Testing Randomization Tests *Role of Assumptions in Statistical Testing 15

Role of Assumptions in Statistical Inference Y1, Y2,..., Yn are independent Bernoulli r.v.s with probability of success p. Approx. 95% CI for p pˆ ± 196. p( ˆ 1 n p) ˆ Why do we say this 95% CI is approx.? Based on Normal Distribution. When does this approximation work? Assumptions np 10 n(1-p) 10 What happens when this assumption is not true? Ex. n = 10; p = 0.5 or p = 0.1 16

p = 0.5; n = 10 8927 out of 10000 CIs contain true p. p = 0.1; n = 10 6463 out of 10000 CIs contain true p. 17

Role of Assumptions in Statistical Inference Is the coverage rate for 95% CI for p close to 95% if assumption holds? How does the coverage rate for traditional 95% CI for p compare to the Plus 4 method 95% CI for p? Two cases p = 0.1, n = 100 p = 0.5, n = 1000 p = 0.1; n = 100 Traditional 95% CI 9316 out of 10000 CIs contain true p. Plus 4 Method 95% CI 9546 out of 10000 CIs contain true p. p = 0.5; n = 1000 Traditional 95% CI 9424 out of 10000 CIs contain true p. Plus 4 Method 95% CI 9484 out of 10000 CIs contain true p. 18