AMS 132: Discussion Section 2

Similar documents
Motion II. Goals and Introduction

LAB 2 - ONE DIMENSIONAL MOTION

Designing a Quilt with GIMP 2011

Lab 1 Uniform Motion - Graphing and Analyzing Motion

Chem 1 Kinetics. Objectives. Concepts

Kinetics of Crystal Violet Bleaching

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

Physics 212E Spring 2004 Classical and Modern Physics. Computer Exercise #2

PHY 111L Activity 2 Introduction to Kinematics

Students will explore Stellarium, an open-source planetarium and astronomical visualization software.

Using Microsoft Excel

Using web-based Java pplane applet to graph solutions of systems of differential equations

Two problems to be solved. Example Use of SITATION. Here is the main menu. First step. Now. To load the data.

module, with the exception that the vials are larger and you only use one initial population size.

Linear Motion with Constant Acceleration

Experiment 1: The Same or Not The Same?

Computational Chemistry Lab Module: Conformational Analysis of Alkanes

(THIS IS AN OPTIONAL BUT WORTHWHILE EXERCISE)

The Geodatabase Working with Spatial Analyst. Calculating Elevation and Slope Values for Forested Roads, Streams, and Stands.

Probability and Discrete Distributions

A GUI FOR EVOLVE ZAMS

SteelSmart System Cold Formed Steel Design Software Download & Installation Instructions

SuperCELL Data Programmer and ACTiSys IR Programmer User s Guide

Math 132. Population Growth: Raleigh and Wake County

Connect the Vernier spectrometer to your lap top computer and power the spectrometer if necessary. Start LoggerPro on your computer.

Physics E-1ax, Fall 2014 Experiment 3. Experiment 3: Force. 2. Find your center of mass by balancing yourself on two force plates.

Introduction to Computer Tools and Uncertainties

Math Lab 10: Differential Equations and Direction Fields Complete before class Wed. Feb. 28; Due noon Thu. Mar. 1 in class

To factor an expression means to write it as a product of factors instead of a sum of terms. The expression 3x

pka AND MOLAR MASS OF A WEAK ACID

let s examine pupation rates. With the conclusion of that data collection, we will go on to explore the rate at which new adults appear, a process

A SHORT INTRODUCTION TO ADAMS

Lab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations

Gravity: How fast do objects fall? Teacher Advanced Version (Grade Level: 8 12)

ES205 Analysis and Design of Engineering Systems: Lab 1: An Introductory Tutorial: Getting Started with SIMULINK

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Water tank. Fortunately there are a couple of objectors. Why is it straight? Shouldn t it be a curve?

Introduction to Special Relativity

EXPERIMENT 2 Reaction Time Objectives Theory

Partner s Name: EXPERIMENT MOTION PLOTS & FREE FALL ACCELERATION

Simulating Future Climate Change Using A Global Climate Model

Zetasizer Nano-ZS User Instructions

EOS 102: Dynamic Oceans Exercise 1: Navigating Planet Earth

D.T.M: TRANSFER TEXTBOOKS FROM ONE SCHOOL TO ANOTHER

Photoelectric Photometry of the Pleiades Student Manual

Model Building An Introduction to Atomistic Simulation

The data for this lab comes from McDonald Forest. We will be working with spatial data representing the forest boundary, streams, roads, and stands.

Assignment #0 Using Stellarium

Overlay Analysis II: Using Zonal and Extract Tools to Transfer Raster Values in ArcMap

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

Bayes Factor Single Arm Time-to-event User s Guide (Version 1.0.0)

EXPERIMENT 7: ANGULAR KINEMATICS AND TORQUE (V_3)

TEACHER NOTES MATH NSPIRED

Lab Exploration #4: Solar Radiation & Temperature Part II: A More Complex Computer Model

LABORATORY 1: KINEMATICS written by Melissa J. Wafer '95 June 1993

Sequential Logic (3.1 and is a long difficult section you really should read!)

Imaging with the 70AT and the Meade electronic eyepiece CCD imager By Ted Wilbur 2/4/02

Watershed Modeling Orange County Hydrology Using GIS Data

Electric Fields and Equipotentials

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Using SkyTools to log Texas 45 list objects

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

DISCRETE RANDOM VARIABLES EXCEL LAB #3

A Brief Introduction To. GRTensor. On MAPLE Platform. A write-up for the presentation delivered on the same topic as a part of the course PHYS 601

This will mark the bills as Paid in QuickBooks and will show us with one balance number in the faux RJO Exchange Account how much we owe RJO.

Week 2: Review of probability and statistics

Relative Photometry with data from the Peter van de Kamp Observatory D. Cohen and E. Jensen (v.1.0 October 19, 2014)

41. Sim Reactions Example

PHY 221 Lab 7 Work and Energy

Standards-Based Quantification in DTSA-II Part II

Simple Linear Regression

Infinity Unit 2: Chaos! Dynamical Systems

BOND LENGTH WITH HYPERCHEM LITE

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Confidence intervals

Administrivia. Lab notebooks from Lab 0 are due at 2 PM in Hebb 42 at the start of Lab 1.

Intensity of Light and Heat. The second reason that scientists prefer the word intensity is Well, see for yourself.

CE 365K Exercise 1: GIS Basemap for Design Project Spring 2014 Hydraulic Engineering Design

PHY 221 Lab 9 Work and Energy

Conformational Analysis of n-butane

Best Pair II User Guide (V1.2)

Introduction to Coastal GIS

General Physics I Lab. M1 The Atwood Machine

Experiment 2: The Beer-Lambert Law for Thiocyanatoiron (III)

v Prerequisite Tutorials GSSHA WMS Basics Watershed Delineation using DEMs and 2D Grid Generation Time minutes

Introduction to Algebra: The First Week

Lesson Plan 2 - Middle and High School Land Use and Land Cover Introduction. Understanding Land Use and Land Cover using Google Earth

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 9: June 6, Abstract. Regression and R.

ISIS/Draw "Quick Start"

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

PHY221 Lab 2 - Experiencing Acceleration: Motion with constant acceleration; Logger Pro fits to displacement-time graphs

F denotes cumulative density. denotes probability density function; (.)

Matlab and RREF. next page. close. exit. Math 45 Linear Algebra. David Arnold.

More on Input Distributions

Simulation. Where real stuff starts

APBS electrostatics in VMD - Software. APBS! >!Examples! >!Visualization! >! Contents

University of Utah Electrical & Computer Engineering Department ECE 3510 Lab 9 Inverted Pendulum

Learning to Use Scigress Wagner, Eugene P. (revised May 15, 2018)

LAB 3: WORK AND ENERGY

Experiment 2: The Beer-Lambert Law for Thiocyanatoiron (III)

Transcription:

Prof. David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz AMS 132: Discussion Section 2 All computer operations in this course will be described for the Windows operating system OS); if you use a different OS, you ll need to figure out with the help of the TA) how to do the same things in your OS environment. All mouse clicks in this course are left-clicks unless otherwise indicated.) This discussion section involves you working in real time with R, so if you haven t already done so: On your laptop, point your favorite browser at https://cran.r-project.org/. R is available for 3 OSs): Linux, Mac OS X, and Windows. If you re running some other OS on your laptop, you ll either need to run R inside an emulator of one of those 3 OSs or work with another AMS 132 student in your discussion section whose laptop is R-compatible. Assuming that you can proceed on your machine, click on Download R for, where is your OS. As noted above, I ll describe the next steps for Windows; for Linux or Mac OS X, get your Discussion Section TA or somebody else in your discussion section who s computer-savvy to help you. When you get to the R for Windows page, click on Install R for the first time. As of this writing, this will take you to the R-3.3.2 for Windows 32/64 bit) page, at which you can click on Download R 3.3.2 for Windows. In the lower left-corner of your browser a new icon will appear that says R-3.2.2-win.exe; wait for the download to complete into this icon, and then left-click once on it. You ll be asked if you want to let the R program install itself on your machine; say Yes. You ll now be led through a long series of small pop-up windows; you should reply OK and/or Next whenever it s waiting for you to choose something. Eventually you ll get to a window called Installing; R will now install, and the installation process will be completed when you get to the window that says Completing the R for Windows 3.3.2 Setup Wizard, at which point you click on Finish and R should now be successfully installed on your machine. On your desktop you should now see two new icons called Ri386 3.3.2 and Rx64 3.3.2; you should use the first of these icons if your laptop only supports 32-bit computing and the second if you re 64-bit compatible. If you don t know which applies to your laptop, double-left-click on the Rx64 icon; if you re not 64-bit compatible, an error message will occur, but the other icon should work for you. Once you ve started an R session, you ll typically want to resize the window to take up less than 100% of your screen. In the R Console window, type demo graphics ) and hit Enter repeatedly, stopping before each new Enter to look at the demo results this will show you some of the graphical descriptive capabilities in R. When the demo is finished, you ll get a nearly blank line that begins with >; this is the command prompt in R, and you can now begin doing some actual computations. Every R session ends with the q ) command, at which point R will ask you whether you want to save the R workspace you ve created; if you have code that will completely and quickly recreate the entire session as in this discussion section), you typically don t need to 1

save the workspace. In later discussion sections we ll talk about how to save and restore R workspaces.) Many people feel that the graphical user interface GUI) supplied by R is less than terrific. After having installed R, if you want you can go to https://www.rstudio.com/products/rstudio/download/ and download an open-source freeware package called RStudio that has a better GUI for R; your TA can show you how to do this, and how to use RStudio. In what follows, for simplicity, I m going to tell you how to run R in the same way that I do in class: with an R session filling half of the screen and a text file using Notepad) filling the other half, and with you copying and pasting back and forth from one window to the other. At least one of the TAs will tell you that this is somewhat clumsy and old-school when compared to RStudio; he s right.) 1. In the Kaiser Intensive Care Unit ICU) case study from class, the statistical model was Y i θ) IID Bernoulliθ) for i = 1,..., n), 1) with n = 112 and Y = Y 1,..., Y n ), and we found it useful to keep track of the sample sum S = n i=1 Y i and the sample mean Ȳ = 1 n n i=1 Y i; recall that in the actual data set y = y 1,..., y n ) S took on the value s = 4, leading to an observed sample mean of ȳ = 1 n n i=1 y i = 4. = 0.0357. 112 Since the mean EY i θ) of a Bernoulliθ) random variable is θ, we agreed in class that an intuitively reasonable estimate of θ is the sample mean: ˆθ = Ȳ. Neyman s confidence interval CI) logic then gave us the large-sample approximate 1001 α)% CI ˆθ ± Φ 1 1 α ) ˆθ1 ˆθ), 2) 2 n in which Φ is the cumulative distribution function CDF) of the standard normal distribution; for example, for a 95% nominal confidence level, α = 0.05 and Φ 1 0.975). = 1.96, as you can verify with the R command qnorm 0.975 ). I wrote some R code that suggested that interval 2) was not well-calibrated in this problem, by which I mean that the actual percentage of time across repeated-sampling data sets simulated from model 1)) that the interval 2) included the true θ at nominal confidence level 95% was less than 95%. We then worked out a possible small-sample improvement to interval 2), by noticing that the repeated-sampling distribution of the sample mean ˆθ with a value of n as small as 112 and a true θ near 0 was not very close to the normal curve suggested by the Central Limit Theorem CLT: because the Theorem only says what will happen for large n). One way to understand why the CLT doesn t provide a very good approximation in this case study is to note that θ lives in 0, 1) but the normal curve lives on the entire real line R. This suggested trying to find a transformed version of θ that lives on the whole real line, which led us to the logit transform ˆη log ˆθ 1 ˆθ ), in which θ s+ 1 2 n+1 nudges θ away from 0 and 1 to keep from dividing by 0 or trying to take the log of 0). Using the Delta Method we showed that ˆη was approximately normally distributed with approximate mean and variance Eˆη). = η and V ˆη). = 2 1 n ˆθ 1 ˆθ ). 3)

Using Neyman s logic on η gives the approximate 1001 α)% CI ˆη ± Φ 1 1 α ) Φ 1 ) 1 α ŜEˆη) = ˆη ± 2 2 n ˆθ 1 ˆθ, 4) ) and by back-transforming the endpoints of this interval, using the inverse logit transformation ) p η = log if and only if p = eη 1 p 1 + e = 1, 5) η 1 + e η we finally got the hopefully improved) small-sample approximate 1001 α)% CI ) 1 1 + e, 1, 6) η A) 1 + e η+a) in which A = Φ 1 1 α 2 ) n ˆθ 1 ˆθ ). I ve written some new R code that implements the two approximate CIs 2) and 6). Go to the course web page and click on the file called https://ams132-winter17-01.courses.soe.ucsc.edu/ Kaiser case study calibration R code Your screen will fill with a text file. Right click inside your browser and you ll get a menu; choose Save as.... A new window will appear called Save As; on my laptop the default place to save is the Desktop on yours it might be Documents, or somewhere else). Pay attention to where the system is about to save this file and click Save. Find this text file and double-left-click on it to open it; it will look like the following: checking the calibration of the large-sample and small-sample intervals in the Kaiser ICU case study with a large number of simulation replications, to get definitive calibration answers rm list = ls ) ) this removes everything in the R workspace so that you can start with a clean slate N <- 8561 n <- 112 population size sample size this is a probability calculation, so in visualizing what s going on it s helpful to create the population data set population <- c rep 1, 306 ), rep 0, 8255 ) ) 3

print population.theta <- mean population ) ) 0.03574349 n.null <- 0 n.negative.large.sample <- 0 n.negative.small.sample <- 0 n.hit.large.sample <- 0 n.hit.small.sample <- 0 M.calibration.large.sample <- 1000000 seed <- 1 set.seed seed ) i ran the code twice, with seed = 1 and seed = 2; results are given below alpha <- 0.05 confidence level 100 1 - alpha ) % system.time for m in 1:M.calibration.large.sample ) { y.star <- sample population, n, replace = T ) another way to do this would be to sample directly from the Bernoulli distribution with the rbinom function R doesn t have a built-in rbernoulli function, but you can use the fact that a Bernoulli random variable is just a binomial random variable in which only 1 success/failure trial occurs): in place of the y.star <- sample population, n, replace = T ) command above you could just say y.star <- rbinom n, 1, 306 / 8561 ) with this approach you don t have to create a population data set at all) 4

s.star <- sum y.star ) if s.star == 0 ) { n.null <- n.null + 1 theta.hat <- s.star / n se.theta.hat <- sqrt theta.hat * 1 - theta.hat ) / n ) large.sample.confidence.interval <- c theta.hat - qnorm 1 - alpha / 2 ) * se.theta.hat, theta.hat + qnorm 1 - alpha / 2 ) * se.theta.hat ) if large.sample.confidence.interval[ 1 ] < 0 ) { n.negative.large.sample <- n.negative.large.sample + 1 if large.sample.confidence.interval[ 1 ] < population.theta ) & large.sample.confidence.interval[ 2 ] > population.theta ) ) { n.hit.large.sample <- n.hit.large.sample + 1 theta.hat.prime <- s.star + 0.5 ) / n + 1 ) eta.hat <- log theta.hat.prime / 1 - theta.hat.prime ) ) se.eta.hat <- sqrt 1 / n * theta.hat.prime * 1 - theta.hat.prime ) ) ) eta.interval <- c eta.hat - qnorm 1 - alpha / 2 ) * se.eta.hat, eta.hat + qnorm 1 - alpha / 2 ) * se.eta.hat ) small.sample.confidence.interval <- 1 / 1 + exp - eta.interval ) ) if small.sample.confidence.interval[ 1 ] < 0 ) { n.negative.small.sample <- n.negative.small.sample + 1 5

if small.sample.confidence.interval[ 1 ] < population.theta ) & small.sample.confidence.interval[ 2 ] > population.theta ) ) { n.hit.small.sample <- n.hit.small.sample + 1 ) user system elapsed 29.36 0.00 29.38 on my laptop 1,000,000 simulation iterations takes about 30 seconds of clock time print paste seed =, seed, ; proportion of all-0 data sets =, signif n.null / M.calibration.large.sample, "seed = 1 ; proportion of all-0 data sets = 0.01685" "seed = 2 ; proportion of all-0 data sets = 0.01698" dbinom 0, n, population.theta ) 0.01696559 1 - population.theta )^n 0.01696559 print paste seed =, seed, ; proportion of large-sample negative intervals =, signif n.negative.large.sample / M.calibration.large.sample, "seed = 1 ; proportion of large-sample negative intervals = 0.412" "seed = 2 ; proportion of large-sample negative intervals = 0.4118" print paste seed =, seed, ; large-sample interval hit rate =, signif n.hit.large.sample / M.calibration.large.sample, "seed = 1 ; large-sample interval hit rate = 0.9058" "seed = 2 ; large-sample interval hit rate = 0.9056" 6

print paste seed =, seed, ; proportion of small-sample negative intervals =, signif n.negative.small.sample / M.calibration.large.sample, "seed = 1 ; proportion of small-sample negative intervals = 0" "seed = 2 ; proportion of small-sample negative intervals = 0" print paste seed =, seed, ; small-sample interval hit rate =, signif n.hit.small.sample / M.calibration.large.sample, "seed = 1 ; small-sample interval hit rate = 0.9521" "seed = 2 ; small-sample interval hit rate = 0.9518" a) Study this R code, and work with your TA to resolve any questions you may have about how it works. Your goal is to completely understand how every command and function does what it does, because later this quarter you ll be writing your own R code and you need to know how things work. b) Start an R session. Copy the entire block of R code in your text file and paste it into your R window after pasting, you may need to hit Enter to get all of it into R). The code will begin running, and R will show you that it s computing with a little wheel of death; on my laptop with M = 1,000,000 simulation replications the code takes about 30 clock seconds to run. You should get exactly the same results I did with seed = 1 that s the point of setting a random number seed: the output of the simulation is random, but to promote replicability) when the code is run twice with the same random number seed it will yield exactly the same output). I ran the code twice, with seed = 1 and seed = 2, to get some idea of the Monte Carlo accuracy with M = 1,000,000. Pick a few more values of seed of your own choosing, edit them into the text window at the appropriate place one by one), and rerun the code to see what kind of Monte Carlo noise you get. Try varying the α value to get different nominal confidence levels, to go along with the α = 0.05 value in the code; I would suggest α = 0.2, α = 0.1 and α = 0.01. Complete a table like the following: CI Coverage n = 112, θ 0.0357) Large-Sample Small-Sample Nominal Actual Actual 80% 90% 95% 90.6% 95.2% 99% What conclusions do you draw about the calibration of the large-sample intervals with n = 112 and a true population) θ value of about 0.0357? How about the small-sample intervals? If you have time you can also try different values of n e.g., even smaller than 112) and population.theta e.g., even closer to 0).) 7