Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer

Similar documents
R Functions for Probability Distributions

1 Probability Distributions

GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference

Explore the data. Anja Bråthen Kristoffersen Biomedical Research Group

Lecture 09: Sep 19, Randomness. Random Variables Sampling Probability Distributions Caching. James Balamuta STAT UIUC

Explore the data. Anja Bråthen Kristoffersen

R Based Probability Distributions

Statistical Computing Session 4: Random Simulation

GENERALIZED ERROR DISTRIBUTION

Introduction to R and Programming

Chapter 3: Methods for Generating Random Variables

STAT 675 Statistical Computing

Class 04 - Statistical Inference

Introductory Statistics with R: Simple Inferences for continuous data

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions

The Normal Distribution

TMA4265: Stochastic Processes

Holiday Assignment PS 531

Textbook: Survivial Analysis Techniques for Censored and Truncated Data 2nd edition, by Klein and Moeschberger

STT 315 Problem Set #3

Frontier Analysis with R

Inverse Transform Simulations

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Generalized Linear Models in R

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /13/ /12

Metric Predicted Variable on Two Groups

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002

Math/Stat 3850 Exam 1

Maximum Likelihood Exercises SOLUTIONS

EE/CpE 345. Modeling and Simulation. Fall Class 9

1 The Normal approximation

R STATISTICAL COMPUTING

FW 544: Computer Lab Probability basics in R

Package jmuoutlier. February 17, 2017

Lecture 5 : The Poisson Distribution

R in 02402: Introduction to Statistic

Quantitative Understanding in Biology Module I: Statistics Lecture II: Probability Density Functions and the Normal Distribution

Statistical distributions: Synopsis

Probability Distributions & Sampling Distributions

MATLAB BASICS. Instructor: Prof. Shahrouk Ahmadi. TA: Kartik Bulusu

Prediction problems 3: Validation and Model Checking

Metric Predicted Variable on One Group

Introduction to Statistics and R

Robustness and Distribution Assumptions

TMA4265: Stochastic Processes

Univariate Descriptive Statistics for One Sample

Transformations of Standard Uniform Distributions

Sampling Inspection. Young W. Lim Wed. Young W. Lim Sampling Inspection Wed 1 / 26

f Simulation Example: Simulating p-values of Two Sample Variance Test. Name: Example June 26, 2011 Math Treibergs

Metric Predicted Variable With One Nominal Predictor Variable

Package interspread. September 7, Index 11. InterSpread Plus: summary information

Measurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R

Some hints for the Radioactive Decay lab

10.1 Generate n = 100 standard lognormal data points via the R commands

Continuous random variables

Lab 2: Photon Counting with a Photomultiplier Tube

3. Shrink the vector you just created by removing the first element. One could also use the [] operators with a negative index to remove an element.

Package depth.plot. December 20, 2015

Outline PMF, CDF and PDF Mean, Variance and Percentiles Some Common Distributions. Week 5 Random Variables and Their Distributions

Statistical Simulation An Introduction

Physics with Matlab and Mathematica Exercise #1 28 Aug 2012

Hierarchical Modeling

Lecture 4: Random Variables and Distributions

Generating Random Numbers

ST417 Introduction to Bayesian Modelling. Conjugate Modelling (Poisson-Gamma)

Assignments. Statistics Workshop 1: Introduction to R. Tuesday May 26, Atoms, Vectors and Matrices

Computer Assignment 8 - Discriminant Analysis. 1 Linear Discriminant Analysis

Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications

Biostatistics Xinhai Li Probability distribution

Markov processes, lab 2

Maximum Likelihood Estimation

Geology Geomath Computer Lab Quadratics and Settling Velocities

Learning Packet. Lesson 5b Solving Quadratic Equations THIS BOX FOR INSTRUCTOR GRADING USE ONLY

MAT300/500 Programming Project Spring 2019

Tabulation means putting data into tables. A table is a matrix of data in rows and columns, with the rows and the columns having titles.

Math 98 - Introduction to MATLAB Programming. Spring Lecture 3

Senior astrophysics Lab 2: Evolution of a 1 M star

EEE161 Applied Electromagnetics Laboratory 1

Using R in 200D Luke Sonnet

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

AMS 132: Discussion Section 2

Chapter Learning Objectives. Probability Distributions and Probability Density Functions. Continuous Random Variables

Matter & Interactions I Fall 2016

Chapter 3. Chapter 3 sections

Even and odd functions

Munitions Example: Negative Binomial to Predict Number of Accidents

R Demonstration ANCOVA

Exponential Functions and Graphs - Grade 11 *

Package misctools. November 25, 2016

ECE 650. Some MATLAB Help (addendum to Lecture 1) D. Van Alphen (Thanks to Prof. Katz for the Histogram PDF Notes!)

The Poisson Distribution

Math Computer Lab 4 : Fourier Series

Likelihood and Bayesian Inference for Proportions

Math493 - Fall HW 4 Solutions

Statistical methods. Mean value and standard deviations Standard statistical distributions Linear systems Matrix algebra

Week 9 The Central Limit Theorem and Estimation Concepts

Minimum and maximum values *

Linear Equations in One Variable *

TMA 4265 Stochastic Processes Semester project, fall 2014 Student number and

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression

Transcription:

Lunds universitet Matematikcentrum Matematisk statistik Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer General information on labs During the rst half of the course MASA01 we will have two labs. is based on programming in R, and will be done in groups of 2-4. The laboratory work It is compulsory to submit a written report for the lab works in the week following the lab. The report should be written so that it can be read independently without the need to use the computer exercise. All graphs should be given descriptive names on the axes. Also all sourcecode should contain comments, and must be included in the report. The report will be due in 2 weeks' time. Exercises marked with asterisk (*) are exercises that can be either partially or fully solved without R. In case of doubts, please contact Debleena Thacker, debleena@maths.lth.se.

1 INTRODUCTION TO R Computer Exercise 1 The purpose of this lab is to introduce R as a tool to investigate dierent densities. 1 Introduction to R Following are some useful commands for dierent distributions: Distribution Random numbers Probability-/density function distribution function quantile Normal rnorm dnorm pnorm qnorm Uniform runif dunif punif qunif Binomial rbinom dbinom pbinom qbinom Poisson rpois dpois ppois qpois Exponential rexp dexp pexp qexp Gamma rgamma dgamma pgamma qgamma Quantile functions (fth column) will be used only in the second part of the course. A useful command in R is? or help. Type help(sum) or?sum An extended search is conducted through help.search: help.search("sum") Typical examples 1+2+3 A variable is dened by v <- 2 w <- -7.5 The following is an example of how one can use this: u <- (v-w)/5 exp(u) 2

1 INTRODUCTION TO R Matrices and vectors Vectors can be dened as functions c() a <- c(12,-4.3,9,5/2) a Sometimes the following are useful: b <- c(6:9) c <- rep(3,4) d <- seq(from = -2, to = 3, by = 0.3) Indexing is done by d[4] the length of a vector can be determined by length() function: length(d) Element-wise operations can be done as follows: a*b 3*a-1 f <- a+b The scalar product of two vectors can be computed as: g <- a%*%b or: sum(a*b) One can use the in-built functions for vectors: exp(a) sin(3*pi)-exp(c) Matrices can be dened in the following manner as follows: fill_number <- c(1:10) m1 <- matrix(fill_number, ncol = 2) m2 <- matrix(fill_number, ncol = 2, byrow = TRUE) m1 m2 What is the dierence between m1 and m2? Also try m3<- cbind(a,b) m4 <- rbind(a,b) The dimension of a matrix : dim(m3) nrow(m3) 3

1 INTRODUCTION TO R ncol(m3) Matrices can be indexed by: m1[1,] m1[1,2] m1[,1] Matrix product can be computed by %*% m3%*%m4 m3%*%t(m3) Logical operations are dened element-wise according to the following rules: a <= b a[a<=b] Often one wants to assign values 1 and 0 to TRUE and FALSE. This can be done by as.real as.real(a <= b) The following can be used to dene the indicator variables. Matrix inversion can be done by solve: diag_m <- diag(3, nrow = 4) inv <- solve(diag_m) diag_m%*%inv The type of a variable is given by str function: names <- c("hugo", "Lasse", "Line") str(names) str(d) str(m1) Graphics/Diagrams x <- seq(-10,10,0.1) n <- 15 gaussians <- c(rnorm(100, mean = 0, sd = 1)) y1 <- pnorm(x, mean = 0, sd = 1) y2 <- dnorm(x, mean = 0, sd = 1) plot(x,y1, main = "Standard normal distribution", xlab = "function arguments", ylab = "function values",col = "red", type = "l", xlim = c(-5,5)) lines(x,y2, col = "green", type = "l") legend(-5,0.9, c("density", "distr.function"), col = c("red", "green"), text.col = "blue", lty = c(1, 1)) 4

2 DISTRIBUTIONS points(gaussians, rep(0,length(gaussians)), pch = 4) mean(gaussians) sd(gaussians) var(gaussians) help(legend) One plot multiple gures in the same window in the following manner: par(mfrow = c(2,1)) plot(x, sin(x), type = "b") plot(x, sin(x 2), type = "l") One can remove the divisions in the window by: dev.off() or par(mfrow = c(1,1)) One can nish on the R sessions by typing: q() at which point you will be prompted as to whether or not you want to save your workspace. If you do not save it, it will be lost. To save your workspace (type y). 2 Distributions Problem* 2.1. What is the denition of the distribution of a random variable? Problem* 2.2. Dene density and probability functions of a random variable. Problem* 2.3. Let X be a continuous random variable with density function f X (x) = 1 2 x1 [0,2](x). Determine P ( 1 X 2 ). (1 A ( ) denotes the indicator function of A, where 1 A (x) = 1 for x A and 1 A (x) = 0 for x / A. ) Problem* 2.4. Let X be a discrete random variable with probability function Determine c. Calculate P ( 1 X < 4 ). p X (k) = c 0.4 k, k = 0, 1, 2,... Problem* 2.5. Give an example of an experiment such that its outcome can be described by the following Binomial random variable X Bin(n, p). 5

2 DISTRIBUTIONS Problem* 2.6. What is the dierence between hypergeometric and binomial distributions? 6

3 PRACTICALS WITH R 3 Practicals with R Problem 3.1. Let X be exponentially distributed with E(X) = 4. Plot (using R) the distribution function F X ( ) of X. Problem 3.2. Determine P (X 4) where X Poisson(3). Compute P (X 3) for same X using R. Problem 3.3. Compute (using R) P (X 5) då X where N (µ = 2, σ 2 = 3). note that the parameters in dnorm are expectation µ and standard deviation σ (and not the variance). Problem 3.4. Compute (using R) P (X 4) where X Bin(10, 0.6) Problem 3.5. Plot the probability functions for Poisson(10) and Bin(n, p n ) where (n, p n ) {(11, 10/11), (20, 1/2), (100, 0.1), (1000, 0.01)}. Plot all of them in the same window: For example start with: x <- seq(0,20,1) plot(x, dpois(x, 10), ylim = c(0,0.5), col = "red", main = "comparision of densities", type = "l") One can use the command lines to add a graphic in a plot window. Use dierent colors for dierent densities. Type colors() for a complete list of available colors). What do you observe? Can you say something about when can one describe a Binomial distribution with the help of Poisson distribution? Problem 3.6. Plot the densities of N (5, 2), unif(0, 10) and Exp(rate = 1/5) in the same window. Use dierent colors. Remember to label them. Histogram A histogram can describe how values in a data set is distributed over an interval. This interval is divided into an (optional) number of subintervals, and we can count the number of values in each subinterval. Using a bar chart, histogram can be easily visualized.. Use R function rexp to simulate a vector X of 1000 exponential random numbers with mean 3. To see how to use function rexp and its parameters, type help(rexp) and read the help text. X <- rexp(1000, rate = 1/3) We want to draw a histogramm of X with 30 classes. Let's type 7

3 PRACTICALS WITH R h <- hist(x, breaks = 30, ylab = "counts", main = "Histogram över X Exp(1/3)", col = "lightblue") The break points of the subintervals that has been used in the function hist can be obtained by bryt<-h$breaks and the length of bryt by length(bryt). hist may not use the number of breakpoints as many breakpoints as you enter. Instead, it chooses breakpoints in a "nice" way. It chooses m 30 (usually m 30) equally spaced breakpoints such that their distance is a round value. This is done by the function pretty, type help(pretty) if you want to know more. If you want to control over vertices, then the best way is to choose break points yourself. You can for example write bins <- seq(0, max(x), length.out = 30) h2 <- hist(x, breaks = bins, ylab = "counts", col = "lightblue", main = "Histogram of random numbers with Exp(3)") Type str(h) to get an overview of all values returned by hist. One can obtain the number of data points in each sub-interval by h$counts The histogram shows the number of data points in each subgroup. However, we are interested in comparing the histogram with the theoretical density function and hence, you need to normalize the histogram. Problem 3.7 (Plot normalised histogram). What is the charaterisitics of probability function and how can one scale the histograms in order to be able to relate them to densities. Observe that hist(...)$counts gives the values of the histogram and hist(...)$breaks gives the subinterval break-points. One can plot a bar-diagram using barplot. Hints: The command diff(hist(...)$breaks) gives the dierence between two consecutive break points. Problem 3.8. Generate a sample of size 1000 from N (20, σ 2 = 6). Draw a normalised histogram from this sample. Plot the density of N (20, 6) in the same graph, and compare it with the histogram. Problem 3.9. An alternative way to plot the histogram will be freq = FALSE i hist function: hist(x, 30, freq = FALSE, main = "normalised histogram") Let X be a Normal randon=m variable with mean µ = 1 and variance σ 2 = 2. Y = 2X + 3. What is the distribution of Y? 8 Assume

3 PRACTICALS WITH R Generate two samples of size 5000 each, one from the distribution of X, and another from the distribution of Y. Draw the respective normalised histograms using hist and freq = FALSE. Plot the densities of X and Y in the same gure. 9