R STATISTICAL COMPUTING

Similar documents
Hypothesis Testing. Gordon Erlebacher. Thursday, February 14, 13

Motor Trend Car Road Analysis

Introduction to R, Part I

Outline. Unit 3: Inferential Statistics for Continuous Data. Outline. Inferential statistics for continuous data. Inferential statistics Preliminaries

HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC

Inferences on Linear Combinations of Coefficients

Holiday Assignment PS 531

Two sample Hypothesis tests in R.

Simple linear regression: estimation, diagnostics, prediction

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Bivariate data analysis

WEB-DISTANCE ST 370 Quiz 1 FALL 2007 ver. B NAME ID # I will neither give nor receive help from other students during this quiz Sign

Statistical Programming with R

SMAM 314 Exam 3 Name. 1. Mark the following statements true or false. (6 points 2 each)

1 Probability Distributions

Generating OLS Results Manually via R

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer

CHAPTER 10. Regression and Correlation

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Statistical Computing Session 4: Random Simulation

WEB-DISTANCE ST 370 Quiz 1 Autumn 2007 ver. A NAME ID # I will neither give nor receive help from other students during this quiz Sign

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Regression_Model_Project Md Ahmed June 13th, 2017

Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer

Logistic Regression in R. by Kerry Machemer 12/04/2015

While you wait: Enter the following in your calculator. Find the mean and sample variation of each group. Bluman, Chapter 12 1

Can you tell the relationship between students SAT scores and their college grades?

Review of Multiple Regression

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Inference for Single Proportions and Means T.Scofield

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Lecture 6: Linear Regression

Agricultural and Applied Economics 637 Applied Econometrics II

Econometrics. 4) Statistical inference

Using R in 200D Luke Sonnet

STAT 328 (Statistical Packages)

CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE. It would be very unusual for all the research one might conduct to be restricted to

Answer Key: Problem Set 6

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Introduction to RStudio

R Functions for Probability Distributions

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Chapter 3 - Linear Regression

Contents 1 Admin 2 General extensions 3 FWL theorem 4 Omitted variable bias 5 The R family Admin 1.1 What you will need Packages Data 1.

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

The Chi-Square and F Distributions

Lecture 6: Linear Regression (continued)

Stats + Homework 2 Review. CS100 TAs

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Topic 15: Simple Hypotheses

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Standard Normal Calculations

Tests about a population mean

Confidence Intervals for Comparing Means

Linear Models II. Chapter Key ideas

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Math 628 In-class Exam 2 04/03/2013

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Week 14 Comparing k(> 2) Populations

MAT3378 (Winter 2016)

Simple Linear Regression: One Qualitative IV

You can use numeric categorical predictors. A categorical predictor is one that takes values from a fixed set of possibilities.

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Statistics 301: Probability and Statistics 1-sample Hypothesis Tests Module

Canonical Correlations

Package MicroMacroMultilevel

Chapter 16. Simple Linear Regression and Correlation

Regression and the 2-Sample t

Chapter 9. Hypothesis testing. 9.1 Introduction

3. Shrink the vector you just created by removing the first element. One could also use the [] operators with a negative index to remove an element.

Explore the data. Anja Bråthen Kristoffersen

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

probability George Nicholson and Chris Holmes 29th October 2008

Polynomial Regression

STAT Chapter 10: Analysis of Variance

Inference for Regression Inference about the Regression Model and Using the Regression Line

Two-Way ANOVA. Chapter 15

Package misctools. November 25, 2016

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

SPSS LAB FILE 1

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Mathematical Notation Math Introduction to Applied Statistics

Final Exam. Name: Solution:

Gov Univariate Inference II: Interval Estimation and Testing

Composite Hypotheses. Topic Partitioning the Parameter Space The Power Function

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

CBA4 is live in practice mode this week exam mode from Saturday!

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Introductory Statistics with R: Simple Inferences for continuous data

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

1 Introduction to Minitab

Transcription:

R STATISTICAL COMPUTING some R Examples Dennis Friday 2 nd and Saturday 3 rd May, 14.

Topics covered Vector and Matrix operation. File Operations. Evaluation of Probability Density Functions. Testing of Hypothesis, i.e. t, z, F, Chi-square. Linear Regression.

Vectors I A numeric vector is a list of numbers. The three common functions that are used to create vectors in various situations are:- c(), seq() and rep(). The function c() The letter c means concatenate, i.e. to join together. > c(0, 270, 250, 300, 210) # Creating a numeric vector > Saturday.sales < c(0, 300, 3) > Sunday.sales < c(1, 230, 0)# Saving a vector in a variable. > weekend.sales < c(saturday.sales, Sunday.sales)# Combining two vectors. > cities < c( Kerala, Delhi, Chennai ) # Creating a vector of characters.

Vectors II The functions cbind(), rbind() and seq() > rows.sales < cbind(saturday.sales, Sunday.sales) # Binds Rows > columns.sales < rbind(saturday.sales, Sunday.sales) # Binds Columns > colnames(column.sales) < cities # Adds titles into an array of vectors. The abbreviation seq means sequence, i.e. it is used for equidistant series of numbers. > seq(4, 9) # Creates a sequence of numbers from 4 to 9 at interval of 1. # creates a sequence of numbers from 3 to 10 at interval of 2. > seq(3, 10)

Vectors III The function rep() The abbreviation rep means replicate, i.e. it is used to generate repeated values.it is used in two variants, depending on whether the second argument is a vector or a single number, for example; to repeat a series of numbers; > x = c(4, 5) > rep(x, 3) # This will repeat x three times. Consider the following r-code; > rep(1 : 2, c(10, 15)) # This means repeat 1 10 times and 2 15 times.

Matrices I Example A matrix is a two-dimensional array of numbers. > x < 1 : 12 > dim(x) < c(3, 4); x Alternatively, the function matrix() can be used as follows; > matrix(1 : 12, nrow = 3, byrow = T ) The dim assignment function sets or changes the dimension attribute of x, causing R to treat the vector of 12 numbers as a 3 x 4 matrix. Useful functions that operate on matrices include rownames(), colnames(), and the transposition function t().

Matrices II Example > x < matrix(1 : 12, nrow = 3, byrow = T ) > rownames(x) < LETTERS[1 : 3]; x > colnames(x) < letters[1 : 3]; x # Naming rows and columns. The character vector LETTERS is a built-in variable that contains the capital letters A Z. Similar useful vectors are letters, month.name and month.abb, which refers to lowercase letters a z, month names, and abbreviated month names respectively. Finding the Transpose and the Inverse of a matrix. > y < t(x); y # Gives the transpose of a matrix. > z < solve(x); z # Gives the inverse of a matrix.

Matrices III Create two matrices, say, A and B. Matrix addition, subtraction and multiplication > C = c(rep(0, 3), seq(1, 6, 1), rep(seq(1, 6, 1), 2)) > A = matrix(sample(c, 12), nrow = 3, byrow = T ); A > B = matrix(sample(c, 12), nrow = 3, byrow = T ); B > D = A + B; D # Adds the matrix. > E = A B; E # Subtracts matrix B form matrix A. > F = A B; F # Multiplies matrix A form matrix B element-wise. > G = A% %t(b); G # Computes matrix multiplication.

File operations Reading Data. The two common functions used in reading data are; scan() and read.table(), i.e. > Sales = scan(file = filepath ) > SALES = read.table(file = filepath, header = FALSE) To merge two or more files according to the column names or row names, we use the function merge(); > AB = merge(x, y, z) # merging files. To write an output into a file, use the function write(), i.e. > output.file = write(x, file = file path to save the data, ncolumns = 1, append = FALSE)

Some probability distributions I Normal distribution Full list and options are found in > help(normal) command. dnorm > x < seq(,, by =.1) > y < dnorm(x) > plot(x, y) pnorm > x < seq(,, by =.1) > y < pnorm(x, mean = 3, sd = 4) > plot(x, y)

Some probability distributions II qnorm The next function we look at is qnorm which is the inverse of pnorm. The idea behind qnorm is that you give it a probability, and it returns the number whose cumulative distribution matches the probability. > x < seq(0, 1, by =.05) > y < qnorm(x) > plot(x, y) > y < qnorm(x, mean = 3, sd = 2) > plot(x, y) rnorm > y < rnorm(0, mean = 2) > hist(y)

t-distribution I

t-distribution II dt, Help at > help(tdist) > x < seq(,, by =.5) > y < dt(x, df = 10) pt > x = c( 3, 4, 2, 1) > pt((mean(x) 2)/sd(x), df = ) qt > v < c(0.005,.025,.05) > qt(v, df = 253) rt > rt(3, df = 10)

Chi-square I

Chi-square II dchisq, Help at > help(chisquare) > x < seq(,, by =.5) > y < dchisq(x, df = 10) pchisq > x = c(2, 4, 5, 6) > pchisq(x, df = ) qchisq > v < c(0.005,.025,.05); qchisq(v, df = 253) rchisq > rchisq(3, df = )

Testing of Hypotheses I In the one-sample t-test, we are comparing a sample mean to a known or hypothesized population mean. The null hypothesis is that the expected mean difference between the sample mean and the population mean is zero, or in other words, that the expected value of the sample mean is equal to the population mean. t-test example 1. > rnorm1 = rnorm(50, 500, 100); rnorm1 # Generate random numbers from a normal distribution with µ = 500, σ = 100. > summary(rnorm1) # Gives the descriptive summary. The mean of 517.3 is higher than µ = 500, but is it significantly higher? To test this hypotheses, we use the function t.test(). > t.test(rnorm1, mu = 500)

Testing of Hypotheses II Load the in-built library datasets. t-test example 2. Quiz > library(datasets) > head(mtcars) Assuming that the data in mtcars follows the normal distribution, find the 95% confidence interval estimate of the difference between the mean gas mileage of manual and automatic transmissions.

Testing of Hypotheses III Solution Answer > L = mtcars$am == 0 > mpg.auto = mtcars[l, ]$mpg; mpg.auto # automatic transmission mileage. > mpg.manual = mtcars[!l, ]$mpg; mpg.manual # manual transmission mileage. > t.test(mpg.auto, mpg.manual) In mtcars, the mean mileage of automatic transmission is 17.147 mpg and the manual transmission is 24.392 mpg. The 95% confidence interval of the difference in mean gas mileage is between 3.97 and 11.2802 mpg.

Testing of Hypotheses IV Chi-squared test Consider the following example; > female = c(18, 102) > male = c(10, 110) > migraine = cbind(female, male); migraine > chisq.test(migraine) From the results we find that;- We determine that the chi-square test fails to reject the null hypothesis that gender and migraine susceptibility are independent. If the ratio of female to male migraine sufferers is indeed 18:6, then our result is not what we had hoped for. We may have an unusual sample or we may simply need a larger sample to obtain statistical significance.

Regression I Simple Linear Regression An example of grade point averages (GPAs) of students along with the number of hours each student studies per week. We then use the lm() function to find the slope and intercept terms. Hours 10 12 10 15 14 12 13 15 16 14 13 12 11 10 13 13 14 18 17 14 GPA 3.33 2.92 2.56 3.08 3.57 3.31 3.45 3.93 3.82 3.70 3.26 3.00 2.74 2.85 3.33 3.29 3.58 3.85 4.00 3.50 > results < lm(gpa Hours); summary(results) The correct interpretation of the regression equation is that a student who did not study would have an estimated GPA of 1.3728, and that for every 1-hour increase in study time, the estimated GPA would increase by 0.1489 points.

Regression II Example Consider the dataframe on cars, mtcars in library(datasets). Test if mpg -miles per gallon is affected by disp -displacement and hp -horsepower. > myvariables = c( mpg, disp, hp ) > mycars = mtcars[myvariables] > mymodel = lm(mpg disp + hp, data = mycars) > summary(mymodel) From the summary of the model, we can clearly see that the independent variables; disp -Engine displacement & hp -Horsepower negatively affects the dependent variable, i.e. for an increase in either displacement or horsepower, it will result in a decrease in the miles per gallon. Another important graphical way to assess relationship between