Introduction to Statistics and R
|
|
- Marsha Moody
- 5 years ago
- Views:
Transcription
1 Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary file for the lecture slides. The.pdf file is created using R Markdown with a.rmd source file, and compiled through RStudio. Note that this will require installing several additional packages. 1 Using R and RStudio Most of the basic R mathematical and statistical functions are self-explanatory. Note that lines with a # in the front are comments, which will not be executed. Lines with in the front are outputs. # Basic mathematical operations [1] 4 3*5 [1] 15 3^5 [1] 243 exp(2) [1] log(3) [1] log2(3) [1] factorial(5) [1] 120 Data objects can be a complicated topic for people who have never used R before. Most common data objects are vector, matrix, list, and data.frame. Operations on vectors are matrices are fairly intuitive. # creating a vector c(1,2,3,4) [1] c("a", "b", "c") [1] "a" "b" "c" 1
2 # creating matrix from a vector matrix(c(1,2,3,4), 2, 2) [,1] [,2] [1,] 1 3 [2,] 2 4 x = c(1,1,1,0,0,0); y = c(1,0,1,0,1,0) cbind(x,y) x y [1,] 1 1 [2,] 1 0 [3,] 1 1 [4,] 0 0 [5,] 0 1 [6,] 0 0 # matrix multiplication using '%*%' matrix(c(1,2,3,4), 2, 2) %*% t(cbind(x,y)) [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] Simple mathematical operations on vectors and matrices are usually element-wise. You can easily extract certain elements of them by using the [] operator, like a C programming reference style. # some simple operations x[3] [1] 1 x[2:5] [1] cbind(x,y)[1:2, ] x y [1,] 1 1 [2,] 1 0 (x + y)^2 [1] length(x) [1] 6 dim(cbind(x,y)) [1] 6 2 # A warning will be issued when R detects something wrong. Results may still be produced. x + c(1,2,3,4) Warning in x + c(1, 2, 3, 4): longer object length is not a multiple of shorter object length [1] If you want to see more information about a particular function or operator in R, the easiest way is to get 2
3 the reference document. Put a question mark in front of a function name: # In a default R console window, this will open up a web browser. # In RStudio, this will be displayed at the Help window at the bottom-right penal.?log2?matrix 2 Basic Statistical Concepts Now, let s start to work on some real datasets. Load the Prostate Cancer Data (prostate) from the ElemStatLearn package. R packages are collections of functions and data sets. They are developed and shared by the community and improve the existing base R functionalities. The ElemStatLearn is a package developed for the textbook The Elements of Statistical Learning, Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. This dataset comes from a study by Stamey et al. (1989) that examined the prostate-specific antigen (PSA) and some clinical measures in 97 men. If you have never installed this package before, type the following code in your R console window. install.packages("elemstatlearn") Now, let s load this package and have a look at the dataset. library(elemstatlearn) data(prostate) # we will remove the last column since its not used in this analysis prostate = prostate[, -10] dim(prostate) [1] 97 9 head(round(prostate, 3)) lcavol lweight age lbph svi lcp gleason pgg45 lpsa The dataset contains 8 clinical variables and 1 outcome (lpsa). Among them, svi and gleason are categorical variables. Name lcavol lweight age Attribute log cancer volume log prostate weight age in years 3
4 Name Attribute lbph log of benign prostatic hyperplasia amount svi seminal vesicle invasion lcp log of capsular penetration gleason Gleason score pgg45 percent of Gleason scores 4 or 5 lpsa log of prostate specific antigen We can first perform some basic descriptive statistics. The summary function provides a quick overview of the data and each of the variables. This is particularly useful for a continuous variable. The quantiles and mean will be provided for each variable. For categorical variables, we can use the table statement to summarize the count of each category. summary(prostate) lcavol lweight age lbph svi Min. : Min. :2.375 Min. :41.00 Min. : Min. : st Qu.: st Qu.: st Qu.: st Qu.: st Qu.: Median : Median :3.623 Median :65.00 Median : Median : Mean : Mean :3.629 Mean :63.87 Mean : Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. : Max. :4.780 Max. :79.00 Max. : Max. : lcp gleason pgg45 lpsa Min. : Min. :6.000 Min. : 0.00 Min. : st Qu.: st Qu.: st Qu.: st Qu.: Median : Median :7.000 Median : Median : Mean : Mean :6.753 Mean : Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. : Max. :9.000 Max. : Max. : # the table() function can be used for categorical variables table(prostate$svi) table(prostate$gleason) # this statement creates multiple plots in the same row par(mfrow=c(1,2)) 4
5 # mean and sd are standard statistical functions mean(prostate$lcavol) [1] sd(prostate$lcavol) [1] # count frequencies for categorical variables table(prostate$gleason) # we can view a single continous variable using histogram hist(prostate$lcavol) hist(prostate$lpsa) Histogram of prostate$lcavol Histogram of prostate$lpsa Frequency Frequency prostate$lcavol prostate$lpsa For two random variables, we usually use the correlation coefficient to describe how related the two variables are. plot(prostate$lcavol, prostate$lpsa, pch = 19, cex = 0.7, xlab = "cancer volume", ylab = "prostate specific antigen") 5
6 prostate specific antigen cancer volume cor(prostate$lcavol, prostate$lpsa) [1] # correlation matrix can be calculated for each pair of variables round(cor(prostate), 3) lcavol lweight age lbph svi lcp gleason pgg45 lpsa lcavol lweight age lbph svi lcp gleason pgg lpsa Some R packages provide really nice plots of the correlation matrix library(corrplot) corrplot(cor(prostate), type = "upper", tl.col = "black", tl.srt = 35) 6
7 lcavol lweight age lcavol lweight age lbph svi lcpgleason lbph svi lcp gleason pgg45 lpsa pgg45 lpsa pairs(~.,data=prostate[, -c(5, 7)], pch = 19, cex = 0.5, col = "deepskyblue") lcavol lweight age lbph lcp pgg45 lpsa To demonstrate different correlation coefficients, we simulate some random data. # I will set some random seed so that the results can be replicated. set.seed(1) # generate 1000 independent normal data 7
8 n = 1000; X = rnorm(n) par(mfrow = c(2,3), oma = c(2,2,0,0) + 0.1, mar = c(2,1,2,1) + 0.1) # strong positive correlation plot(x, X + rnorm(n, 0, 0.5), pch = 19, cex = 0.5) title("strong Positive Correlation (0.9)") # moderate positive correlation plot(x, X + rnorm(n, 0, 2), pch = 19, cex = 0.5) title("moderate Positive Correlation (0.5)") # zero correlation due to symmetry plot(x, X^2 + rnorm(n, 0, 0.5), pch = 19, cex = 0.5) title("(nearly) Zero Correlation") # zero correlation due to independence plot(x, rnorm(n), pch = 19, cex = 0.5) title("(nearly) Zero Correlation") # moderate negative correlation plot(x, - X + rnorm(n, 0, 2), pch = 19, cex = 0.5) title("moderate Negative Correlation (-0.5)") # strong negative correlation plot(x, - X + rnorm(n, 0, 0.5), pch = 19, cex = 0.5) title("strong Negative Correlation (-0.9)") Strong Positive Correlation (0.9) Moderate Positive Correlation (0.5) (Nearly) Zero Correlation X + rnorm(n, 0, 0.5) X + rnorm(n, 0, 2) X^2 + rnorm(n, 0, 0.5) (Nearly) Zero X Correlation Moderate Negative X Correlation ( 0.5) Strong Negative XCorrelation ( 0.9) rnorm(n) X + rnorm(n, 0, 2) X + rnorm(n, 0, 0.5) X X X 8
9 3 Hypothesis Testings 3.1 Example 1: t test Let s perform a t test for testing the svi indicator vs. the outcome lpsa. t.test(lpsa ~ svi, data = prostate) Welch Two Sample t-test data: lpsa by svi t = , df = , p-value = 7.879e-08 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean in group 0 mean in group The test shows a p-value less than 10 4, which is considered highly significant. Note that this test can be done in many different ways depends on how your data is constructed. For example, the following approach gives the same result. g1 = prostate$lpsa[prostate$svi == 1] g0 = prostate$lpsa[prostate$svi == 0] t.test(g0, g1) Welch Two Sample t-test data: g0 and g1 t = , df = , p-value = 7.879e-08 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y Example 2: simple linear regression An alternative approach for this question is to use a regressioin model, or ANOVA. We consider modeling the relationship using lpsa β 0 + β 1 svi 9
10 fit = glm(lpsa ~ svi, data = prostate, family = "gaussian") summary(fit) Call: glm(formula = lpsa ~ svi, family = "gaussian", data = prostate) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** svi e-09 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be ) Null deviance: on 96 degrees of freedom Residual deviance: on 95 degrees of freedom AIC: Number of Fisher Scoring iterations: Example 3: multiple linear regression We can further inccorperate more variables in a regression model, for example, lpsa β 0 + β 1 svi + β 2 lcavol + β 3 age fit = glm(lpsa ~ svi + lcavol + age, data = prostate, family = "gaussian") summary(fit) Call: glm(formula = lpsa ~ svi + lcavol + age, family = "gaussian", data = prostate) Deviance Residuals: Min 1Q Median 3Q Max
11 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * svi ** lcavol e-11 *** age Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be ) Null deviance: on 96 degrees of freedom Residual deviance: on 93 degrees of freedom AIC: Number of Fisher Scoring iterations: Example 4: multiple testing Consider a situation that two variables x and y are completely independent. However, as we repeated to perform the test using newly generated samples, some of them will turn out significant. pvalues = rep(na, 1000) for (i in 1:1000) { x = rbinom(20, 1, 0.5) y = rnorm(20) pvalues[i] = t.test(y~x)$p.value } table(pvalues <= 0.05) FALSE TRUE Of course in this example, we are expected to see around 5% of significant results that is how we designed our test. But this may be a big issue in genomic studies, where we need to test thousands of genes one by one. Some solutions may include the Bonferroni correction which lowers the α level by considering the number of hypothesis tests we are performing. 11
12 3.5 Example 4: high-dimensional data Consider a situation that the number of variables is very large. We will have a very unstable system. Too many variables can be significant, and also the parameter estimates are unreliable. When the number of variables is larger than the number of observations, the standard linear model cannot even run. # 55 observations with 50 varaibles set.seed(1) n = 55; p = 50 x = matrix(rnorm(n*p), n, p) # y is independent of x y = rnorm(n) allcoef = summary(lm(y~x))$coefficients # just the varaibles that estimated to be significant allcoef[allcoef[,4] < 0.05, ] Estimate Std. Error t value Pr(> t ) x x x x x x x x x x x Some machine learning models can help in this situation. For example, the Lasso regression. It incorporates a way to focus more on the important variables (if there is any), and try to eliminate variables that are pure noise. This is done using the glmnet package. library(glmnet) lasso.fit = cv.glmnet(x, y) # the lasso model estimates all coefficients to be 0 except the intercept term sum(as.numeric(coef(lasso.fit))!= 0) [1] 1 Some other machine learning models such as random forests can also be used to handle high-dimensional data and estimate the effect of each variable. For more details, please see the documentation of R package randomforest. 12
Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015
Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro
More informationUsing R in 200D Luke Sonnet
Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random
More informationSampling Distributions
Merlise Clyde Duke University September 3, 2015 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);
More informationIdentify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R
Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Fadel Hamid Hadi Alhusseini Department of Statistics and Informatics, University
More informationLecture 4: Newton s method and gradient descent
Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech
More informationHoliday Assignment PS 531
Holiday Assignment PS 531 Prof: Jake Bowers TA: Paul Testa January 27, 2014 Overview Below is a brief assignment for you to complete over the break. It should serve as refresher, covering some of the basic
More informationCOMP 551 Applied Machine Learning Lecture 2: Linear Regression
COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise
More informationSome new ideas for post selection inference and model assessment
Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve
More informationCOMP 551 Applied Machine Learning Lecture 2: Linear regression
COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical
More informationSampling Distributions
Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);
More informationHigh-dimensional data analysis
High-dimensional data analysis HW3 Reproduce Figure 3.8 3.10 and Table 3.3. (do not need PCR PLS Std Error) Figure 3.8 There is Profiles of ridge coefficients for the prostate cancer example, as the tuning
More informationConsider fitting a model using ordinary least squares (OLS) regression:
Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful
More informationLinear Regression. Machine Learning Seyoung Kim. Many of these slides are derived from Tom Mitchell. Thanks!
Linear Regression Machine Learning 10-601 Seyoung Kim Many of these slides are derived from Tom Mitchell. Thanks! Regression So far, we ve been interested in learning P(Y X) where Y has discrete values
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationChecking the Poisson assumption in the Poisson generalized linear model
Checking the Poisson assumption in the Poisson generalized linear model The Poisson regression model is a generalized linear model (glm) satisfying the following assumptions: The responses y i are independent
More informationPost-selection inference with an application to internal inference
Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,
More informationExercise 5.4 Solution
Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia
More informationPoisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction
More informationStatistical Prediction
Statistical Prediction P.R. Hahn Fall 2017 1 Some terminology The goal is to use data to find a pattern that we can exploit. y: response/outcome/dependent/left-hand-side x: predictor/covariate/feature/independent
More informationVariable Selection in Low and High-Dimensional Data Analysis. May 15, 2016
Variable Selection in Low and High-Dimensional Data Analysis May 15, 2016 Workshop Description 1 Petty 219 from 9:00AM-12:30PM 2 Presentation I (9:00AM-10:00PM) 3 Presentation II (10:20AM-11:20AM) 4 Presentation
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationPost-selection Inference for Forward Stepwise and Least Angle Regression
Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationLogistic Regression 21/05
Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression
More informationOn the Inference of the Logistic Regression Model
On the Inference of the Logistic Regression Model 1. Model ln =(; ), i.e. = representing false. The linear form of (;) is entertained, i.e. ((;)) ((;)), where ==1 ;, with 1 representing true, 0 ;= 1+ +
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More informationLeast Angle Regression, Forward Stagewise and the Lasso
January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,
More informationLecture 5: Soft-Thresholding and Lasso
High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized
More informationIntroduction to the Analysis of Tabular Data
Introduction to the Analysis of Tabular Data Anthropological Sciences 192/292 Data Analysis in the Anthropological Sciences James Holland Jones & Ian G. Robertson March 15, 2006 1 Tabular Data Is there
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationIntroduction to the Generalized Linear Model: Logistic regression and Poisson regression
Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)
More informationInteractions in Logistic Regression
Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/
More informationLeftovers. Morris. University Farm. University Farm. Morris. yield
Leftovers SI 544 Lada Adamic 1 Trellis graphics Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475 Manchuria No. 462 Svansota Trebi Wisconsin No. 38 No. 457 Glabron Peatland Velvet No. 475
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationBusiness Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee
Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)
More informationGeneralized Linear Models in R
Generalized Linear Models in R NO ORDER Kenneth K. Lopiano, Garvesh Raskutti, Dan Yang last modified 28 4 2013 1 Outline 1. Background and preliminaries 2. Data manipulation and exercises 3. Data structures
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationA significance test for the lasso
1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationPackage plw. R topics documented: May 7, Type Package
Type Package Package plw May 7, 2018 Title Probe level Locally moderated Weighted t-tests. Version 1.40.0 Date 2009-07-22 Author Magnus Astrand Maintainer Magnus Astrand
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationLogistic Regression. 0.1 Frogs Dataset
Logistic Regression We move now to the classification problem from the regression problem and study the technique ot logistic regression. The setting for the classification problem is the same as that
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationRegression III: Computing a Good Estimator with Regularization
Regression III: Computing a Good Estimator with Regularization -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 Another way to choose the model Let (X 0, Y 0 ) be a new observation
More informationChapter 16: Understanding Relationships Numerical Data
Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear
More informationRegularization Paths. Theme
June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,
More informationR Output for Linear Models using functions lm(), gls() & glm()
LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base
More informationCuckoo Birds. Analysis of Variance. Display of Cuckoo Bird Egg Lengths
Cuckoo Birds Analysis of Variance Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 29th November 2005 Cuckoo birds have a behavior in which they lay their
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationModule 4: Regression Methods: Concepts and Applications
Module 4: Regression Methods: Concepts and Applications Example Analysis Code Rebecca Hubbard, Mary Lou Thompson July 11-13, 2018 Install R Go to http://cran.rstudio.com/ (http://cran.rstudio.com/) Click
More informationBooklet of Code and Output for STAD29/STA 1007 Midterm Exam
Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 Packages................................ 2 2 Hospital infection risk data (some).................
More informationStatistics. Introduction to R for Public Health Researchers. Processing math: 100%
Statistics Introduction to R for Public Health Researchers Statistics Now we are going to cover how to perform a variety of basic statistical tests in R. Correlation T-tests/Rank-sum tests Linear Regression
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationInferences on Linear Combinations of Coefficients
Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you
More informationEECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction
EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction Prof. Predrag R. Jelenković Time: Tuesday 4:10-6:40pm 1127 Seeley W. Mudd Building Dept. of Electrical Engineering
More informationTento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/
Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationRegression and Models with Multiple Factors. Ch. 17, 18
Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationRegression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.
Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression
More informationSolutions to obligatorisk oppgave 2, STK2100
Solutions to obligatorisk oppgave 2, STK2100 Vinnie Ko May 14, 2018 Disclaimer: This document is made solely for my own personal use and can contain many errors. Oppgave 1 We load packages and read data
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationRegression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102
Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical
More informationMODULE 6 LOGISTIC REGRESSION. Module Objectives:
MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form
More informationWeek 7 Multiple factors. Ch , Some miscellaneous parts
Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires
More informationLecture 5 - Plots and lines
Lecture 5 - Plots and lines Understanding magic Let us look at the following curious thing: =rnorm(100) y=rnorm(100,sd=0.1)+ k=ks.test(,y) k Two-sample Kolmogorov-Smirnov test data: and y D = 0.05, p-value
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationA brief introduction to mixed models
A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic
More informationHow to work correctly statistically about sex ratio
How to work correctly statistically about sex ratio Marc Girondot Version of 12th April 2014 Contents 1 Load packages 2 2 Introduction 2 3 Confidence interval of a proportion 4 3.1 Pourcentage..............................
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationHarvard University. Rigorous Research in Engineering Education
Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected
More informationIntroduction to Linear Regression
Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46
More informationNon-Gaussian Response Variables
Non-Gaussian Response Variables What is the Generalized Model Doing? The fixed effects are like the factors in a traditional analysis of variance or linear model The random effects are different A generalized
More informationSLR output RLS. Refer to slr (code) on the Lecture Page of the class website.
SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association
More informationRegression on Faithful with Section 9.3 content
Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,
More informationBias-free Sparse Regression with Guaranteed Consistency
Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March
More informationRegression Methods for Survey Data
Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear
More informationChapter 8 Conclusion
1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect
More informationRegression in R. Seth Margolis GradQuant May 31,
Regression in R Seth Margolis GradQuant May 31, 2018 1 GPA What is Regression Good For? Assessing relationships between variables This probably covers most of what you do 4 3.8 3.6 3.4 Person Intelligence
More information