Maximum Likelihood Exercises SOLUTIONS

Size: px
Start display at page:

Download "Maximum Likelihood Exercises SOLUTIONS"

Transcription

1 Maximum Likelihood Exercises SOLUTIONS Exercise : Frog covariates () Before we can fit the model, we have to do a little recoding of the data. Both ReedfrogPred$sie and ReedfrogPred$pred are text data, so we ll need to convert them to dummy variables, before we can use them in the model. R is nice about doing this for you, when you use lm(), but it won t do it automatically for custom maximum likelihood estimation, using mle2(). So to recode both categorical variables to 0/ dummies: # need to recode sie to a dummy variable ReedfrogPred$siecode <- ifelse( ReedfrogPred$sie=="big",, 0 ) # need to recode pred to a dummy variable ReedfrogPred$predcode <- ifelse( ReedfrogPred$pred=="pred",, 0 ) Now we can define the first model. All you need to do is modify the code I provided to expand to linear model portion. What we are assuming is that the probability of an individual tadpole surviving is given by: Pr(survive parameters) = + exp(linear model). So you can build the linear model part in the same way you d build a linear regression. This code fits the model that includes parameters for baseline survival (the intercept), sie, and predator present: m <- mle2( ReedfrogPred$surv ~ dbinom( prob=/(+exp( m + s*reedfrogpred$siecode + p*reedfrogpred$predcode )), sie=reedfrogpred$density ), start=list(m=0,s=0,p=0) ) The part that corresponds to the probability model above is: prob=/(+exp( m + s*reedfrogpred$siecode + p*reedfrogpred$predcode )) This is the same as: Pr(survive m, s, p) = The estimates you come up with are: m s p exp(m + s siecode + p predcode). What do these mean? To find out, you need to understand how the probability of survival responds to changes in these parameters. The answer is that negative estimates increase probability (as the value of the predictor increases), while positive estimates decrease probability. You can prove this for yourself (you should have) by simply plotting /( + exp(m)), for a range of m values: curve( /(+exp(x)), from=-5, to=5 )

2 /( + exp(x)) x Note that probability is highest for the smallest values. So looking at our parameter estimates again: m s p we can conclude that baseline survival, m, is equal to: + exp( 2.92) = That s the probability of survival you get when sie is ero and pred is ero. It s very high! The parameter estimates for the other two parameters, s and p, are positive, so each decreases chance of survival, as each predictor increases. Being bigger means dying more. The presence of predators means dying more. There is actually a deep and precise relationship between these estimates, on the log scale, and the odds of the event happening. The odds of a tadpole survival, considering only baseline survival m is: Pr(survive) Pr(survive) = = That means 8.6: odds of survival. Very good odds. Now consider the odds of survival, when we add a predator. The chance of survival becomes /( + exp(m + p)) = /( + exp( ) = So the new odds are: = When predators are present, the odds of survival are.25:. Substantially reduced. 2

3 Now here s the magic. Compare 8.6: to.25:. The proportional change in odds is: = This says that the new odds are about 6.7% of the previous odds. By coincidence (not really), if you evaluate exp( p), where p is the parameter estimate for predation, you get: exp( p) = exp( ) = That s the same number as before, the proportional change in odds. So you can interpret these coefficients inside the logistic as proportional changes in odds. Just use the formula: proportional change in odds = exp( estimate). You can prove that this formula is correct, with a little algebra. We want to know the number x that satisfies: for any values a and b. Solving for x yields: /( + exp(a)) /( + exp(a + b)) x = /( + exp(a)) /( + exp(a + b)), x = exp( b), which is the result we induced above, via arithmetic. In summary, to get the proportional change in odds induced by a parameter, per unit of its predictor variable, use exp( k), where k is the parameter. For this reason, you will sometimes see these logit parameter estimates referred to as log-odds. Coming back to our estimates, the odds changes induced by each are: > exp( -coef(m) ) m s p So you can read these as: () the baseline odds of survival are 8.6:; (2) big tadpoles have half the odds of survival as small tadpoles; (3) when predators are present, a tadpole has 6.6% the odds of survival as when predators are absent. Now let s try a model that interacts sie and predation. m2 <- mle2( ReedfrogPred$surv ~ dbinom( prob=/(+exp( m + s*reedfrogpred$siecode + p*reedfrogpred$predcode + *ReedfrogPred$siecode*ReedfrogPred$predcode )), sie=reedfrogpred$density ), start=list(m=0,s=0,p=0,=0) ) Which evaluates the probability model: Pr(survive m, s, p, ) = The estimates are: + exp(m + s siecode + p predcode + siecode predcode). 3

4 m s p Remember, positive estimates decrease chance of survival. So now the estimate for sie has changed direction, and being big helps you survive that makes more sense. This main effect applies in the absence of predators, so we might interpret this as saying that bigger individuals die slightly less, when there are no predators around. The interaction of sie and predation tells us that being big hurts your survival, when predators are around. This implies that predators target big tadpoles. Now converting to odds: > exp( -coef(m2) ) m s p We start with.7: odds of survival. Being big increases the odds of survival by about 7%. Predators reduce odds of survival by about 0.2 = 0.88 = 88%. When a tadpole is big and predators are present, the odds of survival (already accounting for the main effects of sie and predation) are reduced by an additional 0.35 = 0.65 = 65%. (2) Confidence intervals for the second model I estimated above (using confint()) are: 2.5 % 97.5 % m s p We can again convert these to changes in odds: > exp( -confint(m2) ) Profiling % 97.5 % m s p So read the m line as the baseline odds of survival lie between 8.7: and 7.8:, with 95% confidence. The s line could be read as the proportional change in odds induced by being big lie between 2.24 times and 0.62 times the original odds. Clearly the effect of sie isn t estimated very precisely. The other two parameters, p and, however are consistent reductions in odds. So we can have more confidence about interpreting those effects. To compute the quadratic estimate likelihood intervals, you can use the formula I provided in lecture (and in the source file for this week): pse <- sqrt(diag( vcov(m2) )) m s p

5 Note that these are the same standard errors you get if you type summary(m2). To make these into 95% confidence intervals, use the -score that corresponds to a 95% interval,.96, and add/subtract.96 times the standard error to each parameter estimate: coef(m2) + pse*.96 # upper bounds coef(m2) - pse*.96 # lower bounds These intervals are very similar to the profile likelihood intervals we produced with confint(). One reason the quadratic estimation intervals (the ones that are estimated form the vcov matrix) are so similar to those from the likelihood profiles is that the profiles are shaped very much like quadratic functions, in this case. You can see this when you plot the profiles, in the square-root space, using plot(profile(m2)): Likelihood profile: m Likelihood profile: s % 95% 90% 80% 50% % 95% 90% 80% 50% m s Likelihood profile: p Likelihood profile: % 95% 90% 80% 50% % 95% 90% 80% 50% p If the lines were less straight, there would be more disagreement between the quadratic estimation intervals and the direct profile intervals. (3) Now to plot the chance of survival, using the estimates. There are many ways to do this, so you may have used a different approach. That s fine. My tactic here is to use a basic bar chart, because we actually have discrete predictor categories. There are four groups of tadpoles, and each has a single estimated chance of survival for every tadpole in a group. The groups are: () small/no-pred, (2) small/pred, (3) big/no-pred, (4) big/pred. The code below computes these four probabilities and makes a simple bar chart of them. p <- /(+exp( coef(m2)[] )) # small/no-pred 5

6 p2 <- /(+exp( coef(m2)[] + coef(m2)[2] )) # large/no-pred p3 <- /(+exp( coef(m2)[] + coef(m2)[3] )) # small/pred p4 <- /(+exp( coef(m2)[] + coef(m2)[2] + coef(m2)[3] + coef(m2)[4] )) # large/pred barplot( c(p,p2,p3,p4), names.arg=c("small/no-pred","large/no-pred", "small/pred","large/pred"), ylab="probability of survival" ) All this is copacetic with our previous interpretation of the parameters. Exercise 2: Making up data () The histograms from 00 runs of estimation parameters from simulated data are: Histogram of co[, ] Histogram of co[, 2] Frequency Frequency co[, ] co[, 2] The left histogram is the p parameter. The right one is the k parameter. The mean of each is 6.04 and 7.73, respectively. The original (true) values are p = 5 and k = 2. So there appears to be some bias in the estimates: p tends to be overestimated, and k tends to be underestimated. (This bias doesn t go away, as you increase the number of simulations.) We can get a single run a simulated data to compare against, using: d <- gen.dat( 20, 5, -2 ) Estimate it s parameters and compute confidence intervals: m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x)),sie=), start=list(p=0,k=0) ) Coefficients: p k % 97.5 % p k Aside from the noted bias again, the histograms do provide a measure of the range of values that are plausible. The simulated distribution is much more skewed than the single sample estimates, though. This is due to the bias in estimation. So to answer the question, I would say the do not contain exactly the same information. Instead, we learn from the simulation exercise that the estimation is biased. 6

7 But look what happens when we make the true values p = 0 and k = 0. Now the simulated distributions and the confidence intervals from a single run are very similar. You can directly compare the implied confidence intervals of the simulated runs by find the 2.5% and 97.5% cutoffs in the histograms. R makes this easy, with the quantile function: d <- gen.dat( 20, 0, 0 ) m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x)),sie=), start=list(p=0,k=0) ) confint(m) for ( i in :00 ) { d <- gen.dat( 20, 0, 0 ) m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x)),sie=), start=list(p=0,k=0) ) if ( i== ) { co <- coef(m) else { co <- rbind(co,coef(m)) quantile( co[,], probs=c(0.025,0.975) ) quantile( co[,2], probs=c(0.025,0.975) ) (2) When we reduce the sample sie, everything gets wider: the histograms and the confidence intervals. Less data means less confidence. (3) Larger values of the parameters appear to induce more bias and make estimates harder. In particular, if the mean values tend to push the chance of seeing a towards either or 0, then it ll be very hard to estimate parameters, because the system is pushed against either the floor or ceiling. (4) To add another prediction variable: gen.dat2 <- function(n,p,k,k2) { x <- rnorm( n, 4, 2 ) x2 <- rnorm( n,, 2 ) pr <- /(+exp(p + k*x + k2*x2)) y <- rbinom( prob=pr, sie=, n=n ) data.frame(x=x,y=y,x2=x2) for ( i in :00 ) { d <- gen.dat2( 50, -, 3, ) m <- mle2( d$y ~ dbinom(prob=/( + exp(p + k*d$x + k2*d$x2)), sie=), start=list(p=0,k=0,k2=0) ) if ( i== ) { co <- coef(m) else { co <- rbind(co,coef(m)) 7

8 Exercise 3: Making up data again A Poisson random variable is a count of events that occur in a finite interval of time. The minimum is ero and the maximum is infinity. You can see what a Poisson distribution looks like with: hist(rpois(000,lambda=2)) The parameter lambda is the mean number of events. As you increase it, the distribution looks increasingly normal, but the values are still always integers (because counts are never fractions). All you have to do to convert the code form #2 into Poisson simulations is to replace the probability functions rbinom and dbinom with rpois and dpois. The trick, though, is to use exp() to ensure that the value of the mean number of events is always greater than ero. gen.dat.pois <- function(n,p,k) { x <- rnorm( n, 4, 2 ) y <- rpois( lambda=exp( p + k*x ), n=n ) data.frame(x=x,y=y) for ( i in :00 ) { d <- gen.dat.pois( 20,, 3 ) m <- mle2( d$y ~ dpois(lambda=exp( p + k*d$x ) ), start=list(p=0,k=0) ) if ( i== ) { co <- coef(m) else { co <- rbind(co,coef(m)) 8

maximum likelihood Maximum likelihood families nega2ve binomial ordered logit/probit

maximum likelihood Maximum likelihood families nega2ve binomial ordered logit/probit maximum likelihood Maximum likelihood families nega2ve binomial ordered logit/probit The story so far... We have some data We want to predict it, using other data We venture a hypothesis We write that

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Lecture 10: The Normal Distribution. So far all the random variables have been discrete.

Lecture 10: The Normal Distribution. So far all the random variables have been discrete. Lecture 10: The Normal Distribution 1. Continuous Random Variables So far all the random variables have been discrete. We need a different type of model (called a probability density function) for continuous

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

One-to-one functions and onto functions

One-to-one functions and onto functions MA 3362 Lecture 7 - One-to-one and Onto Wednesday, October 22, 2008. Objectives: Formalize definitions of one-to-one and onto One-to-one functions and onto functions At the level of set theory, there are

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Lecture 10: Powers of Matrices, Difference Equations

Lecture 10: Powers of Matrices, Difference Equations Lecture 10: Powers of Matrices, Difference Equations Difference Equations A difference equation, also sometimes called a recurrence equation is an equation that defines a sequence recursively, i.e. each

More information

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Analysing categorical data using logit models

Analysing categorical data using logit models Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression Logistic Regression 1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression 5. Target Marketing: Tabloid Data

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

12 Generalized linear models

12 Generalized linear models 12 Generalized linear models In this chapter, we combine regression models with other parametric probability models like the binomial and Poisson distributions. Binary responses In many situations, we

More information

In other words, we are interested in what is happening to the y values as we get really large x values and as we get really small x values.

In other words, we are interested in what is happening to the y values as we get really large x values and as we get really small x values. Polynomial functions: End behavior Solutions NAME: In this lab, we are looking at the end behavior of polynomial graphs, i.e. what is happening to the y values at the (left and right) ends of the graph.

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2: 03 17 08 3 All about lines 3.1 The Rectangular Coordinate System Know how to plot points in the rectangular coordinate system. Know the

More information

Eigenvalues & Eigenvectors

Eigenvalues & Eigenvectors Eigenvalues & Eigenvectors Page 1 Eigenvalues are a very important concept in linear algebra, and one that comes up in other mathematics courses as well. The word eigen is German for inherent or characteristic,

More information

5.2 Infinite Series Brian E. Veitch

5.2 Infinite Series Brian E. Veitch 5. Infinite Series Since many quantities show up that cannot be computed exactly, we need some way of representing it (or approximating it). One way is to sum an infinite series. Recall that a n is the

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Tied survival times; estimation of survival probabilities

Tied survival times; estimation of survival probabilities Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation

More information

Introduction to Error Analysis

Introduction to Error Analysis Introduction to Error Analysis Part 1: the Basics Andrei Gritsan based on lectures by Petar Maksimović February 1, 2010 Overview Definitions Reporting results and rounding Accuracy vs precision systematic

More information

Supplemental Resource: Brain and Cognitive Sciences Statistics & Visualization for Data Analysis & Inference January (IAP) 2009

Supplemental Resource: Brain and Cognitive Sciences Statistics & Visualization for Data Analysis & Inference January (IAP) 2009 MIT OpenCourseWare http://ocw.mit.edu Supplemental Resource: Brain and Cognitive Sciences Statistics & Visualization for Data Analysis & Inference January (IAP) 2009 For information about citing these

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

More information

Do students sleep the recommended 8 hours a night on average?

Do students sleep the recommended 8 hours a night on average? BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8

More information

Chapter 7: Sampling Distributions

Chapter 7: Sampling Distributions Chapter 7: Sampling Distributions Section 7.1 What is a Sampling Distribution? The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Chapter 7 Sampling Distributions 7.1 What is a Sampling

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

ISQS 5349 Spring 2013 Final Exam

ISQS 5349 Spring 2013 Final Exam ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

CS1800: Sequences & Sums. Professor Kevin Gold

CS1800: Sequences & Sums. Professor Kevin Gold CS1800: Sequences & Sums Professor Kevin Gold Moving Toward Analysis of Algorithms Today s tools help in the analysis of algorithms. We ll cover tools for deciding what equation best fits a sequence of

More information

Chapter 7: Sampling Distributions

Chapter 7: Sampling Distributions Chapter 7: Sampling Distributions Section 7.1 What is a Sampling Distribution? The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Chapter 7 Sampling Distributions 7.1 What is a Sampling

More information

Section 1.x: The Variety of Asymptotic Experiences

Section 1.x: The Variety of Asymptotic Experiences calculus sin frontera Section.x: The Variety of Asymptotic Experiences We talked in class about the function y = /x when x is large. Whether you do it with a table x-value y = /x 0 0. 00.0 000.00 or with

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Psychology 282 Lecture #3 Outline

Psychology 282 Lecture #3 Outline Psychology 8 Lecture #3 Outline Simple Linear Regression (SLR) Given variables,. Sample of n observations. In study and use of correlation coefficients, and are interchangeable. In regression analysis,

More information

When is MLE appropriate

When is MLE appropriate When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

AQA Level 2 Further mathematics Further algebra. Section 4: Proof and sequences

AQA Level 2 Further mathematics Further algebra. Section 4: Proof and sequences AQA Level 2 Further mathematics Further algebra Section 4: Proof and sequences Notes and Examples These notes contain subsections on Algebraic proof Sequences The limit of a sequence Algebraic proof Proof

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Interactions and Factorial ANOVA

Interactions and Factorial ANOVA Interactions and Factorial ANOVA STA442/2101 F 2017 See last slide for copyright information 1 Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality Math 336-Modern Algebra Lecture 08 9/26/4. Cardinality I started talking about cardinality last time, and you did some stuff with it in the Homework, so let s continue. I said that two sets have the same

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

Interactions and Factorial ANOVA

Interactions and Factorial ANOVA Interactions and Factorial ANOVA STA442/2101 F 2018 See last slide for copyright information 1 Interactions Interaction between explanatory variables means It depends. Relationship between one explanatory

More information

[Limits at infinity examples] Example. The graph of a function y = f(x) is shown below. Compute lim f(x) and lim f(x).

[Limits at infinity examples] Example. The graph of a function y = f(x) is shown below. Compute lim f(x) and lim f(x). [Limits at infinity eamples] Eample. The graph of a function y = f() is shown below. Compute f() and f(). y -8 As you go to the far right, the graph approaches y =, so f() =. As you go to the far left,

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Lecture 17: Small-Sample Inferences for Normal Populations. Confidence intervals for µ when σ is unknown

Lecture 17: Small-Sample Inferences for Normal Populations. Confidence intervals for µ when σ is unknown Lecture 17: Small-Sample Inferences for Normal Populations Confidence intervals for µ when σ is unknown If the population distribution is normal, then X µ σ/ n has a standard normal distribution. If σ

More information

C if U can. Algebra. Name

C if U can. Algebra. Name C if U can Algebra Name.. How will this booklet help you to move from a D to a C grade? The topic of algebra is split into six units substitution, expressions, factorising, equations, trial and improvement

More information

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Mixed Models for Longitudinal Ordinal and Nominal Outcomes Mixed Models for Longitudinal Ordinal and Nominal Outcomes Don Hedeker Department of Public Health Sciences Biological Sciences Division University of Chicago hedeker@uchicago.edu Hedeker, D. (2008). Multilevel

More information

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections 3.4 3.6 by Iain Pardoe 3.4 Model assumptions 2 Regression model assumptions.............................................

More information

General Least Squares Fitting

General Least Squares Fitting Appendix B General Least Squares Fitting B.1 Introduction Previously you have done curve fitting in two dimensions. Now you will learn how to extend that to multiple dimensions. B.1.1 Linearizable Non-linear

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

Homework Solutions Applied Logistic Regression

Homework Solutions Applied Logistic Regression Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that

More information

Logit Regression and Quantities of Interest

Logit Regression and Quantities of Interest Logit Regression and Quantities of Interest Stephen Pettigrew March 4, 2015 Stephen Pettigrew Logit Regression and Quantities of Interest March 4, 2015 1 / 57 Outline 1 Logistics 2 Generalized Linear Models

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Logistic regression (v1) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 30 Regression methods for binary outcomes 2 / 30 Binary outcomes For the duration of this

More information

Descriptive statistics

Descriptive statistics Patrick Breheny February 6 Patrick Breheny to Biostatistics (171:161) 1/25 Tables and figures Human beings are not good at sifting through large streams of data; we understand data much better when it

More information

Lecture 10A: Chapter 8, Section 1 Sampling Distributions: Proportions

Lecture 10A: Chapter 8, Section 1 Sampling Distributions: Proportions Lecture 10A: Chapter 8, Section 1 Sampling Distributions: Proportions Typical Inference Problem Definition of Sampling Distribution 3 Approaches to Understanding Sampling Dist. Applying 68-95-99.7 Rule

More information

Chapter 1 Describing Data

Chapter 1 Describing Data Chapter 1 Describing Data Variable Basics Def: A Variable is any characteristic of an individual. Def: Individuals are the objects described by data. Note the term individual is somewhat flawed. Sometimes

More information

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives STA 2023 Module 6 The Sampling Distributions Module Objectives In this module, we will learn the following: 1. Define sampling error and explain the need for sampling distributions. 2. Recognize that sampling

More information

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45 Announcements Solutions to Problem Set 3 are posted Problem Set 4 is posted, It will be graded and is due a week from Friday You already know everything you need to work on Problem Set 4 Professor Miller

More information

Finding the Median. 1 The problem of finding the median. 2 A good strategy? lecture notes September 2, Lecturer: Michel Goemans

Finding the Median. 1 The problem of finding the median. 2 A good strategy? lecture notes September 2, Lecturer: Michel Goemans 18.310 lecture notes September 2, 2013 Finding the Median Lecturer: Michel Goemans 1 The problem of finding the median Suppose we have a list of n keys that are completely unsorted. If we want to find

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Induction 1 = 1(1+1) = 2(2+1) = 3(3+1) 2

Induction 1 = 1(1+1) = 2(2+1) = 3(3+1) 2 Induction 0-8-08 Induction is used to prove a sequence of statements P(), P(), P(3),... There may be finitely many statements, but often there are infinitely many. For example, consider the statement ++3+

More information

MA554 Assessment 1 Cosets and Lagrange s theorem

MA554 Assessment 1 Cosets and Lagrange s theorem MA554 Assessment 1 Cosets and Lagrange s theorem These are notes on cosets and Lagrange s theorem; they go over some material from the lectures again, and they have some new material it is all examinable,

More information

Stat 587: Key points and formulae Week 15

Stat 587: Key points and formulae Week 15 Odds ratios to compare two proportions: Difference, p 1 p 2, has issues when applied to many populations Vit. C: P[cold Placebo] = 0.82, P[cold Vit. C] = 0.74, Estimated diff. is 8% What if a year or place

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Probability Distributions

Probability Distributions CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas

More information

Logit Regression and Quantities of Interest

Logit Regression and Quantities of Interest Logit Regression and Quantities of Interest Stephen Pettigrew March 5, 2014 Stephen Pettigrew Logit Regression and Quantities of Interest March 5, 2014 1 / 59 Outline 1 Logistics 2 Generalized Linear Models

More information

4 Multicategory Logistic Regression

4 Multicategory Logistic Regression 4 Multicategory Logistic Regression 4.1 Baseline Model for nominal response Response variable Y has J > 2 categories, i = 1,, J π 1,..., π J are the probabilities that observations fall into the categories

More information

Fitting a Straight Line to Data

Fitting a Straight Line to Data Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,

More information

:Effects of Data Scaling We ve already looked at the effects of data scaling on the OLS statistics, 2, and R 2. What about test statistics?

:Effects of Data Scaling We ve already looked at the effects of data scaling on the OLS statistics, 2, and R 2. What about test statistics? MRA: Further Issues :Effects of Data Scaling We ve already looked at the effects of data scaling on the OLS statistics, 2, and R 2. What about test statistics? 1. Scaling the explanatory variables Suppose

More information

ACCUPLACER MATH 0311 OR MATH 0120

ACCUPLACER MATH 0311 OR MATH 0120 The University of Teas at El Paso Tutoring and Learning Center ACCUPLACER MATH 0 OR MATH 00 http://www.academics.utep.edu/tlc MATH 0 OR MATH 00 Page Factoring Factoring Eercises 8 Factoring Answer to Eercises

More information

Inference Tutorial 2

Inference Tutorial 2 Inference Tutorial 2 This sheet covers the basics of linear modelling in R, as well as bootstrapping, and the frequentist notion of a confidence interval. When working in R, always create a file containing

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Lecture 3: Big-O and Big-Θ

Lecture 3: Big-O and Big-Θ Lecture 3: Big-O and Big-Θ COSC4: Algorithms and Data Structures Brendan McCane Department of Computer Science, University of Otago Landmark functions We saw that the amount of work done by Insertion Sort,

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 17, 2010 Instructor: John Parman Final Exam - Solutions You have until 12:30pm to complete this exam. Please remember to put your

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information