Data Analysis, Statistics, Machine Learning

Similar documents
Data Analysis, Statistics, Machine Learning

Data Analysis, Statistics, Machine Learning

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Simple Linear Regression (single variable)

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

A Matrix Representation of Panel Data

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Resampling Methods. Chapter 5. Chapter 5 1 / 52

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

Inference in the Multiple-Regression

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

Chapter 3: Cluster Analysis

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Introduction to Regression

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

AP Physics Kinematic Wrap Up

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Computational modeling techniques

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

SPH3U1 Lesson 06 Kinematics

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

Smoothing, penalized least squares and splines

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

Kinetic Model Completeness

Lecture 10, Principal Component Analysis

IAML: Support Vector Machines

Our Lady Star of the Sea Religious Education CIRCLE OF GRACE LESSON PLAN - Grade 1

Comparing Several Means: ANOVA. Group Means and Grand Mean

Preparation work for A2 Mathematics [2018]

, which yields. where z1. and z2

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

COMP 551 Applied Machine Learning Lecture 4: Linear classification

Differentiation Applications 1: Related Rates

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

INSTRUMENTAL VARIABLES

Trigonometric Ratios Unit 5 Tentative TEST date

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

Physics 2010 Motion with Constant Acceleration Experiment 1

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

Lab 1 The Scientific Method

IN a recent article, Geary [1972] discussed the merit of taking first differences

We can see from the graph above that the intersection is, i.e., [ ).

Thermodynamics Partial Outline of Topics

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Lab #3: Pendulum Period and Proportionalities

Preparation work for A2 Mathematics [2017]

Pattern Recognition 2014 Support Vector Machines

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

Finding the Earth s magnetic field

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)

Data Analysis, Statistics, Machine Learning

Exponential Functions, Growth and Decay

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

Tree Structured Classifier

Flipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System

Flipping Physics Lecture Notes: Simple Harmonic Motion Introduction via a Horizontal Mass-Spring System

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

What is Statistical Learning?

EEO 401 Digital Signal Processing Prof. Mark Fowler

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

Five Whys How To Do It Better

Physics 212. Lecture 12. Today's Concept: Magnetic Force on moving charges. Physics 212 Lecture 12, Slide 1

CHAPTER 8b Static Equilibrium Units

AP Statistics Notes Unit Two: The Normal Distributions

Who is the Holy Spirit?

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

ECE 5318/6352 Antenna Engineering. Spring 2006 Dr. Stuart Long. Chapter 6. Part 7 Schelkunoff s Polynomial

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th,

PLEASURE TEST SERIES (XI) - 07 By O.P. Gupta (For stuffs on Math, click at theopgupta.com)

Chapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms

In the OLG model, agents live for two periods. they work and divide their labour income between consumption and

Figure 1a. A planar mechanism.

Group Analysis: Hands-On

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Math Foundations 10 Work Plan

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Thermodynamics and Equilibrium

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

NGSS High School Physics Domain Model

LCAO APPROXIMATIONS OF ORGANIC Pi MO SYSTEMS The allyl system (cation, anion or radical).

1 Course Notes in Introductory Physics Jeffrey Seguritan

Please Stop Laughing at Me and Pay it Forward Final Writing Assignment

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

Trigonometric Functions. Concept Category 3

READING STATECHART DIAGRAMS

Kinetics of Particles. Chapter 3

Lecture 13: Markov Chain Monte Carlo. Gibbs sampling

Relationships Between Frequency, Capacitance, Inductance and Reactance.

Hubble s Law PHYS 1301

5 th grade Common Core Standards

Transcription:

Data Analysis, Statistics, Machine Learning Leland Wilkinsn Adjunct Prfessr UIC Cmputer Science Chief Scien<st H2O.ai leland.wilkinsn@gmail.cm

Mst sta<s<cal predic<n mdels take ne f tw frms y = Σ j (β j x j ) + ε (addi<ve func<n) y = f(x j, ε) (nnlinear func<n) The dis<nc<n is imprtant The first frm is called an addi<ve mdel The secnd frm is called a nnlinear mdel Addi<ve mdels can be curvilinear (if terms are nnlinear) Nnlinear mdels cannt be transfrmed t linear Examples f linear r linearizable mdels are y =β 0 + β 1 x 1 + + β p x p + ε y =αe βx+ ε Examples f nnlinear mdels are y =β 1 x 1 / β 2 x 2 + ε y = lgβ 1 x 1 ε 2 Cpyright 2016 Leland Wilkinsn

Regressin predicts a set f values n a variable y frm values n ne r mre x variables. This is dne by fivng a mathema<cal func<n that, fr any value(s) n the x variable(s), yields the mst prbable value f y. Simple linear mdel is y i = β 0 + β 1 x i + i Es<mates are ŷ i = ˆβ 0 + ˆβ 1 x i The badness r lss f this predic<n in a sample f values is represented by the discrepancies between the y values and their crrespnding predicted values. lss = n (y ŷ) 2 i=1 3 Cpyright 2016 Leland Wilkinsn

Ordinary Least Squares (OLS) Legendre (1805) Of all the principles that can be prpsed fr this purpse, I think there is nne mre general, mre exact, r easier t apply, than that which we have used in this wrk; it cnsists f making the sum f the squares f the errrs a minimum. By this methd, a kind f equilibrium is established amng the errrs which, since it prevents the extremes frm dmina<ng, is apprpriate fr revealing the state f the system which mst nearly appraches the truth. (transla<n by S<gler, 1986) 4 Cpyright 2016 Leland Wilkinsn

Regressin Francis Galtn (1887) He n<ced that the blue lines were the same length as the red nes That is, the line best predic<ng Y frm X regressed away frm the majr axis 5 Cpyright 2016 Leland Wilkinsn

Regressin Francis Galtn (1887) His data weren t as clean and linear as he imagined, but that didn t mager Height f Mid-Parent in Inches 75 70 65 60 60 65 70 75 Height f Child in Inches Wachsmuth, Wilkinsn & Dallal, 2003 6 Cpyright 2016 Leland Wilkinsn

Regressin Francis Galtn (1887) That s because he aggregated ver different surces 80 Height f Father 70 60 50 80 Height f Mther 70 60 50 50 60 70 80 Height f Sn 50 60 70 80 Height f Daughter Wachsmuth, Wilkinsn & Dallal, 2003 7 Cpyright 2016 Leland Wilkinsn

Es<ma<ng via Ordinary Least Squares 8 Cpyright 2016 Leland Wilkinsn

Es<ma<ng via Ordinary Least Squares Fr intercept (b 0 ) and slpe (b 1 ), we culd use calculus The way we did when we used maximum likelihd t es<mate mean and sd In this case, we want t minimize the sum f squared residuals SSE Given y i = b 0 b 1 x i + e i We sum the e i t get SSE SSE = n (y i b 0 b 1 x i ) 2 i=1 Cmpute the par<al deriva<ves with respect t b 0 and b 1 Set these deriva<ves t zer (where the minimum SSE exists) Slve the resul<ng simultaneus equa<ns This is what Legendre riginally did But there is an easier way 9 Cpyright 2016 Leland Wilkinsn

Es<ma<ng via OLS (we ll use matrices) Y = XB + E XB E X Y = X XB + X E (X X) 1 X Y =(X X) 1 (X X)B (X X) 1 X Y = B E = Y XB y 6 5 4 3 2 Y = 3 2 2 4 4 6 5 X = 1 2 1 1 1 2 1 3 1 4 1 5 1 6 B = 1.25 0.75 E = 0.25 0.00.75 0.50.25 1.00.75 1 2 3 4 5 6 x 10 Cpyright 2016 Leland Wilkinsn

Es<ma<ng the regressin mdel parameters Hw d we knw we minimized the errr sum f squares? X Y = XY cs θ cs θ = X Y XY Pearsn crrela<n cefficient (if data are centered) length f E = E E Y Y and X are fixed (because f θ and X and Y ) Shrtest distance frm pint Y t line X is E QED θ XB X 11 Cpyright 2016 Leland Wilkinsn

Es<ma<n (assuming we sampled frm a ppula<n) B = ˆβ0 ˆβ 1 is a least- squares es<matr unbiased (assuming residuals are hmgeneus) smallest variance amng unbiased es<matrs Best Linear Unbiased Es<matr (BLUE) If the residuals are nrmally distributed B is a maximum likelihd es<matr We can d classical sta<s<cal tests n es<mates f parameters 12 Cpyright 2016 Leland Wilkinsn

Es<ma<ng via OLS Tw predictrs (same frmulas) Y = XB + E XB E X Y = X XB + X E (X X) 1 X Y =(X X) 1 (X X)B (X X) 1 X Y = B E = Y XB Y = 3 2 2 4 4 6 5 X = 1 2 3 1 1 4 1 2 2 1 3 1 1 4 2 1 5 4 1 6 3 B = 0.897 0.746 0.135 E = 0.206.183.659 0.730.151 0.833.778 13 Cpyright 2016 Leland Wilkinsn

The tw- predictr vectr space Y = XB + E Wikipedia 14 Cpyright 2016 Leland Wilkinsn

The tw- predictr vectr space It is pssible fr y nt t be crrelated much with either X 1 r X 2 yet be highly crrelated with the linear cmbina<n f X 1 and X 2 S dn t thrw ut predictrs by lking at their crrela<ns with the dependent variable X 1 E θ Y XB X 2 15 Cpyright 2016 Leland Wilkinsn

The tw- predictr vectr space Z is nt significantly related t X Z is nt significantly related t Y But the mul<ple crrela<n f Z with X and Y is almst 1! 0.03 0.03 0.02 0.02 0.01 0.01 Z 0.00 Z 0.00-0.01-0.01-0.02-0.02-0.03-3 -2-1 0 1 2 3 X -0.03-3 -2-1 0 1 2 3 Y 16 Cpyright 2016 Leland Wilkinsn

Standardized Regressin Cefficients (Beta Weights) Standardize variables, then cmpute regressin Remves scales frm cnsidera<n f size f cefficients Scial scien<sts lve this stuff Why then are crrela<n cefficients s agrac<ve? Only bad reasns seem t cme t mind. Wrst f all, prbably, is the absence f any need t think abut units fr either variable. Given tw perfectly meaningless variables, ne is reminded f their meaninglessness when a regressin cefficient is given, since ne wnders hw t interpret its value. Tukey (1969) 17 Cpyright 2016 Leland Wilkinsn

Sums f Squares SSR = n (ŷ i ȳ i ) 2 i=1 regressin sum f squares (explained) SSE = n (y i ŷ i ) 2 i=1 errr sum f squares (unexplained) SST = n (y i ȳ i ) 2 i=1 ttal sum f squares 18 Cpyright 2016 Leland Wilkinsn

Gdness f fit Pearsn crrela<n ρ X,Y = ˆρ X,Y = r X,Y = COV (X, Y ) V AR(X)V AR(Y ) X Y XY (if X and Y are centered) Mul<ple crrela<n (sqrt f cefficient f determina<n) R 2 = n i=1 (ŷ i ȳ i ) 2 n i=1 (y i ȳ i ) 2 regressin sum f squares / ttal sum f squares R xx = 1 r x1 x 2 r x1 x p r x2 x 1 1 r x2 x p.... r xp x 1 r xp x 2 1 R yx = r yx1 r yx2 r yxp R 2 = R yx Rxx 1 R xy sum f squared crrela<ns with Y / crrela<ns amng X 19 Cpyright 2016 Leland Wilkinsn

Inference F- sta<s<c (test f significance fr verall predic<n) F p,n p 1 = R 2 /p (1 R 2 )/(n p 1) Cnfidence intervals n regressin cefficients c j = diag(x X) 1 j SSE s = n p 1 CI =(ˆβ ± t α/2 n p 1 s c j ) 20 Cpyright 2016 Leland Wilkinsn

An interes<ng applica<n f Ordinary Least Squares We have Z n x p : n fixed pints in p dimensins (cell twers, MDS r ther cnfigura<n, ) d n : a new pint y s es<mated distance t each pint in Z Slve fr vectr b p specifying lca<n f new pint z 1 z 2 z 7 y z 3 z 6 z 4 z 5 21 Cpyright 2016 Leland Wilkinsn

An interes<ng applica<n f Ordinary Least Squares We have Z n x p : n fixed pints in p dimensins (cell twers, MDS r ther cnfigura<n, ) d n : a new pint y s es<mated distance t each pint in Z Slve fr vectr b p specifying lca<n f new pint d Z X y i j m s = n i=2 i 1 j=1 k=1 p (Z ik Z jk ) 2 (a scaling cnstant) y m = d 2 i d 2 j + s, j =1,,i 1, i =2,,n, m=(i 1)(i 2)/2+j X mk = 2(Z ik Z jk ), j =1,,i 1, i =2,,n, m=(i 1)(i 2)/2+j b =(X X) 1 X y 22 Cpyright 2016 Leland Wilkinsn

OLS Assump<ns The true mdel is linear in the parameters The X variables are nt randm and are measured withut errr The X variables are nt cllinear Residuals are uncrrelated/independent The residuals have cnstant variance (hmscedas<city) If t and F test sta<s<cs are cmputed Residuals must be Nrmally distributed Randm sample frm a ppula<n 23 Cpyright 2016 Leland Wilkinsn

Evalua<ng OLS assump<ns The true mdel is linear in the parameters Use LOESS t put a smth thrugh residual plt 24 Cpyright 2016 Leland Wilkinsn

Transfrma<ns Dealing with nnlinearity Wilkinsn, Blank, & Gruber (1996) 25 Cpyright 2016 Leland Wilkinsn

Transfrma<ns Dealing with nnlinearity Tukey- Msteller bulging rule Ascend y ladder (p > 1) x = x p y = y p Descend x ladder (p < 1) Ascend x ladder (p > 1) Descend y ladder (p < 1) 26 Cpyright 2016 Leland Wilkinsn

Evalua<ng OLS assump<ns The X variables are nt randm and are measured withut errr There is n simple test r graphic fr this Knw the surce f yur measurements If there is measurement errr, yu can use errrs- in- the- variables methds Or, yu can just frgeddabu<t, which is what mst f the wrld des Wh cares if yur cefficient es<mates are biased? They are usually biased dwnward, s n harm dne Just dn t shw them t a scial scien<st Scial scien<sts lve latent variable mdels They are Platnists 27 Cpyright 2016 Leland Wilkinsn

Evalua<ng OLS assump<ns The X variables are nt cllinear Values f Variance Infla<n Factr (VIF) abve 10 are wrrisme VIF = 1 1 R 2 i R 2 =.995 R 2 =.97 28 Cpyright 2016 Leland Wilkinsn

Evalua<ng OLS assump<ns Residuals are uncrrelated/independent An ACF plt can help spt vila<ns, but there are ther types f dependencies Ecnmists spend all their <me wrrying abut this And fr gd reasn serial dependence is mre txic than utliers Dn t even THINK f using rdinary linear regressin (trend line) n <me series data 29 Cpyright 2016 Leland Wilkinsn

Evalua<ng OLS assump<ns The residuals have cnstant variance (hmscedas<city) Try transfrma<ns (usually lgging wrks fr a pwer mdel) Or use ne f the heterscedas<city crrec<ns (MacKinnn- White, etc.) 30 Cpyright 2016 Leland Wilkinsn

Transfrma<ns Dealing with heterscedas<city lg- lg transfrma<n 31 Cpyright 2016 Leland Wilkinsn

Evalua<ng OLS assump<ns If t and F test sta<s<cs are cmputed Residuals must be Nrmally distributed Randm sample frm a ppula<n Frget abut tests fr nrmality; they are wrthless 32 Cpyright 2016 Leland Wilkinsn

Evalua<ng OLS assump<ns If t and F test sta<s<cs are cmputed Residuals must be Nrmally distributed 33 Cpyright 2016 Leland Wilkinsn

Evalua<ng OLS assump<ns Leverage Diagnal f the hat matrix (puts the hat n the parameters) H = X(X X) 1 X Ck s D Measures the decrement in predic<n by remving an bserva<n A type f leverage measure that is transfrmable t an apprximate F 34 Cpyright 2016 Leland Wilkinsn

Evalua<ng OLS assump<ns Leverage 35 Cpyright 2016 Leland Wilkinsn

Mdel Selec<n Par<al Residual Plts Each plt cnsists f sets f residuals plged against each ther. One set, n the ver<cal axis, cnsists f the residuals frm regressing y n all the X (predictr) variables except fr the predictr n the hrizntal axis. The ther set, n the hrizntal axis, cnsists f the residuals frm regressing X i n all the ther X variables. The result is a plt which shws yu hw each X variable is related t the Y variable when all the ther X variables are taken int accunt. Par<al residual plts have several useful features: The slpe f the line in the plt is the par<al regressin regressin cefficient crrespnding t the predictr in the plt. A line with a steep slpe and nice lking residuals is a sign that a predictr belngs in the mdel. The residuals frm this line are the same as the residuals frm regressing Y n all the predictrs. The plt helps yu t judge whether the cndi<nal rela<nship is linear r nnlinear. Extreme values n the hrizntal axis help yu t iden<fy high leverage pints. 36 Cpyright 2016 Leland Wilkinsn

Mdel Selec<n Par<al Residual Plts time = 10.679 + 6.697 distance + 0.008 climb 37 Cpyright 2016 Leland Wilkinsn

Mdel Selec<n Frward Selec<n Regressin Backward Elimina<n Regressin Stepwise Regressin All Pssible Subsets Regressin 38 Cpyright 2016 Leland Wilkinsn

Mdel Selec<n Myers, J.L., Well, A.D., Lrch Jr, R.F. (2013). Research Design and Sta<s<cal Analysis (3 rd Ed.). Rutledge. 39 Cpyright 2016 Leland Wilkinsn

Mdel Selec<n Akaike Infrma<n Criterin (AIC) Lngley Data: Predic<ng TOTAL (RSQ =.995) 40 Cpyright 2016 Leland Wilkinsn

Mdel Selec<n (m submdels) AIC = 2(p + 2) 2 lg(l) AIC k = AIC k AIC min L k =exp( AIC/2) W k = L k / m i=1 L i rela<ve AIC rela<ve likelihd AIC weights RSQ Increment AIC Weight 41 Cpyright 2016 Leland Wilkinsn

Mdel Selec<n (m submdels) 42 Cpyright 2016 Leland Wilkinsn

Other appraches t mul<cllinearity Regulariza<n Ridge regressin (Herl, 1962) Minimize n (y i x i β) 2 + λ i=1 p j=1 β 2 j S, ˆβ =(X X + λi) 1 X y We inflate the diagnal t reduce the influence f the ff- diagnal cvariances Ridge regressin shrinks the es<mates tward zer, intrducing bias But this reduces the variance f the es<mates 43 Cpyright 2016 Leland Wilkinsn

Other appraches t mul<cllinearity Regulariza<n LASSO (Least Abslute Value Shrinkage and Selec<n Operatr) Minimize Slu<n requires itera<n n (y i x i β) 2 + λ i=1 Allws sme cefficients t g t zer with thers nnzer Efrn and Tibshirani use Least Angle Regressin with Shrinkage (LARS) Similar t stepwise regressin p β j j=1 44 Cpyright 2016 Leland Wilkinsn

Lgis<c Regressin OLS es<mates 2 1 PHD 0-1 400 500 600 700 800 900 GRE 45 Cpyright 2016 Leland Wilkinsn

Lgis<c Regressin LR es<mates 2 1 PHD 0-1 400 500 600 700 800 900 GRE 46 Cpyright 2016 Leland Wilkinsn

Lgis<c Regressin Mdel and es<ma<n p(x; β) = lgit(p) = lg L(β; x 1,...,x n )= l(β; x 1,...,x n )= exβ 1+e xβ = 1 1+e xβ p(x) 1 p(x) = xβ n i=1 p y i i (1 p i) 1 y i n lgit i lg(1 + p i ) i=1 Must use itera<ve p<miza<n t find maximum Different mdel fr mre than tw categries x = x 0,,x p β = β 0,, β p lg- dds likelihd (Binmial) lg- likelihd 47 Cpyright 2016 Leland Wilkinsn

Pissn Regressin (lg- linear mdel) Mdel and es<ma<n E[Y x] =e xβ Pissn distribu<n has nly 1 parameter p(y; x, β) = eyxβ e exβ y! L(β; y, x 1,...,x n )= l(β; y, x 1,...,x n )= n e y ix i β e ex i β y i! n {(y i x i β) e xiβ lg(y i!)} i=1 i=1 y is integer valued likelihd lg- likelihd (dn t need the yellw term In rder t maximize) Must use itera<ve p<miza<n t find maximum 48 Cpyright 2016 Leland Wilkinsn

Pissn Regressin OLS Pissn 49 Cpyright 2016 Leland Wilkinsn

Generalized Linear Mdels (GLM) Nelder & Wedderburn (1972) Nt t be cnfused with General Linear Mdel (GLM) fr OLS with r withut dummy variables Generalized Least Squares (GLS) fr dealing with heterscedas<city Es<ma<n dne thrugh Itera<vely Reweighted Least Squares (IRWLS) xβ = g(e[y ]) E[Y ]=µ = g 1 (xβ) link func<n mdeling mean f Y thrugh inverse f link func<n I Distribu<n Link Func<n Name Link Func<n Nrmal Binmial Pissn Iden<ty Lgit Lg xβ = µ µ xβ = lg 1 µ xβ = lg(µ) 50 Cpyright 2016 Leland Wilkinsn

Nnlinear Regressin Func<ns f the frm y = f(x + ε) Le} is linearizable by transfrma<n, right is intrinsically nnlinear y = e β 0+β 1 x+ y = e β 0+β 1 x + 51 Cpyright 2016 Leland Wilkinsn

Nnlinear Regressin a+b/x+c lg(x)) E[Y ]=e Minimize SSE (Newtn, Quasi- Newtn, Metrplis, ) Magic, right? 52 Cpyright 2016 Leland Wilkinsn

Nnlinear Regressin Things t wrry abut Dn t waste yur <me with R 2 There are all srts f defini<ns All are nnsense The bad nes are ridiculusly large Lk at sum f squared errrs instead Dn t hunt arund fr nnlinear equa<ns that wrk That s a fishing expedi<n The equa<n yu chse shuld be driven by thery and the dmain it applies t Half the <me yur itera<ve fivng methd will crak That s a sign that there s a lcal minimum Or yur equa<n is wrng Or yur star<ng values are wildly ff- target Dn t waste yur <me lking fr anther cmputer prgram Nnlinear prgrams are ntriusly finicky Yur data are prbably crap r yur mdel is ridiculus Lk at the fit befre yu examine any sta<s<cs Mst f the <me yu have ne predictr and ne dependent variable S lk at the figed equa<n Ignre the fit sta<s<cs and tests f significance un<l it lks gd EVERYTHING fits Keep in mind that almst any bad mdel will lk pregy gd If it desn t make there<cal sense, dn t trust it 53 Cpyright 2016 Leland Wilkinsn

References McCullagh, P. and Nelder, J. (1989). Generalized Linear Mdels, Secnd Editin. Bca Ratn: Chapman and HalłCRC. Tukey, J.W. (1969). Analyzing data: Sanctificatin r detective wrk? American Psychlgist, 24, 83-91. 54 Cpyright 2016 Leland Wilkinsn