Hazards, Densities, Repeated Events for Predictive Marketing. Bruce Lund

Size: px
Start display at page:

Download "Hazards, Densities, Repeated Events for Predictive Marketing. Bruce Lund"

Transcription

1 Hazards, Densities, Repeated Events for Predictive Marketing Bruce Lund 1

2 A Proposal for Predicting Customer Behavior A Company wants to predict whether its customers will buy a product or obtain service during specific future time periods. One or more purchases / services by the customer in the same time period will constitute an event. The goal is to predict repeating (recurring) events same event at different time periods. Specifically, Create models that give the probability density across time that the customer s j th event occurs at future time period t. Two methodologies will be presented. These methods have performed well on simulated data and test data. I think there is potential value for CRM (customer relationship management). 2

3 Survival Data Mining Survival Data Mining is a name that SAS has used for applications in time-to-event models where: The underlying database is large and consists of transactional data from consumer marketing or credit risk The emphasis is on prediction rather than an explanatory study This talk is included within the general framework of Survival Data Mining SAS has an Enterprise Miner Application (see the Survival Node) SAS Training Courses: - Survival Data Mining Using SAS Enterprise Miner Software - Survival Data Mining: A Programming Approach - But not Survival Analysis Using the Proportional Hazards Model In this talk PROC LOGISTIC is used in fitting the models Not mentioned today: PROC PHREG, PROC LIFEREG 3

4 We Need Notation E j,t indicates that the j th event occurred at time t Let s explain using E 2,3 and the table below. j t Time period E 2,3 Cust 001 event no event event YES Cust 002 event event event NO Cust 003 no event event event YES Cust 004 event event no event NO 4

5 Graphic GOAL: To present 2 different modeling methods for finding the probability densities shown in the graphic. f j,t is probability density for E j,t for t = 1 to T Prob f j,1 j is fixed f j,2 f j,t In words: f j,t is probability that customer has j th event at time = t σ T t=1 f j,t + f j, = 1 where f j, is probability event j occurs later than T or never occurs. 1 2 T time 5

6 Customer Data in the Sample Looks Like This Time period 1 2 T Cust 001 (X1 Xk ) Event: Yes or No Event: Yes or No Tracking of customers starts at time 1 and ends at time T In each period it is observed whether customer has an event Customer events are predicted by the use of by covariates X1 Xk and a parametrization or transformation of t (time) Some customers may be censored (drop out) After censoring, a customer is no longer observed (no recording of events) But in CRM, often it is unknown who / when to censor when developing a sample for modeling So, censoring may not be considered when developing the sample Event: Yes or No 6

7 Hazards, Probabilities Densities, Survival Function Time is measured in discrete units such as day, month, year. Let h j,t be the hazard for time of the j th event This is mathematically: h j,t = P(E j,t not E j,t where t < t) The probability density of the j th event f j,t = P(E j,t ) Hazards and probabilities are connected by this formula f j,t = h j,t ς t 1 t =0 (1 h j,t ) with h j,0 = 0 The survival function S j,t is the probability that the j th event did not occur by time t S j,t = 1 - σt j=1 f j,t = ςt t =0 (1 h j,t ) 7

8 Bring h j,t f j,t and S j,t together through an example E.g. Tossing a coin with probability of heads = ¼ and successive tosses are independent. The event of interest is the second heads, j = 2 f 2,t = Probability for 2 nd heads at time t = 1, 2, 3 h 2,t = Hazard for 2 nd heads at time t = 1, 2, 3 These are easy: f 2,1 = 0 f 2,2 = ¼ * ¼ = 1/16 f 2,3 = ¼ * ¾ * ¼ + ¾ * ¼ * ¼ = 3/32 8

9 Bring h j,t f j,t and S j,t together through an example f 2,1 = 0 f 2,2 = ¼ * ¼ = 1/16 f 2,3 = ¼ * ¾ * ¼ + ¾ * ¼ * ¼ = 3/32 Now the Hazards: h 2,1 = 0 h 2,2 = 1/16 It is a little harder to see h 2,3 = 1/10 We can check that the Survival Function for j=2, t=3 satisfies the 2 formulas: S 2,3 = 27/32 = 1 - (0 + 1/16 + 3/32) = (1-0) * (1-1/16) * (1-1/10) where S j,t = 1 - σt j=1 f j,t = ςt t =0 (1 h j,t ) 9

10 Two Methods for Studying the j th Event will be Presented Discrete Time Hazard Model (DTHM) Discrete time hazard model methodology is well-known technique for modeling the hazard for the 1 st event h 1,t as a function of covariates X and time t In this talk the DTHM is simply applied to modeling the hazard for the j th event - No real change Multinomial Logistic Model (MLM) Unconventional, but it has worked in simulations and on test data Several theoretical issues but not a problem in practice 10

11 SAS EM: Survival Node and DTHM SAS EM includes a SURVIVAL Node which performs DTHM. Click on Applications in the TOOLBAR and bring in SURVIVAL Node.» The Survival Node in SAS EM performs DTHM.» This node has many powerful features. 11

12 Maximum Number of Events Logically, total number of events for a customer must be < T. Maximum to be allowed by Modeler will be called J. If events are rare, then J could be pre-set < T. If there are a few customers with > J events, then either: Remove them or Truncate to J and interpret as J or more 12

13 DTHM for Modeling the Hazard of Time of j th Event The DTHM is implemented using binary logistic regression where the input data set to PROC LOGISTIC has a specialized structure Customer-Period structure» What is the Customer-Period structure of the data? and» Why does binary logistic regression and Customer-Period structure give the hazard model? We will see (along with some hand waving) that this is a consequence of formulation of the likelihood function for a time to event model please indulge me 2 slides to follow 13

14 Let L be Likelihood Function for Time to Event Model Consider a customer. Suppose nothing happens (regarding the j th event) until time t. But then at t: either the j th event occurs OR the customer is censored If customer has j th event at t, then L = f j,t (The density is a factor in L). If customer is censored, then L = S j,t (Survival function is a factor in L). Now the contribution to L for this customer is: L = f δt j,t S 1 - δt j,t Where δ t = 1 if j th event occurred Else δ t = 0 if censored. Using formulas for f and S this expression for L is re-written in terms of hazards: L = {h j,t ς t 1 t =0 (1 h j,t )} δt t {ς t =0 (1 h j,t )} 1 - δt 14

15 Likelihood Function L is Likelihood for Binary Logistic Regression (involves hand waving ) We can transform L (via some algebra) to: ςt t =0 h Yt j,t (1 h j,t ) 1-Yt where Y t = 0. UNLESS t = t AND the customer has jth event at t then Y t = 1 Notice that exponent Y t has been redefined from δ t. This is proof by notation! This is Likelihood Function for a binary logistic regression with probability h j,t These 2 slides follow P Allison in (1982) Discrete-Time Methods for the Analysis of Event Histories 15

16 Now We See How to Structure the Data for Logistic Regression 1. If j th event at t=3, then these factors are in L (1 h j,1 ) * (1 h j,2 ) * (h j,3 ) Each factor corresponds to an observation in the data set. 2. If censored at t=2, then these factors are in L (1 h j,1 ) * (1 h j,2 ) 3. If no event through t=3, then these factors are in L (1 h j,1 ) * (1 h j,2 ) * (1 h j,3 ) How does h j,t depend on time t and X s? Next Slide ID time t covariates target 1 1 a a a b b cc cc cc 0 16

17 How does h j,t depend on time t and X s? We will select the logistic function (among several choices) h j,t = exp(xbeta) / (1 + exp(xbeta)) where xbeta = α(t) + β * X and X are covariates, β are coefficients fit by model Need to specify how α(t) depends on t Choices for α(t) include: α(t) = α 0 + α 1 * t simple linear α(t) = α 0 + α j * (t = j) + + α T-1 * (t = T-1) dummies for time t j others (cubic splines) 17

18 Compute Hazards for 2 nd event at time t PROC LOGISTIC, when applied to the Customer-Period data set, computes h 2,t Proc Logistic Data=example desc; Class t; /* = a choice of α(t) */ Model Y2= X t; Output out= predict pred= hazard; Where t > 1; Risk set starts at t=2 for second event ID t X Event Y More ID s 18

19 Densities p 2,t for 2 nd event at time t After Proc Logistic (and DATA Step) ID=2 has hazards: ID h21 h22 h23 h p21 p22 p23 p If t = 1, then p j,t = h j,t If t > 1, then p j,t = h j,t ς t 1 t =1(1 h j,t ) Why use f j,t instead of p j,t? -- Reserve f j,t for theoretical ideal. ID t X Y No observation 19

20 Model scores for p 2,4 ID h21 h22 h23 h24 p21 p22 p23 p Fit the model Next, use formulas to score p 2,4 Now all customers have hazards and probabilities for all t 20

21 Types of Covariates for h j,t Time: T - 1 dummies Time period 1 2 T OR tdum1 tdum2 tdumt-1 tdum1 tdum2 tdumt-1 tdum1 tdum2 tdumt-1 t (or a transform of t) 1 2 T X fixed at t= 0 x x x X * time interaction e.g. x*1 x*2 x*t Z time varying z1 z2 zt For prediction, time varying Z either must be lagged by T or separately forecast. 21

22 If CLASS t, then p j,t ~ correct on average If CLASS t and no censoring, then: Essentially, an equality I do not have a mathematical proof. N = customers in sample N j,t number with E j,t N j,t / N = fraction of sample with j th event at t p j,t / N ~ N j,t / N Correct on Average is less true if t or some transform of t replaces CLASS t 22

23 If CLASS t, then p j,t ~ correct on average N t = customers in sample at time t N j,t number with E j,t N j,t / N t = fraction with j th event at t Also holds for censoring: p j,t / N t ~ N j,t / N t N is replaced by N t (=number surviving to time t) For p j,t to give good estimates for individual customers, need good X k s and good modeling techniques How to fit a model is subject for another talk. Assuming a good model, then p j,t is a candidate to take the role of f j,t 23

24 Baseline Hazard Model for 1 st Event T = 6 using DTHM Maximum Likelihood Estimates Pr > DF Est Wald Wald Intercept <.0001 tdum <.0001 tdum <.0001 tdum <.0001 tdum <.0001 tdum (tdum6) 0 30% 20% 10% 0% Hazards without covariates Baseline Hazard for 1st Event Periods h 1,1 = exp( ) 1+ exp( ) = 24.6% 24

25 Baseline Probability Model for 1 st Event T = 6 The baseline probability density function, p 1,t for t = 1 to 6, is computed from the baseline hazards: 30% 20% 10% 0% Baseline Prob Density for First Event 24.6% 19.7% Periods Equals the fraction of customers with 1 st event at t in the sample If t = 1, then p j,t = h j,t If t > 1, then p j,t = h j,t ς t 1 t =1(1 h j,t ) 25

26 Probabilities for a Customer for Events j = 1 to 6 12% 10% 8% 6% 4% Prob for j th Event 1st Event 2nd Event 3rd Event 4th Event 5th Event 6th Event 2% 0% Periods 26

27 Profiles for levels of X3 among Scored Customers Compute Probabilities for each t for customers with X3=1 (and X3=2, X3=3) Average these Probabilities Take the Cum of this average 80% 70% 60% 50% 40% 30% Cum Prob of 1st Event - Profile for X3 X3=1 X3=2 X3=3 50% 40% 30% 20% 10% Cum Prob of 3rd Event - Profiles X3 Plot Cum vs. Time X3=3 more likely to have events 20% Periods 0% Periods 27

28 We can compute: Expected Number of Events at Time t Cum_p j,t = σt t =1 p j,t = cum prob. of j th event (not j events) by time t J Expected(t) = σ j=1 Cum_p j,t where J = min(j, t) j = 1 j = 2 t p 1,t Cum_p 1,t p 2,t Cum_p 2,t Expected(t) Expected number of events by time t 28

29 Expected Number of Events at Time t Cum_p j,t = σt t =1 p j,t = prob. of j th event (not j events) by t J Expected(t) = σ j=1 Cum_p j,t where J = min(j, t) Customers ranked by Expected into 5 ranks (separately for t= 3 and t= 6) BY RANK: Compare: Average Expected with Average Actual Simulation ranks Expected(3) Avg Events(3) Expected(6) Avg Events(6) all

30 A Second Approach to Estimate f j,t It is based on a Multinomial Logistic Model (MLM) with unordered target. MLM can be fit with PROC LOGISTIC For MLM there is a separate model for each t = 1 to T» For hazard model there was a model for each j = 1 to J Defining the target for MLM is somewhat confusing. Next slide 30

31 Targets and MLM For t = 3: Target is called ML_3 To define ML_3 see below:» Let E(t)=10 if an event occurred at t, Else E(t) = 0 ID t=1 t=2 t=3 ML_3 FORMULA E(t) + 5 * (E(3) = 0) = = E(t) + 5 * (E(3) = 0) = =

32 Targets and MLM ID t=1 t=2 t=3 ML_3 FORMULA E(t) + 5 * (E(3) = 0) = = E(t) + 5 * (E(3) = 0) = = ML_3 = 10 occurs when E 1,3 occurs no event, no event, event ML_3 = 20 occurs when E 2,3 occurs ML_3 = 30 occurs when E 3,3 occurs Other values of ML_3 (5, 15, 25 are not directly used) 32

33 Targets and MLM ID X ML_3 1 x x x x x x x 7 20 PROC LOGISTIC DATA = MLM; MODEL ML_3 (ref= 5") = X / LINK = GLOGIT; SCORE DATA= MLM OUT= SCORED; SCORED has probabilities for each level (6 in all) of ML_3 including: P(10) = q13 P(20) = q23 P(30) = q33 10 E 1,3 occurs 20 E 2,3 occurs 30 E 3,3 occurs 33

34 But we want q j,t as t varies for fixed j E.g. All T MLM models are needed for j=2 and q 2,t as t=1 to T ML_1 gives: (*) q 1,1 = Prob (E 1,1 ) q 2,1 = Prob (E 2,1 ) = 0 ML_2 gives: q 1,2 = Prob (E 1,2 ) q 2,2 = Prob (E 2,2 ) (*) By definition: q j,1 = 0 for j > 1 ML_3 gives: q 1,3 = Prob (E 1,3 ) q 2,3 = Prob (E 2,3 ) q 3,3 = Prob (E 3,3 ) ML_4 gives: q 1,4 = Prob (E 1,4 ) q 2,4 = Prob (E 2,4 ) q 3,4 = Prob (E 3,4 ) q 4,4 = Prob (E 4,4 ) ML_t gives: q 1,t = Prob (E 1,t ) q 2,t = Prob (E 2,t ) q 3,t = Prob (E 3,t ) Etc. 34

35 q j,t is correct, on average N is # customers in sample N j,t is # customers experiencing E j,t q j,t is correct on-average: q j,t / N = N j,t / N This is a standard property of multinomial logistic regression It is true for all j and t. For q j,t to give good estimates for individual customers, find good X k s. q j,t can be a good candidate to take the role of f j,t 35

36 p j,t vs. q j,t as estimators of f j,t Unsettling feature of MLM: It can happen that: σ T t=1 q j,t > 1 In practice, not a serious problem (I think). But must check. (*) But always σ T t=1 p j,t 1 there is a proof (**) Hazard Model used J models while MLM requires T models CLASS t is needed in Hazard Model for ~ correct on average. MLM has model for each t and is always correct on average For MLM all predictors interact with time (via separate models) Both can predict for T future periods» Time-varying covariates are lagged or separately forecast (*) The data structure and strong covariates for each t will restrain (not mathematically prevent) this occurrence (**) Thomas, G (1957) Probability of Sums of Series, American Mathematical Monthly, 64,

37 Comments on MLM If needed, compute Hazards from q j,t h j,t = q j,t / S j,(t -1) = q j,t / (1 - σ t 1 i=1 q j,i ) If J is max number of events:» There are 2*J levels for the target ML_t ( when t J)» Use MLM for smaller J?» Need more applications / simulations to give good guidance I have a successful tests of MLM for J = 8 37

38 Collapsing Levels for MLM? Consider target variable ML_3 ID t=1 t=2 t=3 ML_ PROC LOGISTIC DATA = MLM; MODEL ML_3(ref= 5") = X / LINK = GLOGIT; Could we collapse 15 and 25 into, say 99 and reduce the complexity of the model?» Bad for model fit. 15 and 25 have different meanings. 38

39 Same Target Value with Different Histories? ID Target 1 Target Does predictive accuracy suffer if Target 1 is used for q 2,6? Instead for Target 2, q 2,6 = P(18) + + P(22) If we conclude we must use (and similar history coding for other target levels), then MLM becomes far less attractive. Note: On Training Dataset: Average(P(20)) equals Average(P(18) + + P(22)) 39

40 time Model #1 # Two models were fit: Model 1: Using 20 (and other levels) Model 2: Replacing 20 with 18 to 22 On Validation, q 2,6 from Model 1 was ranked Within each rank the absolute differences between Model1 and Model2 for q 2,6 were computed Multinomial and Histories A B C =C/B Rank Avg Avg. Abs. Diff Pct Model1 Model1 (Model1-Model2) Diff All % decile % decile % decile % decile % decile % decile % decile % decile % decile % decile % Simulation 40

41 The Choice: DTHM or MLM In simulations and test data (using the same X s)» Probabilities are very similar p j,t ~ q j,t I recommend trying both methods and comparing 41

42 How are Models Validated? The predictive accuracy of p j,t and q j,t are measured against actuals; e j,t = 1 if customer had j th event at t, otherwise 0. This is done on a validation sample 42

43 How are Models Validated? Lift Tables / Charts Profiles of Probabilities for fixed covariate values Fit model on Training Score Validation Dataset Via programming, compute p 1,1 = Prob(E 1,1 ) Rank customers using p 1,1 into 5 ranks (rank 1 for highest) e r = event rate for r th rank p r = avg prob for r th rank rank p r Mean (rank) e r Profile for X = x Cum Avg p Cum Avg e Cum Avg p 1,t v. Cum Avg e 1,t 43

44 How are Models Validated? Lift Tables Many!! If T = 6 and J = 4 then 18 lift tables for combinations of t and j Profiles of Probabilities for fixed covariate values Many!! 3 covariates with 4 levels 12 profiles for each J =J*12 Need a simple summary metric 44

45 Absolute Error Gives T numbers to measure model performance Each customer and t: Find absolute error between expected and actual events Suppose T =3 and J = 2. Here is a customer: t p 1,t Cum_p 1,t p 2,t Cum_p 2,t Expected(t) Cum Actual(t) Abs. Error = = =.1 Only T=3 numbers are produced All Customers t 1 2 T=3 Mean Abs. Error 45

46 Compare Many Models using Mean Absolute Error Idea #1: Rank models by Mean Absolute Error for T The errors will be greatest at the extreme T? Idea #2: Sum the errors across 1 to T and rank Mean Absolute Error Time Model #1 SUM Error

47 Challenges and Sum Up DTHM: Data requires unusual data structure Customer-Period MLM: Complicated programming to create targets for MLM The case of 20 vs is this a problem? several case studies say no Limitation of size of J J = 8 was successful in one test Model Fitting challenges how to efficiently fit models for all J (DTHM) or T (MLM)? Not discussed today. One approach is to create predictors and fit one model (J = 2 for DTHM and t = T and MLM) Write macros to fit models across all J and all T What is the best way to compare many candidate models suppose 20 models? Rank by absolute error Based on simulations and test data: DTHM and MLM produced very similar results 47

48 Contact Information Bruce Lund SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies.

49 Lift Charts 49

50 Lift Charts for Probability: 1 st Event for t = 1 Fit model on Training Score Validation Dataset Via programming, compute p 1,1 = Prob(E 1,1 ) Rank customers using p 1,1 into 5 ranks (rank 1 for highest) e r = event rate for r th rank p r = avg prob for r th rank rank p r Mean (rank) e r

51 Three metrics are presented: 2 Separation = (e 1 e 5 ) + (e 2 e 4 ) = Monotonic Deviation = (e 2 -e 1 )*(e 2 > e 1 ) + 3 (e 3 -e 2 )*(e 3 > e 2 ) + (e 4 -e 3 )*(e 4 > e 3 ) + (e 5 -e 4 )*(e 5 > e 4 ) = 0 Lift Charts Metrics e r = event rate for r th rank p r = avg prob for r th rank rank p r Mean ABS (rank) e r (p r -e r ) Abs error = Separation = Mono = 0 Absolute Error = Abs(e 1 p 1 ) + + Abs(e 5 p 0 ) =

52 Multi v. DTHM: Prob for 1 st Event, t = 1 60% 40% 20% Multi: Prob 1st event at t=1 qr er Abs Error = Separation = Mono Dev. = 0 0% 60% 40% 20% 0% Quintile Ranks DTHM: Prob 1st event at t= Quintile Ranks pr er Multinomial is very slightly superior to Hazard: Lower Abs. Error Greater Separation Abs Error = Separation = Mono Dev. = 0 52

53 Lift charts for DTHM all j and t 60% 40% 20% 0% DTHM: Prob 1st event at t= Quintile Ranks pr er 6 lift charts for 1 st event (j=1, t=1 to 6) 5 lift charts for 2 nd event (j=2, t=2 to 6) Etc. 1 lift chart for 6 th event (j=6, t=6) Is there a summary metric? In total, 21 lift charts. Many! 53

54 Other Sides 54

55 DTHM: One Model instead of J Models? Possible to fit a single discrete time hazard model for repeated events but the target is occurrence of an event not j th event Observation is created for each time period for a customer. Target is 0 or 1 depending on whether there is an event. The spell s are covariates in these models (j th spell identifies periods before and including j th event). For prediction: Try to compute the hazard of j th event. Compute h j,t for times t by setting covariate spell = j in model equation I m unsuccessful in getting good h j,t by this approach See Allison for discussion of modeling repeated events with discrete time logistic models. The problem of dependence among the event times is discussed. Allison, P. D. (2010). Survival Analysis Using SAS : A Practical Guide, Chapter 8 55

56 Alternative (but not useful) method for DTHM A different formulation of the hazard model for 2 nd event (j = 2): Only observe after the time of experiencing E 1,t until either E 2,t occurs or until end of observation period is reached. Time event h* 2,t n/a n/a h* 2,2 h* 2,3 n/a This is risk set for 2 nd event (= those customers who have experienced the 1 st event). h* 2,t can be fit (first, must update X s to start time for risk). But h* 2,t do not lead to formulas for p 2,t (= probability density for E 2,t ) 56

57 Ideas for Validation Metrics Compare mean density to event rates for Customer groups across time: Suppose a predictor X1 has only a few distinct values. For a fixed value of X1 and for fixed j, the average probabilities p j,t (q j,t ) are computed and are compared with the averages of e j,t across t = 1 to T. The hoped-for outcome is that the average of p j,t (q j,t ) closely agree with average of e j,t across t = 1 to T for each value of X1 and for each j. The comparison p j,t (q j,t ) with the average of e j,t can be applied to covariate patterns (subsets of customers having identical predictor values) for the patterns where there is sufficient sample size to meaningfully compute the average of e j,t. Finally, the idea might be further extended by creating approximate covariate patterns where a pattern is defined by fixing values of discrete predictors and fixing the mid-points of ranks of the continuous predictors. The following discussion applies to DTHM. It also applies to MTM provided MTM uses same predictors across t. 57

58 Formula that Connects Hazards and Probabilities Let f(t) be the probability density across values of time t where t = 1, The cumulative distribution for f(t) is F(t) = σt i=1 f(i) and the survival function is S(t) = 1 - F(t) The hazard function h(t) is given by h(t) = f(t) / S(t-1) with the convention that S(0) = 1 Derivation: (A) 1 - h(t) = 1 - f(t) / S(t-1) = [S(t-1) - f(t)] / S(t-1) Note: S(t-1) - f(t) = S(t) (B) Multiply (A) by S(t-1) to give: S(t-1) (1 - h(t)) = S(t) t Iterate the formula (B) for t-1, t-2, 1 to give: ς i=1 (1 h i ) = S(t) F(t) = 1 - S(t) = 1 - ςt i=1 (1 h i ) Therefore, f(t) = F(t) - F(t-1) = h(t) ς t 1 i=1 (1 h i ) 58

59 σt t=1 p j,t < 1 G. Thomas (1957) shows for any j and any customer that the sum of the densities satisfies: σ T t=1 p j,t < 1. From this point on, j is fixed and does not enter into the logic. The proof recasts the problem in terms of probabilities. Assume that h j,t gives the probability of heads for the (independent) toss of the t th unfair coin from a sequence of coins. The product t t =1(1 h j,t ) is the probability that no heads occur in the first t tosses. The product p j,t = h j,t t-1 t =1(1 h j,t ) is probability that first head occurs at the t th toss. σt t =1 p j,t is the probability of at least one head in the first t tosses. Therefore, σt t =1 p j,t + t t =1(1 h j,t ) = 1 for all t. Since t t =1(1 h j,t ) < 1, it follows that σt t =1 p j,t < 1 for any t. Thomas, G. (1957). Probability of Sums of Series, American Mathematical Monthly, 64,

60 Example where σ T t=1 q j,t > 1 #1 Data example; input Cust_ID t X1 datalines; ; Proc Sort data = example; by Cust_ID t; Data ML; set example; by Cust_ID t; array Yt{*} Y1 - Y2; array X1t{*} X1_1 - X1_2; retain Y1 - Y2 X1_1 - X1_2; Yt{t} = Y; X1t{t} = X1; if last.cust_id then do; ML_1 = 0; ML_2 = 0; do i = 1 to 2; ML_2 = ML_2 + 10*Yt{i}*(i < 3) + (i = 2)*(Yt{2} = 0)*(ML_2 > 0)*5; ML_1 = ML_1 + 10*Yt{i}*(i < 2); end; output; end; 60

61 Example where σ T t=1 q j,t > 1 #2 Proc Logistic data = ML; model ML_1(ref = "0")= / link = glogit; score data = ML out = scored_ml_1(rename=(p_10 = q11)); Proc Logistic data = ML; model ML_2(ref = "0")= X1_2 / link = glogit; score data = ML out = scored_ml_2(rename=(p_10 = q12)); Data q11_q12; merge scored_ml_1 scored_ml_2; by Cust_ID; sum_q11_q12 = q11 + q12; Proc Print data = q11_q12; Var Cust_ID q11 q12 sum_q11_q12; run; This example is highly contrived. Cust_ID s 3 and 4 have cum. density > 1 Obs Cust_ID q11 q12 sum_q11_q

62 Unlikely that: σ T t=1 q j,t > 1 -- provided the model is good Consider j = 1 and Obs = 1 Time 1 2 Obs = Convert to values of ML_1 and ML_2: Obs = Let X be a covariate with X=5 for Obs = 1. Further assume X=5 is strongly predictive of an event across all times: Then q 1,1 will be large Also q 2,2 will be large, but then, necessarily, q 1,2 will be small. This is a loose argument that σ2 t=1 assuming good models. q 1,t < 1 but it conveys the idea that the data restrain σq j,t, If σ T t=1 q j,t > 1, then it is likely that the MLM models have poor fit for one or more values of t. 62

63 Sum p j,t Across t vs. Sum p j,t Across j The summation ( p 1,1 + p 1,2 + + p 1,t0 ) gives probability of customer having first event by period t0. In contrast, consider the summation (where p j,t0 = 0 if j > t0) SUM_J(t0) = ( p 1,t0 + p 2,t0 + + p J,t0 ) This gives probability of a customer having an event in period t0. Therefore, SUM_J(t0) is an in-market model for period t0. But this is a complicated way to obtain this simple model. 63

64 Independence of Irrelevant Alternatives (IIA) For multinomial logistic regression the log-odds of alternatives j and k for the i th customer is given by log( P ij / P ik ) = ( β j β k ) x i (IIA) In IIA the log-odds of alternatives j and k for the i th customer involve the coefficients for alternatives j and k as well as the customer predictor values of x i but involve no other alternatives. This restrictive condition may not be appropriate for some models. Tests of the suitability of multinomial logistic regression, including violations of IIA, are performed by the Hausman s Specification Test and the Small and Hsiao Likelihood Ratio Test. SAS implementations of these tests are discussed in SAS documentation of PROC MDC. Findings reported by Cheng and Long (2006) show these tests to be unreliable for large-scale applications. A related short discussion of testing of the suitability of multinomial logistic regression is given by Paul Allison (2012). Alternatives to the multinomial logistic model for LTV include more generalized discrete choice models such as the heteroscedastic extreme value model and nested-logit model. Neither of these models is subject to the IIA condition. These and other discrete choice models can be fitted by PROC MDC. Allison, P. (2012). How Relevant Is The Independence Of Irrelevant Alternatives?, Oct 12, 2012, Statistical Horizons. Available At: Cheng, S. and Long, J. S. (2006). Testing for IIA in the Multinomial Logit Model, Sociological Methods & Research: 35:

65 Sample Size Discussion General: If counts of observations in the database of event J are low, then modeling J is (probably) not important. Go back to J-1. Hazard: If modeling j th event: Rough rule is 10 or 15 observations of j per predictor coefficient. If 10 coefficients, then 100 or 150 observations of j If needed, non-events can be randomly sampled to reduce size of dataset. At least 30 observations for each time. If T = 10 and j = 3, then require 30 for each of j = 3,, 10 (240 observations of jth event, jth = 3) Multinomial: If modeling 2*j levels: If 2*j levels of target and K predictors, then each target level uses K coefficients. For target level g (versus g 0, a reference level) Rough rule for sample size is (10 or 15 observations) * K * 2*j-1. Log(odds) g = k=1k b g *x g,k If needed, high count levels of target can be randomly sampled to reduce size of dataset. Extreme Fix: If count of observations of g is small, could fit binary logistic of g 0 (a reference level) to g for each g beyond g 0. For g with small counts, only a few parameters would be used in that model. The 2*j - 1 regressions can be re-combined, as in the paper of Begg and Gray, to a multinomial model. Begg, C. B. and Gray, R. (1984). Calculation of polychotomous logistic regression parameters using individualized regressions, Biometrika 71, 1, pp

66 The Start Date? Calendar time week10 week11 week12 week13 week14 Time from start Cust 01 events Calendar time week10 week11 week12 week13 week14 Time from start Cust 02 events Start a customer one week after the customer had an event? But some customers do not have a prior event Start from the customer-date (first association with customer) Might be a long time ago events are different now Start date is an important decision, no general rules In CRM models: Using same start date is natural and effective 66

67 Total Error for 1 st Event - a Heuristic ID p11 p12 p13 p14 p15 p16 e11 e12 e13 e14 e15 e16 A B A: Total Error for ID s with no 1 st Event: Assumes 1 st Event occurs at t=7 Weight = (7 t) for t = 1 to 6 Error = Weight(t)*Prob(t) Not a Bad Error t Wgt Prob Error TOTAL ERROR= Bad Error t Wgt Prob Error TOTAL ERROR= B: Total Error for ID s with 1 st Event: 1 st Event occurs at t 0 Weight = abs(t 0 t) for t = 1 to 6 Error = Weight(t)*Prob(t) 67

68 ID p55 p56 e55 e56 A B t Wgt Prob Error TOTAL ERROR= Total Error for 5 th Event A: Total Error for ID s with no 5 th Event: Assumes 5 th Event occurs at t=7 Weight = (7 t) for t = 5 to 6 Error = Weight(t)*Prob(t) t Wgt Prob Error TOTAL ERROR= B: Total Error for ID s with 5 th Event: 5 th Event occurs at t 0 Weight = abs(t 0 t) for t = 5 to 6 Error = Weight(t)*Prob(t)

69 Hazard Model No j th Event Avg total error j= j= j= j= j= j= j th Event Occur Avg total error j= j= j= j= j= Total Error: DTHM v. Multinomial Multinomial No j th Event Avg total error j th Event Occur Avg total error difference difference Multinomial has smaller total error for no events : j = 1 to 4 MLM for j = 1 is appears to be better than DTHM In this example, Multinomial is slightly better than DTHM

70 References There are hundreds of good references on the topic of discrete time hazard models (DTHM). I ve listed two references that introduce the DTHM in a readable and thoughtful style. Allison, P. D. (2010). Survival Analysis Using SAS : A Practical Guide, Second Edition, Cary, NC, SAS Institute. See chapter 7 Singer, J. D. and Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence, New York, Oxford University Press. See chapters 9-12 Training Course: For SAS users I recommend Survival Data Mining: A Programming Approach The second method in this talk for finding the densities for repeated events is based on a multinomial logistic model. I ve not seen papers or books that use the multinomial logistic model for this purpose. I ve listed Paul Allison s book on Logistic Regression as a readable and thoughtful introduction to logistic regression including the multinomial model. Allison, P. (2012). Logistic Regression Using SAS, Cary NC, SAS Institute. The ideas in this talk are drawn from two papers I ve presented at SAS Global Forums: Lund B. (2016). Probability Density for Repeated Events, Proceedings of the SAS Global Forum 2016 Conference, Paper Lund B. (2015) Multinomial Logistic Model for Long-Term Value, Proceedings of the SAS Global Forum 2015 Conference, Paper

MSUG conference June 9, 2016

MSUG conference June 9, 2016 Weight of Evidence Coded Variables for Binary and Ordinal Logistic Regression Bruce Lund Magnify Analytic Solutions, Division of Marketing Associates MSUG conference June 9, 2016 V12 web 1 Topics for this

More information

A Survival Analysis of GMO vs Non-GMO Corn Hybrid Persistence Using Simulated Time Dependent Covariates in SAS

A Survival Analysis of GMO vs Non-GMO Corn Hybrid Persistence Using Simulated Time Dependent Covariates in SAS Western Kentucky University From the SelectedWorks of Matt Bogard 2012 A Survival Analysis of GMO vs Non-GMO Corn Hybrid Persistence Using Simulated Time Dependent Covariates in SAS Matt Bogard, Western

More information

The Function Selection Procedure

The Function Selection Procedure ABSTRACT Paper 2390-2018 The Function Selection Procedure Bruce Lund, Magnify Analytic Solutions, a Division of Marketing Associates, LLC The function selection procedure (FSP) finds a very good transformation

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Weight of Evidence Coding and Binning of Predictors in Logistic Regression

Weight of Evidence Coding and Binning of Predictors in Logistic Regression ABSTRACT MWSUG 2016 Paper AA15 Weight of Evidence Coding and Binning of Predictors in Logistic Regression Bruce Lund, Independent Consultant, Novi, MI Weight of evidence (WOE) coding of a nominal or discrete

More information

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Mixed Models for Longitudinal Ordinal and Nominal Outcomes Mixed Models for Longitudinal Ordinal and Nominal Outcomes Don Hedeker Department of Public Health Sciences Biological Sciences Division University of Chicago hedeker@uchicago.edu Hedeker, D. (2008). Multilevel

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

Models for Ordinal Response Data

Models for Ordinal Response Data Models for Ordinal Response Data Robin High Department of Biostatistics Center for Public Health University of Nebraska Medical Center Omaha, Nebraska Recommendations Analyze numerical data with a statistical

More information

DISPLAYING THE POISSON REGRESSION ANALYSIS

DISPLAYING THE POISSON REGRESSION ANALYSIS Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Extensions of Cox Model for Non-Proportional Hazards Purpose

Extensions of Cox Model for Non-Proportional Hazards Purpose PhUSE 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used

More information

5. Parametric Regression Model

5. Parametric Regression Model 5. Parametric Regression Model The Accelerated Failure Time (AFT) Model Denote by S (t) and S 2 (t) the survival functions of two populations. The AFT model says that there is a constant c > 0 such that

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Chapter 10 Logistic Regression

Chapter 10 Logistic Regression Chapter 10 Logistic Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Logistic Regression Extends idea of linear regression to situation where outcome

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

ECON 594: Lecture #6

ECON 594: Lecture #6 ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was

More information

Consider Table 1 (Note connection to start-stop process).

Consider Table 1 (Note connection to start-stop process). Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event

More information

Limited Dependent Variable Models II

Limited Dependent Variable Models II Limited Dependent Variable Models II Fall 2008 Environmental Econometrics (GR03) LDV Fall 2008 1 / 15 Models with Multiple Choices The binary response model was dealing with a decision problem with two

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM Application of Ghosh, Grizzle and Sen s Nonparametric Methods in Longitudinal Studies Using SAS PROC GLM Chan Zeng and Gary O. Zerbe Department of Preventive Medicine and Biometrics University of Colorado

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions Alan J Xiao, Cognigen Corporation, Buffalo NY ABSTRACT Logistic regression has been widely applied to population

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Generalization to Multi-Class and Continuous Responses. STA Data Mining I

Generalization to Multi-Class and Continuous Responses. STA Data Mining I Generalization to Multi-Class and Continuous Responses STA 5703 - Data Mining I 1. Categorical Responses (a) Splitting Criterion Outline Goodness-of-split Criterion Chi-square Tests and Twoing Rule (b)

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Leverage Sparse Information in Predictive Modeling

Leverage Sparse Information in Predictive Modeling Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from

More information

ABSTRACT INTRODUCTION SUMMARY OF ANALYSIS. Paper

ABSTRACT INTRODUCTION SUMMARY OF ANALYSIS. Paper Paper 1891-2014 Using SAS Enterprise Miner to predict the Injury Risk involved in Car Accidents Prateek Khare, Oklahoma State University; Vandana Reddy, Oklahoma State University; Goutam Chakraborty, Oklahoma

More information

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model

Outline. The binary choice model. The multinomial choice model. Extensions of the basic choice model Outline The binary choice model Illustration Specification of the binary choice model Interpreting the results of binary choice models ME output The multinomial choice model Illustration Specification

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs The data for the tutorial came from support.sas.com, The LOGISTIC Procedure: Conditional Logistic Regression for Matched Pairs Data :: SAS/STAT(R)

More information

Cox s proportional hazards/regression model - model assessment

Cox s proportional hazards/regression model - model assessment Cox s proportional hazards/regression model - model assessment Rasmus Waagepetersen September 27, 2017 Topics: Plots based on estimated cumulative hazards Cox-Snell residuals: overall check of fit Martingale

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

Improve Forecasts: Use Defect Signals

Improve Forecasts: Use Defect Signals Improve Forecasts: Use Defect Signals Paul Below paul.below@qsm.com Quantitative Software Management, Inc. Introduction Large development and integration project testing phases can extend over many months

More information

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA ABSTRACT Regression analysis is one of the most used statistical methodologies. It can be used to describe or predict causal

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Package threg. August 10, 2015

Package threg. August 10, 2015 Package threg August 10, 2015 Title Threshold Regression Version 1.0.3 Date 2015-08-10 Author Tao Xiao Maintainer Tao Xiao Depends R (>= 2.10), survival, Formula Fit a threshold regression

More information

ECONOMETRICS II TERM PAPER. Multinomial Logit Models

ECONOMETRICS II TERM PAPER. Multinomial Logit Models ECONOMETRICS II TERM PAPER Multinomial Logit Models Instructor : Dr. Subrata Sarkar 19.04.2013 Submitted by Group 7 members: Akshita Jain Ramyani Mukhopadhyay Sridevi Tolety Trishita Bhattacharjee 1 Acknowledgement:

More information

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model Goals PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1 Tetsuya Matsubayashi University of North Texas November 2, 2010 Random utility model Multinomial logit model Conditional logit model

More information

Regression in R. Seth Margolis GradQuant May 31,

Regression in R. Seth Margolis GradQuant May 31, Regression in R Seth Margolis GradQuant May 31, 2018 1 GPA What is Regression Good For? Assessing relationships between variables This probably covers most of what you do 4 3.8 3.6 3.4 Person Intelligence

More information

Analyzing and Interpreting Continuous Data Using JMP

Analyzing and Interpreting Continuous Data Using JMP Analyzing and Interpreting Continuous Data Using JMP A Step-by-Step Guide José G. Ramírez, Ph.D. Brenda S. Ramírez, M.S. Corrections to first printing. The correct bibliographic citation for this manual

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

Modeling Effect Modification and Higher-Order Interactions: Novel Approach for Repeated Measures Design using the LSMESTIMATE Statement in SAS 9.

Modeling Effect Modification and Higher-Order Interactions: Novel Approach for Repeated Measures Design using the LSMESTIMATE Statement in SAS 9. Paper 400-015 Modeling Effect Modification and Higher-Order Interactions: Novel Approach for Repeated Measures Design using the LSMESTIMATE Statement in SAS 9.4 Pronabesh DasMahapatra, MD, MPH, PatientsLikeMe

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Paper 1025-2017 GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Kyle M. Irimata, Arizona State University; Jeffrey R. Wilson, Arizona State University ABSTRACT The

More information

Data Mining 2018 Logistic Regression Text Classification

Data Mining 2018 Logistic Regression Text Classification Data Mining 2018 Logistic Regression Text Classification Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 50 Two types of approaches to classification In (probabilistic)

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY Paper SD174 PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC.

More information

Predictive Modeling Using Logistic Regression Step-by-Step Instructions

Predictive Modeling Using Logistic Regression Step-by-Step Instructions Predictive Modeling Using Logistic Regression Step-by-Step Instructions This document is accompanied by the following Excel Template IntegrityM Predictive Modeling Using Logistic Regression in Excel Template.xlsx

More information

Effect of Weather on Uber Ridership

Effect of Weather on Uber Ridership SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product

More information

Logistic Regression Models for Multinomial and Ordinal Outcomes

Logistic Regression Models for Multinomial and Ordinal Outcomes CHAPTER 8 Logistic Regression Models for Multinomial and Ordinal Outcomes 8.1 THE MULTINOMIAL LOGISTIC REGRESSION MODEL 8.1.1 Introduction to the Model and Estimation of Model Parameters In the previous

More information

Selection and Transformation of Continuous Predictors for Logistic Regression

Selection and Transformation of Continuous Predictors for Logistic Regression Paper AA-09-2014 Selection and Transformation of Continuous Predictors for Logistic Regression ABSTRACT Bruce Lund, Magnify Analytic Solutions A Division of Marketing Associates, Detroit, MI This paper

More information

Ph.D. course: Regression models. Introduction. 19 April 2012

Ph.D. course: Regression models. Introduction. 19 April 2012 Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer

More information

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS INSTITUTE AND FACULTY OF ACTUARIES Curriculum 09 SPECIMEN SOLUTIONS Subject CSA Risk Modelling and Survival Analysis Institute and Faculty of Actuaries Sample path A continuous time, discrete state process

More information

Analysing longitudinal data when the visit times are informative

Analysing longitudinal data when the visit times are informative Analysing longitudinal data when the visit times are informative Eleanor Pullenayegum, PhD Scientist, Hospital for Sick Children Associate Professor, University of Toronto eleanor.pullenayegum@sickkids.ca

More information

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure). 1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

More information

Analyzing the effect of Weather on Uber Ridership

Analyzing the effect of Weather on Uber Ridership ABSTRACT MWSUG 2016 Paper AA22 Analyzing the effect of Weather on Uber Ridership Snigdha Gutha, Oklahoma State University Anusha Mamillapalli, Oklahoma State University Uber has changed the face of taxi

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes Generalized Linear Models for Count, Skewed, and If and How Much Outcomes Today s Class: Review of 3 parts of a generalized model Models for discrete count or continuous skewed outcomes Models for two-part

More information

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H. ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,

More information

Appendix A Summary of Tasks. Appendix Table of Contents

Appendix A Summary of Tasks. Appendix Table of Contents Appendix A Summary of Tasks Appendix Table of Contents Reporting Tasks...357 ListData...357 Tables...358 Graphical Tasks...358 BarChart...358 PieChart...359 Histogram...359 BoxPlot...360 Probability Plot...360

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Introduction to Reliability Theory (part 2)

Introduction to Reliability Theory (part 2) Introduction to Reliability Theory (part 2) Frank Coolen UTOPIAE Training School II, Durham University 3 July 2018 (UTOPIAE) Introduction to Reliability Theory 1 / 21 Outline Statistical issues Software

More information

Binary Dependent Variables

Binary Dependent Variables Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2) Week 7: (Scott Long Chapter 3 Part 2) Tsun-Feng Chiang* *School of Economics, Henan University, Kaifeng, China April 29, 2014 1 / 38 ML Estimation for Probit and Logit ML Estimation for Probit and Logit

More information

Ph.D. course: Regression models

Ph.D. course: Regression models Ph.D. course: Regression models Non-linear effect of a quantitative covariate PKA & LTS Sect. 4.2.1, 4.2.2 8 May 2017 www.biostat.ku.dk/~pka/regrmodels17 Per Kragh Andersen 1 Linear effects We have studied

More information

Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo

Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo Lecture 1 Behavioral Models Multinomial Logit: Power and limitations Cinzia Cirillo 1 Overview 1. Choice Probabilities 2. Power and Limitations of Logit 1. Taste variation 2. Substitution patterns 3. Repeated

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Canonical Slide

More information

On a connection between the Bradley-Terry model and the Cox proportional hazards model

On a connection between the Bradley-Terry model and the Cox proportional hazards model On a connection between the Bradley-Terry model and the Cox proportional hazards model Yuhua Su and Mai Zhou Department of Statistics University of Kentucky Lexington, KY 40506-0027, U.S.A. SUMMARY This

More information

Introduction to Generalized Models

Introduction to Generalized Models Introduction to Generalized Models Today s topics: The big picture of generalized models Review of maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical

More information

Survival Analysis with Time- Dependent Covariates: A Practical Example. October 28, 2016 SAS Health Users Group Maria Eberg

Survival Analysis with Time- Dependent Covariates: A Practical Example. October 28, 2016 SAS Health Users Group Maria Eberg Survival Analysis with Time- Dependent Covariates: A Practical Example October 28, 2016 SAS Health Users Group Maria Eberg Outline Why use time-dependent covariates? Things to consider in definition of

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs

More information

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Qinlei Huang, St. Jude Children s Research Hospital, Memphis, TN Liang Zhu, St. Jude Children s Research Hospital, Memphis, TN

Qinlei Huang, St. Jude Children s Research Hospital, Memphis, TN Liang Zhu, St. Jude Children s Research Hospital, Memphis, TN PharmaSUG 2014 - Paper SP04 %IC_LOGISTIC: A SAS Macro to Produce Sorted Information Criteria (AIC/BIC) List for PROC LOGISTIC for Model Selection ABSTRACT Qinlei Huang, St. Jude Children s Research Hospital,

More information

STAT 510 Final Exam Spring 2015

STAT 510 Final Exam Spring 2015 STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information