MKTG 555: Marketing Models Structural Models -- Overview Arvind Rangaswamy (Some Parts are adapted from a presentation by by Prof. Pranav Jindal) March 27, 2017 1
Overview Differences between structural models and reduced form models Formulating models for dynamic discrete choice situations Application to diffusion of new products 2
Two Approaches to Modeling Reduced Form Model The focus is on prediction Fit different functional forms to explain the data (e.g., use Bass model) Regress sales as a function of past stales, other parameters Structural Model Model how consumers behave (i.e. model the underlying data generating process) Understand what causes some consumers to purchase before others Use theory to guide model development 3
Modeling Demand for DVD Players 40 Sales of DVD Players (USA) 35 30 25 20 15 10 5 0 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Source: Digital Entertainment Group, Year-end Report, 2009 4
Regression Model p t price at time t s t = α 0 + α 1 s t 1 + α 2 p t + ε t It has a structure to establish dependence of s t on p t. So, Is this a structural model? 5
Bass Model Assuming sales is same as adoption: m market potential s t = (m cs t 1 )(p + q cs t 1 ) cs t-1 cumulative sales until t-1 The structure here is based on theoretical notions for determining when someone in the target population will adopt a product. Is this a structural model? 6
What is a Structural Model? Relies on theories of consumer or firm behaviors to derive an econometric model specification that could be tested against data. The theories specify optimizing behavior of agents (e.g. utility maximization by consumers, profit maximization by firms, welfare maximization by policy makers, etc.) Accounts for the fact that the data we observe are not experimental. In short, we model the behaviors that generate the data. 7
What is the Value of Structural Models? Reduced form models allow for out-of-sample predictions if the out-of-sample situation (e.g. future) is not different from in-sample situation. But, what if you wish to predict effects when the situation changes (e.g., one firm exits the market, there is a change in government policy, etc.)? The Lucas critique: When the situation changes, we cannot make predictions solely based on historical data. Rather we need to model how the agents behaviors would change as the situation changes. However, when the situation changes, we can still assume optimization behaviors will not change. 8
Other Benefits of Structural Models More clearly specify model assumptions (particularly assumptions about nature of relationships between variables, assumptions about independence among random variables, etc.). Model parameters typically allow economic interpretation (because they are based on theory). Can be used to test predictive performance of alternative theories. Can be used to articulate how various mechanisms (e.g., incentives) influence outcomes. Allow for simulating outcomes when experiments or reduced form models are less accurate. 9
Costs of Structural Models Typically impose strong parametric assumptions to achieve model identification with available data Mis-specification costs could be high Identification may be less transparent (especially in dynamic models) May not fit the data as well as reduced-form models (but even then may predict better) 10
Steps in Developing A Structural Model Characterize the situation as completely as feasible Behavior of consumers Objectives of firms Nature of interactions between various agents Derive the econometric specification The source and structure of errors Search for parameter values that result in the best match between model predictions and observed behaviors. Conduct simulations What-if analyses 11
Two Types of Structural Models in Marketing Static Structural Demand Models Example: Hartmann 2010 (static structural model with social interactions) Dynamic Structural Demand Models A simple structural dynamic model (extension of Bass model Barrot et al.) Dubé et al. 2014 12
Static Structural Demand Models Primarily based on utility maximizing consumers Consumer decision problems only involve contemporaneous variables i.e., decision does not depend on the expectations about the future Models can be based on both individual data and aggregated data (Berry 1994, BLP 1995) Aggregated models based on individual decisions Allow heterogeneity in preferences 13
Static Structural Model Hartmann paper
Hartmann 2010 Objectives Develop framework to estimate demand model in the context where there are social interactions Model group decisions (a consequence of social interactions) as outcomes of a coordination game (similar to battle of sexes game) Data 3,151 golfers in 199 mutually exclusive groups Groups vary in size between 2 to 6 193 groups analyzed (152 pairs, 34 threesomes, and 7 foursomes) Average of 10.5 purchases together in 152 pairs 15
Hartmann 2010 Model accounts for endogeniety of partner s decision Rich heterogeneity specification which alleviates potential bias in interaction effects Equilibrium model allows modeler to conduct counterfactuals to measure customer values Extension of pioneering work by Bresnahan and Reiss (1991) 16
Model Setup Analysis focuses on group actions, rather than on group formation is (known or observable) association matrix ij = 1 if i and j are partners, 0 otherwise Groups indexed by g and individuals by i u y igt, y igt, ε igt, γ ig, = 0 + 0,igt y igt = 0 1igt y igt, γ ig, + ε 1,igt y igt = 1 17
Model Setup Assumption 1igt y jgt = 1, y i,jgt ; γ ig, > 1igt y jgt = 0, y i,jgt ; γ ig, if ij = 1 1igt y jgt = 1, y i,jgt ; γ ig, = 1igt y jgt = 0, y i,jgt ; γ ig, if ij = 0 When a group member j purchases, it increases i s utility of purchasing if that group member is i s partner but has no direct effect on i s purchase decision if not a partner but rather just a group member by association. When ij = 0, j does still indirectly affect i through any other partners that i and j have in common. 18
Multiple Equilibria More than one group decision y gt can satisfy the conditions Multiple equilibria prevent us from: (1) defining a likelihood function, and (2) predict outcomes when conducting counterfactual analysis Four possible outcomes: 1, 1 : 11A + 1A > 0A and 11B + 1B > 0B 1, 0 : 10A + 1A > 0A and 11B + 1B < 0B 0, 1 : 11A + 1A < 0A and 10B + 1B > 0B 0, 0 : 10A + 1A < 0A and 10B + 1B < 0B 19
Regions of Equilibria (Pair) 20
Likelihood Function Error space divided into multiple mutually exclusive regions as opposed to a binary region For the two player game: Pr 1, 1 Pr 1, 0 Pr 0, 1 = Pr 11A + 1A > 0A Pr 11B + 1B > 0B = Pr 10A + 1A > 0A Pr 11B + 1B < 0B = Pr 11A + 1A < 0A Pr 10B + 1B > 0B Pr 0, 0 = 1 Pr 1, 1 Pr(0, 1) 21
Likelihood Function Likelihood function for a group g over T g periods L g y Ag, y Bg ; g = T g t=1 1 1 A=0 B=0 Pr A, B; g I y Agt = A I y Bgt = B 22
Latent Utility Likelihood function for a group g over T g periods 1igt y i,jgt ; g, = γ 0ig + γ 1ig I + + β ig X igt j i ij y jgt 1 + α ig γ 1ig I j i ij y jgt 2 23
Targeted Pricing Focus just on pairs for simplicity Use specification estimates with heterogenous intercepts and interaction effects Optimal uniform advertised price is $95.65 increases profitability by 2.9% Targeted prices at group level further increases profitability by 1% Targeted prices within group increases profitability by 19% (potential backlash since players in a group interact). 24
Alternative Pricing Strategies Offer two-for-one coupons Optimal price per customer more by $9 Target groups with strong complementarities Offer free rounds to some while charging higher prices to others (with rotation) Profitability increases by 7% Focus targeting on non-price items 25
Simple Dynamic Structural Model Barrot et al. Working Paper
Some Perennial Questions About Technology Adoption Why does someone (individual or firm) adopt new technologies? What factors influence the timing of adoption of new technologies? What determines their rates of diffusion? What factors govern the long-run levels of adoption? 27
Why do Firms Adopt New Technologies and Business Practices? Benefits > Costs Increased efficiencies/effectiveness Network effects (Direct and Indirect) Option value Regulatory requirements Installation and training. 28
When do Firms Adopt New Technologies and Business Practices? Heterogeneity in costs and/or benefits Awareness/Learning Competitive environment Economies of scale Investments in current technologies Internal bureaucracy. 29
Overall Research Question How do competition (close ties) and legitimation (distant ties) influence the likelihood of a focal organization adopting a new online channel (after accounting for various other factors that could explain the adoption decision and adoption timing)? 30
Predictions of Population Ecology Theory No. of Firms Entering (or Exiting) Legitimation Overall Competition Number of firms in industry 31
Description of Digital Product/Service The product is a software that enabled auto dealers in Germany to list available used cars on a national Webaccessible database visited by potential car buyers. Adoption of the software is tantamount to adopting a new channel (or a new way of doing business). The company charged a fixed fee per month if a dealer listed more than 5 cars. The company was started in 1997 and was a first mover in this domain. The company had monopoly for several years, and continues to be the dominant player even today. The Web site now lists over 1,400,000 used cars at any given time (October 2013). 32
33
Adoption Decisions Across Space and Time > Is there any pattern to this adoption process? 1997 2000 2002 2003 1998 1999 2001 34
35 Temporal and Geographic Patterns of Adoptions Geographic Location of German Car Dealers Cities > 100,000 Inhabitants Essen Düsseldorf Frankfurt Hamburg Munich Berlin
Illustration of Adoption Data in Geographic cluster j A B t-1 t t+1 We group data into monthly time periods to accommodate timevarying covariates for which we have at best monthly data. When determining the competitive pressure (hazard) on dealer B in time period t, we take into account the fact that A in the same competitive zone has already adopted (i.e., there is no aggregation of times of adoption). On the other hand, A and B will both experience the same legitimation pressure (hazard) during time period t, which is based on the aggregated number of adopters in other zones (i.e., outside zone j) at the end of time period t-1, as a proportion of the total number of potential adopters in the other zones. 36
Some Threats to Valid Inference Regarding Contagion Effects Correlated unobservables (especially those that are time-varying) Endogeneneity Endogenous dealer formation Endogeneity of installed base Simultaneity (Reflection) Strategic behaviors (e.g., pre-emption, forwardlooking competitive behaviors) 37
Modeling Approach Binary Choice Model (for adoption) via utility maximization. Hazard Model to account for the time to adoption and censoring. Likelihood maximization. Incremental model building with various approaches for incorporating random effects e.g. for unobserved dealer heterogeneity), fixed effects for geographic clusters (for unobserved cluster heterogeneity), time-varying unobservables. 38
Utility Model U ijt Y ijt = 1 = βx ijt + ε 1,ijt = β 0i + β 1 X 1it + β 2 X 2j + β 3 X 3ijt + β 4 X 4i j t 1 + ε 1,ijt i: Dealer; j: Competitive zone; t: time period X 1it : Dealer-specific covariates X 2j : Competitive zone specific covariates X 3ijt : Competitive peer effect X 4i j t 1 : Legitimation peer effect. 39
Modeling Approach (Binary Choice Model + Hazard Model) 40
Utility Maximization The probability that dealer i will adopt technology in time period t is given by: P ijt Y ijt = 1 X ijt = Prob U ijt Y ijt = 1 U ijt Y ijt = 0 > 0 That is, Prob X ijt + ε 1,ijt ε 0,ijt > 0 If we assume that the errors are iid extreme value distributions (i.e., ε.,ijt ~ e ε e e ε ), we get the familiar binary logit model. 41
Hazard Model Logit ignores the conditional nature of adoption decisions if a dealer adopts in time period t, it also means the dealer did not adopt in time periods 0, 1, 2, t-1. To account for this, we specify that the hazard (conditional probability of adoption at time t, given no adoption until t-1) has the following proportional odds structure: h 0 t is the base hazard of adoption in time period t (i.e., hazard when X ijt 0). Denote ζ t = Ln h 0(t) 1 h 0 (t) for t = 1, 2., T 42
Hazard Model We get: Note: P ijt(.) ζ t > 0 for ζ t >0 Instead of the Logit link, we can also use the complementary log-log link function by specifying a proportional hazard model, rather than the proportional odds model. In this case, we get (using a continuous time notation, ): h X ijt = h 0 e βx ijt 43
Hazard Model Then, we will get: h(t X ijt ) = P ijt Y ijt = 1 X ijt = 1 e eβx ijt + t where t is a function of the integral of the base hazard over the duration of a time period t. Given observations Y ijt and X ijt, we can estimate parameters (, ζ t or t ) by maximizing sample likelihood, using the following equation (where R t is the risk set, i.e., those dealers who have not yet adopted by the beginning of time period t): L = T t=1 ij R t (P ijt [Y ijt = 1 X ijt ]) Yijt(1 P ijt [Y ijt = 0 X ijt ]) 1 Yijt. 44
A Discrete Probit Model (to also account for time-varying unobservables) Consider: F(y it z i, y it 1, y io, c i, T i ) = F(y it z it, y it 1, c i ) for t = 1, 2, T i y it : Choice at time t ( 1 adopt, 0 don t adopt) z it : Value of covariate for unit i at time t c i : Unobserved heterogeneity T i : Time to adoption. It can be correlated with (z i, c i ), but not with shocks to y it. Note also that in the context of new product adoption, the initial choice (y i0 ) is independent of unobserved heterogeneity (c i ), because the start of the stochastic process is the same for all units. 45
A Discrete Probit Model (to also account for time-varying unobservables) The joint density (conditional times marginal) for an observation at time t for unit i is given by: f t y it, y it 1 z it, c i = g t y it z it, y it 1, c i g t y it z it, c i The complete joint distribution is (via exogeneity assumption): f y i z it, c i = T i g t y it z it, y it 1, c i g t y it z it, c i t=1 Integrate out the unobserved heterogeneity: T f y i z it = i ci t=1 ft y it z it, y it 1, c i f t y it z it, c i h c i z i, y i0 dc i. (A) 46
A Discrete Probit Model (to also account for time-varying unobservables) To operationalize this likelihood, we use Normal distributions (i.e. Probit model). Specifically, let y it = 1 y it 1 + 0 + z it + c i + it 0 ~ where it y it 1, z i, c i iid N(0, 1) (note negative sign for the error) Then Pr y it = 1 y it 1, z i, c i = y it 1 + 0 + z it + c i. We assume the following conditional density for unobserved heterogeneity: h(c i z i, y i0 ;, 0, 1 ~N( 0 + z i + 1 y i0, c 2 ) 47
A Discrete Probit Model (to also account for time-varying unobservables) Operationalizing equation (A), the contribution to the likelihood for unit i is then given by: L i = c i T i y t 1 + 0 + z it + c i 2y it 1 h c i z i, y i0 dc i t=1 Substituting for heterogeneity c i and its distribution h(.), we get: L i = c i T i y t 1 + 0 +z it + 0 + z i + a i 2y it 1 t=1 1 c a i c da i where: a i = c i 0 z i - 1 y i0 48
Dynamic Structural Model Dubé et al. 2014
Dynamic Structural Demand Models Primarily based on utility-maximizing consumers, but the consumers care about the past and/or future. That is, previous actions or parameters of interest, and expectations about the future, may affect current decisions. In purchasing a DVD player, consumers take into account not only current price, but expectations about future price, the number of DVD movies available, etc. 50
Two Contexts Where Dynamic Choice Models are Needed State dependence (past actions affect current utility and actions) Learning, variety seeking, switching cost, etc. In reduced form models, these could be modeled as lag choices affecting current decisions (based on assumed functional form) Reference prices consumer decisions based on some benchmark price Based on expectations about future prices, uncertainty about product quality, etc. In reduced form models, this could be modeled as a function of observed prices to explain current choices 51
Dynamic Stochastic Choice Models (Structure ) Consumer chooses among K+1 choice alternatives (brands), a t A = {0, 1, 2,.. K}, in each period t = 0, 1, 2, 3. The state vector, s S has the following structure s = (x, ), where x, ε R K+1. The transition probability is given by p(s t+1 s t, a t ) 52
Dynamic Choice Models (Assumptions) Additive separability: utility from choosing action a t = j in state s t = x t, ε t has the following structure: U s t, a t = u j x t + ε jt. Conditional independence: Given x t and a t current realizations of ε t do not influence the future realizations of states x t+1. Hence the transition probability of x can be written as f x t+1 x t, a t. IID shocks: ε jt is iid across actions and time periods. Given conditional independence and the iid assumption, the transitional probability can be written as: p s t+1 s t, a t = f x t+1 x t, a t g(ε t+1 ) 53
Dynamic Choice Models (Decision Rules and Rewards) In each period, the consumer chooses an action according to the decision rule d: X R K+1 A, a t = d(x t, ε t ). The expected present discounted value of utilities under decision rule d is: v d x 0, ε 0 = E t=0 t U x t, t, d x t, t x 0, 0 is a discount factor 0 < β < 1 54
Dynamic Choice Models (Optimal Behavior) Given that A is finite, there is an optimal decision rule d (x t, ε t ). We can express the consumer s optimization problem recursively using Bellman s equation. The associated value function satisfies: v x, ε = max j A u j x + ε j + βe(v (x, ε ) x, a = j) = max j A u j x + ε j + β v x, ε f x x, j g ε d x, ε 55
Dynamic Choice Models (Expected Value Function) Define the expected value function as: w x = v x, ε g ε dε w x is the value the consumer expects to get before the unobserved states are realized Using the definition of w, we can re-write the value function as: v x, = max j A u j x + ε j + β w x f x x, a = j g ε d x 56
Dynamic Choice Models (Expected Value Function) Taking expectation on both sides with respect to, we obtain the integrated Bellman equation. w x = max j A u j x + ε j + β w x f x x, a = j d(x ) g dε Taking expectation on both sides of this equation defines a contraction mapping. Thus we have a unique solution which equals the expected value function. 57
Dynamic Choice Models (Choice Specific Value Functions) Using the definition of the expected value function, we define the choice-specific value functions as: v j x = u j x + β w x f x x, a = j d x + ε j Conditional on solving for w(x), the optimal decision rule satisfies d x, ε = k v k x + ε j for all k A, k j. 58