Bayesian estimation of complex networks and dynamic choice in the music industry

Size: px

Start display at page:

Download "Bayesian estimation of complex networks and dynamic choice in the music industry"

Caitlin Black
5 years ago
Views:

1 Bayesian estimation of complex networks and dynamic choice in the music industry Stefano Nasini Víctor Martínez-de-Albéniz Dept. of Production, Technology and Operations Management, IESE Business School, University of Navarra, Barcelona, Spain

2 Outline Multidimensional Gaussian reduction The exponential family of distribution 4 Numerical results Goodness of fit

3 Artist goods: the music broadcasting industry Artist goods Their life cycles that resemble clothing fashion trends, with a time window in which their popularity increases shortly after their premiere and then decrease. This is due to network externalities in individual preferences and opinions.

4 Artist goods: the music broadcasting industry A data set of songs played on TV channels and radio stations Germany UK Broadcasting companies Artists Songs Time periods 163 weeks 163 weeks

5 Artist goods: the music broadcasting industry A song s popularity increases after their premiere and then decrease (a) B. Mars, Just the way you are in Germany. (b) B. Mars, Locked Out Of Heaven in Germany. (c) B. Mars, Just the way you are in the UK. (d) B. Mars, Locked Out Of Heaven in the UK.

6 Artist goods: the music broadcasting industry Correlated choices from different broadcasting companies BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth Radio London BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth Radio London Table: Spearman s correlations among the dynamic plays of Locked Out Of Heaven. BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth Radio London BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth Radio London Table: Spearman s correlations among the dynamic plays of Just the way you are.

7 Artist goods: the music broadcasting industry Our goal is to have a joint model which allows... Predicting the common life cycle of song diffusion within the music broadcasting industry. Detecting the structure of imitation and spillover between radio stations and TV channels, based on the observed correlations. Taking decision about what s the best broadcasting industry to launch a song in order to maximize the future number of plays.

8 as two-mode network Notation R := set of individuals (primary layer); S := set of item (secondary layer); T := set of time periods; xst = [xs1t xs2t... xs R t ]T χ is the R -dimensional connection profile of the sth item at time t. E R R := a set of connections between broadcasting industries;

9 as two-mode network Spillover measurements to internalize cross-section dependency in the panel 1 i Ghk (xst ; xs,t 1,..., xs,t τ ) = E τ Pτ 1 ii Ghk (xst ; xs,t 1,..., xs,t τ ) = E τ `=0 Pτ 1 d` (xsht )uh (xsk (t `) )uk p ; `=0 d` xsht uh xsk (t `) uk 2 p ;

10 Multidimensional Gaussian reduction The exponential family of distribution P(x st x s,t 1,..., x s,t τ ) h(x st ) exp α st S s + β r R r + γ hk G hk r R (h, k) E - S st accounts for the size effect of each item in the secondary layer, for s S; - R r accounts for the size effect of each individual in the primary layer, for r R; - G hk internalizes the one-mode projection into the primary layer, for (h, k) R; Underlying measure: either h(x st ) = 1 x srt! or h(x st ) = (2π) r R (τ+1) R 2

11 Multidimensional Gaussian reduction The exponential family of distribution The spillover measurement G hk plays an important role. ( [ P(x srt x sr t such that r r, t < t) 1 x srt! exp αst + β r η ] T [ x srt C(x srt ) ] ), where η = 1 τ 1 γ rk (x sk(t l) ) p and C(x srt ) = (x srt ) p 1, for i, τ E k R l=1 γ r1 η = 1 τ E. and C(x srt ) = γ rn τ l=1 τ l=1 d l x srt u r d l x srt u r x s1(t l) u 1. x sn(t l) u n 2 p 2 p for ii.

12 Multidimensional Gaussian reduction The exponential family of distribution α = 1 and γ = 1 α = 1 and γ = 1 Spillover measurement 1 x!y! exp(α(x + y) + γ(xy)1/2 ) 1 exp(α(x + y) + γ x y ) x!y!

13 Multidimensional Gaussian reduction The exponential family of distribution Multidimensional Gaussian reduction Under special conditions: P(x st x s,t 1,..., x s,t τ ) h(x st ) exp α st S s + β r R r + γ hk G hk r R (h, k) E - G hk (x st ; x s,t 1,..., x s,t τ ) = τ l=0 d l ( xsht x sk(t l) ) ; - h(x st ) = (2π) (τ+1) R 2 ; X st. X s,t τ N (µ, Σ), where µ = Σ α st e + β. α s,t τ e + β and Σ = 1 2 d 0 Γ... d τ Γ.. d τ Γ... d 0 Γ 1.

14 Multidimensional Gaussian reduction The exponential family of distribution Why is our model an extension of the ERGM? Exponential Family Whenever the density of a random variable may be written f (x) h(x) exp{θ T C(x)} the family of all such random variables (for all possible θ) is called an exponential family. Exponential Random Graph Model (ERGM) P θ (X = x) = exp{θt C(x)}, where Z (θ) X is a random network on n nodes (a matrix of 0 s and 1 s); θ is a vector of parameters; C(x) is a known vector of graph statistics on x.

15 Why it is difficult to find the MLE Multidimensional Gaussian reduction The exponential family of distribution The log-likelihood function - the model: P(X = x (0) θ) = exp{θt C(x (0) )}, where x (0) is the Z (θ) observed data set. - The log-likelihood function is l(θ) = θ T C(x (0) ) log ( Z (θ) ) = θ T C(x (0) ) log exp{θ T C(x)} all possible x - Even in the simplest case of undirected graphs without self-edges, the number of graphs in the sum is very large.

16 Maximum Pseudo-likelihood Multidimensional Gaussian reduction The exponential family of distribution Let x w be a unique component of x and x w the vector of all the remaining components. The pseudo-likelihood function Let s approximate the marginal P(x w θ) by the conditional P(x w x w ; θ)? Then l(θ) = w P(x w x w ; θ). Result: The maximum pseudo-likelihood estimate. Unfortunately, little is known about the quality of MPL estimates.

17 Pseudo-likelihood for ERGM Multidimensional Gaussian reduction The exponential family of distribution Notation: For a network x and a pair (i, j) of nodes l(θ) = w P(x w x w ; θ) = exp{θ T C(x (0) )} exp{θ T C(x (i,j) ij = 1, x ij )} + exp{θ T C(x ij = 0, x ij )} exp{n(n 1)θ T C(x (0) )} = ( ) (i,j) exp{θ T C(x ij = 1, x ij )} + exp{θ T C(x ij = 0, x ij )}

18 Pseudo-likelihood for our model Multidimensional Gaussian reduction The exponential family of distribution Pseudo-likelihood for our model l(θ) = (r,t) P(x srt x sr t such that r r, t < t) (r,t) ( [ 1 x srt! exp αst + β r η ] T [ x srt C(x srt ) ] ), What is the normalizing constant for the full conditional? Z (α st, β r, η) = x srt 0 ( [ 1 x srt! exp αst + β r η ] T [ x srt C(x srt ) ] ) Even the pseudo-likelihood is hard to define for our model

19 Pseudo-likelihood for our model Multidimensional Gaussian reduction The exponential family of distribution Pseudo-likelihood for our model l(θ) = (r,t) P(x srt x sr t such that r r, t < t) (r,t) ( [ 1 x srt! exp αst + β r η ] T [ x srt C(x srt ) ] ), What is the normalizing constant for the full conditional? Z (α st, β r, η) = x srt 0 ( [ 1 x srt! exp αst + β r η ] T [ x srt C(x srt ) ] ) Even the pseudo-likelihood is hard to define for our model

20 Bayesian posterior Goodness of fit Let θ = [α 1t,..., α S t, β 1,..., β R, γ 11,..., γ R, R ] T be the vector of natural parameters, π(θ) a prior distribution and x (0) the observed data set. By applying the Bayes rule we have: P(θ x (0) ) = P(x (0) θ)π(θ) P(x (0) θ)π(θ) dθ θ P(x 1..., x τ ; θ) w P(x t x t 1..., x t τ ; θ)π(θ) t = τ+1 = w P(x 1..., x τ ; θ) P(x t x t 1..., x t τ ; θ)π(θ) dθ θ t = τ+1 P(x 1..., x τ ; θ) π(θ) w m q s,t,θ (x st ) Z (θ) t = τ+1 s=1 = P(x 1..., x τ ; θ) π(θ) w m q s,t,θ (x st ) dθ θ Z (θ) t = τ+1 s=1

21 Metropolis-Hastings Goodness of fit Since both P(x (0) θ) and P(θ x (0) ) can only be specified under proportionality conditions, almost all known valid MCMC algorithms for θ cannot be applied. Consider for instance the Metropolis-Hastings acceptance probability: π accept (θ, θ ) { = min 1, P(x (0) θ )π(θ ) P(x (0) θ)π(θ) Q(θ } θ ) Q(θ θ) w m P(x 1..., x τ ; θ ) q s,t,θ (x st )π(θ ) t = τ+1 s=1 = min 1, Z (θ) Q(θ θ ) w m Z (θ )Q(θ θ) P(x 1..., x τ ; θ) q s,t,θ (x st )π(θ) t = τ+1 s=1 where Q(θ θ) is the proposal distribution.

22 Goodness of fit Specialized MCMC for doubly intractable distributions Murray proposed a MCMC approach which overcomes the drawback to a large extent, based on the simulation of the joint distribution of the parameter and the sample spaces, conditioned to the observed data set x (0), that is to say P(x, θ x (0) ). Algorithm 1 Exchange algorithm of Murray. 1: Initialize θ 2: repeat 3: Draw θ from an arbitrary proposal distribution; 4: Draw x from P(. θ ) 5: Accept θ with probability min 6: Update θ 7: until Convergence { 1, P(x θ)p(x (0) θ )π(θ } ) P(x (0) θ)p(x θ )π(θ)

23 Goodness of fit Goodness of fit: graphical illustration Total number of plays along time by the top-30 songs (a) Full model. (b) Null model (γ = 0).

24 Goodness of fit Goodness of fit: graphical illustration Total number of plays along time by the top-30 songs (a) Total plays along time. (b) Market share.

25 Goodness of fit Reducing the dimensionality of the parameter space Model specification based on structural properties of the music industry The parameter space is the whole ( T S + R + E )-dimensional Euclidean space, while the sample space has dimension ( T S R ). We use two strategies to reduce the dimensionality of the parameter space: A. Define communities of broadcasting companies to consider only within-group spillover effects γ; B. Define a functional form for the effect of the song life cycle α.

26 Goodness of fit Reducing the dimensionality of the parameter space A. Reducing the E effects γ Pairwise spillover effects γ kh, between individual companies h and k with the same radio format. Common spillover effect between different radio formats γ kh, if h and k have different formats. B. Reducing the T S effects α The broadcasting pattern of songs exhibit a time window in which their popularity quickly increases shortly after their premier and then decreases.

27 Goodness of fit Groups of broadcasting companies WITHIN FORMAT BETWEEN FORMATS TV channels Let s introduce only the effects γ which are associated to TV channels and radio station of the same format. Contemporary and Easy listening Top 40 and Urban Radio stations Rock music

28 The estimated spillover effects Goodness of fit The estimated spillover effects Contemporary Rock News Sport Top-40 World-Music TV channels Contemporary ( 0.089, 0.004) (0.012, 0.021) ( 0.028, 0.014) ( 0.164, 0.012) ( 0.030, 0.019) Rock ( 0.035, 0.021) ( 0.049, 0.037) ( 0.018, 0.001) ( 0.032, 0.001) ( 0.015, 0.021) News ( 0.023, 0.047) ( 0.072, 0.010) ( 0.035, 0.008) ( 0.005, 0.024) (0.009, 0.030) ( 0.186, 0.068) Sport ( 0.009, 0.076) ( 0.036, 0.001) ( 0.068, 0.030) ( 0.015, 0.013) ( 0.029, 0.001) Top-40 ( 0.070, 0.001) ( 0.083, 0.022) ( 0.052, 0.000) ( 0.038, 0.022) ( 0.025, 0.019) World-Music ( 0.017, 0.014) ( 0.029, 0.036) ( 0.022, 0.005) ( 0.017, 0.011) ( 0.014, 0.024) TV channels ( 0.291, 0.038) BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth R. London BBC 1 Xtra ( 0.009, 0.060) ( 0.104, 0.057) ( 0.015, 0.012) (0.005, 0.024) ( 0.015, 0.012) Capital FM ( 0.015, 0.051) ( 0.060, 0.001) ( 0.009, 0.025) (0.000, 0.025) ( 0.013, 0.019) Kiss 100 FM ( 0.020, 0.124) ( 0.028, 0.025) ( 0.009, 0.025) ( 0.032, 0.021) (0.001, 0.029) Metro Radio ( 0.008, 0.094) ( 0.009, 0.012) ( 0.027, 0.026) ( 0.014, 0.037) (0.000, 0.055) Radio City ( 0.019, 0.110) ( 0.040, 0.012) ( 0.015, 0.022) ( 0.021, 0.009) (0.010, 0.033) Smooth R. London ( 0.033, 0.011) ( 0.021, 0.014) ( 0.022, 0.016) ( 0.032, 0.023) ( 0.022, 0.001)

29 Songs dynamics Goodness of fit Define a functional form for the effect of song dynamics The attractiveness trajectory of the s th song can be specified by letting t 0 be the starting week when the song is launched and then considering a gamma kernel to design the shape its time dynamics: { δ 0 α st = s + δs 1 (t t 0 ) + δs 2 log(t t 0 ) if t > t 0 otherwise where t 0 is the week when the song has been launched.

30 Songs life cycle Goodness of fit Common life cycle of the top-30 songs

31 Goodness of fit Propagation of the broadcasting decision after the premier week t 0. 1 T [ ] max E x S s,,t+t x srt = z r : for all r R, s S t =1 subject to y r = 1 r R z r min{my r, φ} r R, y r {0, 1}, z r 0, F φ 0 r R, Format Eigenvector Expected plays in t Expected plays in t φ = 10 φ = 100 φ = 10 φ = 100 Contemporary Rock News Sport Top World Music TV-channels

32 Discussion Goodness of fit Which are the real achievements of this work? We considered a large multidimensional panel of songs weekly broadcasted on radio stations and TV channels and detect a pattern of cross-section dependencies, based on pairwise imitations. has been proposed to internalized in a unique probabilistic framework both the songs life cycle and the complex correlation structure. A specialized MCMC method has been implemented to estimate the model parameters. The out-of-sample goodness of fit has been analyzed, assessing the model adequacy for the observed data set.

the European Research Council under the European Union s Seventh

33 Goodness of fit THANK YOU FOR YOUR ATTENTION Acknowledgements The research leading to these results has received funding from the European Research Council under the European Union s Seventh Framework Programme (FP/ ) / ERC Grant Agreement n

Pairwise influences in dynamic choice: method and application

Pairwise influences in dynamic choice: method and application Stefano Nasini Victor Martínez-de-Albéniz December 14, 2015 Abstract Choices of different individuals over time exhibit pairwise associations