Conjoint Modeling of Temporal Dependencies in Event Streams. Ankur Parikh Asela Gunawardana Chris Meek

Size: px

Start display at page:

Download "Conjoint Modeling of Temporal Dependencies in Event Streams. Ankur Parikh Asela Gunawardana Chris Meek"

Moses Joseph
6 years ago
Views:

1 Conjoint Modeling of Temporal Dependencies in Event Streams Ankur Parikh Asela Gunawardana Chris Meek

2 APPLICATION 1: MAKING ADVERTISING MORE RELEVANT

3 Display Advertising Ads are targeted to page content

4 Display Advertising Ads are targeted to page content

5 Behavioral Advertising User acts, expressing information need

6 Behavioral Advertising User acts, expressing information need

7 Behavioral Advertising Future ads are targeted to need

8 Behavioral Advertising Future ads are targeted to need

9 Behavioral Advertising User actions (e.g. queries issued, pages viewed) and ads are categorized into a hierarchy. E.g.: Health & Wellness Men s Health Mental Health Nutrition Internet Hosting Web Design Users that engage in actions belonging to a category are later shown ads belonging to that category

10 Behavioral Advertising User actions (e.g. queries issued, pages viewed) and ads are categorized into a hierarchy. E.g.: Health & Wellness Men s Health Mental Health Nutrition Internet Hosting Web Design Users that engage in actions belonging to a category are later shown ads belonging to that category

11 Improving Behavioral Advertising Goal: Show users information they need, when they need it. Need better modeling of the dynamics of user s needs Forecast needs before they are expressed Optimal time to provide information may be before expression of need Forecast the end of existing needs Don t advertise after need has passed

12 Improving Behavioral Advertising Goal: Show users information they need, when they need it. Need better modeling of the dynamics of user s needs Forecast needs before they are expressed Optimal time to provide information may be before expression of need Forecast the end of existing needs Don t advertise after need has passed

13 Improving Behavioral Advertising Goal: Show users information they need, when they need it. Need better modeling of the dynamics of user s needs Forecast needs before they are expressed Optimal time to provide information may be before expression of need Forecast the end of existing needs Don t advertise after need has passed

14 Improving Behavioral Advertising Goal: Show users information they need, when they need it. Need better modeling of the dynamics of user s needs Forecast needs before they are expressed Optimal time to provide information may be before expression of need Forecast the end of existing needs Don t advertise after need has passed

15 Intervene Before Need Is Expressed

16 Stop Ads After No Longer Relevant

17 Stop Ads After No Longer Relevant

18 Stop Ads After No Longer Relevant

19 Modeling Problem Model times and categories of user s future actions given times and categories of user s past actions What will the user be interested in when?

20 APPLICATION 2: UNDERSTANDING AND FORECASTING FAILURES IN DATACENTERS

21 Datacenter Machine 1 Machine 2 Machine 3

22 Datacenter Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

23 Datacenter Controller Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

24 Datacenter Controller OSPatch: Complete Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

25 Datacenter Controller Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

26 Datacenter Controller DiskWrite: Fail Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

27 Datacenter Controller Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

28 Datacenter Controller Machine 1 XYZ Sales App Machine 2 Machine 3 Insert:Fail XYZ DB ABC DB XYZ Sales DB IJK DB

29 Datacenter Controller Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

30 Datacenter Controller Machine 1 ReadMail: Machine Success 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

31 Datacenter Controller Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

32 Datacenter Controller DiskWrite: Fail Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

33 Datacenter Controller Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

34 Reboot Datacenter Controller Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

35 Datacenter Controller Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

36 Datacenter Controller Machine 1 ReadMail: Machine Success 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

37 Datacenter Controller Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

38 Datacenter Controller DiskWrite: Fail Machine 1 Machine 2 Machine 3 XYZ Sales App XYZ DB ABC DB XYZ Sales DB IJK DB

39 Datacenter Management System Controller chooses actions in response to events according to a policy Policy hand-authored by datacenter managers Difficult: Complex interactions between hardware, OS, services, and policies. Would like to: Diagnose failures: what past events predict failures? Understanding policies: where can the policy be improved? Eventually automate policy selection

40 Datacenter Management System Controller chooses actions in response to events according to a policy Policy hand-authored by datacenter managers Difficult: Complex interactions between hardware, OS, services, and policies. Would like to: Diagnose failures: what past events predict failures? Understanding policies: where can the policy be improved? Eventually automate policy selection

41 Modeling Problem Model dependencies of events on types and times of past events: Why did failures happen? In what situations are particular actions taken?

42 MODELING EVENT STREAMS AS MARKED POINT PROCESSES

43 Events Streams as Marked Point Processes Wish to model dynamics of random processes rather than dependencies between random variables. Model stream of actions from a user or stream of events from a machine as a marked point process

44 Events Streams as Marked Point Processes Wish to model dynamics of random processes rather than dependencies between random variables. Model stream of actions from a user or stream of events from a machine as a marked point process

45 Events Streams as Marked Point Processes Wish to model dynamics of random processes rather than dependencies between random variables. Model stream of actions from a user or stream of events from a machine as a marked point process Health Health Sports Health Sports Mar-02 13:34 Mar-05 09:21 Mar-09 14:01 Mar-26 23:34 t

46 Events Streams as Marked Point Processes Wish to model dynamics of random processes rather than dependencies between random variables. Model stream of actions from a user or stream of events from a machine as a marked point process l l 2 1 l 3 l 4 l 5 t 1 t 3 t 4 t 5 t 2 t

47 Modeling Marked Point Processes Forward in time likelihood: p t 1, l 1,, t n, l n = History h i = t 1, l 1,, t i 1, l i 1 n i=1 p(t i, l i h i ) Process completely characterized by conditional intensity functions λ l t h for the labels l [Daley & Vere-Jones 2003]. λ l t h = expected rate of events labeled l at time t given history h Conditional intensity representation: p t i, l i h i = λ li t i h i l e t i λ l τ h i dτ

48 Modeling Marked Point Processes Forward in time likelihood: p t 1, l 1,, t n, l n = History h i = t 1, l 1,, t i 1, l i 1 n i=1 p(t i, l i h i ) Process completely characterized by conditional intensity functions λ l t h for the labels l [Daley & Vere-Jones 2003]. λ l t h = expected rate of events labeled l at time t given history h Conditional intensity representation: p t i, l i h i = λ li t i h i l e t i λ l τ h i dτ

49 Modeling Marked Point Processes Forward in time likelihood: p t 1, l 1,, t n, l n = History h i = t 1, l 1,, t i 1, l i 1 n i=1 p(t i, l i h i ) Process completely characterized by conditional intensity functions λ l t h for the labels l [Daley & Vere-Jones 2003]. λ l t h = expected rate of events labeled l at time t given history h Conditional intensity representation: p t i, l i h i = λ li t i h i l e t i λ l τ h i dτ

50 PCIM Piecewise-constant Conditional Intensity Model (PCIM) [Gunawardana et al 2011]: Assumes λ l t h is piecewise constant in t Represents λ l t h for each l by a decision tree Parameters have conjugate prior given tree Learn trees greedily using structural prior and Bayesian model selection Forecast using importance sampling

51 Tree for λ RebootFail t h RebootInit t 0: 30, t λ RebootFail = 0 RebootFail t 0: 30, t VersionCheckFail t 1: 00, t 0: 30 λ RebootFail = 0 λ RebootFail = 0.1 λ RebootFail = 2.5

52 PCIM Piecewise-constant Conditional Intensity Model (PCIM) [Gunawardana et al 2011]: Assumes λ l t h is piecewise constant in t Represents λ l t h for each l by a decision tree Parameters have conjugate prior given tree Learn trees greedily using structural prior and Bayesian model selection Forecast using importance sampling

53 PCIM Piecewise-constant Conditional Intensity Model (PCIM) [Gunawardana et al 2011]: Assumes λ l t h is piecewise constant in t Represents λ l t h for each l by a decision tree Parameters have conjugate prior given tree Learn trees greedily using structural prior and Bayesian model selection Forecast using importance sampling

54 Tree for λ RebootFail t h RebootInit t 0: 30, t λ RebootFail = 0 RebootFail t 0: 30, t VersionCheckFail t 1: 00, t 0: 30 λ RebootFail = 0 λ RebootFail = 0.1 λ RebootFail = 2.5

55 Graphical Representations of PCIMs Tree for RebootFail depends on past RebootInit, RebootFail, and VersionCheckFail but not on VersionCheckSucc and RebootSucc. (Temporal) conditional independence!!! of intensities, not probabilities. Temporal conditional independencies can be represented as a directed graph [Didelez2008]. Graph may be cyclic.

56 Graphical Representations of PCIMs Tree for RebootFail depends on past RebootInit, RebootFail, and VersionCheckFail but not on VersionCheckSucc and RebootSucc. (Temporal) conditional independence!!! of intensities, not probabilities. Temporal conditional independencies can be represented as a directed graph [Didelez2008]. Graph may be cyclic.

57 Graphical Representations of PCIMs Tree for RebootFail depends on past RebootInit, RebootFail, and VersionCheckFail but not on VersionCheckSucc and RebootSucc. (Temporal) conditional independence!!! of intensities, not probabilities. Temporal conditional independencies can be represented as a directed graph [Didelez2008]. Graph may be cyclic.

59 PCIMs vs. CTBNs PCIMs are marked point processes Events have piecewise-constant conditional intensities given parents. Estimation is simple. Forecasting is complex. CTBNs are Markov marked point processes Events (state transitions) have constant conditional intensities x given parents: λ x:i j t h = q i j par x h Modeling non-markov processes requires latent states. Estimation is more complex. Forecasting simplifies.

60 PCIMs vs. CTBNs PCIMs are marked point processes Events have piecewise-constant conditional intensities given parents. Estimation is simple. Forecasting is complex. CTBNs are Markov marked point processes Events (state transitions) have constant conditional intensities x given parents: λ x:i j t h = q i j par x h Modeling non-markov processes requires latent states. Estimation is more complex. Forecasting simplifies.

61 PCIMs vs. CTBNs PCIMs are marked point processes Events have piecewise-constant conditional intensities given parents. Estimation is simple. Forecasting is complex. CTBNs are Markov marked point processes Events (state transitions) have constant conditional intensities x given parents: λ x:i j t h = q i j par x h Modeling non-markov processes requires latent states. Estimation is more complex. Forecasting simplifies.

62 PCIMs vs. CTBNs PCIMs are marked point processes Events have piecewise-constant conditional intensities given parents. Estimation is simple. Forecasting is complex. CTBNs are Markov marked point processes Events (state transitions) have constant conditional intensities x given parents: λ x:i j t h = q i j par x h Modeling non-markov processes requires latent states. Estimation is more complex. Forecasting simplifies.

63 PCIMs vs. CTBNs PCIMs are marked point processes Events have piecewise-constant conditional intensities given parents. Estimation is simple. Forecasting is complex. CTBNs are Markov marked point processes Events (state transitions) have constant conditional intensities x given parents: λ x:i j t h = q i j par x h Modeling non-markov processes requires latent states. Estimation is more complex. Forecasting simplifies.

64 Large Label Sets When set of possible l is large: Need separate tree for each l. Hard to learn good λ l (t h) for rare l. Idea: Make λ l t h be similar for similar l, different for different l Use structured labels with attributes for learning similarity: RebootFail Reboot:Fail MentalHealth Health&Wellness:MentalHealth

65 Large Label Sets When set of possible l is large: Need separate tree for each l. Hard to learn good λ l (t h) for rare l. Idea: Make λ l t h be similar for similar l, different for different l Use structured labels with attributes for learning similarity: RebootFail Reboot:Fail MentalHealth Health&Wellness:MentalHealth

66 Large Label Sets When set of possible l is large: Need separate tree for each l. Hard to learn good λ l (t h) for rare l. Idea: Make λ l t h be similar for similar l, different for different l Use structured labels with attributes for learning similarity: RebootFail Reboot:Fail MentalHealth Health&Wellness:MentalHealth

67 Large Label Sets When set of possible l is large: Need separate tree for each l. Hard to learn good λ l (t h) for rare l. Idea: Make λ l t h be similar for similar l, different for different l Use structured labels with attributes for learning similarity: RebootFail Reboot:Fail MentalHealth Health&Wellness:MentalHealth Attrib 1

68 Large Label Sets When set of possible l is large: Need separate tree for each l. Hard to learn good λ l (t h) for rare l. Idea: Make λ l t h be similar for similar l, different for different l Use structured labels with attributes for learning similarity: RebootFail Reboot:Fail MentalHealth Health&Wellness:MentalHealth Attrib 2

69 Conjoint PCIMs Represent λ l t h for all l by a single decision tree Queries label attributes to distinguish dissimilar l Like PCIM Has conjugate prior given tree structure Learn tree using structural prior and Bayesian model selection Forecast using importance sampling

70 *:Fail [t 0: 30, t) l [t 0: 30, t) l = :Fail l = Reboot: λ l = 2.0 λ l = 0.01

71 *:Fail [t 0: 30, t) Depends on h & t, not l l [t 0: 30, t) l = :Fail l = Reboot: λ l = 2.0 λ l = 0.01

72 *:Fail [t 0: 30, t) l [t 0: 30, t) l = :Fail Depends on l not h & t l = Reboot: λ l = 2.0 λ l = 0.01

73 *:Fail [t 0: 30, t) l [t 0: 30, t) Depends on l, h & t. l = :Fail l = Reboot: λ l = 2.0 λ l = 0.01

74 *:Fail [t 0: 30, t) l [t 0: 30, t) l = :Fail l = Reboot: λ l = 2.0 λ l = 0.01

75 *:Fail [t 0: 30, t) l [t 0: 30, t) l = :Fail Model λ Rare:Fail t h even if Rare:Fail unseen in training λ l = 2.0 l = Reboot: λ l = 0.01

76 Conjoint PCIMs Represent λ l t h for all l by a single decision tree Queries label attributes to distinguish dissimilar l Like PCIM Has conjugate prior given tree structure Learn tree using structural prior and Bayesian model selection Forecast using importance sampling

77 EXPERIMENTAL RESULTS

78 Timing is Important Application: Predict whether the user will issue queries in the category Health & Wellness in the next week.

79 Cross Label Dependencies Matter Application: Predict whether a user who has never done so will issue queries in the category Health & Wellness in the next week.

80 C-PCIMs Generalizes From Less Data Application: Model when users will issue web queries in what categories

81 C-PCIMs Can Forecast Rare Events Application: Predict whether the user will issue queries in the category Mental Health in the next day.

82 (C)PCIMs Allow Diagnosis FailPostReimage t 0: 10, t Reboot t 0: 10, t CheckDriveFail t 0: 20, t 0: 10 λ SendTech = 0 λ SendTech = 0.15 λ SendTech = 0 λ SendTech = 0.75

83 C-PCIMs Allow Richer Representations Application: Model which machines will issue which messages when.

84 Thank You

87 Modeling User Query Behavior for Advertising Data: Logs of user query behavior over time Train: ~100k queries from ~6k users over 2 months Test: ~150k queries from ~6k users over next month Test users selected to have at least 2 weeks of data each No user overlap in train & test. Queries mapped to 476 hierarchical categories 476 labels Attributes: nodes in hierarchy Task: Predict whether a user will issue queries in a target category in a target time period (to decide whether to show user corresponding ads in that period)

88 Modeling Datacenter Machine Behavior Data: Event logs from a datacenter Train ~130k events from first two weeks of month Test: ~170k events from rest of month 221 possible messages 71 machines 15k labels Each machine belongs to 1 of 4 types Attributes: message, machine, machine type Task: Model dependencies Diagnose failures

Machine Learning for Data Science (CS4786) Lecture 19

Machine Learning for Data Science (CS4786) Lecture 19 Hidden Markov Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ Quiz Quiz Two variables can be marginally independent but not