Bayesian Networks in Educational Assessment Tutorial

Size: px
Start display at page:

Download "Bayesian Networks in Educational Assessment Tutorial"

Transcription

1 Bayesian Networks in Educational Assessment Tutorial Session V: Refining Bayes Nets with Data Russell Almond, Bob Mislevy, David Williamson and Duanli Yan Unpublished work ETS 1

2 Agenda SESSION TOPIC PRESENTERS Session 1: Evidence Centered Design David Williamson Session 2: Bayesian Networks Russell Almond Session 3: Bayes Net Tools & Applications Session 4: ACED: ECD in Action Duanli Yan Russell Almond & Duanli Yan Session 5: Refining Bayes Nets with Data Russell Almond 2

3 Outline Variables and Parameters The Hyper-Dirichlet Model The EM Algorithm Reduced Parameter Models Evaluating Model Fit Model Search and Causality 3

4 Variables and Parameters Bayesian statistics does not distinguish between variables and parameters, just known and unknown quantities Define: Variable: person specific Parameter: constant across people Visualize as a two layer network 4

5 First Layer Skill1 Task1-Obs Skill2 Task2-Obs Task3-Obs A simple model with two skills and 3 observables 5

6 Distributions and Variables Skill1 Task1-Obs Skill2 Task2-Obs Variables (values are person specific) Task3-Obs Distributions provide probabilities for variables 6

7 Different People, Same Distributions Student 1 Skill1 Skill2 Task1-Obs Task2-Obs Task3-Obs Student 2 Skill1 Skill2 Task1-Obs Task2-Obs Task3-Obs Student 3 Skill1 Skill2 Task1-Obs Task2-Obs Task3-Obs 7

8 Different People, Same Distributions Student 1 Skill1 Skill2 Task1-Obs Task2-Obs Task3-Obs Student 2 Skill1 Skill2 Task1-Obs Task2-Obs Task3-Obs Student 3 Skill1 Skill2 Task1-Obs Task2-Obs Task3-Obs 8

9 Second Layer Distributions have Parameters Parameters are the same across all people Parameters drop down into first layer to do person specific computations (e.g., scoring) Probability distributions of parameters are called Laws 9

10 Second Layer 2 10

11 Hyper-Markov Properties Spiegelhalter and Lauritzen (1990) make two assumptions of convenience Global Meta Independence parameters from different distributions are independent p 1 and p 2 are independent Local Meta Independence parameters from the same distribution are independent l 1,2, l 1,-2, l -1,2, and l -1,-2 are independent 11

12 Hyper-Dirichlet Law Bayes net distributions are conditional multinomial distributions Dirichlet law is natural conjugate of multinomial distribution (like beta and binomial) Can be thought of as counts of pseudo-observations in each category. Category 1 Category 2 Category 3 Prior (weight of 6 obs) Observed counts (30) Posterior (weight of 36) Each row of each table is an independent Dirichlet Global and Local Independence 12

13 An example in pictures Prior b11[1] sample: Hypothetical experiment: 3 with Skill, 3 without. 0.0 x Likelihood likelihood[1] sample: Actual observation: 7 with Skill, 3 without Posterior p sample: Combined information: 10 with Skill, 6 without

14 Hyper-Dirichlet Law Advantages Natural Conjugate Elicit in terms of effective data Very flexible Netica can do it via EM algorithm Disadvantages Many parameters (exponential in number of parents) May be hard to find data for all conditions (e.g. Skill 1 very high, Skill 2 very low) 14

15 Fully Observed Case If all variables in the Bayes net are observed, learning is easy Hyper-Dirichlet law is natural conjugate of conditional probability tables. Add observed cross-tab to prior to get posterior 15

16 Netica example fully observed 16

17 Partially Observed Case These are basis of E-Step in EM (and also sampling in MCMC). 17

18 Netica example partially observed 18

19 Netica example partially observed 19

20 Four Phase Algorithm For each cycle: 1. Select new Proficiency Parameters 2. Select new Evidence/Link Model Parameters 3. Impute values for proficiency variables 4. Impute values for unobserved evidence/link model variables (e.g., missing observations, context effects) Can exploit the basic Bayes net operations for Phases 3 and 4 20

21 EM Algorithm Variables: (E-Step) Impute Expected Values Usually use expected counts in tables corresponding to CPTs of Bayes net (sufficient statistics) Parameters: (M-Step) Maximize posterior (likelihood) given imputed counts 21

22 MCMC Algorithm For both parameter and variable phases sample from posterior distribution given all other parameters/variables Can use Bayes net sampling algorithm for variable phase: Pick node in junction tree Sample values for variables using posterior for that node Propagate sampled values to neighbors, and sample remaining variables Repeat until all variables are sampled For hyper-dirichlet laws, can use Gibbs sampler Reduced parameter models may require Metropolis algorithm 22

23 Identifiability Technically not a problem, as prior identifies model. But: If prior=posterior, we want to know State label swapping Exchange meaning of High and Low states of proficiency variable Can appear as swapped rows in CPTs Usually need more constrained model to get rid of problem In upcoming Dibello-Samejima model, location and scale of latent variables as in IRT Fix difficulty/discrimation of certain categories, or Scale anchor (set of parameters whose average difficulty/discrimination is constrained) 23

24 Reduced Parameter Models Noisy-and and Noisy-Or models NIDA, DINA and Fusion model (Junker & Sijtsma) DiBello Samejima models Based on effective theta and graded response model Compensatory, Conjunctive, Disjunctive and Inhibitor relationships For both of these model types, number of parameters grows linearly with number of parents 24

25 Noisy-And All input skills needed to solve problem Bypass parameter for Skill j, q j Slip probability (overall), q 0 Probability of correct outcome NIDA/DINA cognitive diagnosis models 25

26 Noisy Min (Max) If skills have more than two levels Use a cut point to make skill binary (e.g., reading skill must be greater than X) Use a Noisy-min model Probability of success is determined by the weakest skill Noisy-And/Min common in ed. measurement, Noisy-Or/Max common in medical diagnosis Number of parameters is linear in number of parents/states 26

27 DiBello--Samejima Models Useful when there are multiple ordered values for both the parent(s) and an observable variable. Single parent version Map each level of parent state to effective theta on IRT (N(0,1)) scale, Now plug into Samejima graded response model to get probability of outcome Uses standard IRT parameters, difficulty and discrimination 27

28 The Effective q Method (1): Samejima s Model X=1 0.6 X= X= Theta a j =1 b j1 =-1 b j2 =+1 Samejima s (1969) psychometric model for graded responses: 28

29 The Effective q Method (2): Conditional Probabilities for Three q s X=1 0.6 X= X= Theta q X=1 (Poor) X=2 (Okay) X=3 (Good) Low= Med= High=

30 Various Structure Functions For Multiple Parents, assign each parent j an effective theta at each level k,. Combine Using a Structure Function Possible Structure Functions: Compensatory = weighted average Conjunctive = min Disjunctive = max q 1 Inhibitor; e.g. level k* on : ~ where q is some low value. 0 ~ s q1, k 1 ~,..., q ~ q 0 J, k J if if k k 1 1 k k * * 30

31 Q-Matrix and Bayes Nets Many tasks are single observable (item) Efficient; Useful for disentangling failures Q-Matrix is a matrix view of these Bayes nets Nonzero entries correspond to skill-to-task edges Used by many diagnostic testing applications (Rule Space, Tatsuoka; Fusion model, General Diagnostic Model (von Davier), NIDA/DINA) Gives an overview of the assessment EM fragment for observable identified by selecting parents of observable, and parametric form for distribution 31

32 Q-Matrix Example EvidenceModel S1 S2 S3 S4 EM8Word EM2ConnectInfo EM8Word EM4SpecInfo EM3ConnectSynth EM8Word EM4SpecInfo Column for each proficiency variable: Is the proficiency relevant for the observable indicated by the row? 1=yes, 0=no. Row for each observable: Which proficiencies are relevant? 32

33 Augmented Q-Matrix EvidenceMoCPTType Difficulty S1 S2 S3 S4 EM8Word Compensatory EM2ConnectICompensatory EM8Word Compensatory EM4SpecInfo Compensatory EM3ConnectSCompensatory EM8Word Compensatory EM4SpecInfo Compensatory Change 0-1 coding to 0-3 to indicate strength of relationship Add a column for distribution type Add a column for difficulty 33

34 Eliciting Priors 1. Elicit Structure (i.e., what are parents of each node) 2. Elicit Distributional Form (e.g., conjunctive, compensatory, inhibitor) 3. Elicit Strength of Relationship 4. Elicit Measure of Certainty (e.g., effective sample size, variance) Often use Linguistic Priors for 3 and 4 (e.g., map Hard and Easy onto normal with analystselected mean and variance 34

35 Targets of Model Criticism Indices (Cowell et al., 1999) Parent-Child Relationship Adequacy of conditional probability distribution given observed parents (Box, 1980; 1983) Note: Parent Data not usually observed Unconditional Node Distribution Getting Marginal Distributions for Nodes usually pretty easy Conditional Node Distribution Leave one out prediction Captures Relationship among nodes Two Observable Table Tests for local dependence Global Monitor Overall adequacy of the model with respect to observed data 35

36 Common Model Criticism Indices Compare predictions to subsequent observations. A surprise index is an empirical measure of how unexpected an observation is. Weather forecasting pedigree (Murphy & Winkler, 1984). Typically designed as penalty indices Penalty incurred when a low probability of occurrence is assigned to an event which subsequently occurs Common indices Logarithmic Score Weaver s Surprise Index Quadratic Brier Score Good s Logarithmic Surprise Index Ranked Probability Score 36

37 Logarithmic Score (Spiegelhalter et al., 1993) S log - log p Evaluated for each node, as log probability of the event that actually occurred. p is the prior probability of the observed state Greater than or equal to zero; zero if a probability of 100% had been assigned to the observed outcome, higher if observed value was less expected. 37

38 Weaver s Surprise Index (Weaver, 1948) S. I. i E p p i p1 p2... p where n is the number of possible outcomes Distinction between rare and surprising event Rare small probability Surprising small relative probability Values indicative of surprising observations as they move away from unity; Weaver suggests: value of 3-5 is not large values of 10 begin to be surprising values above 1,000 are definitely surprising 2 2 i p 2 n 38

39 Williamson Prediction Error Technique For each person i For each observable j Predict X i j from X i -j Score S i j using one of the scoring rules previously described Sum over items to gauge person fit Sum over people to gauge item fit Sum over items & people to gauge model fit 39

40 Reference Distribution Distribution of scores under null hypothesis is unknown. Simulate data from model and calculate S i j for each simulee/observable pair Take a bootstrap sample from S i j to get reference distribution (sample simulees) 40

41 Posterior Predictive Model Checks Guttman [1967], Rubin [1984], Sinharay [2004] Method: parameters in model y data y rep replicated data using same parameters -- shadow data. Pick a statistics D(y, ) Compare D(y, ) and D(y rep, ) Often look at Pr(D(y, ) > D(y rep, )) Sometimes does not depend on : D(y) and D(y rep ) 41

42 PPMC in BUGS First, create shadow data by copying data line in model Y[i] ~ dxxx(omega) Yrep[i] ~ dxxx(omega) Next, have BUGS calculate D(y, ) and D(y rep, ) stat <- D(Y,omega) statrep <- D(Yrep, omega) pstat <- (stat < statrep) Mean of pstat is PP p-value 42

43 Expected Value vs Actual Number correct residual Posterior Mean of Expected number correct score Sinharay and Almond (2007), based on data from Tatsuoka 43

44 Observable Characteristic Plots Data from Tatsuoka mixed number subtraction test Sinharay, Almond and Yan (2004) X axis groups are equivalence classes of proficiency profiles, group membership estimated through MCMC (one cycle) Horizontal lines indicate success probabilities for people who do/do not have necessary skills Glyph at center of line shows whether or not group expected to succeed Bars give credible intervals for group success rate 44

45 Learning Models Make Modifications to model to improve model fit. Model Search Maximum Score Model Search MCMC Best Set of Models Heckerman (1995; reprinted in Jordan,1998) and Buntine(1996) provide good tutorials. Cowell et al. (1999) also has several chapters on this topic. Neapolitan (2004) devotes much of the book to this topic 45

46 Limitations of Learning (1) Certain Models are mathematically identical, can t be distinguished from fit score Only can distinguish models which differ on independence conditions. A B C A B C A B C These are the same (except for order of parameters) A B C This one has different independence conditions 46

47 Limitations of Learning (2) Latent Variables add other possible models Latent Variables can be hidden causes Cannot distinguish models when latent variables are not observed. A A H No Effect H Intermediate Step C C A A H Common Cause H C C Contributing Factor All four models have identical scores. 47

48 Causality and Learning Many authors (especially Pearl) use learning to learn causality. Can distinguish patterns where arrows point A B C inwards. Technical definition of causality at odd with the lay definition Always relative to observed variables. 48

49 Causality Example Which variables are included in model search affects conclusions Many unmodeled intermediate steps in both pictures Be cautious with the use of the word causal in a technical sense. Gender Race Gender Race Parent's Education Proficiency Proficiency Item1 Item2 Item3 Item1 Item2 Item3 Model A Model B 49

An IRT-based Parameterization for Conditional Probability Tables

An IRT-based Parameterization for Conditional Probability Tables An IRT-based Parameterization for Conditional Probability Tables Russell G. Almond Educational Psychology and Learning Systems College of Education Florida State Univeristy Tallahassee, FL 32312 Abstract

More information

Assessing Fit of Models With Discrete Proficiency Variables in Educational Assessment

Assessing Fit of Models With Discrete Proficiency Variables in Educational Assessment Research Report Assessing Fit of Models With Discrete Proficiency Variables in Educational Assessment Sandip Sinharay Russell Almond Duanli Yan Research & Development February 2004 RR0407 Assessing Fit

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Generative Models for Discrete Data

Generative Models for Discrete Data Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers

More information

Posterior Predictive Model Checks in Cognitive Diagnostic Models

Posterior Predictive Model Checks in Cognitive Diagnostic Models Posterior Predictive Model Checks in Cognitive Diagnostic Models Jung Yeon Park Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Equivalency of the DINA Model and a Constrained General Diagnostic Model

Equivalency of the DINA Model and a Constrained General Diagnostic Model Research Report ETS RR 11-37 Equivalency of the DINA Model and a Constrained General Diagnostic Model Matthias von Davier September 2011 Equivalency of the DINA Model and a Constrained General Diagnostic

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Structure learning in human causal induction

Structure learning in human causal induction Structure learning in human causal induction Joshua B. Tenenbaum & Thomas L. Griffiths Department of Psychology Stanford University, Stanford, CA 94305 jbt,gruffydd @psych.stanford.edu Abstract We use

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Psychometric Models: The Loglinear Cognitive Diagnosis Model. Section #3 NCME 2016 Training Session. NCME 2016 Training Session: Section 3

Psychometric Models: The Loglinear Cognitive Diagnosis Model. Section #3 NCME 2016 Training Session. NCME 2016 Training Session: Section 3 Psychometric Models: The Loglinear Cognitive Diagnosis Model Section #3 NCME 2016 Training Session NCME 2016 Training Session: Section 3 Lecture Objectives Discuss relevant mathematical prerequisites for

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University A Markov chain Monte Carlo approach to confirmatory item factor analysis Michael C. Edwards The Ohio State University An MCMC approach to CIFA Overview Motivating examples Intro to Item Response Theory

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

CSE 473: Artificial Intelligence Autumn Topics

CSE 473: Artificial Intelligence Autumn Topics CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Online Item Calibration for Q-matrix in CD-CAT

Online Item Calibration for Q-matrix in CD-CAT Online Item Calibration for Q-matrix in CD-CAT Yunxiao Chen, Jingchen Liu, and Zhiliang Ying November 8, 2013 Abstract Item replenishment is important to maintaining a large scale item bank. In this paper

More information

Automating variational inference for statistics and data mining

Automating variational inference for statistics and data mining Automating variational inference for statistics and data mining Tom Minka Machine Learning and Perception Group Microsoft Research Cambridge A common situation You have a dataset Some models in mind Want

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Estimation of Q-matrix for DINA Model Using the Constrained. Generalized DINA Framework. Huacheng Li

Estimation of Q-matrix for DINA Model Using the Constrained. Generalized DINA Framework. Huacheng Li Estimation of Q-matrix for DINA Model Using the Constrained Generalized DINA Framework Huacheng Li Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

A multivariate multilevel model for the analysis of TIMMS & PIRLS data A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

LEARNING WITH BAYESIAN NETWORKS

LEARNING WITH BAYESIAN NETWORKS LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang - 2006, Jeremy Gould 2013, Chip Galusha -2014 Jeremy Gould 2013Chip Galus May 6th, 2016

More information

Bayes Nets III: Inference

Bayes Nets III: Inference 1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

Analyzing Hierarchical Data with the DINA-HC Approach. Jianzhou Zhang

Analyzing Hierarchical Data with the DINA-HC Approach. Jianzhou Zhang Analyzing Hierarchical Data with the DINA-HC Approach Jianzhou Zhang Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Estimating the Q-matrix for Cognitive Diagnosis Models in a. Bayesian Framework

Estimating the Q-matrix for Cognitive Diagnosis Models in a. Bayesian Framework Estimating the Q-matrix for Cognitive Diagnosis Models in a Bayesian Framework Meng-ta Chung Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive

More information

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Learning With Bayesian Networks. Markus Kalisch ETH Zürich Learning With Bayesian Networks Markus Kalisch ETH Zürich Inference in BNs - Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal

More information

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement Second meeting of the FIRB 2012 project Mixture and latent variable models for causal-inference and analysis

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Statistical Analysis of Q-matrix Based Diagnostic. Classification Models

Statistical Analysis of Q-matrix Based Diagnostic. Classification Models Statistical Analysis of Q-matrix Based Diagnostic Classification Models Yunxiao Chen, Jingchen Liu, Gongjun Xu +, and Zhiliang Ying Columbia University and University of Minnesota + Abstract Diagnostic

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Learning Bayesian Networks (part 1) Goals for the lecture

Learning Bayesian Networks (part 1) Goals for the lecture Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Bayesian time series classification

Bayesian time series classification Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 38 Hierarchical Cognitive Diagnostic Analysis: Simulation Study Yu-Lan Su, Won-Chan Lee, & Kyong Mi Choi Dec 2013

More information

On a Discrete Dirichlet Model

On a Discrete Dirichlet Model On a Discrete Dirichlet Model Arthur Choi and Adnan Darwiche University of California, Los Angeles {aychoi, darwiche}@cs.ucla.edu Abstract The Dirichlet distribution is a statistical model that is deeply

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014

More information

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

Bayesian Network Representation

Bayesian Network Representation Bayesian Network Representation Sargur Srihari srihari@cedar.buffalo.edu 1 Topics Joint and Conditional Distributions I-Maps I-Map to Factorization Factorization to I-Map Perfect Map Knowledge Engineering

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22. Introduction to Artificial Intelligence V22.0472-001 Fall 2009 Lecture 15: Bayes Nets 3 Midterms graded Assignment 2 graded Announcements Rob Fergus Dept of Computer Science, Courant Institute, NYU Slides

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example Bayesian Networks Characteristics of Learning BN Models (All hail Judea Pearl) (some hail Greg Cooper) Benefits Handle incomplete data Can model causal chains of relationships Combine domain knowledge

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car CSE 573: Artificial Intelligence Autumn 2012 Bayesian Networks Dan Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer Outline Probabilistic models (and inference)

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Lecture 16: Mixtures of Generalized Linear Models

Lecture 16: Mixtures of Generalized Linear Models Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Tensor rank-one decomposition of probability tables

Tensor rank-one decomposition of probability tables Tensor rank-one decomposition of probability tables Petr Savicky Inst. of Comp. Science Academy of Sciences of the Czech Rep. Pod vodárenskou věží 2 82 7 Prague, Czech Rep. http://www.cs.cas.cz/~savicky/

More information

Diagnostic Classification Models: Psychometric Issues and Statistical Challenges

Diagnostic Classification Models: Psychometric Issues and Statistical Challenges Diagnostic Classification Models: Psychometric Issues and Statistical Challenges Jonathan Templin Department of Educational Psychology The University of Georgia University of South Carolina Talk Talk Overview

More information

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton) Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Stochastic Approximation Methods for Latent Regression Item Response Models

Stochastic Approximation Methods for Latent Regression Item Response Models Research Report Stochastic Approximation Methods for Latent Regression Item Response Models Matthias von Davier Sandip Sinharay March 2009 ETS RR-09-09 Listening. Learning. Leading. Stochastic Approximation

More information

Learning Bayesian networks

Learning Bayesian networks 1 Lecture topics: Learning Bayesian networks from data maximum likelihood, BIC Bayesian, marginal likelihood Learning Bayesian networks There are two problems we have to solve in order to estimate Bayesian

More information

Package dina. April 26, 2017

Package dina. April 26, 2017 Type Package Title Bayesian Estimation of DINA Model Version 1.0.2 Date 2017-04-26 Package dina April 26, 2017 Estimate the Deterministic Input, Noisy ``And'' Gate (DINA) cognitive diagnostic model parameters

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information