Parameter Es*ma*on: Cracking Incomplete Data
|
|
- Lydia Parrish
- 6 years ago
- Views:
Transcription
1 Parameter Es*ma*on: Cracking Incomplete Data Khaled S. Refaat Collaborators: Arthur Choi and Adnan Darwiche
2 Agenda Learning Graphical Models Complete vs. Incomplete Data Exploi*ng Data for Decomposi*on EDML vs. EM
3 Learning Graphical Models
4 Learning Graphical Model Parameters!!!! V X!!!!!!!!!!!! Y Z V X Y Z False True? False True False? True True True? False Our goal is to find parameter es*mates that maximize the likelihood:
5 Complete vs. Incomplete Data
6 Complete Data V X Y Z False True True False True False False True True True True False Incomplete Data V X Y Z False True? False? False? True True True? False
7 Complete Data V X Y Z False True True False True False False True True True True False Incomplete Data V X Y Z False True? False? False? True True True? False closed- form or a convex op*miza*on problem
8 Complete Data V X Y Z False True True False True False False True True True True False closed- form or a convex op*miza*on problem Incomplete Data V X Y Z False True? False? False? True True True? False hard non- convex op*miza*on problem
9 Complete Data V X Y Z False True True False True False False True True True True False closed- form or a convex op*miza*on problem Incomplete Data V X Y Z False True? False? False? True True True? False hard non- convex op*miza*on problem
10 Incomplete Data V X Y Z False True? False? False? True True True? False
11 Incomplete Data Fully- observed variables V X Y Z False True? False? False? True True True? False
12 Incomplete Data Hidden variables V X Y Z False True? False? False? True True True? False
13 Exploi*ng Data for Decomposi*on
14 Op*miza*on Op*miza*on Algorithm Inference Engine The op*miza*on algorithm (e.g. EM, EDML, Gradient Method) calls the inference engine with every unique data example at each itera*on.
15 Inference Decomposi*on Techniques!!!! V!!!! V X!!!!!!!! Y X!!!!!!!! Y!!!! Z!!!! Z Prune edges outgoing from observed nodes before compu*ng probabili*es.
16 Main Idea (NIPS 14) Op*miza*on Algorithm Op*miza*on Algorithm Op*miza*on Algorithm Inference Engine Inference Engine Inference Engine We decompose the op*miza*on problem itself to get decomposed convergence and data compression.
17 Learning from Incomplete Data!!!! V X!!!!!!!!!!!! Y Z V X Y Z False True? False True False? True True True? False
18 Decomposing the Op*miza*on Problem!!!! V!!!! V X!!!!!!!! Y X!!!!!!!! Y!!!! Z!!!! Z Get three components:
19 The components of a network par**on its parameters into groups:
20 !!!! V!!!!!!!! X Y Boundary Variable!!!! Z!!!! V Boundary Variable!!!! X
21 !!!! V!!!!!!!! X Y!!!! Z!!!! V!!!! X
22 Learned Parameters!!!! V!!!! V!!!!!!!! X Y!!!! X!!!! Z
23 Theorem (NIPS 14) Any sta*onary points for the sub- problems combine to create a sta*onary point for the original problem.
24 Theorem (NIPS 14) Any sta*onary points for the sub- problems combine to create a sta*onary point for the original problem. Every sta*onary point for the original problem induces sta*onary points for the sub- problems.
25 Experimental SeYng
26 Experimental SeYng EM: uses an inference engine that decomposes inference.
27 Experimental SeYng EM: uses an inference engine that decomposes inference. D- EM: decomposes the op*miza*on problem itself, solves each sub- problem using EM, and combines the solu*ons.
28 The Computa*onal Benefit of Decomposi*on Speed up 500 Speed up Observed % Observed % Figure: Speed- up of D- EM over EM on chain networks: three chains (180, 380, and 500 variables) (le^), and tree networks (63, 127, 255, and 511 variables) (right).
29 Table: Speed- up of D- EM over EM on UAI networks.
30 Reasons for Speed- up
31 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.
32 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.
33 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.
34 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.
35 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.
36 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.
37 Decomposed Convergence # iterations Sub network Figure: Graph showing the number of itera*ons required by each sub- network sorted descendingly.
38 Data Compression
39 Data Compression A B C D E F G H I J K L
40 Data Compression A B C D E F G H I J K L
41 Data Compression A B C D E F G H I J K L A B Count True True 1000 True False 5000 False True 3000 False False 8000
42 Speed up Data Compression Dataset Size Figure: Speed- up of D- EM over EM as a func*on of dataset size (log- scale).
43 EDML vs. EM
44 So^ Evidence
45 Hard Evidence X X! {S 1, S 2, S 3 }
46 Hard Evidence X X = S 1
47 So^ Evidence X X = S 1 with some probability
48 So^ Evidence X η η = true
49 So^ Evidence η X p(η X) true S 1! 1 X true true S 2 S 3! 2! 3 η η = true
50 Edge Dele*on
51 Edge Dele*on (cont.)
52 Choi et al 2006 Assert So^ Evidence
53 Problem Defini*on Original Bayesian Network: H S E? true? true???? true S H 2 E
54 Meta Network Crea*on H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3 Example 1 Example 2 Example 3
55 Meta Network Crea*on (cont.)! H Prior knowledge on parameters H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3! S H!S H!! E H!E H!
56 Meta Network Crea*on (cont.)! H Prior knowledge on parameters H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3! S H!S H!! E H!E H!
57 Assert Data as Evidence! H H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3! S H!S H!! E H!E H!
58 H S E? true? true???? true! H H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3! S H!S H!! E H!E H!
59 EDML (Delete Edges)! H H 2 H 1 H 2 H 3 S 1 E 1 S 2 E 2 S 3 E 3! S H!S H!! E H!E H!
60 EDML (Learning from So^ Evidence)! H
61 EDML (Learning from So^ Evidence)! H H 1 H 2 H 3 So^ Evidence from Example 1 So^ Evidence from Example 2 So^ Evidence from Example 3
62 EDML (Learning from So^ Evidence)! H H 1 H 2 H 3 Maximizing the posterior probability is a convex op*miza*on problem (UAI 11, UAI 12).
63
64 EDML Fixed Points (UAI 12) Theorem: EDML fixed points are precisely the EM fixed points.
65 Convergence (UAI 11) Theorem: When only leaves have missing values, EDML converges in one itera*on, whereas EM may not. H 2 S 2 E 2
66 Experiment EM vs. EDML (itera*ons) Category %EDML beqer %EM beqer EDML Speed- up % EM Speed- up % Hiding 10% 93.82% 6.18% 84.59% 87.13% Hiding 25% 90.95% 9.05% 83.83% 75.70% Hiding 35% 82.24% 17.76% 86.26% 75.09% Hiding 50% 77.61% 22.39% 87.80% 80.21% Hiding 70% 75.65% 24.35% 84.48% 74.21% Average 83.05% 16.95% 85.41% 76.96%
67 Experiment EM vs. EDML (itera*ons) Category %EDML beqer %EM beqer EDML Speed- up % EM Speed- up % Hiding 10% 93.82% 6.18% 84.59% 87.13% Hiding 25% 90.95% 9.05% 83.83% 75.70% Hiding 35% 82.24% 17.76% 86.26% 75.09% Hiding 50% 77.61% 22.39% 87.80% 80.21% Hiding 70% 75.65% 24.35% 84.48% 74.21% Average 83.05% 16.95% 85.41% 76.96%
68 Andes (Hiding 25% of the nodes)
69 EDML Generaliza*on (NIPS 13) We generalized EDML as a parallel coordinate descent algorithm. This helps derive new EDML algorithms for other graphical models.
70 EDML for Learning MRFs from Complete Data (NIPS 13)
71 Conclusion Learning from incomplete data can be difficult.
72 Conclusion Learning from incomplete data can be difficult. Good news: pajerns of incompleteness may be exploited.
73 Conclusion Learning from incomplete data can be difficult. Good news: pajerns of incompleteness may be exploited. EDML becomes more exact as the data becomes more complete.
74 Thanks!
CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationIntroduc)on to Ar)ficial Intelligence
Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause Bayesian networks! Compact representa)on of distribu)ons over large number of variables! (OQen) allows
More informationEfficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data
Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data SUPPLEMENTARY MATERIAL A FACTORED DELETION FOR MAR Table 5: alarm network with MCAR data We now give a more detailed derivation
More informationGene Regulatory Networks II Computa.onal Genomics Seyoung Kim
Gene Regulatory Networks II 02-710 Computa.onal Genomics Seyoung Kim Goal: Discover Structure and Func;on of Complex systems in the Cell Identify the different regulators and their target genes that are
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationLearning Bayesian Network Parameters under Equivalence Constraints
Learning Bayesian Network Parameters under Equivalence Constraints Tiansheng Yao 1,, Arthur Choi, Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 90095
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationBayesian Networks. 10. Parameter Learning / Missing Values
Bayesian Networks Bayesian Networks 10. Parameter Learning / Missing Values Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Business Economics and Information Systems
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationPart 2. Representation Learning Algorithms
53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then
More informationStructure Learning: the good, the bad, the ugly
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationSlides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP
Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)
Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles
More informationDiscrimina)ve Latent Variable Models. SPFLODD November 15, 2011
Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs
More informationMarkov Networks.
Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationSymbolic Variable Elimination in Discrete and Continuous Graphical Models. Scott Sanner Ehsan Abbasnejad
Symbolic Variable Elimination in Discrete and Continuous Graphical Models Scott Sanner Ehsan Abbasnejad Inference for Dynamic Tracking No one previously did this inference exactly in closed-form! Exact
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationNoisy-or classifier. Laboratory for Intelligent Systems University of Economics, Prague, Czech Republic
Noisy-or classifier Jiří Vomlel Laboratory for Intelligent Systems University of Economics, Prague, Czech Republic Institute of Information Theory and Automation Academy of Sciences of the Czech Republic
More informationReadings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008
Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationBayesian networks Lecture 18. David Sontag New York University
Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence
More informationTractable Learning in Structured Probability Spaces
Tractable Learning in Structured Probability Spaces Guy Van den Broeck UCLA Stats Seminar Jan 17, 2017 Outline 1. Structured probability spaces? 2. Specification language Logic 3. Deep architecture Logic
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationTractable Learning in Structured Probability Spaces
Tractable Learning in Structured Probability Spaces Guy Van den Broeck DTAI Seminar - KU Leuven Dec 20, 2016 Structured probability spaces? Running Example Courses: Logic (L) Knowledge Representation (K)
More informationData Structures for Efficient Inference and Optimization
Data Structures for Efficient Inference and Optimization in Expressive Continuous Domains Scott Sanner Ehsan Abbasnejad Zahra Zamani Karina Valdivia Delgado Leliane Nunes de Barros Cheng Fang Discrete
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationProbabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning
Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits
More informationPSDDs for Tractable Learning in Structured and Unstructured Spaces
PSDDs for Tractable Learning in Structured and Unstructured Spaces Guy Van den Broeck DeLBP Aug 18, 2017 References Probabilistic Sentential Decision Diagrams Doga Kisa, Guy Van den Broeck, Arthur Choi
More informationGraphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum
Graphical Models Lecture 1: Mo4va4on and Founda4ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. Board work Expert systems the desire for probability
More informationOnline Algorithms for Sum-Product
Online Algorithms for Sum-Product Networks with Continuous Variables Priyank Jaini Ph.D. Seminar Consistent/Robust Tensor Decomposition And Spectral Learning Offline Bayesian Learning ADF, EP, SGD, oem
More informationBias/variance tradeoff, Model assessment and selec+on
Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised
More informationG8325: Variational Bayes
G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c
More informationComputer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13
Computer Vision Pa0ern Recogni4on Concepts Part I Luis F. Teixeira MAP- i 2012/13 What is it? Pa0ern Recogni4on Many defini4ons in the literature The assignment of a physical object or event to one of
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationCSE 150. Assignment 6 Summer Maximum likelihood estimation. Out: Thu Jul 14 Due: Tue Jul 19
SE 150. Assignment 6 Summer 2016 Out: Thu Jul 14 ue: Tue Jul 19 6.1 Maximum likelihood estimation A (a) omplete data onsider a complete data set of i.i.d. examples {a t, b t, c t, d t } T t=1 drawn from
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationLearning Markov Networks. Presented by: Mark Berlin, Barak Gross
Learning Markov Networks Presented by: Mark Berlin, Barak Gross Introduction We shall egi, pehaps Eugene Onegin, Chapter VI Off did he take, I folloed at his heels. Inferno, Canto II Reminder Until now
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationNeural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
Neural Networks William Cohen 10-601 [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ] WHAT ARE NEURAL NETWORKS? William s notation Logis;c regression + 1
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationbound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o
Category: Algorithms and Architectures. Address correspondence to rst author. Preferred Presentation: oral. Variational Belief Networks for Approximate Inference Wim Wiegerinck David Barber Stichting Neurale
More informationAdap>ve Filters Part 2 (LMS variants and analysis) ECE 5/639 Sta>s>cal Signal Processing II: Linear Es>ma>on
Adap>ve Filters Part 2 (LMS variants and analysis) Sta>s>cal Signal Processing II: Linear Es>ma>on Eric Wan, Ph.D. Fall 2015 1 LMS Variants and Analysis LMS variants Normalized LMS Leaky LMS Filtered-X
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More informationLearning Bayesian Networks (part 1) Goals for the lecture
Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials
More informationParameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets
Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationMethodological Foundations of Biomedical Informatics (BMSC-GA 4449) Optimization
Methodological Foundations of Biomedical Informatics (BMSCGA 4449) Optimization Op#miza#on A set of techniques for finding the values of variables at which the objec#ve func#on amains its cri#cal (minimal
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationMarkov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)
Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More information: Advanced Compiler Design Compu=ng DF(X) 3.4 Algorithm for inser=on of φ func=ons 3.5 Algorithm for variable renaming
263-2810: Advanced Compiler Design 3.3.2 Compu=ng DF(X) 3.4 Algorithm for inser=on of φ func=ons 3.5 Algorithm for variable renaming Thomas R. Gross Computer Science Department ETH Zurich, Switzerland
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationStarAI Full, 6+1 pages Short, 2 page position paper or abstract
StarAI 2015 Fifth International Workshop on Statistical Relational AI At the 31st Conference on Uncertainty in Artificial Intelligence (UAI) (right after ICML) In Amsterdam, The Netherlands, on July 16.
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationCon$nuous Op$miza$on: The "Right" Language for Fast Graph Algorithms? Aleksander Mądry
Con$nuous Op$miza$on: The "Right" Language for Fast Graph Algorithms? Aleksander Mądry We have been studying graph algorithms for a while now Tradi$onal view: Graphs are combinatorial objects graph algorithms
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationDirected Probabilistic Graphical Models CMSC 678 UMBC
Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,
More informationMachine Learning 3. week
Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationProbabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning
Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck HRL/ACTIONS @ KR Oct 28, 2018 Foundation: Logical Circuit Languages Negation Normal Form Circuits Δ
More informationLearning With Bayesian Networks. Markus Kalisch ETH Zürich
Learning With Bayesian Networks Markus Kalisch ETH Zürich Inference in BNs - Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal
More informationThe exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS 188 Fall 2018 Introduction to Artificial Intelligence Practice Final You have approximately 2 hours 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib
More informationSean Escola. Center for Theoretical Neuroscience
Employing hidden Markov models of neural spike-trains toward the improved estimation of linear receptive fields and the decoding of multiple firing regimes Sean Escola Center for Theoretical Neuroscience
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationAdaptive Crowdsourcing via EM with Prior
Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationProbabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April
Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference
More information