Which coin will I use? Which coin will I use? Logistics. Statistical Learning. Topics. 573 Topics. Coin Flip. Coin Flip

Similar documents
Logistics. Naïve Bayes & Expectation Maximization. 573 Schedule. Coming Soon. Estimation Models. Topics

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

CSE 473: Artificial Intelligence Autumn Topics

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Bayesian Learning. Instructor: Jesse Davis

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473

Bayesian Learning (II)

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

CS 361: Probability & Statistics

Probabilistic Classification

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Directed Graphical Models

CS 361: Probability & Statistics

Bayesian Networks. Motivation

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CSE 473: Artificial Intelligence Autumn 2011

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 188: Artificial Intelligence Spring Announcements

Intelligent Systems (AI-2)

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Bayesian Networks (Part II)

CS 188: Artificial Intelligence Fall 2009

Review: Bayesian learning and inference

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Probabilistic Models. Models describe how (a portion of) the world works

Bayesian Methods: Naïve Bayes

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Introduction to Artificial Intelligence. Unit # 11

PROBABILISTIC REASONING SYSTEMS

CS 5522: Artificial Intelligence II

COMP90051 Statistical Machine Learning

Learning Bayesian Networks (part 1) Goals for the lecture

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CSC321 Lecture 18: Learning Probabilistic Models

BN Semantics 3 Now it s personal! Parameter Learning 1

Intelligent Systems (AI-2)

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

CS 188: Artificial Intelligence Fall 2008

Intelligent Systems (AI-2)

CS 343: Artificial Intelligence

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Probabilistic Machine Learning

Artificial Intelligence Bayes Nets: Independence

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

Logistic Regression. Machine Learning Fall 2018

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Models in Machine Learning

MLE/MAP + Naïve Bayes

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Directed Graphical Models or Bayesian Networks

The Naïve Bayes Classifier. Machine Learning Fall 2017

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Probabilistic Representation and Reasoning

Approximate Inference

Generative Clustering, Topic Modeling, & Bayesian Inference

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

CS 7180: Behavioral Modeling and Decision- making in AI

PMR Learning as Inference

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

CPSC 340: Machine Learning and Data Mining

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Probabilistic Models

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Intelligent Systems (AI-2)

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Probability Theory and Simulation Methods

Reasoning Under Uncertainty: More on BNets structure and construction

Reasoning Under Uncertainty: More on BNets structure and construction

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Introduction to Bayesian Learning

Machine Learning 4771

Probabilistic Graphical Models

Computational Cognitive Science

Markov Networks.

Lecture 5: Bayesian Network

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

A brief introduction to Conditional Random Fields

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

LEARNING WITH BAYESIAN NETWORKS

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Learning Bayesian belief networks

Machine Learning, Fall 2009: Midterm

Naïve Bayes classification

MLE/MAP + Naïve Bayes

Naïve Bayes Classifiers and Logistic Regression. Doug Downey Northwestern EECS 349 Winter 2014

Learning P-maps Param. Learning

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Transcription:

Logistics Statistical Learning CSE 57 Team Meetings Midterm Open book, notes Studying See AIMA exercises Daniel S. Weld Daniel S. Weld Supervised Learning 57 Topics Logic-Based Reinforcement Learning Planning Probabilistic Knowledge Representation & Inference Search Problem Spaces Agency Topics Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates Priors Learning Structure of Bayesian Networks Daniel S. Weld Daniel S. Weld 4 Coin Flip Coin Flip H ) =. H ) =.5 H H ) =. H ) =.5 H Which coin will I use? ) = / ) = / ) = / Prior: Probability of a hypothesis before we make any observations Which coin will I use? ) = / ) = / ) = / Uniform Prior: All hypothesis are equally likely before we make any observations

Experiment : Heads H) =? H) =? H) =? Experiment : Heads H) =.66 H) =. H) =.6 Posterior: Probability of a hypothesis given data H )=. H ) =.5 H )=/ ) = / ) = / H ) =. H ) =.5 H ) = / ) = / ) = / Terminology Experiment : Tails Prior: Probability of a hypothesis before we see any data Uniform Prior: A prior that makes all hypothesis equaly likely Posterior: Probability of a hypothesis after we saw some data Likelihood: Probability of data given hypothesis HT) =? HT) =? HT) =? H ) =. H ) =.5 H ) = / ) = / ) = / Experiment : Tails HT) =. HT) =.58 HT) =. Experiment : Tails HT) =. HT) =.58 HT) =. H ) =. H ) =.5 H ) = / ) = / ) = / H ) =. H ) =.5 H ) = / ) = / ) = /

Your Estimate? What is the probability of heads after two experiments? Most likely coin: Best estimate for H) H ) =.5 Most likely coin: Your Estimate? Maximum Likelihood Estimate: The best hypothesis that fits observed data assuming uniform prior Best estimate for H) H ) =.5 H ) =. H ) =.5 H ) = / ) = / ) = / H ) =.5 ) = / Using Prior Knowledge Using Prior Knowledge We can encode it in the prior: ) =.5 ) =.5 ) =.7 H ) =. H ) =.5 H H ) =. H ) =.5 H Experiment : Heads H) =? H) =? H) =? Experiment : Heads H) =.6 H) =.65 H) =.89 Compare with ML posterior after Exp : H) =.66 H) =. H) =.6 H ) =. H ) =.5 H ) =.5 ) =.5 ) =.7 H ) =. H ) =.5 H ) =.5 ) =.5 ) =.7

Experiment : Tails HT) =? HT) =? HT) =? Experiment : Tails HT) =.5 HT) =.48 HT) =.485 H ) =. H ) =.5 H ) =.5 ) =.5 ) =.7 H ) =. H ) =.5 H ) =.5 ) =.5 ) =.7 Experiment : Tails HT) =.5 HT)=.48 HT) =.485 Your Estimate? What is the probability of heads after two experiments? Most likely coin: Best estimate for H) H H ) =. H ) =.5 H ) =.5 ) =.5 ) =.7 H ) =. H ) =.5 H ) =.5 ) =.5 ) =.7 Most likely coin: Your Estimate? Maximum A Posteriori (MAP) Estimate: The best hypothesis that fits observed data assuming a non-uniform prior Best estimate for H) H Did We Do The Right Thing? HT)=.5 HT)=.48 HT)=.485 H ) =.7 H ) =. H ) =.5 H 4

Did We Do The Right Thing? A Better Estimate HT) =.5 HT)=.48 HT)=.485 and are almost equally likely Recall: =.68 HT)=.5 HT)=.48 HT)=.485 H ) =. H ) =.5 H H ) =. H ) =.5 H Bayesian Estimate Bayesian Estimate: Minimizes prediction error, given data and (generally) assuming a non-uniform prior =.68 HT)=.5 HT)=.48 HT)=.485 H ) =. H ) =.5 H Comparison After more experiments: HTH 8 ML (Maximum Likelihood): H) =.5 after experiments: H MAP (Maximum A Posteriori): H after experiments: H Bayesian: H) =.68 after experiments: H Comparison Summary For Now ML (Maximum Likelihood): Easy to compute MAP (Maximum A Posteriori): Still easy to compute Incorporates prior knowledge Bayesian: Minimizes error => great when data is scarce Potentially much harder to compute Maximum Likelihood Estimate Maximum A Posteriori Estimate Bayesian Estimate Prior Uniform Any Any Hypothesis The most likely The most likely Weighted combination 5

....4.5.6.7.8.9....4.5.6.7.8.9....4.5.6.7.8.9....4.5.6.7.8.9....4.5.6.7.8.9....4.5.6.7.8.9 5 4-5 4 -....4.5.6.7.8.9....4.5.6.7.8.9....4.5.6.7.8.9....4.5.6.7.8.9 Continuous Case Continuous Case In the previous example, we chose from a discrete set of three coins In general, we have to pick from a continuous distribution of biased coins Continuous Case Continuous Case Prior Exp : Heads Exp : Tails uniform with background knowledge....4.5.6.7.8.9 Continuous Case Posterior after experiments: After Experiments... Posterior: ML Estimate MAP Estimate Bayesian Estimate w/ uniform prior ML Estimate MAP Estimate Bayesian Estimate w/ uniform prior with background knowledge with background knowledge 6

After Experiments... Topics Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates Priors Learning Structure of Bayesian Networks Daniel S. Weld 8 Review: Conditional Probability A B) is the probability of A given B Assumes that B is the only info known. Defined by: A B) A B) = B) Conditional Independence A&B not independent, since A B) < A) A A B True A A B B True B Daniel S. Weld 9 Daniel S. Weld 4 True Conditional Independence But: A&B are made independent by C A A B B A C C A C) = A B, C) Bayes Rule E H ) H ) P ( H E) = E) Simple proof from def of conditional probability: H E) H E) = (Def. cond. prob.) E) H E) E H ) = (Def. cond. prob.) H ) P ( H E) = E H ) H ) (Mult by H) in line ) B C QED: E H ) H ) P ( H E) = E) (Substitute # in #) Daniel S. Weld 4 Daniel S. Weld 4 7

5 5 5..4.6.8-5 8 6 4 8 6 4..4.6.8 - An Example Bayes Net Given Parents, X is Independent of Non-Descendants Earthquake Burglary Pr(B=t) Pr(B=f).5.95 Radio Alarm Pr(A E,B) e,b.9 (.) e,b. (.8) e,b.85 (.5) e,b. (.99) NbrCalls NbrCalls Daniel S. Weld 4 Daniel S. Weld 44 Given Markov Blanket, X is Independent of All Other Nodes Parameter Estimation and Bayesian Networks MB(X) = Par(X) Childs(X) Par(Childs(X)) Daniel S. Weld 45 E B R A J M T F T T F T F F F F F T F T F T T T F F F T T T F T F F F F... We have: - Bayes Net structure and observations - We need: Bayes Net parameters Parameter Estimation and Bayesian Networks Parameter Estimation and Bayesian Networks E B R A J M T F T T F T F F F F F T F T F T T T F F F T T T F T F F F F... E B R A J M T F T T F T F F F F F T F T F T T T F F F T T T F T F F F F... B) =? Prior + data = Now compute either MAP or Bayesian estimate A E,B) =? A E, B) =? A E,B) =? A E, B) =? 8

..4.6.8..4.6.8 Parameter Estimation and Bayesian Networks A E,B) =? A E, B) =? A E,B) =? A E, B) =? Prior E B R A J M T F T T F T F F F F F T F T F T T T F F F T T T F T F F F F... + data= Now compute either MAP or Bayesian estimate Topics Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates Priors Learning Structure of Bayesian Networks Daniel S. Weld 5 Recap Given a BN structure (with discrete or continuous variables), we can learn the parameters of the conditional prop tables. Earthqk Burgl What if we don t know structure? Spam Nigeria Sex Nude Alarm N N Daniel S. Weld 5 Learning The Structure of Bayesian Networks Search thru the space of possible network structures! (for now, assume we observe all variables) For each structure, learn parameters Pick the one that fits observed data best Caveat won t we end up fully connected???? When scoring, add a penalty model complexity Problem!?!? Learning The Structure of Bayesian Networks Search thru the space For each structure, learn parameters Pick the one that fits observed data best Problem? Exponential number of networks! And we need to learn parameters for each! Exhaustive search out of the question! So what now? 9

Learning The Structure of Bayesian Networks Local search! Start with some network structure Try to make a change (add or delete or reverse edge) See if the new network is any better Initial Network Structure? Uniform prior over random networks? Network which reflects expert knowledge? What should be the initial state? Learning BN Structure The Big Picture We described how to do MAP (and ML) learning of a Bayes net (including structure) How would Bayesian learning (of BNs) differ? Find all possible networks Calculate their posteriors When doing inference, return weighed combination of predictions from all networks! Daniel S. Weld 57