Computers playing Jeopardy!

Similar documents
Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

How to Use Probabilities. The Crash Course

10/17/04. Today s Main Points

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Midterm sample questions

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

LECTURER: BURCU CAN Spring

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Hidden Markov Models

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Parsing with Context-Free Grammars

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

CS460/626 : Natural Language

Intelligent Systems (AI-2)

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Basic Probability and Statistics

Structured Output Prediction: Generative Models

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

8: Hidden Markov Models

Lecture 9: Hidden Markov Model

Lecture 3: ASR: HMMs, Forward, Viterbi

DT2118 Speech and Speaker Recognition

Variational Decoding for Statistical Machine Translation

n(1 p i ) n 1 p i = 1 3 i=1 E(X i p = p i )P(p = p i ) = 1 3 p i = n 3 (p 1 + p 2 + p 3 ). p i i=1 P(X i = 1 p = p i )P(p = p i ) = p1+p2+p3

Intelligent Systems (AI-2)

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Sequences and Information

NLP Programming Tutorial 11 - The Structured Perceptron

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

LI 569 Intro to Computational Linguistics Assignment 1: Probability Exercises

Text Mining. March 3, March 3, / 49

CS : Speech, NLP and the Web/Topics in AI

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

CS 188 Introduction to AI Fall 2005 Stuart Russell Final

Markov Processes Hamid R. Rabiee

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

Hidden Markov Models. George Konidaris

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Lecture 13: Structured Prediction

Statistical NLP: Hidden Markov Models. Updated 12/15

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

Parametric Models Part III: Hidden Markov Models

An introduction to PRISM and its applications

Statistical Methods for NLP

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Probabilistic Context-free Grammars

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

The Noisy Channel Model and Markov Models

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

CS460/626 : Natural Language

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Dynamic Programming: Hidden Markov Models

A brief introduction to Conditional Random Fields

Lecture 2: N-gram. Kai-Wei Chang University of Virginia Couse webpage:

Statistical Methods for NLP

Hidden Markov Models

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Lecture 2: Probability, Naive Bayes

Sequential Data Modeling - The Structured Perceptron

Our Status. We re done with Part I Search and Planning!

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

A* Search. 1 Dijkstra Shortest Path

Hidden Markov Models Part 1: Introduction

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

Log-Linear Models, MEMMs, and CRFs

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Hidden Markov Models in Language Processing

Our Status in CSE 5522

Parsing with Context-Free Grammars

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Discrete Mathematical Structures. Chapter 1 The Foundation: Logic

Soft Inference and Posterior Marginals. September 19, 2013

Artificial Intelligence Markov Chains

COMP90051 Statistical Machine Learning

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Artificial Intelligence Uncertainty

Probabilistic Graphical Models

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

Hidden Markov Model Documentation

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Aspects of Tree-Based Statistical Machine Translation

HMM part 1. Dr Philip Jackson

Cogs 14B: Introduction to Statistical Analysis

Probability, Entropy, and Inference / More About Inference

Probabilistic Robotics

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Stochastic Parsing. Roberto Basili

Hidden Markov Models

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

HMM and Part of Speech Tagging. Adam Meyers New York University

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

Transcription:

Computers playing Jeopardy! CSE 392, Computers Playing Jeopardy!, Fall 2011 Stony Brook University http://www.cs.stonybrook.edu/~cse392 1

Last class: grammars and parsing in Prolog Verb thrills VP Verb NP S NP VP S NP VP Verb NP A roller coaster thrills every teenager 600.465 - Intro to NLP - J. Eisner 2

Today: NLP ambiguity Example: books: NOUN OR VERB You do not need books for this class. She books her trips early. Another example: Thank you for not smoking or playing ipods without earphones. Thank you for not smoking () without earphones These cases can be detected as special uses of the same word Caveout: If we write too many rules, we may write unnatural grammars special rules became general rules it puts a burden too large on the person writing the grammar

Ambiguity in Parsing S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP S VP Papa VP PP V NP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a ate Det N with Det N the caviar a spoon

Ambiguity in Parsing S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa V S VP NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a ate NP PP Det N P NP the caviar with Det N a spoon

Scores P A R S E R correct test trees s c o r e r accuracy test sentences Grammar Recent parsers quite accurate good enough to help NLP tasks!

Speech processing ambiguity Speech processing is a very hard problem (gender, accent, background noise) Solution: n-grams Letter or word frequencies: 1-grams: THE, COURSE useful in solving cryptograms If you know the previous letter/word: 2-grams h is rare in English (4%; 4 points in Scrabble) but h is common after t (20%)!!! If you know the previous 2 letters/words: 3-grams h is really common after (space) t?

N-grams What are n-grams good for? Useful for search engines, indexers, etc. Useful for text-to-speech How to Build a N-gram? Histogram of letters? Histogram of bigrams?

Probabilities and statistics descriptive: mean scores confirmatory: statistically ti ti significant? ifi predictive: what will follow? Probability bilit notation ti p(x Y): p(paul Revere wins weather s clear) = 0.9 Revere s won 90% of races with clear weather p(win clear) = p(win, clear) / p(clear) syntactic sugar logical conjunction predicate selecting of predicates races where weather s clear

Properties of p (axioms) p( ) = 0 p(all outcomes) = 1 p(x) p(y) for any X Y p(x Y) = p(x) + p(y) provided X Y= example: p(win & clear) + p(win & clear) = p(win)

Properties and Conjunction what happens as we add conjuncts to left of bar? p(paul Revere wins, Valentine places, Epitaph shows weather s clear) probability can only decrease what happens as we add conjuncts to right of bar? p(paul Revere wins weather s clear, ground is dry, jockey getting over sprain) probability can increase or decrease Simplifying Right Side (Backing Off) - reasonable estimate p(paul Revere wins weather s clear, ground is dry, jockey getting over sprain)

600.465 Intro to NLP J. Eisner Factoring Left Side: The Chain Rule p(revere, Valentine, Epitaph weather s clear) = p(revere Valentine, Epitaph, weather s clear) * p(valentine Epitaph, weather s clear) * p(epitaph weather s clear) Epitaph? True because numerators cancel against denominators Makes perfect sense when read from bottom to top Valentine? Valentine? Epitaph, Valentine, Revere? 1/3 * 1/5 * 1/4 Revere? Revere? Revere? Revere? 12

Factoring Left Side: The Chain Rule p(revere Valentine, Epitaph, weather s clear) conditional independence lets us use backed-off data from all four of these cases to estimate their shared probabilities bili i If this prob is unchanged by backoff, we say Revere was CONDITIONALLY INDEPENDENT of Valentine and Epitaph (conditioned on the weather s being clear). no 3/4 Revere? yes 1/4 Valentine? no 3/4 Revere? yes 1/4 Epitaph? no 3/4 yes yes 1/4 Revere? Valentine? clear? irrelevant no 3/4 Revere? yes 1/4 600.465 13 Intro to NLP J. Eisner

Bayes Theorem p(a B) = p(b A) * p(a) / p(b) Easy to check by removing syntactic sugar Use 1: Converts p(b A) to p(a B) Use 2: Updates p(a) to p(a B) 600.465 - Intro to NLP - J. Eisner 14

Probabilistic Algorithms Example: The Viterbi algorithm computes the probability of a sequence of observed events and the most likely sequence of hidden states (the Viterbi path) that result in the sequence of observed events. http://www.cs.stonybrook.edu/~pfodor/old_page/viterbi/viterbi.p forward_viterbi(+observations, +States, +Start_probabilities, +Transition_probabilities, +Emission_probabilities, -Prob, -Viterbi_path, - Viterbi_prob) forward_viterbi(['walk', 'shop', 'clean'], ['Ranny', 'Sunny'], [0.6, 0.4], [[0.7, 0.3],[0.4,0.6]], [[0.1, 0.4, 0.5], [0.6, 0.3, 0.1]], Prob, Viterbi_path, Viterbi_prob) will return: Prob = 0.03361, Viterbi_path = [Sunny, Rainy, Rainy, Rainy], Viterbi_prob=0.0094

Viterbi Algorithm A dynamic programming algorithm Input: a first-order hidden Markov model (HMM) states Y initial probabilities π i of being in state i transition probabilities a i,j of transitioning from state i to statej observations x 0,..., x T Output: The state sequence y 0,..., y T most likely to have produced the observations V t,k is the probability of the most probable state sequence responsible for the first t + 1 observations V 0,k = P(x 0 k) π i V T,k = P(x t k) max y Y (a y,k V t-1,y ) Wikipedia

Viterbi Algorithm Alice and Bob live far apart from each other Bob does three activities: walks in the park, shops, and cleans his apartment Alice has no definite information about the weather where Bob lives Alice tries to guess what the weather is based on what Bob does The weather operates as a discrete Markov chain There are two (hidden to Alice) states "Rainy" and "Sunny start_probability = {'Rainy': 0.6, 'Sunny': 0.4} Wikipedia

Viterbi Algorithm The transition_probability represents the change of the weather transition_probability = { 'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3}, 'Sunny'' : {'Rainy': ' 0.4, 'Sunny':' 0.6}} The emission_probability represents how likely Bob is to perform a certain activity on each day: emission_probability = { 'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5}, 'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1}, } Wikipedia

Viterbi Algorithm Alice talks to Bob and discovers the history of his activities: on the first day he went for a walk on the second day he went shopping on the third day he cleaned his apartment ['walk', 'shop', 'clean'] What is the most likely sequence of rainy/sunny days that would explain these observations? Wikipedia

Wikipedia