Be able to define the following terms and answer basic questions about them:

Similar documents
Be able to define the following terms and answer basic questions about them:

Final Examination CS 540-2: Introduction to Artificial Intelligence

CSE 546 Final Exam, Autumn 2013

Final Exam, Fall 2002

Introduction to Machine Learning Midterm, Tues April 8

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Introduction to Machine Learning Midterm Exam

FINAL: CS 6375 (Machine Learning) Fall 2014

Bayesian Networks Inference with Probabilistic Graphical Models

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

CS 6375 Machine Learning

Introduction to Fall 2011 Artificial Intelligence Final Exam

Naïve Bayes classification

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Q1. [24 pts] The OMNIBUS

Bayesian Methods: Naïve Bayes

Grundlagen der Künstlichen Intelligenz

CS 188: Artificial Intelligence Spring Announcements

Machine Learning Algorithm. Heejun Kim

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Introduction to Machine Learning

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Introduction to Machine Learning

Sample questions for COMP-424 final exam

CS540 ANSWER SHEET

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Introduction to Machine Learning Midterm Exam Solutions

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Final Exam, Spring 2006

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

CS Machine Learning Qualifying Exam

Bayesian Learning (II)

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

total 58

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

UVA CS / Introduc8on to Machine Learning and Data Mining

Machine Learning, Fall 2009: Midterm

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

UVA CS 4501: Machine Learning

Machine Learning. Instructor: Pranjal Awasthi

CS 570: Machine Learning Seminar. Fall 2016

Brief Introduction of Machine Learning Techniques for Content Analysis

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Introduction to Bayesian Learning

CS 188: Artificial Intelligence Spring Today

Final Exam, Machine Learning, Spring 2009

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Reinforcement Learning Wrap-up

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Qualifier: CS 6375 Machine Learning Spring 2015

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Directed Graphical Models

Machine Learning

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Using first-order logic, formalize the following knowledge:

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Randomized Algorithms

Machine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #2 Scribe: Ellen Kim February 7, 2008

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

COMP90051 Statistical Machine Learning

Generative Models for Discrete Data

CS 188 Fall Introduction to Artificial Intelligence Midterm 2

COMP 328: Machine Learning

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Mathematical Formulation of Our Example

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Midterm: CS 6375 Spring 2015 Solutions

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Bayesian Networks. Motivation

Review of probabilities

1. Bounded suboptimal search: weighted A*

Machine Learning, Midterm Exam

Loss Functions, Decision Theory, and Linear Models

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

MLE/MAP + Naïve Bayes

Final Exam December 12, 2017

CSCI-567: Machine Learning (Spring 2019)

Final Exam December 12, 2017

CSC242: Intro to AI. Lecture 23

November 28 th, Carlos Guestrin 1. Lower dimensional projections

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Generative Clustering, Topic Modeling, & Bayesian Inference

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Fundamentals of Machine Learning for Predictive Data Analytics

CSE 473: Artificial Intelligence Autumn Topics

MAT 271E Probability and Statistics

Final Examination CS540-2: Introduction to Artificial Intelligence

Lecture 5: Introduction to Markov Chains

ECE521 week 3: 23/26 January 2017

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Transcription:

CS440/ECE448 Fall 2016 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables o Axioms of probability o Joint, marginal, conditional probability distributions o Independence and conditional independence o Bayes rule Bayesian inference o Likelihood, prior, posterior o Maximum likelihood (ML), maximum a posteriori (MAP) inference o Naïve Bayes o Parameter learning Bayesian networks o Structure and parameters o Conditional independence assumptions o Calculating joint and conditional probabilities o Inference (you should be able to do the math) o Complexity of inference (worst-case, special classes of networks for which efficient inference is possible) o Parameter learning: complete data (know the math), incomplete data (know the name of the algorithm), structure learning (know why it s hard) o Hidden Markov models (definition, types of inference problems) Markov decision processes o Markov assumption, transition model, policy o Bellman equation o Value iteration, policy iteration Reinforcement learning o Model-based vs. model-free approaches o Exploration vs. exploitation o TD Q-learning Machine learning o Training, testing, generalization, overfitting o Supervised vs. unsupervised vs. semi-supervised vs. active learning o Nearest neighbor classifiers o Perceptrons (incl. perceptron learning algorithm) o Support vector machines (incl. kernel support vector machines) o Neural networks (incl. training procedure) o Deep convolutional neural networks

Sample exam questions 1. Use the axioms of probability to prove that P( A) = 1 P(A). 2. A couple has two children, and one of them is a boy. What is the probability of the other one being a boy? 3. Consider the following joint probability distribution: P(A = true, B = true) = 0.12 P(A = true, B = false) = 0.18 P(A = false, B = true) = 0.28 P(A = false, B = false) = 0.42 What are the marginal distributions of A and B? Are A and B independent and why? 4. A friend who works in a big city owns two cars, one small and one large. Three-quarters of the time he drives the small car to work, and one-quarter of the time he drives the large car. If he takes the small car, he usually has little trouble parking, and so is at work on time with probability 0.9. If he takes the large car, he is at work on time with probability 0.6. Given that he was on time on a particular morning, what is the probability that he drove the small car? 5. We have a bag of three biased coins, a, b, and c, with probabilities of coming up heads of 20%, 60%, and 80%, respectively. One coin is drawn randomly from the bag (with equal likelihood of drawing each of the three coins), and then the coin is flipped three times to generate the outcomes X 1, X 2, and X 3. a. Draw the Bayesian network corresponding to this setup and define the necessary conditional probability tables (CPTs). b. Calculate which coin was most likely to have been drawn from the bag if the observed flips come out heads twice and tails once. 6. Consider the data points in the table below representing a set of seven patients with up to three different symptoms. We want to use the Naive Bayes assumption to diagnose whether a person has the flu based on the symptoms. Sore Throat Stomachache Fever Flu No No No No No No Yes Yes No Yes No No Yes No No No Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No

a. Show the structure of the network and the conditional probability tables. b. If a person has stomachache and fever, but no sore throat, what is the probability of him or her having the flu (according to your learned naive Bayes classifier)? 7. Consider a Naïve Bayes classifier with 100 feature dimensions. The label Y is binary with P(Y=0) = P(Y=1) = 0.5. All features are binary, and have the same conditional probabilities: P(X i =1 Y=0) = a and P(X i =1 Y=1) = b for i=1,..., 100. Given an item X with alternating feature values (X 1 =1, X 2 =0, X 3 =1,..., X 100 =0), compute P(Y=1 X). 8. Consider the Bayesian network with the following structure and conditional probability tables (all variables are binary): a. Is this a polytree? b. Are D and E independent? Are they conditionally independent given B? c. If you did not know the Bayesian network, how many numbers would you need to represent the full joint probability table? d. If the variables were ternary instead of binary, how many values would you need to represent the full joint probability table and the conditional probability tables, respectively? e. Write down the expression for the joint probability of all the variables in the network. f. Find P(A = 0, B = 1, C = 1, D = 0). g. Find P(B A = 1, D = 0). 9. Two astronomers in different parts of the world make measurements M 1 and M 2 of the number of stars N in some small region of the sky, using their telescopes. Normally, there is a small probability e of error by up to one star in each direction (and if there is such an error, it is equally likely to be +1 or -1). Each telescope can also (with a much smaller

probability f) be badly out of focus (events F 1 and F 2 ), in which case the scientist will undercount by three or more stars (or if N is less than 3, fail to detect any stars at all). a. Draw a network for this problem and show the conditional probability tables. b. Write out the conditional distributions for P(M 1 N) for the case where N {1,2,3} and M 1 {0,1,2,3,4}. Each entry in the conditional distribution table should be expressed as a function of the parameters e and/or f. 10. The CS440 staff have a key to the homework bin. It is the master key that unlocks the bins to many classes, so we take special care to protect it. Every day Kyo Kim goes to the gym, and on the days he has the key, 60% of the time he forgets it next to the bench press. When that happens one of the other three TAs, equally likely, always finds it since they work out right after. Daniel Calzada likes to hang out at Einstein Bagels and 50% of the time he is there with the key, he forgets the key at the shop. Luckily Hyo Jin always shows up there and finds the key whenever Daniel forgets it. Hyo Jin has a hole in her pocket and ends up losing the key 80% of the time somewhere on Goodwin street. However, Manav Kedia takes the same path to Siebel and always finds the key. Manav Kedia has a 10% chance to lose the key somewhere in the AI classroom, but then Hyo Jin picks it up. The TAs lose the key at most once per day, around noon (after losing it they become extra careful for the rest of the day), and they always find it the same day in the early afternoon. a. Draw the Markov chain capturing the location of the key and fill in the transition probability table on the right. In this table, the entry of row KK and column KK corresponds to P(X t+1 = KK X t = KK), the entry of row KK and column HJ corresponds to P(X t+1 = KK X t = HJ), and so forth. b. Monday early morning Prof. Lazebnik handed the key to Daniel Calzada. (The initial state distribution assigns probability 1 to X 0 = DC and probability 0 to all other states.) The homework is due Monday at midnight so the TAs need the key to open the bin. What is the probability for each TA to have the key at that time? Let X 0, X Mon and X Tue be random variables corresponding to who has the key when Prof. Lazebnik hands it out, who has the key on Monday evening, and who has the key on Tuesday evening, respectively. Fill in the probabilities in the table below. TA P(X 0 ) P(X Mon ) P(X Tue ) KK 0 DC 1 HJ 0 MK 0

11. Discuss the pros and cons of exploration and exploitation in reinforcement learning. 12. In micro-blackjack, you repeatedly draw a card (with replacement) that is equally likely to be a 2, 3, or 4. You can either Draw or Stop if the total score of the cards you have drawn is less than 6. Otherwise, you must Stop. When you Stop, your utility is equal to your total score (up to 5), or zero if you get a total of 6 or higher. When you Draw, you receive no utility. There is no discount (γ = 1). a. What are the states and the actions for this MDP? b. What is the transition function and the reward function for this MDP? c. Give the optimal policy for this MDP. d. What is the smallest number of rounds of value iteration (number of consecutive games) after which estimated utility of each state in this MDP will converge to its true utility (if value iteration will never converge exactly, state so). 13. In K-Means clustering, a dataset gets partitioned into k clusters where the algorithm tries to cluster similar data entries. Is this a type of supervised, unsupervised, semisupervised, or active learning? Why? 14. We want to implement a classifier that takes two input values, where each value is either 0, 1 or 2, and outputs a 1 if at least one of the two inputs has value 2; otherwise it outputs a 0. Can this function be learned by a Perceptron? If so, construct a Perceptron that does it; if not, why not. 15. You are a Hollywood producer. You have a script in your hand and you want to make a movie. Before starting, however, you want to predict if the film you want to make will rake in huge profits, or utterly fail at the box office. You hire two critics A and B to read the script and rate it on a scale of 1 to 5 (assume only integer scores). Each critic reads it independently and announces their verdict. Of course, the critics might be biased and/or not perfect, therefore you may not be able to simply average their scores. Instead, you decide to use a perceptron to classify your data. The features and labels of your perceptron are defined as follows: FEATURES: There are three features: a constant bias, and the two reviewer scores. Thus f 0 = 1 (a constant bias), f 1 = score given by reviewer A, and f 2 = score given by reviewer B. LABELS: The label is Y=+1 if the movie returns a profit, Y=-1 otherwise.

a. Suppose that you are given the following five training examples, as shown in Table 1. The initial weights are w! = 1, w! = 0, w! = 0. Suppose you train using the examples in Table 1 with a learning rate of α = 1. The perceptron is trained sequentially: each row in the table is classified, then the perceptron weights are either updated, or not updated, depending on the classification result. After this process has been performed for one row of the table, the updated weights are then used to classify the next row of the table, and so on. After learning has gone through the table once, what are the weights? Movie Name A B Profit Pellet Power 1 1 No Ghosts! 3 2 Yes Pac is bac 4 5 No Not a Pizza 3 4 Yes Endless Maze 2 3 Yes b. Instead of Table 1, suppose instead that you want to learn a perceptron that will always output Y = +1 when the total of the two reviewer scores is more than 8, and Y = 1 otherwise. Is this possible? If so, what are the weights w 0, w 1, and w 2 that will make this possible? c. Instead of either Table 1 or part (b), suppose you want to learn a perceptron that will always output Y = +1 when the two reviewers agree (when their scores are exactly the same), and will output Y = 1 otherwise. Is this possible? If so, what are the weights w 0, w 1 and w 2 that will make this possible?