Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Similar documents
10-701/ Machine Learning Mid-term Exam Solution

Topic 10: Introduction to Estimation

Statistical Pattern Recognition

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Pattern Classification, Ch4 (Part 1)

Statistics 20: Final Exam Solutions Summer Session 2007

15-780: Graduate Artificial Intelligence. Density estimation

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Mixtures of Gaussians and the EM Algorithm

Machine Learning Brett Bernstein

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Machine Learning Theory (CS 6783)

1 Approximating Integrals using Taylor Polynomials

6.3 Testing Series With Positive Terms

Optimization Methods MIT 2.098/6.255/ Final exam

Please do NOT write in this box. Multiple Choice. Total

Introduction to Machine Learning DIS10

6.867 Machine learning

Statistical Properties of OLS estimators

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

NUMERICAL METHODS FOR SOLVING EQUATIONS

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Z ß cos x + si x R du We start with the substitutio u = si(x), so du = cos(x). The itegral becomes but +u we should chage the limits to go with the ew

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

Math 113 Exam 3 Practice

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Math 312 Lecture Notes One Dimensional Maps

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

AP Calculus BC Review Applications of Derivatives (Chapter 4) and f,

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Simple Linear Regression

Polynomial Functions and Their Graphs

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Math 10A final exam, December 16, 2016

Lecture 2: Monte Carlo Simulation

PRACTICE PROBLEMS FOR THE FINAL

Maximum and Minimum Values

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :

Data Analysis and Statistical Methods Statistics 651

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Estimation for Complete Data

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS

Lecture 33: Bootstrap

MATH 10550, EXAM 3 SOLUTIONS

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Lecture 11 Simple Linear Regression

1 Review of Probability & Statistics

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Power and Type II Error

Expectation and Variance of a random variable

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Areas and Distances. We can easily find areas of certain geometric figures using well-known formulas:

Math 105: Review for Final Exam, Part II - SOLUTIONS

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

INFINITE SEQUENCES AND SERIES

6.867 Machine learning, lecture 7 (Jaakkola) 1

Random Variables, Sampling and Estimation

SNAP Centre Workshop. Basic Algebraic Manipulation

Algebra of Least Squares

ME 539, Fall 2008: Learning-Based Control

Vector Quantization: a Limiting Case of EM

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

CSE 4095/5095 Topics in Big Data Analytics Spring 2017; Homework 1 Solutions

CALCULUS BASIC SUMMER REVIEW

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

10.6 ALTERNATING SERIES

Castiel, Supernatural, Season 6, Episode 18

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

1 Generating functions for balls in boxes

(7 One- and Two-Sample Estimation Problem )

1 Inferential Methods for Correlation and Regression Analysis

Chapter 8: Estimating with Confidence

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

4.1 Sigma Notation and Riemann Sums

Problem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t =

Support vector machine revisited

Chapter 6 Principles of Data Reduction

For example suppose we divide the interval [0,2] into 5 equal subintervals of length

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Solution of Final Exam : / Machine Learning

Math 132, Fall 2009 Exam 2: Solutions

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Machine Learning Brett Bernstein

Section 13.3 Area and the Definite Integral

Topic 9: Sampling Distributions of Estimators

Parameter, Statistic and Random Samples

Problem Set 4 Due Oct, 12

Transcription:

Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the Checkers task eviromet is provided as a example. Task Eviromet: Checkers Fully Observable or Partially Observable: Fully Observable Sigle Aget or Multiaget: Multiaget Determiistic or Stochastic: Determiistic Episodic or Sequetial: Sequetial Static or Dyamic: Static Discrete or Cotiuous: Discrete Kow or Ukow: Kow Task Eviromet: Robotic Car (Machie Drivig) Fully Observable or Partially Observable: Partially Observable Sigle Aget or Multiaget: Multiaget Determiistic or Stochastic: Stochastic Episodic or Sequetial: Sequetial Static or Dyamic: Dyamic Discrete or Cotiuous: Cotiuous Kow or Ukow: Kow

. Liear Regressio (1 Poits). Give the followig poits: { ( 0, 1 ), (, 1 ), (, -7 ), (, -7) } We have the followig plot: Calculate the Sample Mea for the x's ( a umber ): x i1 0 8 x i

Calculate the Sample Mea for the y's ( a umber ): -3 y i1 1 1 7 7 1 3 Calculate the Biased Sample Variace for the x's ( a umber ): y i var x,biased 1 i 1 1 1 0 0 1 0 0 8 x i x 0 Calculate the Biased Sample Variace for the y's ( a umber ): 16 var y,biased 1 i 1 1 1 1 16 16 16 16 16 y i y 1 3 1 3 7 3 7 3

Calculate the Biased Sample Covariace betwee the x's ad the y's ( a umber ): - cov 1 x,,biased i x xy yi y i 1 0 1 3 1 3 0 1 7 3 7 3 1 0 1 16 8 0 0 8 Calculate the w0 from y = w0 + w1*x ( i.e. calculate the Itercept, a umber ): 1 cov w1 var xy,,biased x,biased w w x y 0 1 3 3 3 1 Calculate the w1 from y = w0 + w1*x ( i.e. calculate the Slope, a umber ): - cov w1 var xy,,biased x,biased

3. Classificatio (1 Poits). Give the followig two classes each with three poits, fid the mea of each class ad the fid the equatio of the plae that separates those classes halfway betwee their respective meas. Esure that the separatig plae produces positive values for Class Poits (Blue) ad egative values for Class 1 Poits (Red). I other words, fid the separatig plae that could be used for classificatio with the maximum margi betwee the classes ad where the separatig plae z is z > 0 for Class Poits (Blue) ad z < 0 for Class 1 Poits (Red). Class 1 Poits (Red): { ( -, ), ( 0, ), (, ) } Class Poits (Blue): { ( -, 0 ), ( 0, 0 ), (, 0 ) } The followig is the plot of these two classes: We wat the followig classificatio:

x x, y, Class1 i i1 i1 0 0 6, 3 3 0, y i Class1, 3 3 xy, midpoit x x, y, Class x, y x, y 0, 0,0 i i1 i1 0 0 0 0 0 0, 3 3 0,0 Class1 Class 0 0, 0 0, 0, 0,1 y i Class, 3 3 Calculate the x value of the Midpoit betwee the Sample Mea of the Class 1 Poits ad the Sample Mea of the Class Poits ( i.e. the value of x i the midpoit ( x, y ) ): 0 Calculate the y value of the Midpoit betwee the Sample Mea of the Class 1 Poits ad the Sample Mea of the Class Poits ( i.e. the value of y i the midpoit ( x, y ) ): 1

Sice we wat the separatig plae to produces positive values for Class Poits (Blue), the we wat the ormalized ormal vector to poit toward Class Poits (Blue). Sice we wat to fid the separatig plae used for classificatio with the maximum margi betwee the classes, the we wat to start poit from the midpoit we just calculated; hece, the agle we eed to use for this ormalized ormal vector is 90. Therefore, we have: ormalized cos,si cos 90,si 90 0, 1 Calculate the x value of the Normalized Normal Vector that poits towards the Class Poits (Blue) (i.e. the value of the x i the ormalized ormal vector ormalized ): 0 Calculate the y value of the Normalized Normal Vector that poits towards the Class Poits (Blue) (i.e. the value of the y i the ormalized ormal vector ormalized ): -1 Sice we have the followig: f x, y cos,si x, y x, y classificatio 0 0 0, 1 xy, 0,1 0, 1 x 0, y1 0 x 0 1 y1 0 x 00 1 y 11 00 11 0 x 1 y 1 0 x 1 y 1 0 x 1 y Calculate the w0 ( a umber ) i the equatio of the separatig plae z = w0 + w1*x + w*y that creates the separatig lie i the level set z = 0: 1 Calculate the w1 ( a umber ) i the equatio of the separatig plae z = w0 + w1*x + w*y that creates the separatig lie i the level set z = 0: 0 Calculate the w ( a umber ) i the equatio of the separatig plae z = w0 + w1*x + w*y that creates the separatig lie i the level set z = 0: -1

. Naive Bayes Network ( Poits). Usig a small subset of data from a experimet at the Ft. Beig Battle Lab, costruct a Naïve Bayes Network to predict a subject's first actio (move or fire) i close rage egagemets uder various circumstaces. "Urba" meas the subject was i a city, vice beig i a forest. "Day" meas the ecouter happeed i daylight, vice at ight usig ight-visio equipmet. "Withi5m" meas that the target was first observed at a rage less tha 5 meters, vice up to 50 meters. "MoveFirst" meas that the subject's first actio o detectig the threat was to move to cover, vice firig immediately. HINT: Draw the graph of this Naïve Bayes Network. Rememberig that a Naïve Bayes Network assumes coditioal idepedece. QUESTION: Usig the data table above, do the followig (Do ot write ay decimal approximatios): 1) Write the exact fractio for P( MoveFirst = true ). 3/10

) Write the exact fractio for P( MoveFirst = false ). 7/10 3) Write the exact fractio for P( Urba = false MoveFirst = false ). /7 ) Write the exact fractio for P( Urba = false MoveFirst = true ). 1/3 5) Write the exact fractio for P( Urba = true MoveFirst = false ). 3/7

6) Write the exact fractio for P( Urba = true MoveFirst = true ). /3 7) Write the exact fractio for P( Day = false MoveFirst = false ). 3/7 8) Write the exact fractio for P( Day = false MoveFirst = true ). /3 9) Write the exact fractio for P( Day = true MoveFirst = false ). /7

10) Write the exact fractio for P( Day = true MoveFirst = true ). 1/3 11) Write the exact fractio for P( Withi5m = false MoveFirst = false ). 3/7 1) Write the exact fractio for P( Withi5m = false MoveFirst = true ). /3 13) Write the exact fractio for P( Withi5m = true MoveFirst = false ). /7

1) Write the exact fractio for P( Withi5m = true MoveFirst = true ). 1/3 15) Write the equatio to predict the probability that a subject's first actio is to shoot whe he ecouters a threat withi 5 meters i a city at ight. I order words, write the equatio that could calculate: P( MoveFirst = false Urba = true, Day = false, Withi5m = true ) i terms of oly the probabilities above. Just write the probabilities (usig the rules of Probability that you leared about i Chapter 13). Do NOT write the exact fractios or the decimal approximatio. P( MoveFirst = false Urba = true, Day = false, Withi5m = true ) = ( P( Urba = true MoveFirst = false ) * P( Day = false MoveFirst = false ) * P( Withi5m = true MoveFirst = false ) * P( MoveFirst = false ) ) / ( P( Urba = true MoveFirst = false ) * P( Day = false MoveFirst = false ) * P( Withi5m = true MoveFirst = false ) * P( MoveFirst = false ) + P( Urba = true MoveFirst = true ) * P( Day = false MoveFirst = true ) * P( Withi5m = true MoveFirst = true ) * P( MoveFirst = true ) ) Or writte out so that we ca actually have a hope of beig able to read it:,, 5 5 PWithi5m true MoveFirst false P MoveFirst false P MoveFirst false Urba true Day false Withi m true P Urba true MoveFirst false P Day false MoveFirst false P Withi m true MoveFirst false P MoveFirst false P Urba true MoveFirst false P Day false MoveFirst false P Urba true MoveFirst true P Day false MoveFirst true P Withi5m true MoveFirst true P MoveFirst true

5. k-nearest Neighbor ( Poits). If k 1, the our closest eighbors would be: ad majority rule would classify our query poit as. k, the our closest eighbors would be:, If If 3 k, the our closest eighbors would be:,, k, the our closest eighbors would be:,,, k, the our closest eighbors would be:,,,, k, the our closest eighbors would be:,,,,, k, the our closest eighbors would be:,,,,,, If If 5 If 6 If 7 7 ad majority rule would classify our query poit as. ad majority rule would classify our query poit as. ad majority rule would classify our query poit as. ad majority rule would classify our query poit as. ad our query poit would be arbitrarily classified. ad majority rule would classify our query poit as. 6. Liear Regressio vs. Logistic Regressio (6 Poits). Describe the differece betwee Liear Regressio ad Logistic Regressio. Detail what is beig miimized i both methods (i.e. what is beig regressed). Liear Regressio is a Supervised Learig method used for Regressio, ot Classificatio. Liear Regressio miimizes the distace of the y value of each poit with the y value of the lie of regressio. Logistic Regressio is a Supervised Learig method used for Classificatio, ot Regressio. Logistic Regressio miimizes the distace betwee the class of a poit (expressed as either a 0 or a 1) ad the value of that poit o the Logistic Fuctio.

7. Parametric vs. Noparametric Methods (8 Poits). For the methods below, idicate whether the method is parametric or oparametric usig a "P" for parametric ad a "N" for oparametric: Liear Regressio: P k-nearest Neighbor: N Bayesia Network: P Multivariate Liear Regressio: P Logistic Regressio: P Perceptro: P Multilayer Feed-Forward Neural Network: P Locally Weighted Regressio: N Support Vector Machies: P Kerel Desity Estimatio: N 8. Parametric vs. Noparametric ( Poits). Describe the differece betwee parametric methods ad oparametric methods. Parametric methods summarize data ito a fixed umber of parameters whose cout does ot icrease as the umber of data poits icrease. Noparametric methods do ot summarize data ito a fixed umber of parameters. The umber of parameters (if they're used at all) icreases as the umber of data poits icrease. 9. The Curse of Dimesioality (6 Poits). Describe the Curse of Dimesioality (esp. as described by Pedro Domigos i "A Few Useful Thigs to Kow about Machie Learig"). The curse of dimesioality, coied by Bellma i 1961, refers to the fact that may algorithms that work fie i low dimesios become itractable whe the iput is high-dimesioal. But i machie learig it refers to much more. Geeralizig correctly becomes expoetially harder as the dimesioality (umber of features) of the examples grows, because a fixed-size traiig set covers a dwidlig fractio of the iput space. Moreover, the similarity-based reasoig that machie learig algorithms deped o (explicitly or implicitly) breaks dow i high dimesios.

10. Supervised vs. Usupervised Learig ( Poits). Describe the differece betwee Supervised ad Usupervised Learig. Supervised Learig requires data that are labeled with their class durig learig. Oly class membership (classificatio) is determied by the supervised learig method. Usupervised Learig does ot require data that are labeled with their class durig learig. Both classes ad class membership are determied by the usupervised learig method. 11. A* Search (0 Poits). 1) The ode where h=15: 1 ) The ode where h=11: 0 3) The ode where h=8: ) The ode where h=7: 3 5) The ode where h=6: 6) The ode where h=10: 5 7) The ode where h=: 0 8) The ode where h=3: 0 9) The ode where h=9: 0

10) The ode where h=5: 0 11) The ode where h=0: 0 1) The ode where h=0 GOAL: 6 13) Is this heuristic admissible? If yes, why? If o, why? No, the heuristic is ot admissible because h=0 is a overestimate sice sigle steps oly cost 10 ad the ode where h=0 is oly oe step away from the goal. Here's the exhaustive aswer: First, we augmet the graph above with edge weights: Next, we add the top ode to the frotier: ad explored is empty: f 15 frotier g 0 h 15 explored =.

Eterig the loop, we see that frotier is ot empty, so we pop frotier to ode: f 15 ode g 0 h 15 The, we test if ode is the goal. Sice it is t, we add ode to explored: f 15 explored = g 0. h 15 For each child of the ode, we test if it is i explored ad frotier. Sice they are t, we add each child to the frotier givig priority to the smallest f : f 16 f 17 f 18 f 0 f 1 frotier g 10, g 10, g 10, g 10, g 10 h 6 h 7 h 8 h 10 h 11 O the ext iteratio, we agai see that frotier is ot empty, so we pop frotier to ode: f 16 ode g 10 h 6

Sice this ode is t the goal, we add the ode to explored: f 15 f 16 explored = g 0, g 10 h 15 h 6 ad the we look at the childre of that ode. For each child, we check to see if the child is i the frotier or i explored. Sice each child is t, we add these childre to frotier: f 17 f 18 f 0 f 1 f 5 f 0 frotier g 10, g 10, g 10, g 10, g 0, g 0 h 7 h 8 h 10 h 11 h 5 h 0 O the ext iteratio, we see that frotier is ot empty, so we pop frotier to ode: f 17 ode g 10 h 7 Sice ode is t the goal, we add ode to explored: f 15 f 16 f 17 explored = g 0, g 10, g 10 h 15 h 6 h 7 Sice ode does t have ay childre, we go o to the ext iteratio.

O this ext iteratio, we see that frotier is still ot empty, so we pop frotier to ode: f 18 ode g 10 h 8 Sice ode is t the goal, we add ode to explored: f 15 f 16 f 17 f 18 explored = g 0, g 10, g 10, g 10 h 15 h 6 h 7 h 8 For each child of ode, we check to see if the child is i the frotier or i explored. Sice each child is t, we add these childre to frotier: f 0 f 1 f 3 f 5 f 9 f 0 frotier g 10, g 10, g 0, g 0, g 0, g 0 h 10 h 11 h 3 h 5 h 9 h 0 O the ext iteratio, we see that frotier is ot empty, so we pop frotier to ode: f 0 ode g 10 h 10 Sice ode is t the goal, we add ode to explored: f 15 f 16 f 17 f 18 f 0 explored = g 0, g 10, g 10, g 10, g 10 h 15 h 6 h 7 h 8 h 10

ad the we look at the childre of that ode. For each child, we check to see if the child is i the frotier or i explored. Sice each child is t, we add these childre to frotier: f 0 f 1 f 3 f 5 f 9 f 0 frotier g 0, g 10, g 0, g 0, g 0, g 0 h 0 h 11 h 3 h 5 h 9 h 0 O the ext iteratio, we see that frotier is ot empty, so we pop frotier to ode: f 0 ode g 0 h 0 Sice ode is the goal, the we have our solutio ad we re doe ad we put zeros i the odes that will ever be expaded:

Therefore, we have the followig: a. The ode where h=15: 1 b. The ode where h=11: 0 c. The ode where h=8: d. The ode where h=7: 3 e. The ode where h=6: f. The ode where h=10: 5 g. The ode where h=: 0 h. The ode where h=3: 0 i. The ode where h=9: 0 j. The ode where h=5: 0 k. The ode where h=0: 0 j. The ode where h=0 GOAL: 6 l. Is this heuristic admissible? If yes, why? If o, why? Highlightig the followig: We see that our heuristic has a higher value tha the actual edge cost to the goal because edge cost 10 0 heuristic ; hece, our heuristic overestimates the cost to reach the goal. Therefore, sice our heuristic overestimates the cost to reach the goal, the this heuristic is ot admissible. I other words: No, our heuristic is ot admissible because it overestimates the cost to reach the goal.