CS 730/730W/830: Intro AI 1 handout: slides Wheeler Ruml (UNH) Lecture 20, CS 730 1 / 13
Summary k-nn Wheeler Ruml (UNH) Lecture 20, CS 730 2 / 13
Summary Summary k-nn supervised learning: induction, generalization classification vs regression numeric, ordinal, discrete confidence estimates k-nn : distance function (any attributes), any labels Neural network : numeric attributes, numeric or binary labels Wheeler Ruml (UNH) Lecture 20, CS 730 3 / 13
Using the k-nearest Neighbors Summary k-nn majority. k = 1 gives Voroni cells d(a,b) = (a i b i ) 2 i normalize dimensions (divide by 1 N weight by distance? i (x i x) 2 ) +: robust to noise, choose k by easy cross-validation : memory, kd-tree, irrelevant features, sparse data in high d Wheeler Ruml (UNH) Lecture 20, CS 730 4 / 13
Wheeler Ruml (UNH) Lecture 20, CS 730 5 / 13
numeric attributes, numeric or binary labels units plain single layer: regression single layer with thresholds: Perceptron +: general (can be non-linear) : hard to train, hard to interpret Wheeler Ruml (UNH) Lecture 20, CS 730 6 / 13
Regression ŷ = θ 0 f 0 (x)+θ 1 f 1 (x)+θ 2 f 2 (x)+... Wheeler Ruml (UNH) Lecture 20, CS 730 7 / 13
Regression ŷ = θ 0 f 0 (x)+θ 1 f 1 (x)+θ 2 f 2 (x)+... ŷ = θ 0 +θ 1 x ŷ = θ 0 +θ 1 x+θ 2 x 2 +θ 3 x 3 ŷ = θ 0 +θ 1 sinx How to learn for large data, or on-line? Wheeler Ruml (UNH) Lecture 20, CS 730 7 / 13
On-line Regression y = θf(x) given sample x,y, want update to decrease E = (ŷ y)2 2 : Wheeler Ruml (UNH) Lecture 20, CS 730 8 / 13
On-line Regression y = θf(x) given sample x,y, want update to decrease E = (ŷ y)2 2 : θ i θ i α δe δθ i Wheeler Ruml (UNH) Lecture 20, CS 730 8 / 13
On-line Regression y = θf(x) given sample x,y, want update to decrease E = (ŷ y)2 2 : θ i θ i α δe δθ i δe = δ (ŷ y) 2 δθ i δθ i 2 = (ŷ y) δ (ŷ y) δθ i = (ŷ y) δ δθ i (θf(x) y) = (ŷ y)f i (x) Wheeler Ruml (UNH) Lecture 20, CS 730 8 / 13
On-line Regression y = θf(x) given sample x,y, want update to decrease E = (ŷ y)2 2 : θ i θ i α δe δθ i δe = δ (ŷ y) 2 δθ i δθ i 2 = (ŷ y) δ (ŷ y) δθ i = (ŷ y) δ δθ i (θf(x) y) = (ŷ y)f i (x) θ i θ i α(ŷ y)f i (x) Wheeler Ruml (UNH) Lecture 20, CS 730 8 / 13
The LMS Procedure y = θx θ θ α(ŷ y)x Wheeler Ruml (UNH) Lecture 20, CS 730 9 / 13
The LMS Procedure y = θx θ θ α(ŷ y)x for x = 0,x 1,x 2, the updates are: θ 0 θ 0 α(ŷ y) θ 1 θ 1 α(ŷ y)x 1 θ 2 θ 2 α(ŷ y)x 2 α 1/N? or 100/(100+N)? or 0.1? rule, LMS weight update, Adaline rule, Widrow-Hoff rule, Perceptron rule converges if data are linear perceptron in finite time if linearly separable Wheeler Ruml (UNH) Lecture 20, CS 730 9 / 13
Break asst 4 exam 2: planning, MDPs, RL projects! Wheeler Ruml (UNH) Lecture 20, CS 730 10 / 13
The Three-layer Architecture hidden layer non-linear! training: backwards error propagation recurrence Wheeler Ruml (UNH) Lecture 20, CS 730 11 / 13
Backwards Error Propagation k inputs, j hidden units, i outputs g (in i ) is derivative of activation function wrt input i i = g (in i )(ŷ y) W j,i = W j,i αa j i Wheeler Ruml (UNH) Lecture 20, CS 730 12 / 13
Backwards Error Propagation k inputs, j hidden units, i outputs g (in i ) is derivative of activation function wrt input i i = g (in i )(ŷ y) W j,i = W j,i αa j i j = g (in j ) i W j,i i W k,j = W k,j αa k j only locally optimal, dependence on structure Wheeler Ruml (UNH) Lecture 20, CS 730 12 / 13
EOLQs What question didn t you get to ask today? What s still confusing? What would you like to hear more about? Please write down your most pressing question about AI and put it in the box on your way out. Thanks! Wheeler Ruml (UNH) Lecture 20, CS 730 13 / 13