CS 4700: Artificial Intelligence

Size: px

Start display at page:

Download "CS 4700: Artificial Intelligence"

Stanley Shelton
5 years ago
Views:

1 CS 4700: Foundations of Artificial Intelligence Fall 2017 Instructor: Prof. Haym Hirsh Lecture 18

2 Prelim Grade Distribution

3 Homework 3: Out Today

4 Extra Credit Opportunity: 4:15pm Today, Gates G01 Relaxing Bottlenecks for Fast Machine Learning Christopher De Sa, Stanford University As machine learning applications become larger and more widely used, there is an increasing need for efficient systems solutions. The performance of essentially all machine learning applications is limited by bottlenecks with effects that cut across traditional layers in the software stack. Because of this, addressing these bottlenecks effectively requires a broad combination of work in theory, algorithms, systems, and hardware. To do this in a principled way, I propose a general approach called mindful relaxation. The approach starts by finding a way to eliminate a bottleneck by changing the algorithm's semantics. It proceeds by identifying structural conditions that let us prove guarantees that the altered algorithm will still work. Finally, it applies this structural knowledge to implement improvements to the performance and accuracy of entire systems. In this talk, I will describe the mindful relaxation approach, and demonstrate how it can be applied to a specific bottleneck (parallel overheads), problem (inference), and algorithm (asynchronous Gibbs sampling). I will demonstrate the effectiveness of this approach on a range of problems including CNNs, and finish with a discussion of my future work on methods for fast machine learning.

5 Today First-Order Logic (R&N Ch 8-9) Machine Learning (R&N Ch 18) Tuesday, April 5 Machine Learning (R&N Ch 18)

6 Resolution Conversion to CNF maintains satisfiability All steps guarantee equivalence except for Skolemization, which only maintains satisfiability Resolution is sound: If α Ͱ β then α β Resolution is refutation complete: If α β then α β Ͱ {} Godel s completeness theorem (No generalization that encompasses arithmetic is complete: Godel s incompleteness theorem)

7 Machine Learning

8 Learning

9 Learn: (dictionary.com) Learning 1. to acquire knowledge of or skill in by study, instruction, or experience 2. to become informed of or acquainted with; ascertain: to learn the truth. 3. to memorize: He learned the poem so he could recite it at the dinner. 4. to gain (a habit, mannerism, etc.) by experience, exposure to example, or the like; acquire: She learned patience from her father. 5. (of a device or machine, especially a computer) to perform an analogue of human learning with artificial intelligence. 6. Nonstandard. to instruct in; teach.

10 Machine Learning An agent is learning if it improves its performance on future tasks after making observations about the world.

11 Supervised Learning Given a training set of N example input-output pairs (x 1,y 1 ), (x 2,y 2 ),, (x n,y n ) where each y i was generated by an unknown function y = f(x), discover a function h that approximates the true function f.

12 Supervised Learning Given a training set of N example input-output pairs (x 1,y 1 ), (x 2,y 2 ),, (x n,y n ) where each y i was generated by an unknown function y = f(x), discover a function h that approximates the true function f. Example: Regression Domain of f is real numbers

13 Supervised Learning Given a training set of m example input-output pairs (x 1,y 1 ), (x 2,y 2 ),, (x m,y m ) where each y i was generated by an unknown function y = f(x), discover a function h that approximates the true function f. Classification learning: Domain of f is finite set of values

14 Supervised Learning Given a training set of m example input-output pairs (x 1,y 1 ), (x 2,y 2 ),, (x m,y m ) where each y i was generated by an unknown function y = f(x), discover a function h that approximates the true function f. Classification learning: Domain of f is finite set of values

16 + -

17 1 0

18 1-1

20 x 2 = 1.7x 1 4.9

21 x 2 = 1.7x x 2 1.7x 1 = 4.9

22 x 2 = 1.7x x 2 1.7x 1 = 4.9 2x 2 3.4x 1 = x 2 17x 1 = 49

23 Points above the line: x 2 1.7x x 2 1.7x x 2 3.4x x 2 17x 1 49

24 f(x 1,x 2 ) = 1 if x 2 1.7x otherwise 1 0

25 Formula for a line w 1 x 1 + w 2 x 2 = b

26 Formula for a line w 1 x 1 + w 2 x 2 = b Points above the line w 1 x 1 + w 2 x 2 b

27 f(x 1,x 2 ) = 1 if w 1 x 1 + w 2 x 2 b 0 otherwise 1 0

28 Generalizing to n dimensions: Formula for a line ( hyperplane ): w 1 x 1 + w 2 x w n x n = b σ i=1 w i x i = b n

29 Generalizing to n dimensions: Formula for a line ( hyperplane ): w 1 x 1 + w 2 x w n x n = b σ i=1 w i x i = b w x = b

30 Generalizing to n dimensions: Formula for a line ( hyperplane ): w 1 x 1 + w 2 x w n x n = b σ i=1 w i x i = b w x = b Points above the line w 1 x 1 + w 2 x w n x n b σn i=1 w i x i b w x b

31 Linear discriminant function: f(x 1,x 2,,x n ) = n 1 if σ i=1 w i x i b 0 otherwise

32 Linear discriminant function: f(x 1,x 2,,x n ) = n 1 if σ i=1 w i x i b 0 otherwise Goal of classification learning: Given: ((x 1,1,x 1,2,,x 1,n ),y 1 ), ((x 2,1,x 2,2,,x 2,n ),y 2 ),, ((x m,1,x m,2,,x m,n ),y m ) x 1 x 2 x m Find: (w 1,, w n ) and b

33 Notational trick : Equivalent to: w 1 x 1 + w 2 x w n x n b w 1 x 1 + w 2 x w n x n b 0

34 Notational trick : Equivalent to: w 1 x 1 + w 2 x w n x n b w 1 x 1 + w 2 x w n x n b 0 b + w 1 x 1 + w 2 x w n x n 0

35 Notational trick : w 1 x 1 + w 2 x w n x n b Equivalent to: w 1 x 1 + w 2 x w n x n b 0 b + w 1 x 1 + w 2 x w n x n 0 If x 0 = 1 bx 0 + w 1 x 1 + w 2 x w n x n 0

36 Notational trick : w 1 x 1 + w 2 x w n x n b Equivalent to: w 1 x 1 + w 2 x w n x n b 0 b + w 1 x 1 + w 2 x w n x n 0 If x 0 = 1 bx 0 + w 1 x 1 + w 2 x w n x n 0 w 0 x 0 + w 1 x 1 + w 2 x w n x n 0

37 Notational trick : w 1 x 1 + w 2 x w n x n b Equivalent to: w 1 x 1 + w 2 x w n x n b 0 b + w 1 x 1 + w 2 x w n x n 0 If x 0 = 1 bx 0 + w 1 x 1 + w 2 x w n x n 0 w 0 x 0 + w 1 x 1 + w 2 x w n x n 0 σn i=0 w i x i 0

38 Linear discriminant function: f(x 0,x 1,x 2,,x n ) = 1 if σ n i=0 w i x i 0 0 otherwise Goal of classification learning: Given: ((1,x 1,1,x 1,2,,x 1,n ),y 1 ), ((1,x 2,1,x 2,2,,x 2,n ),y 2 ),, ((1,x m,1,x m,2,,x m,n ),y m ) x 1 x 2 x m Find: (w 0,, w n )

39 Linear discriminant function: f(x 0,x 1,x 2,,x n ) = 1 if σ n i=0 w i x i 0 0 otherwise Goal of classification learning: Given: ((1,x 1,1,x 1,2,,x 1,n ),y 1 ), ((1,x 2,1,x 2,2,,x 2,n ),y 2 ),, ((1,x m,1,x m,2,,x m,n ),y m ) x 1 x 2 x m Find: (w 0,, w n )

40 Linear discriminant function: f(x 0,x 1,x 2,,x n ) = f w (x) 1 if σ n i=0 w i x i 0 h w (x) 0 otherwise Goal of classification learning: Given: ((1,x 1,1,x 1,2,,x 1,n ),y 1 ), ((1,x 2,1,x 2,2,,x 2,n ),y 2 ),, ((1,x m,1,x m,2,,x m,n ),y m ) x 1 x 2 x m Find: (w 0,, w n )

42 Perceptrons

43 Neuron

44 Perceptrons

45 Perceptron Learning Rule Current hypothesis: h w (x) w 0 = w 1 = w 2 = = w n = 0 [alternatively: set to random values] Repeat For i = 1 to m [for each example] For j = 1 to n [for each feature] w j w j + αx i,j (y i h w (x i )) Until h w (x) gets all data correct [reorder data after each iteration]

46 Perceptron Learning Rule w j w j + αx j (y i h w (x i )) If h w (x) is correct, all w j are unchanged y i = h w (x i ), so (y i h w (x i )) = 0 If h w (x) is too big, w j decreases If h w (x) is too small, w j increases α is the learning rate (sometimes called η)

47 Perceptron Learning Rule: Example w j w j + αx j (y i h w (x i ))

48 Perceptron Learning Rule: Example w j w j + αx j (y i h w (x i )) x 1 x 2 f(x 1,x 2 )

49 Perceptron Learning Rule: Example w j w j + αx j (y i h w (x i )) And gate x 1 x 2 f(x 1,x 2 )

50 Perceptron Learning Rule: Example w j w j + αx j (y i h w (x i )) α = 0.3, w 0 = w 1 = w 2 = 0 Training Data x 1 x 2 f(x 1,x 2 )

51 Perceptron Learning Rule: Example w j w j + αx j (y i h w (x i )) α = 0.3, w 0 = w 1 = w 2 = 0 Training Data x 1 x 2 f(x 1,x 2 )

CS 4700: Artificial Intelligence

CS 4700: Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Fall 2017 Instructor: Prof. Haym Hirsh Lecture 12 Prelim Tuesday, March 21 8:40-9:55am Statler Auditorium Homework 2 To be posted on Piazza 4701 Projects: