CS 4700: Artificial Intelligence

Similar documents
CS 4700: Artificial Intelligence

Machine Learning Linear Models

Machine Learning (CS 567) Lecture 3

Machine Learning Basics Lecture 3: Perceptron. Princeton University COS 495 Instructor: Yingyu Liang

ML in Practice: CMSC 422 Slides adapted from Prof. CARPUAT and Prof. Roth

Artificial Neural Network

Machine Learning (CS 567) Lecture 5

CS 4700: Foundations of Artificial Intelligence Ungraded Homework Solutions

Lecture 7 Artificial neural networks: Supervised learning

CS 4700: Artificial Intelligence

Linear Classifiers. Michael Collins. January 18, 2012

CSC Neural Networks. Perceptron Learning Rule

CSC242: Intro to AI. Lecture 21

CE213 Artificial Intelligence Lecture 14

Artificial Neural Networks Examination, June 2005

Propositional Reasoning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Data Mining Part 5. Prediction

The Perceptron. Volker Tresp Summer 2014

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

The Perceptron. Volker Tresp Summer 2016

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

CS 188: Artificial Intelligence Spring Announcements

Neural Networks Introduction CIS 32

Computational Intelligence Winter Term 2017/18

Linear Regression. S. Sumitra

The Perceptron. Volker Tresp Summer 2018

Lecture Notes in Machine Learning Chapter 4: Version space learning

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Intelligent Systems (AI-2)

Midterm: CS 6375 Spring 2015 Solutions

Computational Intelligence

CS 540: Machine Learning Lecture 1: Introduction

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

AI Programming CS S-09 Knowledge Representation

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Linear Classifiers: Expressiveness

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Deductive Systems. Lecture - 3

6.825 Techniques in Artificial Intelligence. Logic Miscellanea. Completeness and Incompleteness Equality Paramodulation

Value Function Methods. CS : Deep Reinforcement Learning Sergey Levine

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Artificial Neural Networks Examination, June 2004

UNSUPERVISED LEARNING

A summary of Deep Learning without Poor Local Minima

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

Self-assessment due: Monday 3/18/2019 at 11:59pm (submit via Gradescope)

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)

Logic and machine learning review. CS 540 Yingyu Liang

Knowledge based Agents

Inference in first-order logic

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

CMSC 421: Neural Computation. Applications of Neural Networks

CS 4700: Foundations of Artificial Intelligence

COMP 551 Applied Machine Learning Lecture 2: Linear regression

Jakub Hajic Artificial Intelligence Seminar I

Linear discriminant functions

Perceptron (Theory) + Linear Regression

Machine Learning (CS 567) Lecture 2

@SoyGema GEMA PARREÑO PIQUERAS

Lecture 16: Perceptron and Exponential Weights Algorithm

Intelligent Systems Discriminative Learning, Neural Networks

Logic. Introduction to Artificial Intelligence CS/ECE 348 Lecture 11 September 27, 2001

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

The Perceptron algorithm

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

More about the Perceptron

NONSTANDARD MODELS AND KRIPKE S PROOF OF THE GÖDEL THEOREM

Optimization and Gradient Descent

Discrete Mathematics

We choose parameter values that will minimize the difference between the model outputs & the true function values.

Deep Learning Autoencoder Models

Warm up: risk prediction with logistic regression

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

COMS 4771 Introduction to Machine Learning. Nakul Verma

HOMEWORK 4: SVMS AND KERNELS

Linear Algebra. Introduction. Marek Petrik 3/23/2017. Many slides adapted from Linear Algebra Lectures by Martin Scharlemann

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Hopfield Neural Network

Monday May 12, :00 to 1:30 AM

Lecture 4: Perceptrons and Multilayer Perceptrons

MLPR: Logistic Regression and Neural Networks

Deep Learning: a gentle introduction

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

CS 188: Artificial Intelligence. Outline

Intelligent Systems (AI-2)

Lesson Plan Bond Prediction Tenth Grade Chemistry By Rich Wilczewski

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE


Announcements - Homework

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Transcription:

CS 4700: Foundations of Artificial Intelligence Fall 2017 Instructor: Prof. Haym Hirsh Lecture 18

Prelim Grade Distribution

Homework 3: Out Today

Extra Credit Opportunity: 4:15pm Today, Gates G01 Relaxing Bottlenecks for Fast Machine Learning Christopher De Sa, Stanford University As machine learning applications become larger and more widely used, there is an increasing need for efficient systems solutions. The performance of essentially all machine learning applications is limited by bottlenecks with effects that cut across traditional layers in the software stack. Because of this, addressing these bottlenecks effectively requires a broad combination of work in theory, algorithms, systems, and hardware. To do this in a principled way, I propose a general approach called mindful relaxation. The approach starts by finding a way to eliminate a bottleneck by changing the algorithm's semantics. It proceeds by identifying structural conditions that let us prove guarantees that the altered algorithm will still work. Finally, it applies this structural knowledge to implement improvements to the performance and accuracy of entire systems. In this talk, I will describe the mindful relaxation approach, and demonstrate how it can be applied to a specific bottleneck (parallel overheads), problem (inference), and algorithm (asynchronous Gibbs sampling). I will demonstrate the effectiveness of this approach on a range of problems including CNNs, and finish with a discussion of my future work on methods for fast machine learning.

Today First-Order Logic (R&N Ch 8-9) Machine Learning (R&N Ch 18) Tuesday, April 5 Machine Learning (R&N Ch 18)

Resolution Conversion to CNF maintains satisfiability All steps guarantee equivalence except for Skolemization, which only maintains satisfiability Resolution is sound: If α Ͱ β then α β Resolution is refutation complete: If α β then α β Ͱ {} Godel s completeness theorem (No generalization that encompasses arithmetic is complete: Godel s incompleteness theorem)

Machine Learning

Learning

Learn: (dictionary.com) Learning 1. to acquire knowledge of or skill in by study, instruction, or experience 2. to become informed of or acquainted with; ascertain: to learn the truth. 3. to memorize: He learned the poem so he could recite it at the dinner. 4. to gain (a habit, mannerism, etc.) by experience, exposure to example, or the like; acquire: She learned patience from her father. 5. (of a device or machine, especially a computer) to perform an analogue of human learning with artificial intelligence. 6. Nonstandard. to instruct in; teach.

Machine Learning An agent is learning if it improves its performance on future tasks after making observations about the world.

Supervised Learning Given a training set of N example input-output pairs (x 1,y 1 ), (x 2,y 2 ),, (x n,y n ) where each y i was generated by an unknown function y = f(x), discover a function h that approximates the true function f.

Supervised Learning Given a training set of N example input-output pairs (x 1,y 1 ), (x 2,y 2 ),, (x n,y n ) where each y i was generated by an unknown function y = f(x), discover a function h that approximates the true function f. Example: Regression Domain of f is real numbers

Supervised Learning Given a training set of m example input-output pairs (x 1,y 1 ), (x 2,y 2 ),, (x m,y m ) where each y i was generated by an unknown function y = f(x), discover a function h that approximates the true function f. Classification learning: Domain of f is finite set of values

Supervised Learning Given a training set of m example input-output pairs (x 1,y 1 ), (x 2,y 2 ),, (x m,y m ) where each y i was generated by an unknown function y = f(x), discover a function h that approximates the true function f. Classification learning: Domain of f is finite set of values

+ -

1 0

1-1

x 2 = 1.7x 1 4.9

x 2 = 1.7x 1 4.9 x 2 1.7x 1 = 4.9

x 2 = 1.7x 1 4.9 x 2 1.7x 1 = 4.9 2x 2 3.4x 1 = 9.8 10x 2 17x 1 = 49

Points above the line: x 2 1.7x 1 4.9 1 x 2 1.7x 1 4.9 0 2x 2 3.4x 1 49 10x 2 17x 1 49

f(x 1,x 2 ) = 1 if x 2 1.7x 1 4.9 0 otherwise 1 0

Formula for a line w 1 x 1 + w 2 x 2 = b

Formula for a line w 1 x 1 + w 2 x 2 = b Points above the line w 1 x 1 + w 2 x 2 b

f(x 1,x 2 ) = 1 if w 1 x 1 + w 2 x 2 b 0 otherwise 1 0

Generalizing to n dimensions: Formula for a line ( hyperplane ): w 1 x 1 + w 2 x 2 + + w n x n = b σ i=1 w i x i = b n

Generalizing to n dimensions: Formula for a line ( hyperplane ): w 1 x 1 + w 2 x 2 + + w n x n = b σ i=1 w i x i = b w x = b

Generalizing to n dimensions: Formula for a line ( hyperplane ): w 1 x 1 + w 2 x 2 + + w n x n = b σ i=1 w i x i = b w x = b Points above the line w 1 x 1 + w 2 x 2 + + w n x n b σn i=1 w i x i b w x b

Linear discriminant function: f(x 1,x 2,,x n ) = n 1 if σ i=1 w i x i b 0 otherwise

Linear discriminant function: f(x 1,x 2,,x n ) = n 1 if σ i=1 w i x i b 0 otherwise Goal of classification learning: Given: ((x 1,1,x 1,2,,x 1,n ),y 1 ), ((x 2,1,x 2,2,,x 2,n ),y 2 ),, ((x m,1,x m,2,,x m,n ),y m ) x 1 x 2 x m Find: (w 1,, w n ) and b

Notational trick : Equivalent to: w 1 x 1 + w 2 x 2 + + w n x n b w 1 x 1 + w 2 x 2 + + w n x n b 0

Notational trick : Equivalent to: w 1 x 1 + w 2 x 2 + + w n x n b w 1 x 1 + w 2 x 2 + + w n x n b 0 b + w 1 x 1 + w 2 x 2 + + w n x n 0

Notational trick : w 1 x 1 + w 2 x 2 + + w n x n b Equivalent to: w 1 x 1 + w 2 x 2 + + w n x n b 0 b + w 1 x 1 + w 2 x 2 + + w n x n 0 If x 0 = 1 bx 0 + w 1 x 1 + w 2 x 2 + + w n x n 0

Notational trick : w 1 x 1 + w 2 x 2 + + w n x n b Equivalent to: w 1 x 1 + w 2 x 2 + + w n x n b 0 b + w 1 x 1 + w 2 x 2 + + w n x n 0 If x 0 = 1 bx 0 + w 1 x 1 + w 2 x 2 + + w n x n 0 w 0 x 0 + w 1 x 1 + w 2 x 2 + + w n x n 0

Notational trick : w 1 x 1 + w 2 x 2 + + w n x n b Equivalent to: w 1 x 1 + w 2 x 2 + + w n x n b 0 b + w 1 x 1 + w 2 x 2 + + w n x n 0 If x 0 = 1 bx 0 + w 1 x 1 + w 2 x 2 + + w n x n 0 w 0 x 0 + w 1 x 1 + w 2 x 2 + + w n x n 0 σn i=0 w i x i 0

Linear discriminant function: f(x 0,x 1,x 2,,x n ) = 1 if σ n i=0 w i x i 0 0 otherwise Goal of classification learning: Given: ((1,x 1,1,x 1,2,,x 1,n ),y 1 ), ((1,x 2,1,x 2,2,,x 2,n ),y 2 ),, ((1,x m,1,x m,2,,x m,n ),y m ) x 1 x 2 x m Find: (w 0,, w n )

Linear discriminant function: f(x 0,x 1,x 2,,x n ) = 1 if σ n i=0 w i x i 0 0 otherwise Goal of classification learning: Given: ((1,x 1,1,x 1,2,,x 1,n ),y 1 ), ((1,x 2,1,x 2,2,,x 2,n ),y 2 ),, ((1,x m,1,x m,2,,x m,n ),y m ) x 1 x 2 x m Find: (w 0,, w n )

Linear discriminant function: f(x 0,x 1,x 2,,x n ) = f w (x) 1 if σ n i=0 w i x i 0 h w (x) 0 otherwise Goal of classification learning: Given: ((1,x 1,1,x 1,2,,x 1,n ),y 1 ), ((1,x 2,1,x 2,2,,x 2,n ),y 2 ),, ((1,x m,1,x m,2,,x m,n ),y m ) x 1 x 2 x m Find: (w 0,, w n )

Perceptrons

Neuron https://appliedgo.net/perceptron/

Perceptrons https://blog.dbrgn.ch/2013/3/26/perceptrons-in-python/

Perceptron Learning Rule Current hypothesis: h w (x) w 0 = w 1 = w 2 = = w n = 0 [alternatively: set to random values] Repeat For i = 1 to m [for each example] For j = 1 to n [for each feature] w j w j + αx i,j (y i h w (x i )) Until h w (x) gets all data correct [reorder data after each iteration]

Perceptron Learning Rule w j w j + αx j (y i h w (x i )) If h w (x) is correct, all w j are unchanged y i = h w (x i ), so (y i h w (x i )) = 0 If h w (x) is too big, w j decreases If h w (x) is too small, w j increases α is the learning rate (sometimes called η)

Perceptron Learning Rule: Example w j w j + αx j (y i h w (x i ))

Perceptron Learning Rule: Example w j w j + αx j (y i h w (x i )) x 1 x 2 f(x 1,x 2 ) 0 0 0 0 1 0 1 0 0 1 1 1

Perceptron Learning Rule: Example w j w j + αx j (y i h w (x i )) And gate x 1 x 2 f(x 1,x 2 ) 0 0 0 0 1 0 1 0 0 1 1 1

Perceptron Learning Rule: Example w j w j + αx j (y i h w (x i )) α = 0.3, w 0 = w 1 = w 2 = 0 Training Data x 1 x 2 f(x 1,x 2 ) 0 0 0 0 1 0 1 0 0 1 1 1

Perceptron Learning Rule: Example w j w j + αx j (y i h w (x i )) α = 0.3, w 0 = w 1 = w 2 = 0 Training Data x 1 x 2 f(x 1,x 2 ) -1 0.5 1-1 -1 1 1 0.5 0 0.5 1 0