Introduction to Machine Learning (Pattern recognition and model fitting) for Master students
|
|
- Dwain Kelley
- 5 years ago
- Views:
Transcription
1 Introduction to Machine Learning (Pattern recognition and model fitting) for Master students Spring 007, ÖU/RAP Thorsteinn Rögnvaldsson
2 Contents Machine learning algorithms Mostly Artificial Neural Networks (ANN) Problems attacked with learning systems (classification and regression) Issues in learning Bias, over-learning, generalization Seminars Practical project
3 What you should learn How to approach a machine learning classification or a regression problem Basic knowledge of the common linear machine learning algorithms Basic knowledge of some nonlinear machine learning algorithms Practical use of a few machine learning algorithms with MATLAB
4 Form Projects (individual or in group) Written report & oral presentation (40%) Theory (lecture notes, books, whatever you choose). Lectures (3 hrs/occasion, approx. 7 occasions) Seminars Where you read up on the material and present it (in a pedagogic way) to your fellow master students. You re given a paper/chapter to read and then present to the others (you re more than welcome to complement with other material) Evaluation (0%) Material mailed out to the students
5 Why machine learning? Some tasks can easily be described by example, but difficult to write down the rules for. There may be new information in data (i.e. the expert might not know all the information available in the data). On-line tuning (knowledge increases, too difficult to update by hand). Machine learning is very close to statistics.
6 Typical tasks for ML Build systems with the purpose of Classify observations Good/bad, Healthy/sick, red/green/blue, A/B/C,... Estimate some value for observations How good/bad is it?, how healthy is the patient?, how many points will I gain?, how likely is it that I win if I do this or that?, what risk do I take if I do this or that?, what is a reasonable price for this house?, etc. The latter is called regression in statistics.
7 Some machine learning methods Artificial neural networks (ANN) Models inspired by the structure of the neural system Support Vector Machines (SVM) Models designed from statistical learning theory Decision trees Similar to expert systems, produces rules Bayesian networks Reasoning under uncertainty
8 Game playing Chess Search Evaluation function There are possible game paths in chess. Image from 001: A space Odyssey Backgammon Pattern recognition
9 IBM deep blue Deep Blue relies on computational power and search and evaluation. Deep Blue evaluates positions per second. The latest Deep Blue is a 3- node IBM RS/6000 SP with PSC processors. Each node of the SP employs a single microchannel card containing 8 dedicated VLSI chess processors, for a total of 56 processors working in tandem. Deep Blue can calculate moves in three minutes. Deep Blue is brute force. Humans (probably) play chess differently...
10 TD-Gammon The best backgammon programs use temporal difference (TD) algorithms to train a back-propagation neural network by self-play. The top programs are world-class in playing strength. 1998, the American Association of Artificial Intelligence meeting: NeuroGammon won 99 of 100 games against a human grand master (the current World Champion). TD-Gammon is an example of machine learning. It plays itself and adapts its rules after each game depending on wins/losses.
11 Steps in ML/AI problem Measure environment: x Evaluate environment: yf(x) Take a decision and act: α[y]
12 Steps in ML/AI problem Measure environment: x Evaluate environment: yf(x) Take a decision and act: α[y]
13 Introduction to classification
14 Classification Order into one out of several classes ( 1 of K ) K D X C Input space Output (category) space D D X x x x M 1 x K K C c c c M M c
15 Example 1: Robot color vision (Halmstad Univ. mechatronics competition 1999) Classify the Lego pieces into red, blue, and yellow. Classify white balls, black sideboard, and green carpet.
16 What the camera sees (RGB space) Yellow Red Green
17 Mapping RGB (3D) to rgb (D) r g b R R + G + B G R + G + B B R + G + B
18 Lego in normalized rgb space x x X x 1 Input is D Output is 6D: {red, blue, yellow, green, black, white} c C 6
19 All together... The classifier task is to find optimal borders between the different categories. What is a yellow lego piece? What is a blue lego piece? Given rgb values, how likely is it that the robot is seeing e.g. a red lego piece?
20 Example : CMU
21 ALVINN: ANN guided vehicle Input: Output: x X 65 c C 0 Image Steering signal
22 Classification means taking a decision If I believe x c k then I will do α i Examples: I see something thal looks yellow. I decide that it is a yellow Lego brick. If I see a yellow Lego brick, then I will lift it up and carry it to my home. If I see a white ball, then I will try to score a goal. The road looks like it is turning left. I decide it is turning left. If the road turns left, then I will turn the steering wheel left. The patient is bleeding heavily. I decide that the patient needs treatment. Statistical decision theory Sometimes the decision is wrong. Decision theory is about making the best possible decision.
23 Notation p(x) : Probability density for x. p(c k ) : A priori probability for category c k. p(x c k ) : Probability density for all x c k. p(c k x) : A posteriori probability for category c k. p(c k,x) : Joint probability for x and c k. p(x,c k ) α i : Action i. λ(α i c k ) : Cost for making decision α i if x c k. λ ik
24 Illustration from health care Two categories: c 1 Healthy, c Ill p(c i ) The probability that the person is healthy/ill before the doctor meets him/her. (How many of the people going to see a doctor are actually ill?) x {x 1,x,...} The results (the observation) from the doctor s examination (the doctor may have done many tests).
25 Illustration from health care (continued) p(x) The probability for observing x. p(x,c i ) The probability for observing a person from category c i with the test results x. p(x,c i ) p(x c i )p(c i ) p(c i x)p(x) p(x c i ) The probability for observing x when we know the person is from category c i.
26 Bayes rule p(c k,x) p(x,c k ) ) ( ) ( ) ( ) ( x x x p c p c p c p k k k K k k c k p c p p 1 ) ( ) ( ) ( x x
27 Bayes theorem example Joe is a randomly chosen member of a large population in which 3% are heroin users. Joe tests positive for heroin in a drug test that correctly identifies users 95% of the time and correctly identifies nonusers 90% of the time. Is Joe a heroin addict? Example from
28 Bayes theorem example Joe is a randomly chosen member of a large population in which 3% are heroin users. Joe tests positive for heroin in a drug test that correctly identifies users 95% of the time and correctly identifies nonusers 90% of the time. Is Joe a heroin addict? P( H pos) P( pos H ) P( H ) P( pos) P( H ) 3% 0.03, P( H ) 1 P( H ) 0.97 P( pos H ) 95% 0.95, P( pos H ) 10% P( pos) P( pos H ) P( H ) + P( pos H ) P( H ) P( H pos) 0.7 3% Example from
29 Bayes theorem: The Monty Hall Game show In a TV Game show, a contestant selects one of three doors; behind one of the doors there is a prize, and behind the other two there are no prizes. After the contestant select a door, the game-show host opens one of the remaining doors, and reveals that there is no prize behind it. The host then asks the contestant whether he/she wants to SWITCH to the other unopened door, or STICK to the original choice. What should the contestant do? See Let s make a deal (A Joint Venture)
30 The Monty Hall Game Show prize behind door {1,,3}, openi Host opens door i Let s make a deal (A Joint Venture)
31 P(1 P(3 The Monty Hall Game Show prize behind door {1,,3}, Contestant selects door 1 Host opens door open open P(open P(open ) ) ) 1) 1/, P(open 1) P(1) P(open ) P(open 3) P(3) P(open ) P(open open P(open open i) P( i) 1/ ) behind door 3 (the contestant has chosen door 1). 1/ 3 Host opens door i / 3 P(1) a priori probability 3 that the prize is behind door 1 (etc. for & 3) P(open 1) probability that the host opens door if the prize is behind door 1 (the contestant i 1 has chosen door 1). P(open 3) probability that the host opens door if the prize is i 0, Let s make a deal (A Joint Venture) P(open 3) 1
32 P(1 P(3 The Monty Hall Game Show prize behind door {1,,3}, Contestant selects door 1 Host opens door open open P(open P(open ) ) ) 3 i 1 1) 1/, P(open 1) P(1) P(open ) P(open 3) P(3) P(open ) P(open open P(open open 1/ 3 / 3 i) P( i) 1/ i ) 0, Host opens door i Let s make a deal (A Joint Venture) P(open 3) 1
33 Bayes theorem: The Monty Hall Game show In a TV Game show, a contestant selects one of three doors; behind one of the doors there is a prize, and behind the other two there are no prizes. After the contestant select a door, the game-show host opens one of the remaining doors, and reveals that there is no prize behind it. The host then asks the contestant whether he/she wants to SWITCH to the other unopened door, or STICK to the original choice. What should the contestant do? The host is actually asking the contestant whether he/she wants to SWITCH the choice to both other doors, or STICK to the original choice. Phrased this way, it is obvious what the optimal thing to do is.
34 Decision theory: Expected conditional risk K R( α x) λ( α c ) p( c x) i k 1 i k k The Bayes optimal decision: Choose action α i that minimizes R(α i x) Choose the action that has the least severe consequences (averaged over all possible outcomes) Requires estimating the a posteriori probability p(c k x)
35 Decision theory: Expected conditional utility U K ( α x) u( α c ) p( c x) i k 1 i k k The Bayes optimal decision: Choose action α i that maximizes U(α i x) Choose the action that has the most good consequences (averaged over all possible outcomes) Requires estimating the a posteriori probability p(c k x)
36 Classification approaches 1. Model discrimination functions & discrimination boundaries. Model probability densities & use Bayes rule p(x c k ) 3. Model a posteriori probabilities p(c k x) Examples on following slides...
37 Example: The thermostat We want to classify the temperature in a room into three categories {cold, fine, hot} (hot mean that we want air conditioning, cold means we want heating, fine means we re happy). Discrimination boundary approach: Set thresholds, e.g. above 1 is hot, below 19 is cold, and in between is fine. Don t bother with computing probabilities...but this is bad if you want to use decision theory.
38 Example: Equipment health (diagnostics & predictive maintenance) Discrimination boundary approach: Set thresholds and define ok and not-ok regions. Does not scale well to many variables. Probability density approach: Use large sample of ok and not-ok equipment and measure relevant variables x. Estimate p(x ok), p(x not-ok), p(ok) and p(not-ok). Then use Bayes theorem. A posteriori approach: Use large sample of ok and not-ok equipment and measure relevant variables x. Estimate p(ok x) and p(not-ok x).
39 Parametric & non-parametric methods Parametric : Assume a parametric form. Few degrees of freedom leads to large model bias (i.e. assume that everything is linear). Non-parametric : Assume no parametric form. Many degrees of freedom leads to large model variance (i.e. everything can be any nonlinear function). Optimal often somewhere in-between.
40 Linear Gaussian classifier: Parametric Assume p(x c k ) Gaussian with different means µ k and common covariance matrices Σ. ) ( ) ( 1 exp ) det( ) ( 1 ) ( 1 / µ x Σ µ x Σ x T k D c k p π
41 Linear Gaussian classifier: Parametric Assume p(x c) Gaussian with different means µ and common covariance matrices Σ. Estimate means and covariance matrices for the categories from the data: [ ][ ] K k k k T k c n k k k c n k k N N n n N n N k k 1 ) ( ) ( ˆ ˆ ˆ ) ( ˆ ) ( 1) ( 1 ˆ ) ( 1 ˆ Σ Σ µ µ Σ µ x x x x x
42 Linear Gaussian class boundary green samples 14 red samples Training error 0.06% Test error 0.10% Decision boundary
43 The simple perceptron With {-1,+1} representation y( x) sgn[ w T x] if if w w T T x x > < 0 0 w are the parameters Traditionally (early 60:s) trained with Perceptron learning. w T x w0 + w1 x1 + w x +L
44 Perceptron learning Desired output f ( n) if if x( n) x( n) belongs belongs to class to class A B Repeat until no errors are made anymore 1. Pick a random example [x(n),f(n)]. If the classification is correct then do nothing 3. If the classification is wrong, then do the following update to the parameters: (η, the learning rate, is a small positive number) w i w i + ηf ( n) x ( n) i
45 Example: Perceptron learning x 1 x f x The AND function; the function we want the Perceptron to learn. The AND function x 1
46 Example: Perceptron learning x 1 x f x x 1 Initial values: η 0.3 w The AND function
47 Example: Perceptron learning x 1 x f x x 1 Initial values: η 0.3 w The AND function w T x 0 w x 1 + x 0 + w x x w x 0.5 x 1
48 Example: Perceptron learning x 1 x f x Initial values: η 0.3 w The AND function w T x 0 w x 1 + x 0 + w x x x 1 w This is the vector (1,1) x 0.5 x 1
49 Example: Perceptron learning x 1 x f x This one is correctly classified, no action. x 1 w The AND function
50 Example: Perceptron learning x 1 x f x This one is incorrectly classified, learning action. x 1 w The AND function w w w 0 1 w w w 1 0 η η η 1 0.7
51 Example: Perceptron learning x 1 x f x This one is incorrectly classified, learning action. x 1 w The AND function w w w 0 1 w w w 1 0 η η η 1 0.7
52 Example: Perceptron learning x 1 x f This one is correctly classified, no action. x x 1 w The AND function
53 Example: Perceptron learning x 1 x f x This one is incorrectly classified, learning action. x 1 w The AND function w w w 0 1 w w w 1 0 η η η 0 0.7
54 Example: Perceptron learning x 1 x f x This one is incorrectly classified, learning action. x 1 w The AND function w w w 0 1 w w w 1 0 η η η 0 0.7
55 Example: Perceptron learning x 1 x f x w The AND function x 1 Final solution
56 Perceptron learning Perceptron learning is guaranteed to find a solution in finite time, if a solution exists. However, the Perceptron is only linear.
57 Perceptron final decision boundary After 100 epochs. 1 epoch 1 full presentation of the entire data set. Training error 0.07% Test error 0.09%
58 Seminars for next week Decision theory The simple perceptron Probability density estimation ( students)
CS 570: Machine Learning Seminar. Fall 2016
CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationMachine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty
Machine Learning CS6375 --- Spring 2015 a Bayesian Learning (I) 1 Uncertainty Most real-world problems deal with uncertain information Diagnosis: Likely disease given observed symptoms Equipment repair:
More informationBayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington
Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationP (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).
Lectures 7-8 jacques@ucsdedu 41 Conditional Probability Let (Ω, F, P ) be a probability space Suppose that we have prior information which leads us to conclude that an event A F occurs Based on this information,
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationIntelligent Systems Statistical Machine Learning
Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationBayesian Learning. Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2.
Bayesian Learning Reading: Tom Mitchell, Generative and discriminative classifiers: Naive Bayes and logistic regression, Sections 1-2. (Linked from class website) Conditional Probability Probability of
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationIntroduction to Machine Learning
Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10
EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationIE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes. Four Pages.
Closed book and notes. 60 minutes. Four Pages. Score Closed book and notes. 60 minutes. 1. True or false. (for each, 2 points if correct, 1 point if left blank.) T F If P(B) = 0, then P(B A) is undefined.
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationWhat does Bayes theorem give us? Lets revisit the ball in the box example.
ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationMachine Learning Lecture 1
Many slides adapted from B. Schiele Machine Learning Lecture 1 Introduction 18.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Organization Lecturer Prof.
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten
More information2.4 Conditional Probability
2.4 Conditional Probability The probabilities assigned to various events depend on what is known about the experimental situation when the assignment is made. Example: Suppose a pair of dice is tossed.
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 23: Perceptrons 11/20/2008 Dan Klein UC Berkeley 1 General Naïve Bayes A general naive Bayes model: C E 1 E 2 E n We only specify how each feature depends
More informationGeneral Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering
CS 188: Artificial Intelligence Fall 2008 General Naïve Bayes A general naive Bayes model: C Lecture 23: Perceptrons 11/20/2008 E 1 E 2 E n Dan Klein UC Berkeley We only specify how each feature depends
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationRandomized Decision Trees
Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationIntelligent Systems Statistical Machine Learning
Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2015/2016, Our model and tasks The model: two variables are usually present: - the first one is typically discrete
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationNeural Networks biological neuron artificial neuron 1
Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3
CS434a/541a: attern Recognition rof. Olga Veksler Lecture 3 1 Announcements Link to error data in the book Reading assignment Assignment 1 handed out, due Oct. 4 lease send me an email with your name and
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationUnit 8: Introduction to neural networks. Perceptrons
Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad
More informationStatistical Learning Reading Assignments
Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical
More informationArtificial Neural Networks Examination, March 2004
Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 22: Perceptrons and More! 11/15/2011 Dan Klein UC Berkeley Errors, and What to Do Examples of errors Dear GlobalSCAPE Customer, GlobalSCAPE has partnered
More informationErrors, and What to Do. CS 188: Artificial Intelligence Fall What to Do About Errors. Later On. Some (Simplified) Biology
CS 188: Artificial Intelligence Fall 2011 Lecture 22: Perceptrons and More! 11/15/2011 Dan Klein UC Berkeley Errors, and What to Do Examples of errors Dear GlobalSCAPE Customer, GlobalSCAPE has partnered
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationMidterm: CS 6375 Spring 2018
Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationMachine Learning Lecture 2
Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability
More informationBayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction
15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More informationProbability, Statistics, and Bayes Theorem Session 3
Probability, Statistics, and Bayes Theorem Session 3 1 Introduction Now that we know what Bayes Theorem is, we want to explore some of the ways that it can be used in real-life situations. Often the results
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationLinear classifiers Lecture 3
Linear classifiers Lecture 3 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin ML Methodology Data: labeled instances, e.g. emails marked spam/ham
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate
More informationStatistical methods in recognition. Why is classification a problem?
Statistical methods in recognition Basic steps in classifier design collect training images choose a classification model estimate parameters of classification model from training images evaluate model
More informationIntroduction To Artificial Neural Networks
Introduction To Artificial Neural Networks Machine Learning Supervised circle square circle square Unsupervised group these into two categories Supervised Machine Learning Supervised Machine Learning Supervised
More informationShort Course: Multiagent Systems. Multiagent Systems. Lecture 1: Basics Agents Environments. Reinforcement Learning. This course is about:
Short Course: Multiagent Systems Lecture 1: Basics Agents Environments Reinforcement Learning Multiagent Systems This course is about: Agents: Sensing, reasoning, acting Multiagent Systems: Distributed
More informationGenerative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang
Generative Classifiers: Part 1 CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang 1 This Week Discriminative vs Generative Models Simple Model: Does the patient
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron
More informationWith Question/Answer Animations. Chapter 7
With Question/Answer Animations Chapter 7 Chapter Summary Introduction to Discrete Probability Probability Theory Bayes Theorem Section 7.1 Section Summary Finite Probability Probabilities of Complements
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationTwo hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Monday 22nd May 2017 Time: 09:45-11:45
COMP 14112 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Fundamentals of Artificial Intelligence Date: Monday 22nd May 2017 Time: 09:45-11:45 Answer One Question from Section A and One
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationDecision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016
Decision-making, inference, and learning theory ECE 830 & CS 761, Spring 2016 1 / 22 What do we have here? Given measurements or observations of some physical process, we ask the simple question what do
More informationProbabilistic Graphical Models for Image Analysis - Lecture 1
Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More information