How can a Machine Learn: Passive, Active, and Teaching
|
|
- Junior Bates
- 5 years ago
- Views:
Transcription
1 How can a Machine Learn: Passive, Active, and Teaching Xiaojin Zhu jerryzhu@cs.wisc.edu Department of Computer Sciences University of Wisconsin-Madison NLP&CC 2013 Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
2 Item x R d Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
3 Class Label y { 1, 1} Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
4 Learning You see a training set (x 1, y 1 ),..., (x n, y n ) You must learn a good function f : R d { 1, 1} f must predict the label y on test item x, which may not be in the training set You need some assumptions Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
5 The World p(x, y) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
6 Independent and Identically-Distributed (x 1, y 1 ),..., (x n, y n ), (x, y) iid p(x, y) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
7 Example: Noiseless 1D Threshold Classifier p(x) = uniform[0, 1] { 0, x < θ p(y = 1 x) = 1, x θ Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
8 Example: Noisy 1D Threshold Classifier p(x) = uniform[0, 1] { ɛ, x < θ p(y = 1 x) = 1 ɛ, x θ Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
9 Generalization Error R(f) = E (x,y) p(x,y) (f(x) y) Approximated by test set error 1 m n+m i=n+1 (f(x i ) y i ) on test set (x n+1, y n+1 )... (x n+m, y n+m ) iid p(x, y) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
10 Zero Generalization Error is a Dream Speed limit #1: Bayes error ( 1 R(Bayes) = E x p(x) 2 p(y = 1 x) 1 ) 2 Bayes classifier ( sign p(y = 1 x) 1 ) 2 All learners are no better than the Bayes classifier Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
11 Hypothesis Space f F {g : R d { 1, 1} measurable} Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
12 Approximation Error F may include the Bayes classifier... or not e.g. F = { g(x) = sign(x θ ) : θ [0, 1] } e.g. F = {g(x) = sign(sin(αx)) : α > 0} Speed limit #2: approximation error inf R(g) R(Bayes) g F Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
13 Estimation Error Let f = arg inf g F R(g). Can we at least learn f? No. You see a training set (x 1, y 1 ),..., (x n, y n ), not p(x, y) You learn ˆf n Speed limit #3: Estimation error R( ˆf n ) R(f ) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
14 Estimation Error As training set size n increases, estimation error goes down But how quickly? Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
15 Paradigm 1: Passive Learning (x 1, y 1 ),..., (x n, y n ) iid p(x, y) 1D example: O( 1 n ) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
16 Paradigm 2: Active Learning In iteration t 1 the learner picks a query x t 2 the world (oracle) answers with a label y t p(y x t ) Pick x t to maximally reduce the hypothesis space 1D example: O( 1 2 n ) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
17 Paradigm 3: Teaching A teacher designs the training set 1D example: x 1 = θ ɛ/2, y 1 = 1 x 2 = θ + ɛ/2, y 2 = 1 n = 2 suffices to drive estimation error to ɛ (teaching dimension [Goldman & Kearns 95]) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
18 Teaching as an Optimization problem min D loss( f D, θ) + effort(d) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
19 Teaching Bayesian Learners if we choose min n,x 1,...,x n loss( f D, θ ) = KL (δ θ p(θ D)) effort(d) = cn log p(θ x 1,..., x n ) + cn Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
20 Example 1: Teaching a 1D threshold classifier θ θ θ { O(1/n) O(1/2 ) passive learning "waits" active learning "explores" teaching "guides" p 0 (θ) = 1 { p(y = 1 x, θ) = 1 if x θ and 0 otherwise effort(d) = c D For any D = {(x 1, y 1 ),..., (x n, y n )}, p(θ D) = uniform [max i:yi = 1(x i ), min i:yi =1(x i )] The optimal teaching problem becomes ( log min n,x 1,y 1,...,x n,y n n 1 min i:yi =1(x i ) max i:yi = 1(x i ) ) + cn. One solution: D = {(θ ɛ/2, 1), (θ + ɛ/2, 1)} as ɛ 0 with T I = log(ɛ) + 2c Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
21 Example 2: Learner with poor perception Same as Example 1 but the learner cannot distinguish similar items Encode this in effort() effort(d) = c min xi,x j D x i x j With D = {(θ ɛ/2, 1), (θ + ɛ/2, 1)}, T I = log(ɛ) + c/ɛ with minimum at ɛ = c. D = {(θ c/2, 1), (θ + c/2, 1)}. Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
22 Example 3: Teaching to pick a model out of two Θ = {θ A = N( 1 4, 1 2 ), θ B = N( 1 4, 1 2 )}, p 0(θ A ) = p 0 (θ B ) = 1 2. θ = θ A. Let D = {x 1,..., x n }. loss(d) = log (1 + n i=1 exp(x i)) minimized by x i. But suppose box constraints x i [ d, d]: ( ) n log 1 + exp(x i ) + cn + min n,x 1,...,x n i=1 n I( x i d) Solution: all x i = d, n = max ( 0, [ 1 d log ( d c 1)]). Note n = 0 for certain combinations of c, d (e.g., when c d): the effort of teaching outweighs the benefit. The teacher may choose to not teach at all and maintain the status quo (prior p 0 ) of the learner! i=1 Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
23 Teaching Dimension is a Special Case Given concept class C = {c}, define P (y = 1 x, θ c ) = [c(x) = +] and P (x) uniform. The world has θ = θ c The learner has Θ = {θ c c C}, p 0 (θ) = 1 C. 1 P (θ c D) = or 0. c C consistent with D Teaching dimension [Goldman & Kearns 95] T D(c ) is the minimum cardinality of D that uniquely identifies the target concept: min log P (θ c D) + γ D D where γ 1 C. The solution D is a minimum teaching set for c, and D = T D(c ). Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
24 Teaching Bayesian Learners in the Exponential Family So far, we solved the examples by inspection. Exponential family p(x θ) = h(x) exp ( θ T (x) A(θ) ) T (x) R D sufficient statistics of x θ R D natural parameter A(θ) log partition function h(x) base measure For D = {x 1,..., x n } the likelihood is p(d θ) = n i=1 ( ) h(x i ) exp θ s A(θ) with aggregate sufficient statistics s n i=1 T (x i) Two-step algorithm: finding aggregate sufficient statistics + unpacking Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
25 Step 1: Aggregate Sufficient Statistics from Conjugacy The conjugate prior has natural parameters (λ 1, λ 2 ) R D R: ( ) p(θ λ 1, λ 2 ) = h 0 (θ) exp λ 1 θ λ 2 A(θ) A 0 (λ 1, λ 2 ) The posterior p(θ D, λ 1, λ 2 ) = ( ) h 0 (θ) exp (λ 1 + s) θ (λ 2 + n)a(θ) A 0 (λ 1 + s, λ 2 + n) D enters the posterior only via s and n Optimal teaching problem min n,s θ (λ 1 + s) + A(θ )(λ 2 + n) + A 0 (λ 1 + s, λ 2 + n) + effort(n, s) Convex relaxation: n R and s R D (assuming effort(n, s) convex) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
26 Step 2: Unpacking Cannot teach with the aggregate sufficient statistics n max(0, [n]) Find n teaching examples whose aggregate sufficient statistics is s. exponential distribution T (x) = x, x1 =... = x n = s/n. Poisson distribution T (x) = x (integers), round x1,..., x n Gaussian distribution T (x) = (x, x 2 ), harder. n = 3, s = (3, 5): {x 1 = 0, x 2 = 1, x 3 = 2} {x 1 = 1 2, x2 = , x 3 = }. An approximate unpacking algorithm: 1 initialize x i iid p(x θ ), i = 1... n. 2 solve min x1,...,x n s n i=1 T (x i) 2 (nonconvex) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
27 Example 4: Teaching the mean of a univariate Gaussian The world is N(x; µ, σ 2 ), σ 2 is known to the learner T (x) = x Learner s prior in standard form µ N(µ µ 0, σ 2 0 ) Optimal aggregate sufficient statistics s = σ2 (µ µ 0 ) + µ n s n µ : compensating for the learner s initial belief µ 0. n is the solution to n 1 2 effort (n) + σ2 σ 2 0 = 0 e.g. when effort(n) = cn, n = 1 2c σ2 σ0 2 Not to teach if the learner initially had a narrow mind : σ0 2 < 2cσ2. Unpacking s is trivial, e.g. x 1 =... = x n = s/n σ 2 0 Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
28 Example 5: Teaching a multinomial distribution The world multinomial π = (π 1,..., π K ) The learner Dirichlet prior p(π β) = Γ( β k ) Γ(βk ) K k=1 πβ k 1 k. Step 1: find aggregate sufficient statistics s = (s 1,..., s K ) min s ( K ) K log Γ (β k + s k ) + log Γ(β k + s k ) k=1 k=1 K (β k + s k 1) log πk + effort(s) k=1 Relax s R K 0 Step 2: unpack s k [s k ] for k = 1... K. Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
29 Examples of Example 5 world π = ( 1 10, 3 10, 6 10 ) learner s wrong Dirichlet prior β = (6, 3, 1) If effortless effort(s) = 0, s = (317, 965, 1933) (fmincon) The MLE from D is (0.099, 0.300, 0.601), very close to π. brute-force teaching : using big data to overwhelm the learner s prior If costly effort(s) = 0.3 K k=1 s k, s = (0, 2, 8), T I = Not s = (1, 3, 6): the wrong prior. T I = 4.51 Not s = (317, 965, 1933), T I = Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
30 Example 6: Teaching a multivariate Gaussian world: µ R D and Σ R D D learner likelihood N(x µ, Σ), Normal-Inverse-Wishart (NIW) prior Given x 1,..., x n R D, the aggregate sufficient statistics are s = n i=1 x i, S = n i=1 x ix i Step 1: optimal aggregate sufficient statistics via SDP D log 2 D ( νn + 1 i min ν n + log Γ n,s,s 2 2 i=1 ) ν n 2 log Λ n D 2 log κ n + ν n 2 log Σ tr(σ 1 Λ n ) + κ n 2 (µ µ n ) Σ 1 (µ µ n ) + effort(n, s, S) s.t. S 0; S ii s 2 i /2, i. Step 2: unpack s, S iid initializing x1,..., x n N(µ, Σ ) n solve min vec(s, S) i=1 vec(t (x i)) 2 Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
31 Examples of Example 6 The target Gaussian is µ = (0, 0, 0) and Σ = I The learner s NIW prior µ 0 = (1, 1, 1), κ 0 = 1, ν 0 = , Λ 0 = 10 5 I. expensive effort(n, s, S) = n Optimal D with n = 4, unpacked into a tetrahedron T I(D) = Four points N(µ, Σ ) have mean(t I) = 9.06 ± 3.34, min(t I) = 1.99, max(t I) = (100,000 trials) Zhu (Univ. Wisconsin-Madison) How can a Machine Learn NLP&CC / 31
Introduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationMachine Teaching. for Personalized Education, Security, Interactive Machine Learning. Jerry Zhu
Machine Teaching for Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS 2015 Workshop on Machine Learning from and for Adaptive User Technologies Supervised Learning Review D:
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationGaussians Linear Regression Bias-Variance Tradeoff
Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationGeneralized Linear Models
Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationMore Spectral Clustering and an Introduction to Conjugacy
CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationLecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.
Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationCS446: Machine Learning Fall Final Exam. December 6 th, 2016
CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationComputational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept
Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationNPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic
NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014
More informationLecture 7: Passive Learning
CS 880: Advanced Complexity Theory 2/8/2008 Lecture 7: Passive Learning Instructor: Dieter van Melkebeek Scribe: Tom Watson In the previous lectures, we studied harmonic analysis as a tool for analyzing
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationOptimal Teaching for Online Perceptrons
Optimal Teaching for Online Perceptrons Xuezhou Zhang Hrag Gorune Ohannessian Ayon Sen Scott Alfeld Xiaojin Zhu Department of Computer Sciences, University of Wisconsin Madison {zhangxz1123, gorune, ayonsn,
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationUSEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*
USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate
More informationIntroduc)on to Bayesian methods (con)nued) - Lecture 16
Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of
More informationLogistic regression and linear classifiers COMS 4771
Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationLecture 11: Probability Distributions and Parameter Estimation
Intelligent Data Analysis and Probabilistic Inference Lecture 11: Probability Distributions and Parameter Estimation Recommended reading: Bishop: Chapters 1.2, 2.1 2.3.4, Appendix B Duncan Gillies and
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationLecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016
Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationMachine Learning Gaussian Naïve Bayes Big Picture
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More information