The Blessing and the Curse

Size: px
Start display at page:

Download "The Blessing and the Curse"

Transcription

1 The Blessing and the Curse of the Multiplicative Updates Manfred K. Warmuth University of California, Santa Cruz CMPS 272, Feb 31, 2012 Thanks to David Ilstrup and Anindya Sen for helping with the slides Manfred K. Warmuth (UCSC) The Blessing and the Curse 1 / 51

2 Multiplicative updates Definition Algorithm maintains a weight vector Weights multiplied by non-negative factors Motivated by relative entropies as regularizer Big Questions Why are multiplicative updates so prevalent in nature? Are additive updates used? Are matrix versions of multiplicative updates used? Manfred K. Warmuth (UCSC) The Blessing and the Curse 2 / 51

3 Machine learning vs nature Typical multiplicative update in ML Set of experts predict something on a trial by trial basis s i is belief in of ith expert at trial t Update: [V,LW] s i := s i e ηloss i Z where η learning rate and Z normalizer Bayes update is special case Nature Weights are concentrations / shares in a population Next: implementation of Bayes in the tube Manfred K. Warmuth (UCSC) The Blessing and the Curse 3 / 51

4 Multiplicative updates Blessing Speed Curse Loss of variety Will give three machine learning methods for preventing the curse and discuss related methods used in nature Manfred K. Warmuth (UCSC) The Blessing and the Curse 4 / 51

5 In-vitro selection algorithm for finding RNA strand that binds to protein Make tube of random RNA strands protein sheet for t=1 to 20 do 1 Dip sheet with protein into tube 2 Pull out and wash off RNA that stuck 3 Multiply washed off RNA back to original amount (normalization) Manfred K. Warmuth (UCSC) The Blessing and the Curse 5 / 51

6 Modifications Start with modifications of a good candidate RNA strand Put a similar protein into solution Now attaching must be more specific Selection for a better variant Can t be done with computer because RNA strands per liter of soup Manfred K. Warmuth (UCSC) The Blessing and the Curse 6 / 51

7 Basic scheme Start with unit amount of random RNA Loop 1 Functional separation into good RNA and bad RNA 2 Amplify good RNA to unit amount Manfred K. Warmuth (UCSC) The Blessing and the Curse 7 / 51

8 Duplicating DNA with PCR Invented by Kary Mullis 1985 Short primers hybridize at the ends Polymerase runs along DNA strand and complements bases Ideally all DNA strands multiplied by factor of 2 Manfred K. Warmuth (UCSC) The Blessing and the Curse 8 / 51

9 In-vitro selection algorithm for proteins DNA strands chemically similar RNA strands more varied and can form folded structures (Conjectured that first enzymes made of RNA) DNA Letters 4 RNA (Ribosome) Protein 4 20 Proteins are the materials of life For any material property, a protein can be found How to select for a protein with certain functionality? Manfred K. Warmuth (UCSC) The Blessing and the Curse 9 / 51

10 Tag proteins with their DNA Manfred K. Warmuth (UCSC) The Blessing and the Curse 10 / 51

11 In-vitro selection for protein Initialize tube with tagged random protein strands (tag of a protein is its RNA coding) Loop 1 Functional separation 2 Retrieve tags from good protein 3 Reproduce tagged protein from retrieved tags with Ribosome Almost doable... Manfred K. Warmuth (UCSC) The Blessing and the Curse 11 / 51

12 The caveat Not many interesting/specific functional separations found Need high throughput for functional separation Manfred K. Warmuth (UCSC) The Blessing and the Curse 12 / 51

13 Mathematical description of in-vitro selection Name the unknown RNA strands as 1, 2,... n where n (# of strands in 1 liter) s i 0 is share of RNA i Contents of tube represented as share vector s = (s 1, s 2,..., s n ) When tube has unit amount, then s 1 + s s n = 1 W i [0, 1] is the fraction of one unit of RNA i that is good W i is fitness of RNA strand i for attaching to protein Protein represented as fitness vector W = (W 1, W 2,..., W n ) Vectors s, W unknown: Blind Computation! Strong assumption: Fitness W i independent of share vector s Manfred K. Warmuth (UCSC) The Blessing and the Curse 13 / 51

14 Update Good RNA in tube s: s 1 W 1 + s 2 W s n W n = s W Bad RNA: s 1 (1 W 1 ) + s 2 (1 W 2 ) + + s n (1 W n ) = 1 s W Amplification: Good share of RNA i is s i W i - multiplied by factor F If precise, then all good RNA multiplied by same factor F Final tube at end of loop F s W Since final tube has unit amount of RNA Update in each loop F s W = 1 and F = 1 s W s i := s iw i s W Manfred K. Warmuth (UCSC) The Blessing and the Curse 14 / 51

15 Implementing Bayes rule in RNA s i is prior P(i) W i [0..1] is probability P(Good i) s W is probability P(Good) s i := s iw i s W is Bayes rule P(i Good) = }{{} posterior prior {}}{ P(i) P(Good i) P(Good) Manfred K. Warmuth (UCSC) The Blessing and the Curse 15 / 51

16 Iterating Bayes Rule with same data likelihoods W 1 =.9 s 1 shares W 2 =.75 s 2 s 3 W 3 =.8 Initial s = (.01,.39,.6) W = (.9,.75,.8) t Manfred K. Warmuth (UCSC) The Blessing and the Curse 16 / 51

17 max i W i eventually wins.9 shares.84.7 Largest x i always increasing Smallest x i always decreasing.85 s 0,i W i t normalization s t,i = t is speed parameter Manfred K. Warmuth (UCSC) The Blessing and the Curse 17 / 51

18 Negative range of t reveals sigmoids Smallest: Reverse Sigmoid Largest: Sigmoid takes over.49 Manfred K. Warmuth (UCSC) The Blessing and the Curse 18 / 51

19 The best are too good Multiplicative update s i s i fitness factor i }{{} 0 Blessing: Best get amplified exponentially fast Curse: The best wipe out the others Loss of variety Manfred K. Warmuth (UCSC) The Blessing and the Curse 19 / 51

20 A key problem 3 Strands of RNA Want to amplify mixture while keeping percentages unchanged A 15% B 60% C 25% Iterative amplification will favor one! Assumption: W i independent of shares Manfred K. Warmuth (UCSC) The Blessing and the Curse 20 / 51

21 Solution Make one long strand with right percentages of A,B,C A B B B C A B B Amplify same long strand many times Chop into constituents Long strand functions as chromosome Free floating genes in the nucleus would compete Loss of variety Manfred K. Warmuth (UCSC) The Blessing and the Curse 21 / 51

22 Coupling preserves diversity A B PCR A A Un-coupled A and B compete A+B PCR A+B Coupled A and B should cooperate Manfred K. Warmuth (UCSC) The Blessing and the Curse 22 / 51

23 Questions: What updates to s represent an iteration of an in-vitro selection algorithm? What updates are possible with blind computation? Must maintain non-negative weights Assumptions: s independent of W s W measureable So far: s i := s i W i normaliz. More tempered update: s i γ s i (1 W i ) + s }{{} i W i, 0 γ < 1 }{{} BAD GOOD s i (γ(1 W i ) + W i ) }{{} 0 Manfred K. Warmuth (UCSC) The Blessing and the Curse 23 / 51

24 More challenging goal Find set of RNA strands that binds to a number of different proteins P 1 P 2 P 3 P 4 P 5 W 1 W 2 W 3 W 4 W 5 t : In trial t, functional separation based on fitness vector W t So far all W t the same... Manfred K. Warmuth (UCSC) The Blessing and the Curse 24 / 51

25 In-vitro selection algorithm P 1 P 2 P 3 P 4 P 5 W t,1 W t,2 u W t Goal Tube Tube: u = (0.5, 0.5, 0,..., 0) The two strands cooperatively solve the problem Related to disjunction of two variables Manfred K. Warmuth (UCSC) The Blessing and the Curse 25 / 51

26 Key problem 2 Goal Starting with 1 liter tube in which all strands appear with frequency 10 20, use PCR and..., to arrive at tube : ( 0.5, 0.5, 0,..., 0) Problem Over train with P 1 and P 2 s 1 1, s 2 0 Over train with P 4 and P 5 s 1 0, s 2 1 Want blind computation! W t and initial s unknown Need some kind of feedback in each trial Manfred K. Warmuth (UCSC) The Blessing and the Curse 26 / 51

27 Related machine learning problem Tube share vector s probability simplex N Proteins Example vectors W t {0, 1} N Assumption u = (0, 0, 1 k, 0, 0, 1 k, 0, 0, 1 k, 0, 0, 0) w. k non-zero components of value 1 k s.t. t : u W t 1 2k Goal Find s : s W t 1 2k Manfred K. Warmuth (UCSC) The Blessing and the Curse 27 / 51

28 Normalized Winnow Algorithm (N. Littlestone) Do passes over examples W t if s t 1 W t 1 2k then s t := s t 1 conservative update { st 1,i if W else s t,i := t,i = 0 αs t 1,i if W t,i = 1, where α > 1 and re-normalize O(k log N k ) updates needed when labels consistent with k-literal disjunction Logarithmic in N Optimal to within constant factors Manfred K. Warmuth (UCSC) The Blessing and the Curse 28 / 51

29 Back to in-vitro selection Fix If good side large enough, then don t update if s t 1 W t δ : then s t := s t 1 conservative update else s t,i := s t 1,i ((1 s t,i )γ t + s t,i ), where, 0 γ t < 1 }{{} non-neg.factor and re-normalize Can simulate Normalized Winnow Concerns How exactly can we measure s W? Some RNA might be inactive! Manfred K. Warmuth (UCSC) The Blessing and the Curse 29 / 51

30 Problems with in-vitro selection Story so far For learning multiple goals - prevent overtraining with conservative update Alternate trick Cap weights - Start with nature - How we used same trick in Machine learning Manfred K. Warmuth (UCSC) The Blessing and the Curse 30 / 51

31 Alternate trick: cap weights Super predator algorithm Preserves variety c Lion copyrights belong to Andy Warhol Manfred K. Warmuth (UCSC) The Blessing and the Curse 31 / 51

32 Alternate trick: cap weights s i = s i e η Loss i Z s = capped and rescaled version of s Cap and re-scale rest Manfred K. Warmuth (UCSC) The Blessing and the Curse 32 / 51

33 Why capping? m sets encoded as probability vectors (0, 1 m, 0, 0, 1 m, 0, 1 m ) called m-corners ( n m ) of them The convex hull of the m-corners = capped probability simplex (½,0, ½) (0,½, ½) (½,½,0) Combined update converges to best corner of capped simplex - best set of size m Capping lets us learn multiple goals How to cap in in-vitro selection setting? Manfred K. Warmuth (UCSC) The Blessing and the Curse 33 / 51

34 What if environment changes over time Initial tube contains different strands After a while - Loss of variety - Some selfish strand manages to dominate without doing the wanted function Fixes - Mix in a little bit of initial soup for enrichment - Or do sloppy PCR that introduces mutations Manfred K. Warmuth (UCSC) The Blessing and the Curse 34 / 51

35 Disk spindown problem Experts a set of n fixed timeouts : W 1, W 2,..., W n Master Alg. maintains a set of weights : s 1, s 2,..., s n Multiplicative update Problem s t+1,i = s t,i e η energy usage of timeout i Z Burst favors long timeout Coefficients of small timeouts wiped out Manfred K. Warmuth (UCSC) The Blessing and the Curse 35 / 51

36 Disk spindown problem Fix Mix in little bit of uniform vector s = Multiplicative Update s = (1 α) s + α ( 1 N, 1 N,..., 1 ), α is small N Nature uses mutation for same purpose Better fix Keep track of past average share vector r s = Multiplicative Update s = (1 α) s + α r - facilitates switch to previously useful vector - long-term memory Manfred K. Warmuth (UCSC) The Blessing and the Curse 36 / 51

37 Question Long Term Memory How is it realized in nature? Summary Sex? Junk DNA? 1 Conservative update - learn multiple goals 2 Upper bounding weights - 3 Lower bounding weights - robust against change Manfred K. Warmuth (UCSC) The Blessing and the Curse 37 / 51

38 Motivations of the updates? Motivate additive and multiplicative updates Use linear regression as example problem Deep underlying question What are updates used by nature and why? Manfred K. Warmuth (UCSC) The Blessing and the Curse 38 / 51

39 Online linear regression For t = 1, 2,... Get instance x t R n Predict ŷ t = s t x t Get label y t R Incur square loss (y t ŷ t ) 2 Update s t s t+1 y y t } w t ŷ t x t x Manfred K. Warmuth (UCSC) The Blessing and the Curse 39 / 51

40 Two main update families - linear regression Additive s t+1 = s t η (s t x t y t )x t }{{} Gradient of Square Loss Motivated by squared Euclidean distance Weights can go negative Gradient Descent (GD) - SKIING Multiplicative s t+1,i = s t,i e η(s t x t y t )x t,i Z t Motivated by relative entropy Updated weight vector stays on probability simplex Exponentiated Gradient (EG) - LIFE! [KW97] Manfred K. Warmuth (UCSC) The Blessing and the Curse 40 / 51

41 Additive Updates Goal Minimize tradeoff between closeness to last share vector and loss on last example s t+1 = argmin s U(s) = s s t 2 2 }{{} divergence η > 0 is the learning rate/speed U(s) +η (s x t y t ) 2 }{{} loss Manfred K. Warmuth (UCSC) The Blessing and the Curse 41 / 51

42 Additive updates Therefore, U(s) s i si =s t+1,i = 2(s t+1,i s t,i ) + 2η(s x t y t )x t,i = 0 implicit: s t+1 = s t η(s t+1 x t y t )x t explicit: = s t η(s t x t y t )x t Manfred K. Warmuth (UCSC) The Blessing and the Curse 42 / 51

43 Multiplicative updates s t+1 = argmin U(s) i s i =1 where U(s) = s i ln s i +η(s x t y t ) 2 s }{{ t,i } relative entropy Define Lagrangian L(s) = s i ln s i where λ Lagrange coeff. s t,i + η(s x t y t ) 2 + λ( i s i 1) Manfred K. Warmuth (UCSC) The Blessing and the Curse 43 / 51

44 Multiplicative updates L(s) s i = ln s i s t,i η(s x t y t )x t,i + λ = 0 ln s t+1,i s t,i = η(s t+1 x t y t )x t,i λ 1 s t+1,i = s t,i e η(s t+1 x t y t )x t,i e λ 1 Enforce normalization constraint by setting e λ 1 to 1/Z t implicit: s t+1,i = s t,ie ηs t+1 x t y t )x t,i Z t explicit: = s t,ie ηs t x t y t )x t,i Z t Manfred K. Warmuth (UCSC) The Blessing and the Curse 44 / 51

45 Connection to Biology One species for each of the n dimension Fitness rates of species i in time interval (t, t + 1] clamped to w i = η(s t+1 x t y t )x t,i Fitness W i = e w i Algorithm can be seen as running a population Can t go to continuous time, since data points arrive at discrete times 1,2,3,... Manfred K. Warmuth (UCSC) The Blessing and the Curse 45 / 51

46 GD via differential equations Here f (x t ) = f (x t ) t Forward Euler s t+h s t h = η L(s t ) s t+h = s t ηh L(s t ) explicit: s t+1,i h=1 = s t η L(s t ) s t = η L(s t ) Two ways to discretize: Backward Euler s t+h h s t+h h = η L(s t+h ) s t+h = s t ηh L(s t+h ) implicit: s t+1 h=1 = s t η L(s t+1 ) Manfred K. Warmuth (UCSC) The Blessing and the Curse 46 / 51

47 EGU via differential equation Two ways to discretize: log s t = η L(s t ) Forward Euler log s t+h,i log s t,i h = η L(s t ) i s t+h,i = s t,i e ηh L(s t ) i explicit: s t+1,i h=1 = s t,i e η L(s t ) i Backward Euler log s t+h h,i log s t+h,i h = η L(s t+h ) i s t+h,i = s t,i e ηh L(s t+h) i implicit: s t+1,i h=1 = s t,i e η L(s t+1) i Derivation of normalized update more involved Manfred K. Warmuth (UCSC) The Blessing and the Curse 47 / 51

48 EG via differential equation Handle normalization by going one dimension lower ( ln Backward Euler ln s t+1,i 1 n 1 j=1 s t+1,j s t,i 1 n 1 j=1 s t,j ln ) s t,i 1 n 1 j=1 s t,j = η( L(s t ) i L(s t ) n ) = η( L(s t+1 ) i L(s t+1 ) n ) implicit : s t+1,i = s t,i e η L(s t+1) i n j=1 s t,j e η L(s t+1) j Manfred K. Warmuth (UCSC) The Blessing and the Curse 48 / 51

49 Motivation of Bayes Rule s t+1 = argmin U(s) i s i =1 where U(s) = s i ln s i s t,i }{{} relative entropy + s i ( ln P(y t y t 1,..., y 0, i)) i Plug in solution s t+1,i = s t,i P(y t y t 1,..., y 0 ) P(y t y t 1,..., y 0 ) U(s t+1 ) = ln s t,i P(y t y t 1,..., y 0 ) i = ln P(y t y t 1,..., y 0 ) Manfred K. Warmuth (UCSC) The Blessing and the Curse 49 / 51

50 Summary Multiplicative updates converge quickly, but wipe out diversity Changing conditions require reuse of previously learned knowledge/alternatives Diversity is a requirement for success Multiplicative updates occur in nature Diversity preserved through mutations, linking, super-predator Machine learning preserves diversity through conservative update, capping, lower bounding weights, weighted mixtures of the past,... Manfred K. Warmuth (UCSC) The Blessing and the Curse 50 / 51

51 Final question? Does nature use the matrix version of the multiplicative updates? Parameter is density matrix - maintains uncertainty over directions as symmetric positive matrix S exp(logs η L(S)) S := trace of above Quantum relative entropy as regularizer Manfred K. Warmuth (UCSC) The Blessing and the Curse 51 / 51

Putting the Bayes update to sleep

Putting the Bayes update to sleep Putting the Bayes update to sleep Manfred Warmuth UCSC AMS seminar 4-13-15 Joint work with Wouter M. Koolen, Dmitry Adamskiy, Olivier Bousquet Menu How adding one line of code to the multiplicative update

More information

On-line Variance Minimization

On-line Variance Minimization On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel

More information

Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan

Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan UCSC and NICTA Talk at NYAS Conference, 10-27-06 Thanks to Dima and Jun 1 Let s keep it simple Linear Regression 10 8 6 4 2 0 2 4 6 8 8 6 4 2

More information

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over

More information

1 Review and Overview

1 Review and Overview DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Other Topologies The back-propagation procedure is not limited to feed-forward cascades. It can be applied to networks of module with any topology,

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed

More information

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun Y. LeCun: Machine Learning and Pattern Recognition p. 1/3 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun The Courant Institute, New

More information

The Free Matrix Lunch

The Free Matrix Lunch The Free Matrix Lunch Wouter M. Koolen Wojciech Kot lowski Manfred K. Warmuth Tuesday 24 th April, 2012 Koolen, Kot lowski, Warmuth (RHUL) The Free Matrix Lunch Tuesday 24 th April, 2012 1 / 26 Introduction

More information

Linear classifiers: Logistic regression

Linear classifiers: Logistic regression Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension

Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension Journal of Machine Learning Research 9 (2008) 2287-2320 Submitted 9/07; Published 10/08 Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension Manfred K. Warmuth Dima

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

A Simple Protein Synthesis Model

A Simple Protein Synthesis Model A Simple Protein Synthesis Model James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University September 3, 213 Outline A Simple Protein Synthesis Model

More information

1 Review of Winnow Algorithm

1 Review of Winnow Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random

More information

Bayesian Learning Extension

Bayesian Learning Extension Bayesian Learning Extension This document will go over one of the most useful forms of statistical inference known as Baye s Rule several of the concepts that extend from it. Named after Thomas Bayes this

More information

UNIT 5. Protein Synthesis 11/22/16

UNIT 5. Protein Synthesis 11/22/16 UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Tom Bylander Division of Computer Science The University of Texas at San Antonio San Antonio, Texas 7849 bylander@cs.utsa.edu April

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations

UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations Title Optimal Online Learning with Matrix Parameters Permalink https://escholarship.org/uc/item/7xf477q3 Author Nie, Jiazhong Publication

More information

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof

More information

Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection

Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection Koji Tsuda, Gunnar Rätsch and Manfred K. Warmuth Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076

More information

In this lecture, we will consider how to analyse an electrical circuit by applying KVL and KCL. As a result, we can predict the voltages and currents

In this lecture, we will consider how to analyse an electrical circuit by applying KVL and KCL. As a result, we can predict the voltages and currents In this lecture, we will consider how to analyse an electrical circuit by applying KVL and KCL. As a result, we can predict the voltages and currents around an electrical circuit. This is a short lecture,

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13 Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13 Beyond Binary Classification Before we ve talked about

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE

Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Gene Switches Teacher Information

Gene Switches Teacher Information STO-143 Gene Switches Teacher Information Summary Kit contains How do bacteria turn on and turn off genes? Students model the action of the lac operon that regulates the expression of genes essential for

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Relative Loss Bounds for Multidimensional Regression Problems

Relative Loss Bounds for Multidimensional Regression Problems Relative Loss Bounds for Multidimensional Regression Problems Jyrki Kivinen and Manfred Warmuth Presented by: Arindam Banerjee A Single Neuron For a training example (x, y), x R d, y [0, 1] learning solves

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

1 Review of the Perceptron Algorithm

1 Review of the Perceptron Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Putting Bayes to sleep

Putting Bayes to sleep Putting Bayes to sleep Wouter M. Koolen Dmitry Adamskiy Manfred Warmuth Thursday 30 th August, 2012 Menu How a beautiful and intriguing piece of technology provides new insights in existing methods and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Adaptive Gradient Methods AdaGrad / Adam

Adaptive Gradient Methods AdaGrad / Adam Case Study 1: Estimating Click Probabilities Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 The Problem with GD (and SGD)

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Statistical Computing (36-350)

Statistical Computing (36-350) Statistical Computing (36-350) Lecture 19: Optimization III: Constrained and Stochastic Optimization Cosma Shalizi 30 October 2013 Agenda Constraints and Penalties Constraints and penalties Stochastic

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

More Protein Synthesis and a Model for Protein Transcription Error Rates

More Protein Synthesis and a Model for Protein Transcription Error Rates More Protein Synthesis and a Model for Protein James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University October 3, 2013 Outline 1 Signal Patterns Example

More information

K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25

K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25 K-Lists Anindya Sen and Corrie Scalisi University of California, Santa Cruz June 11, 2007 Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

More information

Logistic Regression Trained with Different Loss Functions. Discussion

Logistic Regression Trained with Different Loss Functions. Discussion Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017 1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 8: Sequence Labeling Jimmy Lin University of Maryland Thursday, March 14, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information