Relative Loss Bounds for Multidimensional Regression Problems

Size: px
Start display at page:

Download "Relative Loss Bounds for Multidimensional Regression Problems"

Transcription

1 Relative Loss Bounds for Multidimensional Regression Problems Jyrki Kivinen and Manfred Warmuth Presented by: Arindam Banerjee

2 A Single Neuron For a training example (x, y), x R d, y [0, 1] learning solves min w (y φ(wt x)) 2

3 A Single Neuron For a training example (x, y), x R d, y [0, 1] learning solves min w (y φ(wt x)) 2 The transfer function φ : R R is increasing

4 A Single Neuron For a training example (x, y), x R d, y [0, 1] learning solves min w (y φ(wt x)) 2 The transfer function φ : R R is increasing Popular choice φ(z) = exp( z)

5 A Single Neuron For a training example (x, y), x R d, y [0, 1] learning solves min w (y φ(wt x)) 2 The transfer function φ : R R is increasing Popular choice φ(z) = exp( z) Squared loss and Logistic transfer do not match

6 A Single Neuron For a training example (x, y), x R d, y [0, 1] learning solves min w (y φ(wt x)) 2 The transfer function φ : R R is increasing Popular choice φ(z) = exp( z) Squared loss and Logistic transfer do not match May lead to exponentially many local minima Each dimension has a local minimum for each example With n examples in d dimensions, n d d local minima

7 A Single Neuron For a training example (x, y), x R d, y [0, 1] learning solves min w (y φ(wt x)) 2 The transfer function φ : R R is increasing Popular choice φ(z) = exp( z) Squared loss and Logistic transfer do not match May lead to exponentially many local minima Each dimension has a local minimum for each example With n examples in d dimensions, n d d local minima Motivates matching of loss function and transfer function

8 Generalized Linear Models Employ matching loss and transfer functions L ψ (y i, φ(w T x i ))

9 Generalized Linear Models Employ matching loss and transfer functions L ψ (y i, φ(w T x i )) Convex function of w with a single minimum

10 Generalized Linear Models Employ matching loss and transfer functions L ψ (y i, φ(w T x i )) Convex function of w with a single minimum For logistic transfer function, matching loss is L ψ (y, ŷ) = KL(y ŷ) = y log y ŷ + (1 y) log 1 y 1 ŷ

11 Generalized Linear Models Employ matching loss and transfer functions L ψ (y i, φ(w T x i )) Convex function of w with a single minimum For logistic transfer function, matching loss is L ψ (y, ŷ) = KL(y ŷ) = y log y ŷ + (1 y) log 1 y 1 ŷ The empirical loss is convex in w w = min w KL ( y i, ) exp( w T x i )

12 Generalized Linear Models Employ matching loss and transfer functions L ψ (y i, φ(w T x i )) Convex function of w with a single minimum For logistic transfer function, matching loss is L ψ (y, ŷ) = KL(y ŷ) = y log y ŷ + (1 y) log 1 y 1 ŷ The empirical loss is convex in w w = min w KL ( y i, ) exp( w T x i ) We ignore the statistical connections for now

13 Bregman Divergences Let f be a differentiable, strictly convex function with gradient ψ, i.e., f(x) = ψ(x). The Bregman Divergence L ψ is defined as L ψ (x, y) = f(x) f(y) (x y) T ψ(y)

14 Bregman Divergences Let f be a differentiable, strictly convex function with gradient ψ, i.e., f(x) = ψ(x). The Bregman Divergence L ψ is defined as L ψ (x, y) = f(x) f(y) (x y) T ψ(y) f(z)= z 2 /2 L ψ (x,y)= x y 2 h(z) y x Squared Euclidean distance is a Bregman divergence

15 Bregman Divergences Let f be a differentiable, strictly convex function with gradient ψ, i.e., f(x) = ψ(x). The Bregman Divergence L ψ is defined as L ψ (x, y) = f(x) f(y) (x y) T ψ(y) f(z)=z log z L ψ (x,y)=x log x y x+y h(z) y x Relative Entropy (KL-divergence) is another Bregman divergence

16 Bregman Divergences Let f be a differentiable, strictly convex function with gradient ψ, i.e., f(x) = ψ(x). The Bregman Divergence L ψ is defined as L ψ (x, y) = f(x) f(y) (x y) T ψ(y) f(z)= log z y x h(z) L ψ (x,y)= x y log x y 1 Itakura-Saito distance is another Bregman divergence.

17 Properties of Bregman Divergences L ψ (x,y) 0, and equals 0 iff x = y Not necessarily symmetric, i.e., L ψ (x,y) L ψ (y,x) Not a metric, i.e., triangle inequality does not hold Strictly convex in the first argument Not necessarily convex in the second argument Three-point property L ψ (x,y) = L ψ (x,z) + L ψ (z,y) (x z) T (ψ(y) ψ(z)) Generalized Pythagoras Theorem: L ψ (x,y) = L ψ (x,z) + L ψ (z,y)

18 Bregman Divergence and The Conjugate Recall that f (λ) = sup x (λ T x f( x)) = λ T x f(x) f(x) = sup λ ( λ T x f ( λ)) = λ T x f (λ)

19 Bregman Divergence and The Conjugate Recall that f (λ) = sup x (λ T x f( x)) = λ T x f(x) f(x) = sup λ ( λ T x f ( λ)) = λ T x f (λ) Hence, f(x) + f (λ) = x T λ

20 Bregman Divergence and The Conjugate Recall that f (λ) = sup x (λ T x f( x)) = λ T x f(x) f(x) = sup λ ( λ T x f ( λ)) = λ T x f (λ) Hence, f(x) + f (λ) = x T λ Further, with ψ(x) = f(x), φ(λ) = f (λ), λ = f(x) = ψ(x) x = f (λ) = φ(λ)

21 Bregman Divergence and The Conjugate Recall that f (λ) = sup x (λ T x f( x)) = λ T x f(x) f(x) = sup λ ( λ T x f ( λ)) = λ T x f (λ) Hence, f(x) + f (λ) = x T λ Further, with ψ(x) = f(x), φ(λ) = f (λ), λ = f(x) = ψ(x) x = f (λ) = φ(λ) As a result, L ψ (x 1,x 2 ) = f(x 1 ) + f (λ 2 ) x T 1 λ 2 = L φ (λ 2, λ 1 )

22 Generalized Linear Models, Take 2 For matching loss and transfer function L ψ (y i, φ(w T x i )) = L φ (w T x i, ψ(y i ))

23 Generalized Linear Models, Take 2 For matching loss and transfer function L ψ (y i, φ(w T x i )) = L φ (w T x i, ψ(y i )) The loss is a convex function of w with a single minimum

24 Generalized Linear Models, Take 2 For matching loss and transfer function L ψ (y i, φ(w T x i )) = L φ (w T x i, ψ(y i )) The loss is a convex function of w with a single minimum The strategy for getting matching loss

25 Generalized Linear Models, Take 2 For matching loss and transfer function L ψ (y i, φ(w T x i )) = L φ (w T x i, ψ(y i )) The loss is a convex function of w with a single minimum The strategy for getting matching loss Choose you favorite increasing transfer function φ

26 Generalized Linear Models, Take 2 For matching loss and transfer function L ψ (y i, φ(w T x i )) = L φ (w T x i, ψ(y i )) The loss is a convex function of w with a single minimum The strategy for getting matching loss Choose you favorite increasing transfer function φ Let ψ = φ 1

27 Generalized Linear Models, Take 2 For matching loss and transfer function L ψ (y i, φ(w T x i )) = L φ (w T x i, ψ(y i )) The loss is a convex function of w with a single minimum The strategy for getting matching loss Choose you favorite increasing transfer function φ Let ψ = φ 1 Let L ψ be the Bregman divergence derived from ψ

28 Generalized Linear Models, Take 2 For matching loss and transfer function L ψ (y i, φ(w T x i )) = L φ (w T x i, ψ(y i )) The loss is a convex function of w with a single minimum The strategy for getting matching loss Choose you favorite increasing transfer function φ Let ψ = φ 1 Let L ψ be the Bregman divergence derived from ψ Set up the problem as L ψ (y i, φ(w T x i ))

29 Generalized Linear Models, Take 2 For matching loss and transfer function L ψ (y i, φ(w T x i )) = L φ (w T x i, ψ(y i )) The loss is a convex function of w with a single minimum The strategy for getting matching loss Choose you favorite increasing transfer function φ Let ψ = φ 1 Let L ψ be the Bregman divergence derived from ψ Set up the problem as L ψ (y i, φ(w T x i )) Note that ψ = f and φ = f are inverses of each other

30 Online Learning: The Basic Setting Prediction proceeds in trials t = 1,...,T At trial t, the algorithm Gets an input x t Makes a prediction ŷ t = φ(w T t x t ) Receives true output y t Updates weight vector to w t+1 Consider the loss function L ψ (y t, φ(w T t x t )) = L φ (w T t x t, ψ(y t )) = (y t w T t x t ) 2 The corresponding gradient descent update, with ŷ t = w T t x t w t+1 = w t η(ŷ t y t )x t What happens in the general case?

31 The General Setting General: Multidimensional output, arbitrary matching loss

32 The General Setting General: Multidimensional output, arbitrary matching loss The loss on trial t L ψ (y t, φ(w t x t )) = L φ (W t x t, ψ(y t )) where W t is the weight matrix at trial t

33 The General Setting General: Multidimensional output, arbitrary matching loss The loss on trial t L ψ (y t, φ(w t x t )) = L φ (W t x t, ψ(y t )) where W t is the weight matrix at trial t Consider parameter matrix Θ t = φ(w t ) so that W t = ψ(θ t )

34 The General Setting General: Multidimensional output, arbitrary matching loss The loss on trial t L ψ (y t, φ(w t x t )) = L φ (W t x t, ψ(y t )) where W t is the weight matrix at trial t Consider parameter matrix Θ t = φ(w t ) so that W t = ψ(θ t ) We want to get W = W t+1 as a regularized update It should have low loss on the current example (x t,y t ) It should be close to the current weight matrix W t

35 The General Setting General: Multidimensional output, arbitrary matching loss The loss on trial t L ψ (y t, φ(w t x t )) = L φ (W t x t, ψ(y t )) where W t is the weight matrix at trial t Consider parameter matrix Θ t = φ(w t ) so that W t = ψ(θ t ) We want to get W = W t+1 as a regularized update It should have low loss on the current example (x t,y t ) It should be close to the current weight matrix W t The cost function to minimize U(W) = L φ (W, W t ) + ηl ψ (y t, φ(wx t )) = L φ (W, W t ) + ηl φ (Wx t, ψ(y t ))

36 1 The Gradient Update Set gradient w.r.t. w j to 0 φ(w j t+1 ) = φ(wj t) η(φ j (W t+1 x t ) y j )x t

37 1 The Gradient Update Set gradient w.r.t. w j to 0 φ(w j t+1 ) = φ(wj t) η(φ j (W t+1 x t ) y j )x t Recall that Θ t+1 = φ(w t+1 ) with θ j t+1 = φ(wj t+1 )

38 1 The Gradient Update Set gradient w.r.t. w j to 0 φ(w j t+1 ) = φ(wj t) η(φ j (W t+1 x t ) y j )x t Recall that Θ t+1 = φ(w t+1 ) with θ j t+1 = φ(wj t+1 ) The update equation θ j t+1 = θj t η(φ j (W t+1 x t ) y j )x t θ j t η(φ j (W t x t ) y j )x t = θ j t η(ŷ j t y j )x t

39 1 The Gradient Update Set gradient w.r.t. w j to 0 φ(w j t+1 ) = φ(wj t) η(φ j (W t+1 x t ) y j )x t Recall that Θ t+1 = φ(w t+1 ) with θ j t+1 = φ(wj t+1 ) The update equation θ j t+1 = θj t η(φ j (W t+1 x t ) y j )x t θ j t η(φ j (W t x t ) y j )x t = θ j t η(ŷ j t y j )x t The updated W t+1 = ψ(θ t+1 )

40 1 The Gradient Update Set gradient w.r.t. w j to 0 φ(w j t+1 ) = φ(wj t) η(φ j (W t+1 x t ) y j )x t Recall that Θ t+1 = φ(w t+1 ) with θ j t+1 = φ(wj t+1 ) The update equation θ j t+1 = θj t η(φ j (W t+1 x t ) y j )x t θ j t η(φ j (W t x t ) y j )x t = θ j t η(ŷ j t y j )x t The updated W t+1 = ψ(θ t+1 ) Weight vector W t+1 gets updated using the Θ t+1 representation

41 1 The Gradient Update Set gradient w.r.t. w j to 0 φ(w j t+1 ) = φ(wj t) η(φ j (W t+1 x t ) y j )x t Recall that Θ t+1 = φ(w t+1 ) with θ j t+1 = φ(wj t+1 ) The update equation θ j t+1 = θj t η(φ j (W t+1 x t ) y j )x t θ j t η(φ j (W t x t ) y j )x t = θ j t η(ŷ j t y j )x t The updated W t+1 = ψ(θ t+1 ) Weight vector W t+1 gets updated using the Θ t+1 representation The additive update is on the conjugate space

42 1 The Gradient Update Set gradient w.r.t. w j to 0 φ(w j t+1 ) = φ(wj t) η(φ j (W t+1 x t ) y j )x t Recall that Θ t+1 = φ(w t+1 ) with θ j t+1 = φ(wj t+1 ) The update equation θ j t+1 = θj t η(φ j (W t+1 x t ) y j )x t θ j t η(φ j (W t x t ) y j )x t = θ j t η(ŷ j t y j )x t The updated W t+1 = ψ(θ t+1 ) Weight vector W t+1 gets updated using the Θ t+1 representation The additive update is on the conjugate space GD and EG are special cases

43 1 Relative Loss Bounds Cumulative loss of General Additive (GA) algorithm Loss φ (GA, S) = T L ψ (y t, φ(w t x t )) t=1 Best cumulative loss looking back Loss φ (W, S) = T L ψ (y t, φ(wx t )) t=1 Two types of relative loss bounds Loss φ (GA, S) p Loss φ (W, S) + q Loss φ (GA, S) Loss φ (W, S) + q 1 Loss φ (W, S) + q 2

44 1 The Statistics Connection: Overview Exponential family distribution p θ (z) = exp(z T θ f(θ))p 0 (z)

45 1 The Statistics Connection: Overview Exponential family distribution p θ (z) = exp(z T θ f(θ))p 0 (z) Exponential families Bregman divergences p θ (z) = exp(z T θ f(θ))p 0 (z) = exp( L ψ (y, µ))f 0 (y) where ψ = φ 1 = ( f) 1,y = φ(z), µ = φ(θ)

46 1 The Statistics Connection: Overview Exponential family distribution p θ (z) = exp(z T θ f(θ))p 0 (z) Exponential families Bregman divergences p θ (z) = exp(z T θ f(θ))p 0 (z) = exp( L ψ (y, µ))f 0 (y) where ψ = φ 1 = ( f) 1,y = φ(z), µ = φ(θ) Consider models where θ = w T x µ = φ(w T x)

47 1 The Statistics Connection: Overview Exponential family distribution p θ (z) = exp(z T θ f(θ))p 0 (z) Exponential families Bregman divergences p θ (z) = exp(z T θ f(θ))p 0 (z) = exp( L ψ (y, µ))f 0 (y) where ψ = φ 1 = ( f) 1,y = φ(z), µ = φ(θ) Consider models where θ = w T x µ = φ(w T x) Maximum likelihood estimation boils down to max θ log p(z) = min w L ψ (y i, φ(w T x i ))

48 1 The Statistics Connection: Overview Exponential family distribution p θ (z) = exp(z T θ f(θ))p 0 (z) Exponential families Bregman divergences p θ (z) = exp(z T θ f(θ))p 0 (z) = exp( L ψ (y, µ))f 0 (y) where ψ = φ 1 = ( f) 1,y = φ(z), µ = φ(θ) Consider models where θ = w T x µ = φ(w T x) Maximum likelihood estimation boils down to max θ log p(z) = min w L ψ (y i, φ(w T x i )) This is the problem we discussed today

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

Bregman Divergences. Barnabás Póczos. RLAI Tea Talk UofA, Edmonton. Aug 5, 2008

Bregman Divergences. Barnabás Póczos. RLAI Tea Talk UofA, Edmonton. Aug 5, 2008 Bregman Divergences Barnabás Póczos RLAI Tea Talk UofA, Edmonton Aug 5, 2008 Contents Bregman Divergences Bregman Matrix Divergences Relation to Exponential Family Applications Definition Properties Generalization

More information

Relative Loss Bounds for Multidimensional Regression Problems*

Relative Loss Bounds for Multidimensional Regression Problems* Machine Learning, 45, 301 329, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Relative Loss Bounds for Multidimensional Regression Problems* J. KIVINEN jyrki.kivinen@faceng.anu.edu.au

More information

Kernel Learning with Bregman Matrix Divergences

Kernel Learning with Bregman Matrix Divergences Kernel Learning with Bregman Matrix Divergences Inderjit S. Dhillon The University of Texas at Austin Workshop on Algorithms for Modern Massive Data Sets Stanford University and Yahoo! Research June 22,

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Lecture 4 Backpropagation

Lecture 4 Backpropagation Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machine Learning Lecturer: Philippe Rigollet Lecture 3 Scribe: Mina Karzand Oct., 05 Previously, we analyzed the convergence of the projected gradient descent algorithm. We proved

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Logistic Regression Trained with Different Loss Functions. Discussion

Logistic Regression Trained with Different Loss Functions. Discussion Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Belief Propagation, Information Projections, and Dykstra s Algorithm

Belief Propagation, Information Projections, and Dykstra s Algorithm Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Gradient-Based Learning. Sargur N. Srihari

Gradient-Based Learning. Sargur N. Srihari Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Neural networks III: The delta learning rule with semilinear activation function

Neural networks III: The delta learning rule with semilinear activation function Neural networks III: The delta learning rule with semilinear activation function The standard delta rule essentially implements gradient descent in sum-squared error for linear activation functions. We

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Part 2: Generalized output representations and structure

Part 2: Generalized output representations and structure Part 2: Generalized output representations and structure Dale Schuurmans University of Alberta Output transformation Output transformation What if targets y special? E.g. what if y nonnegative y 0 y probability

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Preliminary draft only: please check for final version

Preliminary draft only: please check for final version ARE211, Fall2012 CALCULUS4: THU, OCT 11, 2012 PRINTED: AUGUST 22, 2012 (LEC# 15) Contents 3. Univariate and Multivariate Differentiation (cont) 1 3.6. Taylor s Theorem (cont) 2 3.7. Applying Taylor theory:

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

1 Review and Overview

1 Review and Overview DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Machine learning with quantum relative entropy

Machine learning with quantum relative entropy Journal of Physics: Conference Series Machine learning with quantum relative entropy To cite this article: Koji Tsuda 2009 J. Phys.: Conf. Ser. 143 012021 View the article online for updates and enhancements.

More information

This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing

More information

A Generalization of Principal Component Analysis to the Exponential Family

A Generalization of Principal Component Analysis to the Exponential Family A Generalization of Principal Component Analysis to the Exponential Family Michael Collins Sanjoy Dasgupta Robert E. Schapire AT&T Labs Research 8 Park Avenue, Florham Park, NJ 7932 mcollins, dasgupta,

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

Infinitely Imbalanced Logistic Regression

Infinitely Imbalanced Logistic Regression p. 1/1 Infinitely Imbalanced Logistic Regression Art B. Owen Journal of Machine Learning Research, April 2007 Presenter: Ivo D. Shterev p. 2/1 Outline Motivation Introduction Numerical Examples Notation

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection

Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection Koji Tsuda, Gunnar Rätsch and Manfred K. Warmuth Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076

More information

Computational Learning Theory - Hilary Term : Learning Real-valued Functions

Computational Learning Theory - Hilary Term : Learning Real-valued Functions Computational Learning Theory - Hilary Term 08 8 : Learning Real-valued Functions Lecturer: Varun Kanade So far our focus has been on learning boolean functions. Boolean functions are suitable for modelling

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Predictive Models Predictive Models 1 / 34 Outline

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Statistical Ranking Problem

Statistical Ranking Problem Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing

More information

Generative Adversarial Networks

Generative Adversarial Networks Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Convex optimization COMS 4771

Convex optimization COMS 4771 Convex optimization COMS 4771 1. Recap: learning via optimization Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w 2 2 + 1 n n [ ] 1 y ix T i w. + 1 / 15 Soft-margin

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016 AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the

More information

Optimized first-order minimization methods

Optimized first-order minimization methods Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure

More information

Learning Binary Classifiers for Multi-Class Problem

Learning Binary Classifiers for Multi-Class Problem Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all

More information

NN V: The generalized delta learning rule

NN V: The generalized delta learning rule NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown

More information

Statistical Learning Theory and the C-Loss cost function

Statistical Learning Theory and the C-Loss cost function Statistical Learning Theory and the C-Loss cost function Jose Principe, Ph.D. Distinguished Professor ECE, BME Computational NeuroEngineering Laboratory and principe@cnel.ufl.edu Statistical Learning Theory

More information

Logistic Regression. Seungjin Choi

Logistic Regression. Seungjin Choi Logistic Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random

More information

Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE

Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Logistic Regression

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information