CS Lecture 13. More Maximum Likelihood
|
|
- Bryan Evans
- 5 years ago
- Views:
Transcription
1 CS 6347 Lecture 13 More Maxiu Likelihood
2 Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2
3 Maxiu Likelihood Estiation Given saples x 1,, x M fro soe unknown distribution with paraeters θ The log-likelihood of the evidence is defined to be log l θ = log p(x θ) Goal: axiize the log-likelihood 3
4 MLE for MRFs Let s copute the MLE for MRFs that factor over the graph G as p x θ = 1 Z(θ) ς C ψ C x C θ The paraeters θ control the allowable potential functions Again, suppose we have saples x 1,, x M fro soe unknown MRF of this for log l θ = C log ψ C x C θ M log Z (θ) 4
5 MLE for MRFs Let s copute the MLE for MRFs that factor over the graph G as p x θ = 1 Z(θ) ς C ψ C x C θ The paraeters θ control the allowable potential functions Again, suppose we have saples x 1,, x M fro soe unknown MRF of this for log l θ = C log ψ C x C θ M log Z (θ) Z(θ) couples all of the potential functions together! Even coputing Z(θ) by itself was a challenging task 5
6 Conditional Rando Fields Learning MRFs is quite restrictive Most real probles are really conditional odels Exaple: iage segentation Represent a segentation proble as a MRF over a two diensional grid Each x i is an binary variable indicating whether or not the pixel is in the foreground or the background How do we incorporate pixel inforation? The potentials over the edge (i, j) of the MRF should depend on x i, x j as well as the pixel inforation at nodes i and j 6
7 Iage Segentation
8 Feature Vectors The pixel inforation is called a feature of the odel Features will consist of ore than just a scalar value (i.e., pixels, at the very least, are vectors of RGBA values) Vector of features y (e.g., one vector of features y i for each i V) We think of the joint probability distribution as a conditional distribution p(x y, θ) This akes MLE even harder Saples are pairs (x 1, y 1 ),, (x M, y M ) The feature vectors can be different for each saple: need to copute Z(θ, y ) in the log-likelihood! 8
9 Log-Linear Models MLE sees daunting for MRFs and CRFs Need a nice way to paraeterize the odel and to deal with features We often assue that the odels are log-linear in the paraeters Many of the odels that we have seen so far can easily be expressed as log-linear odels of the paraeters Feature vectors should also be incorporated in a log-linear way 9
10 Log-Linear Models The potential on the clique C should be a log-linear function of the paraeters where ψ C x C y, θ = exp θ, f C x C, y θ, f C x C, y = θ k f C x C, y k k Here, f is a feature ap that takes the input variables and returns a vector the sae size as θ 10
11 Log-Linear MRFs Over coplete representation: one paraeter for each clique C and choice of x C p x θ = 1 Z C exp(θ C (x C )) f C x C is a 0-1 vector that is indexed by C and x C whose only non-zero coponent corresponds to θ C (x C ) One paraeter per clique p x θ = 1 Z C exp( θ, f C (x C ) ) f C x C is a vector that is indexed ONLY by C whose only nonzero coponent corresponds to θ C 11
12 MLE for Log-Linear Models p x y, θ = 1 Z θ, y C exp θ, f C x C, y log l θ = C θ, f C x C, y log Z(θ, y ) = θ, C f C x C, y log Z(θ, y ) 12
13 MLE for Log-Linear Models p x y, θ = 1 Z θ, y C exp θ, f C x C, y log l θ = C θ, f C x C, y log Z(θ, y ) = θ, C f C x C, y log Z(θ, y ) Linear in θ Depends non-linearly on θ 13
14 Concavity of MLE We will show that log Z(θ, y) is a convex function of θ Fix a distribution q(x y) q x y D(q p = q x y log p x y, θ x = q x y log q(x y) q x y log p x y, θ x x = H(q) q x y log p x y, θ x = H(q) + log Z(θ, y) q x y θ, f C x C, y x C = H(q) + log Z(θ, y) C x C q C x C y θ, f C x C, y 14
15 Concavity of MLE log Z(θ, y) = ax q H(q) + C x C q C x C y θ, f C x C, y If a function g(x, y) is convex in x for each y, then ax g(x, y) is convex in x y Linear in θ As a result, log Z(θ, y) is a convex function of θ for a fixed value of y 15
16 MLE for Log-Linear Models p x y, θ = 1 Z θ, y C exp θ, f C x C, y log l θ = C θ, f C x C, y log Z(θ, y ) = θ, C f C x C, y log Z(θ, y ) Linear in θ Convex in θ 16
17 MLE for Log-Linear Models p x y, θ = 1 Z θ, y C exp θ, f C x C, y log l θ = C θ, f C x C, y log Z(θ, y ) = θ, C f C x C, y log Z(θ, y ) Concave in θ Could optiize it using gradient ascent! (need to copute θ log Z(θ, y)) 17
18 MLE via Gradient Ascent What is the gradient of the log-likelihood with respect to θ? θ log Z(θ, y ) =? (worked out on board) 18
19 MLE via Gradient Ascent What is the gradient of the log-likelihood with respect to θ? θ log Z(θ, y ) = C p C x C y, θ f C x C, y x C This is the expected value of the feature aps under the joint distribution 19
20 MLE via Gradient Ascent What is the gradient of the log-likelihood with respect to θ? θ log l(θ) = C f C x C, y x C p C x C y, θ f C x C, y To copute/approxiate this quantity, we only need to copute/approxiate the arginal distributions p C (x C y, θ) This requires perforing arginal inference on a different odel at each step of gradient ascent! 20
21 Moent Matching Let f x, y = σ C f C x C, y Setting the gradient with respect to θ equal to zero and solving gives f(x, y ) = p x y, θ f x, y x This condition is called oent atching and when the odel is an MRF instead of a CRF this reduces to 1 M f(x ) = p x θ f x x 21
22 Moent Matching As an exaple, consider a log-linear MRF p x = 1 Z C exp(θ C (x C )) That is, f C x C is a vector that is indexed by C and x C whose only non-zero coponent corresponds to θ C (x C ) The oent atching condition becoes 1 1 xc =x C = p C x C θ, for all C, x C M 22
23 Regularization in MLE Recall that we can also incorporate prior inforation about the paraeters into the MLE proble This involved solving an augented MLE p x θ p(θ) What types of priors should we choose for the paraeters? 23
24 Regularization in MLE Recall that we can also incorporate prior inforation about the paraeters into the MLE proble This involved solving an augented MLE p x θ p(θ) What types of priors should we choose for the paraeters? Gaussian prior: p θ exp( 1 2 (θ μ)t Σ 1 (θ μ) T ) Unifor over [0,1] 24
25 Regularization in MLE Recall that we can also incorporate prior inforation about the paraeters into the MLE proble This involved solving an augented MLE p x θ exp( 1 2 θt Dθ T ) What types of priors should we choose for the paraeters? Gaussian prior: p θ exp( 1 2 (θ μ)t Σ 1 (θ μ) T ) Unifor over [0,1] Gaussian prior with a diagonal covariance atrix all of whose entries are equal to λ 25
26 Regularization in MLE Using the previous Gaussian prior yields the following logoptiization proble log p x θ exp( 1 2 θt Dθ T ) = log p(x θ) λ 2 k θ k 2 = log p(x θ) λ 2 θ
27 Regularization in MLE Using the previous Gaussian prior yields the following logoptiization proble log p x θ exp( 1 2 θt Dθ T ) = log p(x θ) λ 2 k θ k 2 = log p(x θ) λ 2 θ 2 2 Known as l 2 regularization 27
28 Regularization l 1 l 2 28
29 Duality and MLE log Z(θ, y) = ax q H(q) + C x C q C x C y θ, f C x C, y log l θ = θ, f C x C, y C log Z(θ, y ) Plugging the first into the second gives: log l θ = θ, f C x C, y C ax q H(q ) + C q C x C y x C θ, f C x C, y 29
30 Duality and MLE ax θ log l θ = ax θ in q 1,,q M θ, C f C x C, y q C x C y f C x C, y x C H(q ) This is called a iniax or saddle-point proble Recall that we ended up with siilar looking optiization probles when we constructed the Lagrange dual function When can we switch the order of the ax and in? The function is linear in theta, so there is an advantage to swapping the order 30
31 Saddle Point Source: Wikipedia 31
32 Sion s Miniax Theore Let X be a copact convex subset of R n and Y be a convex subset of R Let f be a real-valued function on X Y such that then f(x, ) a continuous concave function over Y for each x X f(, y) a continuous convex function over X for each y Y sup y in x f(x, y) = in x sup y f x, y 32
33 Duality and MLE ax θ in q 1,,q M θ, C f C x C, y q C x C y f C x C, y x C H(q ) is equal to in ax q 1,,qM θ θ, C f C x C, y q C x C y f C x C, y x C H(q ) Solve for θ? 33
34 Maxiu Entropy ax q 1,,q M H(q ) such that the oent atching condition is satisfied f(x, y ) = q x y f x, y and q 1,, q are discrete probability distributions Instead of axiizing the log-likelihood, we could axiize the entropy over all approxiating distributions that satisfy the oent atching condition x 34
35 MLE in Practice We can copute the partition function in linear tie over trees using belief propagation We can use this to learn the paraeters of tree-structured odels What if the graph isn t a tree? Use variable eliination to copute the partition function (exact but slow) Use iportance sapling to approxiate the partition function (can also be quite slow; aybe only use a few saples?) Use loopy belief propagation to approxiate the partition function (can be bad if loopy BP doesn t converge quickly) 35
36 MLE in Practice Practical wisdo: If you are trying to perfor soe prediction task (i.e., MAP inference to do prediction), then it is better to learn the wrong odel Learning and prediction should use the sae approxiations What people actually do: Use a few iterations of loopy BP or sapling to approxiate the arginals Approxiate arginals give approxiate gradients (recall that the gradient only depended on the arginals) Perfor approxiate gradient descent and hope it works 36
37 MLE in Practice Other options Replace the true entropy with the Bethe entropy and solve the approxiate dual proble Use fancier optiization techniques to solve the proble faster e.g., the ethod of conditional gradients 37
Using EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationTraining an RBM: Contrastive Divergence. Sargur N. Srihari
Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and
More informationProbabilistic Machine Learning
Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informationSupport Vector Machines MIT Course Notes Cynthia Rudin
Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationSupport Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab
Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationPAC-Bayes Analysis Of Maximum Entropy Learning
PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationDetection and Estimation Theory
ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationBayes Decision Rule and Naïve Bayes Classifier
Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.
More informationBoosting with log-loss
Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationFoundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)
More informationLecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon
Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes
More informationThis model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.
CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when
More informationLearning MN Parameters with Approximation. Sargur Srihari
Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationCompression and Predictive Distributions for Large Alphabet i.i.d and Markov models
2014 IEEE International Syposiu on Inforation Theory Copression and Predictive Distributions for Large Alphabet i.i.d and Markov odels Xiao Yang Departent of Statistics Yale University New Haven, CT, 06511
More informationSupplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse
More informationIntroduction to Machine Learning. Recitation 11
Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationOn Hyper-Parameter Estimation in Empirical Bayes: A Revisit of the MacKay Algorithm
On Hyper-Paraeter Estiation in Epirical Bayes: A Revisit of the MacKay Algorith Chune Li 1, Yongyi Mao, Richong Zhang 1 and Jinpeng Huai 1 1 School of Coputer Science and Engineering, Beihang University,
More informationSupport Vector Machines. Goals for the lecture
Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3
More informationExperimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis
City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna
More informationCh 12: Variations on Backpropagation
Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith
More informationNonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy
Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationConsistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material
Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationGeometrical intuition behind the dual problem
Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical
More informationA Theoretical Analysis of a Warm Start Technique
A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful
More informationBounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks
Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationChaotic Coupled Map Lattices
Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationLower Bounds for Quantized Matrix Completion
Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationLearnability of Gaussians with flexible variances
Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007
More informationIntroduction to Discrete Optimization
Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and
More informationFixed-to-Variable Length Distribution Matching
Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de
More informationRANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION
RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationAn l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions
Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation
More informationMachine Learning: Fisher s Linear Discriminant. Lecture 05
Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function
More informationWill Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning
Will Monroe August 9, 07 with aterials by Mehran Sahai and Chris Piech iage: Arito Paraeter learning Announceent: Proble Set #6 Goes out tonight. Due the last day of class, Wednesday, August 6 (before
More informationA Note on the Applied Use of MDL Approximations
A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationGEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS
ACTA UIVERSITATIS LODZIESIS FOLIA OECOOMICA 3(3142015 http://dx.doi.org/10.18778/0208-6018.314.03 Olesii Doronin *, Rostislav Maiboroda ** GEE ESTIMATORS I MIXTURE MODEL WITH VARYIG COCETRATIOS Abstract.
More informationIdentical Maximum Likelihood State Estimation Based on Incremental Finite Mixture Model in PHD Filter
Identical Maxiu Lielihood State Estiation Based on Increental Finite Mixture Model in PHD Filter Gang Wu Eail: xjtuwugang@gail.co Jing Liu Eail: elelj20080730@ail.xjtu.edu.cn Chongzhao Han Eail: czhan@ail.xjtu.edu.cn
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationTEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES
TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,
More informationResearch in Area of Longevity of Sylphon Scraies
IOP Conference Series: Earth and Environental Science PAPER OPEN ACCESS Research in Area of Longevity of Sylphon Scraies To cite this article: Natalia Y Golovina and Svetlana Y Krivosheeva 2018 IOP Conf.
More informationIn this chapter, we consider several graph-theoretic and probabilistic models
THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions
More informationSupplementary Information for Design of Bending Multi-Layer Electroactive Polymer Actuators
Suppleentary Inforation for Design of Bending Multi-Layer Electroactive Polyer Actuators Bavani Balakrisnan, Alek Nacev, and Elisabeth Sela University of Maryland, College Park, Maryland 074 1 Analytical
More informationOn the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation
journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation
More informationTail Estimation of the Spectral Density under Fixed-Domain Asymptotics
Tail Estiation of the Spectral Density under Fixed-Doain Asyptotics Wei-Ying Wu, Chae Young Li and Yiin Xiao Wei-Ying Wu, Departent of Statistics & Probability Michigan State University, East Lansing,
More informationOptimal Jamming Over Additive Noise: Vector Source-Channel Case
Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical
IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul
More informationLecture 9 November 23, 2015
CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)
More informationAn Improved Particle Filter with Applications in Ballistic Target Tracking
Sensors & ransducers Vol. 72 Issue 6 June 204 pp. 96-20 Sensors & ransducers 204 by IFSA Publishing S. L. http://www.sensorsportal.co An Iproved Particle Filter with Applications in Ballistic arget racing
More informationLogistic Regression. by Prof. Seungchul Lee isystems Design Lab UNIST. Table of Contents
Logistic Regression by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Linear Classification: Logistic Regression I... Using all Distances II..2. Probabilistic
More information3.3 Variational Characterization of Singular Values
3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and
More informationMarkov Networks.
Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function
More informationConstructing Locally Best Invariant Tests of the Linear Regression Model Using the Density Function of a Maximal Invariant
Aerican Journal of Matheatics and Statistics 03, 3(): 45-5 DOI: 0.593/j.ajs.03030.07 Constructing Locally Best Invariant Tests of the Linear Regression Model Using the Density Function of a Maxial Invariant
More informationBayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)
Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu
More informationPrincipal Components Analysis
Principal Coponents Analysis Cheng Li, Bingyu Wang Noveber 3, 204 What s PCA Principal coponent analysis (PCA) is a statistical procedure that uses an orthogonal transforation to convert a set of observations
More informationHamming Compressed Sensing
Haing Copressed Sensing Tianyi Zhou, and Dacheng Tao, Meber, IEEE Abstract arxiv:.73v2 [cs.it] Oct 2 Copressed sensing CS and -bit CS cannot directly recover quantized signals and require tie consuing
More informationarxiv: v2 [cs.lg] 30 Mar 2017
Batch Renoralization: Towards Reducing Minibatch Dependence in Batch-Noralized Models Sergey Ioffe Google Inc., sioffe@google.co arxiv:1702.03275v2 [cs.lg] 30 Mar 2017 Abstract Batch Noralization is quite
More informationIN modern society that various systems have become more
Developent of Reliability Function in -Coponent Standby Redundant Syste with Priority Based on Maxiu Entropy Principle Ryosuke Hirata, Ikuo Arizono, Ryosuke Toohiro, Satoshi Oigawa, and Yasuhiko Takeoto
More informationDistributed Subgradient Methods for Multi-agent Optimization
1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions
More informationSHORT TIME FOURIER TRANSFORM PROBABILITY DISTRIBUTION FOR TIME-FREQUENCY SEGMENTATION
SHORT TIME FOURIER TRANSFORM PROBABILITY DISTRIBUTION FOR TIME-FREQUENCY SEGMENTATION Fabien Millioz, Julien Huillery, Nadine Martin To cite this version: Fabien Millioz, Julien Huillery, Nadine Martin.
More informationLeast Squares Fitting of Data
Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a
More informationPULSE-TRAIN BASED TIME-DELAY ESTIMATION IMPROVES RESILIENCY TO NOISE
PULSE-TRAIN BASED TIME-DELAY ESTIMATION IMPROVES RESILIENCY TO NOISE 1 Nicola Neretti, 1 Nathan Intrator and 1,2 Leon N Cooper 1 Institute for Brain and Neural Systes, Brown University, Providence RI 02912.
More informationCSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13
CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationDERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS
DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationSupervised Baysian SAR image Classification Using The Full Polarimetric Data
Supervised Baysian SAR iage Classification Using The Full Polarietric Data (1) () Ziad BELHADJ (1) SUPCOM, Route de Raoued 3.5 083 El Ghazala - TUNSA () ENT, BP. 37, 100 Tunis Belvedere, TUNSA Abstract
More information