Bayesian Paradigm. Maximum A Posteriori Estimation


 Aldous Smith
 2 years ago
 Views:
Transcription
1 Bayesian Paradigm Maximum A Posteriori Estimation
2 Simple acquisition model noise + degradation
3 Constraint minimization or
4 Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization) Maximum Aposteriori (MAP)
5 Discrete representation Image is a random field Each pixel value is a realization of some random variable
6 Random variables Discrete: takes a countable number of distinct values e.g. num. of children in a family Continuous: x probability density function ( distribution ): Moments: expectation ( average ), variance
7 Random variables Expectation Variance
8 Image as a random field
9 Normal distribution
10 Multivariate Normal Distribution
11 Multivariate Normal Distribution
12 Gamma distribution
13 Bayesian Paradigm a posteriori distribution unknown likelihood given by our problem a priori distribution our prior knowledge Maximum a posteriori (MAP): max p(u z) Maximum likelihood (MLE): max p(z u)
14 Likelihood
15 Image Prior Intensity histogram Gradient histogram
16 Image Prior Intensities Gradients (log)
17 Image Prior Gradient histogram
18 Image Prior Tikhonov regularization TV regularization
19 Image Prior Nonconvex regularization
20 MAP
21 KullbackLeibler divergence Measures similarity between two distributions
22 Convex function f(x) Jensen s inequality:
23 KullbackLeibler divergence is strictly convex
24 Variational Bayes Approximate posterior with a simpler form Minimize with respect to EL:
25 Variational Bayes
26 Variational Bayes Minimize with respect to all other factors
27 Variational Bayes Minimize with respect to
28 Factorized approximation
29 GM approximated by a single Gaussian depends on initialization
30 Denoising with unknown noise level Noise model: Bayes: and is unknown
31 MAP denoising Minimize the log posterior with respect to : with respect to :
32 A Posteriori surface
33 MAP denoising
34 Marginalized denoising Expectation of marginalized posterior
35 Marginalized denoising
36 VB denoising Find the factorized approximation of the posterior: image:
37 VB denoising noise precision:
38 VB denoising
39 Comparission
40 Sparse & Redundant Representation Iterated Shrinkage Algorithms Wavelet thresholding
41 We will Talk About The minimization of the functional Substitution: and assume D is invertible : This is our original denoising functional, if MoorePenrose pseudoinverse
42 A Closer Look at the Functional D rectangular, overcomplete dictionary > underdetermined system a sparse representation ρ(a) measure of sparsity
43 Measure of Sparsity norms ( ) norm, counts nonzero elements many other sparsity measures smooth l 1
44 l 2 unit ball
45 l 1 unit ball
46 l 0.9 unit ball
47 l 0.5 unit ball
48 l 0.3 unit ball
49 Equivalent formulation Method of Lagrange multipliers
50 l 2 norm Solution via pseudoinverse is simple but not sparse:
51 l 1 norm
52 Why Seeking Sparse Representation? Assume x is 1sparse signal in the dictionary is the perfect solution Overcomplete dictionaries => sparsity
53 Classical Solvers Steepest Descent: Hessian: Convergence depends on the condition number κ of the Hessian: Hessian is illconditioned => poor convergence
54 The Unitary Case (DD T = I) Use Define We have m independent 1D optimization problems l 2 is unitarily invariant
55 The 1D Task We need to solve the following 1D problem: LUT a opt b Such a LookUpTable (LUT) can be built for ANY sparsity measure function ρ(a), including nonconvex ones and nonsmooth ones (e.g., l 0 norm).
56 Shrinkage for ρ(a)= a p
57 Soft & Hard Thresholding Soft thresholding Hard thresholding
58 The Unitary Case: Summary Minimizing is done by: x Multiply by D H b LUT S, b a Multiply by D xˆ DONE! The obtained solution is the GLOBAL minimizer of f(a), even if f(a) is nonconvex.
59 The minimization of Leads to two very Contradicting Observations: 1. The problem is quite hard classic optimization find it hard. 2. The problem is trivial for the case of unitary D. How Can We Enjoy This Simplicity in the General Case?
60 IteratedShrinkage Algorithms Common to all is a set of operations in every iteration that includes: (i) Multiplication by D (ii) Multiplication by D T, and (iii) A Scalar shrinkage on the solution S ρ,λ (a).
61 The ProximalPoint Method Aim: minimize f(a) Suppose it is found to be too hard. Define a surrogatefunction g(a,a 0 )=f(a)+dist(aa 0 ). Then, the following algorithm necessarily converges to a local minima of f(a) [Rokafellar, `76]: a 0 a 1 Minimize Minimize a 2 a k a k+1 Minimize g(a,a 0 ) g(a,a 1 ) g(a,a k ) Comments: (i) The minimization of g(a,a 0 ) should be easier. (ii) Looks like it will slowdown convergence. Really?
62 The Proposed SurrogateFunction Original function: Distance: The beauty is that vanishes! Not a function of a. Minimization of g(a,a 0 ) is done in a closed form by shrinkage, done on the vector b k, and this generates the solution a k+1 of the next iteration. We have m independent 1D optimization problems
63 IteratedShrinkage Algorithm While the Unitary case solution is given by x Multiply by D T b LUT S, b aˆ Multiply by D xˆ the general case, by SSF requires: x Multiply by D T /c + b k S, c b a LUT k 1 Multiply by D xˆ
64 Linear Programming Basis Pursuit: LP: Interiorpoint Algorithms for LP problems
65 Compressed Sensing In compressed sensing we compress the signal x of size m by exploiting its origin. This is done by p<<m random projections. The core idea: (P size: p m) holds all the information about the original signal x, even though p<<m. Reconstruction? Done by MAP and solving y M Da x Px Noise? xˆ
Sparse & Redundant Representations by IteratedShrinkage Algorithms
Sparse & Redundant Representations by Michael Elad * The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel 630 August 007 San Diego Convention Center San Diego,
More informationITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION
ITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION Michael Elad The Computer Science Department The Technion Israel Institute o technology Haia 3000, Israel * SIAM Conerence on Imaging Science
More informationInverse problem and optimization
Inverse problem and optimization Laurent Condat, Nelly Pustelnik CNRS, Gipsalab CNRS, Laboratoire de Physique de l ENS de Lyon Decembre, 15th 2016 Inverse problem and optimization 2/36 Plan 1. Examples
More informationCSE 559A: Computer Vision
CSE 559A: Computer Vision Fall 208: TR: :30pm @ Lopata 0 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao ia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/ Sep
More informationCSE 559A: Computer Vision Tomorrow Zhihao's Office Hours back in Jolley 309: 10:30amNoon
CSE 559A: Computer Vision ADMINISTRIVIA Tomorrow Zhihao's Office Hours back in Jolley 309: 0:30amNoon Fall 08: TR: :30pm @ Lopata 0 This Friday: Regular Office Hours Net Friday: Recitation for PSET
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationSparse linear models
Sparse linear models OptimizationBased Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos FernandezGranda 2/22/2016 Introduction Linear transforms Frequency representation Shorttime
More information2.3. Clustering or vector quantization 57
Multivariate Statistics nonnegative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationSparse & Redundant Signal Representation, and its Role in Image Processing
Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique
More informationMCMC Sampling for Bayesian Inference using L1type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1type Priors (what I do whenever the illposedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationMMSE Denoising of 2D Signals Using Consistent Cycle Spinning Algorithm
Denoising of 2D Signals Using Consistent Cycle Spinning Algorithm Bodduluri Asha, B. Leela kumari Abstract: It is well known that in a real world signals do not exist without noise, which may be negligible
More informationNonconvex optimization. Issam Laradji
Nonconvex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 201314 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationcrosslanguage retrieval (by concatenate features of different language in X and find coshared U). TOEFL/GRE synonym in the same way.
10708: Probabilistic Graphical Models, Spring 2015 22 : Optimization and GMs (aka. LDA, Sparse Coding, Matrix Factorization, and All That ) Lecturer: Yaoliang Yu Scribes: YuXiang Wang, Su Zhou 1 Introduction
More informationStatistical approach for dictionary learning
Statistical approach for dictionary learning Tieyong ZENG Joint work with Alain Trouvé Page 1 Introduction Redundant dictionary Coding, denoising, compression. Existing algorithms to generate dictionary
More informationCurve Fitting Revisited, Bishop1.2.5
Curve Fitting Revisited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Componentbased representations
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 201415 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationIndependent Component Analysis. Contents
Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationSparse regression. OptimizationBased Data Analysis. Carlos FernandezGranda
Sparse regression OptimizationBased Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos FernandezGranda 3/28/2016 Regression Leastsquares regression Example: Global warming Logistic
More informationSinglechannel source separation using nonnegative matrix factorization
Singlechannel source separation using nonnegative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics
More informationConvex Optimization and l 1 minimization
Convex Optimization and l 1 minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationLeast Sparsity of pnorm based Optimization Problems with p > 1
Least Sparsity of pnorm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p optimization arising from
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationLargeScale L1Related Minimization in Compressive Sensing and Beyond
LargeScale L1Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationRecent Advances in Bayesian Inference for Inverse Problems
Recent Advances in Bayesian Inference for Inverse Problems Felix Lucka University College London, UK f.lucka@ucl.ac.uk Applied Inverse Problems Helsinki, May 25, 2015 Bayesian Inference for Inverse Problems
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationSPARSE SIGNAL RESTORATION. 1. Introduction
SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGEMITIIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGEMITIIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationRigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems
1 Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems Alyson K. Fletcher, Mojtaba SahraeeArdakan, Philip Schniter, and Sundeep Rangan Abstract arxiv:1706.06054v1 cs.it
More informationLINEARIZED BREGMAN ITERATIONS FOR FRAMEBASED IMAGE DEBLURRING
LINEARIZED BREGMAN ITERATIONS FOR FRAMEBASED IMAGE DEBLURRING JIANFENG CAI, STANLEY OSHER, AND ZUOWEI SHEN Abstract. Real images usually have sparse approximations under some tight frame systems derived
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon LacosteJulien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More information(4D) Variational Models Preserving Sharp Edges. Martin Burger. Institute for Computational and Applied Mathematics
(4D) Variational Models Preserving Sharp Edges Institute for Computational and Applied Mathematics Intensity (cnt) Mathematical Imaging Workgroup @WWU 2 0.65 0.60 DNA Akrosom Flagellum Glass 0.55 0.50
More informationWavelet Footprints: Theory, Algorithms, and Applications
1306 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 Wavelet Footprints: Theory, Algorithms, and Applications Pier Luigi Dragotti, Member, IEEE, and Martin Vetterli, Fellow, IEEE Abstract
More informationOptimization Algorithms for Compressed Sensing
Optimization Algorithms for Compressed Sensing Stephen Wright University of WisconsinMadison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UWMadison) Optimization and Compressed
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationApproximate Inference Part 1 of 2
Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory
More informationLinear convergence of iterative softthresholding
arxiv:0709.1598v3 [math.fa] 11 Dec 007 Linear convergence of iterative softthresholding Kristian Bredies and Dirk A. Lorenz ABSTRACT. In this article, the convergence of the often used iterative softthresholding
More informationAdaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise
Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,
More informationCSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1
CSCI5654 (Linear Programming, Fall 2013) Lectures 1012 Lectures 10,11 Slide# 1 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems.
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Nonparametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationAM 205: lecture 18. Last time: optimization methods Today: conditions for optimality
AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3
More informationA Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases
2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationLinear Regression. CSL603  Fall 2017 Narayanan C Krishnan
Linear Regression CSL603  Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions BiasVariance analysis Regularization
More informationLinear Regression. CSL465/603  Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603  Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions BiasVariance analysis
More informationSparsifying Transform Learning for Compressed Sensing MRI
Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwthaachen.de/ leibe@vision.rwthaachen.de This Lecture: Advanced Machine Learning Regression
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationData Sparse Matrix Computation  Lecture 20
Data Sparse Matrix Computation  Lecture 20 Yao Cheng, Dongping Qi, Tianyi Shi November 9, 207 Contents Introduction 2 Theorems on Sparsity 2. Example: A = [Φ Ψ]......................... 2.2 General Matrix
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finitedimensional formulation
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your onepage crib sheet. No calculators or electronic items.
More informationChapter 3: MaximumLikelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationA New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction
A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationNonNegative Matrix Factorization with QuasiNewton Optimization
NonNegative Matrix Factorization with QuasiNewton Optimization Rafal ZDUNEK, Andrzej CICHOCKI Laboratory for Advanced Brain Signal Processing BSI, RIKEN, Wakoshi, JAPAN Abstract. Nonnegative matrix
More informationMorphological Diversity and Source Separation
Morphological Diversity and Source Separation J. Bobin, Y. Moudden, J.L. Starck, and M. Elad Abstract This paper describes a new method for blind source separation, adapted to the case of sources having
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) DecisionTrees, Genetic Algorithms Part 2: Norman Hendrich SupportVector Machines
More informationSPARSE signal representations have gained popularity in recent
6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML
More informationOptimization methods
Optimization methods OptimizationBased Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos FernandezGranda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationRecent developments on sparse representation
Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last
More informationOvercomplete Dictionaries for. Sparse Representation of Signals. Michal Aharon
Overcomplete Dictionaries for Sparse Representation of Signals Michal Aharon ii Overcomplete Dictionaries for Sparse Representation of Signals Reasearch Thesis Submitted in Partial Fulfillment of The Requirements
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at means and hierarchical clustering as mechanisms for unsupervised learning means
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at means and hierarchical clustering as mechanisms for unsupervised learning means
More informationRecovery of Sparse Signals from Noisy Measurements Using an l p Regularized LeastSquares Algorithm
Recovery of Sparse Signals from Noisy Measurements Using an l p Regularized LeastSquares Algorithm J. K. Pant, W.S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 smooth, unconstrained problems allow
More informationConvex functions, subdifferentials, and the L.a.s.s.o.
Convex functions, subdifferentials, and the L.a.s.s.o. MATH 697 AM:ST September 26, 2017 Warmup Let us consider the absolute value function in R p u(x) = (x, x) = x 2 1 +... + x2 p Evidently, u is differentiable
More informationConditional Gradient (FrankWolfe) Method
Conditional Gradient (FrankWolfe) Method Lecturer: Aarti Singh Coinstructor: Pradeep Ravikumar Convex Optimization 10725/36725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationOne Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017
One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017 1 One Picture and a Thousand Words Using Matrix Approximations Dianne P. O Leary
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression  Maximum likelihood estimation (MLE)  Maximum a posteriori (MAP) estimation Biasvariance tradeoff Linear
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor  p. 1/15 Today s class BiasVariance tradeoff. Penalized regression. Crossvalidation.  p. 2/15 Biasvariance
More informationMachine Learning Srihari. Information Theory. Sargur N. Srihari
Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy
More informationLowComplexity Image Denoising via Analytical Form of Generalized Gaussian Random Vectors in AWGN
LowComplexity Image Denoising via Analytical Form of Generalized Gaussian Random Vectors in AWGN PICHID KITTISUWAN Rajamangala University of Technology (Ratanakosin), Department of Telecommunication Engineering,
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGEMITIIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationMultiple Change Point Detection by Sparse Parameter Estimation
Multiple Change Point Detection by Sparse Parameter Estimation Department of Econometrics Fac. of Economics and Management University of Defence Brno, Czech Republic Dept. of Appl. Math. and Comp. Sci.
More informationGeneralized Orthogonal Matching Pursuit A Review and Some
Generalized Orthogonal Matching Pursuit A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationSuperResolution. Shai Avidan TelAviv University
SuperResolution Shai Avidan TelAviv University Slide Credits (partial list) Ric Szelisi Steve Seitz Alyosha Efros Yacov HelOr Yossi Rubner Mii Elad Marc Levoy Bill Freeman Fredo Durand Sylvain Paris
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Minicourse 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation  maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More information