Bayesian Models for Regularization in Optimization
|
|
- Bryan Butler
- 5 years ago
- Views:
Transcription
1 Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University, August 17, 2011
2 Outline The Optimization Problem Applications PLQ Functions Log-Concave PLQ Densities Interior Point Methods for PLQ Optimization Example: Robust Kalman Smoothing PLQ Objectives with PLQ Constraints
3 The Optimization Problem min x X ρ(f (x))
4 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y
5 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex
6 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Convex composite optimization
7 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling)
8 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling) 2
9 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling) 2 1
10 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling) 2 1 ρ H ( ) (Huber)
11 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling) 2 1 ρ H ( ) (Huber) ρ V ( ) (Vapnik)
12 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling) 2 1 ρ H ( ) (Huber) ρ V ( ) (Vapnik) Or ρ is a combination of these as well as the convex indicators for the level sets for such functions (ρ(y) τ).
13 Graphs of ρ y y x x V (x) = 1 2 x2 V (x) = x y y K K x ɛ ɛ x V (x) = Kx 1 2 K2 ; x < K V (x) = 1 2 x2 ; K x K V (x) = Kx 1 2 K2 ; K < x V (x) = x ɛ; x < ɛ V (x) = 0; ɛ x ɛ V (x) = x ɛ; ɛ x
14 Applications Robust Kalman Filtering (RFPK-UW-NIH, APL-UW-NOAA) tracking: drug concentrations, underwater vehicles
15 Applications Robust Kalman Filtering Global Health: Burden of Disease Models (IHME-UW-GF)
16 Applications Robust Kalman Filtering Global Health: Burden of Disease Models Robust Bundle Adjustment Algorithms (NASA-Ames)
17 Applications Robust Kalman Filtering Global Health: Burden of Disease Models Robust Bundle Adjustment Algorithms Sparsity Optimization
18 Applications Robust Kalman Filtering Global Health: Burden of Disease Models Robust Bundle Adjustment Algorithms Sparsity Optimization Machine Learning (Reproducing Kernel Hilbert Spaces) Control sensor distribution networks
19 Applications Robust Kalman Filtering Global Health: Burden of Disease Models Robust Bundle Adjustment Algorithms Sparsity Optimization Machine Learning (Reproducing Kernel Hilbert Spaces) Geophysical Inverse Problems (SLIM-UBC-NSERC)
20 Piecewise Linear Quadratic (PLQ) Penalties (Rockafellar-Wets 86) θ U,M (w) := sup { u, w 12 } u, Mu u U M S k + U R k polyhedral convex
21 Piecewise Linear Quadratic (PLQ) Penalties (Rockafellar-Wets 86) θ U,M (w) := sup { u, w 12 } u, Mu u U M S k + U R k polyhedral convex Examples: 1 2 w 2 2 = sup u R n [ u, w 12 u, u ] w 1 = sup u, w 1 u i 1
22 Huber ρ H as a PLQ function y K K x ρ H (w) = V (x) = Kx 1 2 K2 ; x< K V (x) = 1 2 x2 ; K x K V (x) =Kx 1 2 K2 ; K < x sup { w, u 12 } u, u. u [ K,K]
23 Vapnik ρ V as a PLQ function Modest extension: ρ U,M,b,B (y) := θ U,M (b + By) = sup u U { u, b + By 12 u, Mu } B R s k injective b R s
24 Vapnik ρ V as a PLQ function Modest extension: ρ U,M,b,B (y) := θ U,M (b + By) = sup u U { u, b + By 12 u, Mu } B R s k injective b R s y ɛ ɛ x V (x) = x ɛ; x< ɛ V (x) = 0; ɛ x ɛ ρ V (x) =x ɛ; ɛ x V (y) = sup b + By, u u U [ ] U = [0, 1] k [0, 1] k I B = I b = ( ɛ1 ɛ1 )
25 Optimization Model Class ρ(f (x))
26 Optimization Model Class ρ(f (x)) ρ is the optimization model
27 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model
28 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model How is the model ρ chosen to reflect our knowledge about the problem data F and the nature of the solution x?
29 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model How is the model ρ chosen to reflect our knowledge about the problem data F and the nature of the solution x? Consider linear regression as a prototypical example.
30 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model How is the model ρ chosen to reflect our knowledge about the problem data F and the nature of the solution x? Consider linear regression as a prototypical example. ρ(f (x)) = 1 2 Ax y 2 2
31 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model How is the model ρ chosen to reflect our knowledge about the problem data F and the nature of the solution x? Consider linear regression as a prototypical example. ρ(f (x)) = 1 2 Ax y ˆρ(x) Bayesian prior
32 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model How is the model ρ chosen to reflect our knowledge about the problem data F and the nature of the solution x? Consider linear regression as a prototypical example. ρ(f (x)) = 1 2 Ax y ˆρ(x) Bayesian prior Maximum Likelihood Estimation: ρ(f (x)) is a negative log-likelihood of the joint density.
33 Log-Concave PLQ Densities Define probability densities p(y) exp ( ρ U,M,B,b (y)) on aff (dom (ρ)) = B T (U Null(M)) U is the horizon cone (or recession cone) of the convex set U (set of directions in which U is unbounded).
34 Log-Concave PLQ Densities Define probability densities p(y) exp ( ρ U,M,B,b (y)) on aff (dom (ρ)) = B T (U Null(M)) U is the horizon cone (or recession cone) of the convex set U (set of directions in which U is unbounded). When are these true densities?
35 PLQ Densities THEOREM: (PLQ Integrability) Suppose ρ(y) is coercive, and let n aff denote the dimension of aff (dom (ρ)). Then the function f (y) = exp ( ρ(y)) is integrable on aff (dom (ρ)) with the n aff -dimensional Lebesgue measure.
36 PLQ Densities THEOREM: (PLQ Integrability) Suppose ρ(y) is coercive, and let n aff denote the dimension of aff (dom (ρ)). Then the function f (y) = exp ( ρ(y)) is integrable on aff (dom (ρ)) with the n aff -dimensional Lebesgue measure. THEOREM: (Coercivity of ρ) ρ is coercive if and only if [B T cone (U)] = {0}, or equivalently if B T cone (U) = R n.
37 PLQ Densities THEOREM: (PLQ Integrability) Suppose ρ(y) is coercive, and let n aff denote the dimension of aff (dom (ρ)). Then the function f (y) = exp ( ρ(y)) is integrable on aff (dom (ρ)) with the n aff -dimensional Lebesgue measure. THEOREM: (Coercivity of ρ) ρ is coercive if and only if [B T cone (U)] = {0}, or equivalently if B T cone (U) = R n , 1, ρ H, ρ V all generate true probability densities.
38 PLQ Densities DEFINITION: Let ρ be any coercive piecewise linear quadratic function on R n of the form ρ U,M,B,b (y) = θ U,M (b + By). Define p(y) to be the density { c1 1 p(y) = exp ( c 2ρ(y)) y dom (ρ) 0 else, where c 2 is a positive constant and ( ) c 1 = exp ( c 2 ρ(y)) dy. y dom (ρ) The integral above is with respect to the Lebesgue measure with dimension dim (aff (dom (ρ))).
39 Constructing PLQ Densities y = (y 1,..., y n ) T a vector of independent PLQ random variables with mean 0 and variance 1.
40 Constructing PLQ Densities y = (y 1,..., y n ) T a vector of independent PLQ random variables with mean 0 and variance 1. Each y i has parameters b i, B i, U i, M i. Set U = U 1 U 2 U n M = diag (M 1, M 2,..., M n ) B = diag (B 1, B 2,..., B n ) b = vec[b 1, b 2,..., b n ].
41 Constructing PLQ Densities y = (y 1,..., y n ) T a vector of independent PLQ random variables with mean 0 and variance 1. Each y i has parameters b i, B i, U i, M i. Set U = U 1 U 2 U n M = diag (M 1, M 2,..., M n ) B = diag (B 1, B 2,..., B n ) b = vec[b 1, b 2,..., b n ]. The random vector z = A 1/2 (y + µ) has mean µ and variance A.
42 Constructing PLQ Densities y = (y 1,..., y n ) T a vector of independent PLQ random variables with mean 0 and variance 1. Each y i has parameters b i, B i, U i, M i. Set U = U 1 U 2 U n M = diag (M 1, M 2,..., M n ) B = diag (B 1, B 2,..., B n ) b = vec[b 1, b 2,..., b n ]. The random vector z = A 1/2 (y + µ) has mean µ and variance A. If C is the normalizing constant for y, then C det(a) 1/2 is the normalizing constant for z.
43 PLQ Normalizing Constants Suppose ρ(y) is a scalar PLQ penalty symmetric about 0. Then is a PLQ density when p(y) = 1 c 1 exp ( ρ(c 2 y)) c 2 = c 1 = 1 c 2 u 2 exp ( ρ(u)) du exp ( ρ(u)) du exp ( ρ(u)) du.
44 PLQ Normalizing Constants Suppose ρ(y) is a scalar PLQ penalty symmetric about 0. Then is a PLQ density when p(y) = 1 c 1 exp ( ρ(c 2 y)) c 2 = c 1 = 1 c 2 u 2 exp ( ρ(u)) du exp ( ρ(u)) du exp ( ρ(u)) du. We need to compute u 2 exp ( ρ(u)) du and exp ( ρ(u)) du.
45 Huber Normalizing Constants exp ( ρ H (y)) dy = 2exp ( K 2 /2 ) 1 K + 2π (2Φ(K) 1) y 2 exp ( ρ H (y)) dy = 4exp ( K 2 /2 ) 1 + K 2 K 3 + 2π (2Φ(K) 1), where Φ is the standard normal CDF.
46 Vapnik Normalizing Constants exp ( ρ V (y)) dy = 2(ɛ + 1) y 2 exp ( ρ V (y)) dy = 2 3 ɛ3 + 2(2 2ɛ + ɛ 2 )
47 PLQ Optimization with min y { ρ U,M,b,B (y) := sup u, b + By 1 } u U 2 ut Mu U = {u : C T u c}.
48 PLQ Optimization with min y { ρ U,M,b,B (y) := sup u, b + By 1 } u U 2 ut Mu U = {u : C T u c}. ρ U,M,b,B (y) = θ U1,M 1 (Ay r) + θ U2,M 2 (y)
49 PLQ Optimization with min y { ρ U,M,b,B (y) := sup u, b + By 1 } u U 2 ut Mu U = {u : C T u c}. KKT Conditions: 0 = B T u 0 = b + By Mu Cq 0 = C T u + s c 0 = q i s i i, q, s 0.
50 Interior Point Methods (IPM) 0 = B T u 0 = b + By Mu Cq 0 = C T u + s c τ = q i s i i, q, s 0.
51 Interior Point Methods (IPM) 0 = B T u 0 = b + By Mu Cq 0 = C T u + s c τ = q i s i i, q, s 0. THEOREM: This KKT system can be solved using an IPM if and only if Null (M) Null (C) = {0}. In particular, this is implied by the condition dom (θ U,M ) = R m.
52 Example: Robust Kalman Smoothing x k = g k (x k 1 ) + w k z k = h k (x k ) + v k, where g k : R n R n a known process function h k : R n R m(k) a known measurement function w k unknown Gaussian process noise N(0, Q k ) v k unknown l 1 -Laplace measurement noise L 1 (0, R k )
53 Robust Kalman Smoothing An unknown linear deterministic process. ( ) ( ) 1 X2 (t) X (0) =, Ẋ (t) = 0 X 1 (t) ( ) cos(t) i.e., X (t) =. sin(t) ;
54 Robust Kalman Smoothing An unknown linear deterministic process. ( ) ( ) 1 X2 (t) X (0) =, Ẋ (t) = 0 X 1 (t) ( ) cos(t) i.e., X (t) =. sin(t) ; k = 0,..., N, let t k = k t and x k = X (t k ) ( t/n, N = 100) g k (x k 1 ) = [ ] 1 0 x t 1 k 1 +w k, w k N(0, Q k ), Q k = z k = X 2 (t k ) + v k R k = We assume the observations z k have outliers. [ t t 2 ] /2 t 2 /2 t 3 /3
55 Robust Kalman Smoothing v k (1 p)n(0, 0.25) + pn(0, φ) Function units Time Simulation: measurements (+), outliers (o) (absolute residuals more than three standard deviations), true function (thick line), l 1 -Laplace estimate (thin line), Gaussian estimate (dashed line), Gaussian outlier removal estimate (dotted line)
56 Robust Kalman Smoothing Mean Squared Error MSE = 1 N N [x 1,k ˆx 1,k ] 2 + [x 2,k ˆx 2,k ] 2 k=1 Table: Median MSE and 95% confidence intervals for the different estimation methods p φ GKF RKF IGS ILS 0.34 (.24,.47).42 (.15, 1.1).04(.02,.1).04(.01,.1) (.26,.60).48 (.15, 1.1).06(.02,.12).04(.02,.10) (.32, 1.1).56 (.18, 1.5).09(.04,.29).05(.02,.12) (.42, 2.3).58 (.19, 1.7).17(.05,.55).05(.02,.13) (1.7, 17.9).55 (.18, 2.0) 1.3(.30, 5.0).05(.02,.14) 1000 realizations of each: v k (1 p)n(0, 0.25) + pn(0, φ).
57 PLQ Objectives with PLQ Constraints P(ψ, φ, τ) minimize x X ψ(x) subject to φ(x) τ.
58 PLQ Objectives with PLQ Constraints P(ψ, φ, τ) minimize x X ψ(x) subject to φ(x) τ. v 1 (τ) = min {ψ(x) x X, φ(x) τ } v 2 (β) = min {φ(x) x X, ψ(x) β }.
59 PLQ Objectives with PLQ Constraints P(ψ, φ, τ) minimize x X ψ(x) subject to φ(x) τ. v 1 (τ) = min {ψ(x) x X, φ(x) τ } v 2 (β) = min {φ(x) x X, ψ(x) β }. v 1 and v 2 are both convex functions.
60 An Inverse Function Theorem for Optimal-Value Functions Suppose that there is an interval (τ l, τ u ) R {± } with (τ l, τ u ) R such that τ (τ l, τ u ) argmin P(ψ, φ, τ) {x X φ(x) = τ }. Then, for every τ (τ l, τ u ),
61 An Inverse Function Theorem for Optimal-Value Functions Suppose that there is an interval (τ l, τ u ) R {± } with (τ l, τ u ) R such that τ (τ l, τ u ) argmin P(ψ, φ, τ) {x X φ(x) = τ }. Then, for every τ (τ l, τ u ), (a) v 2 (v 1 (τ)) = τ, and
62 An Inverse Function Theorem for Optimal-Value Functions Suppose that there is an interval (τ l, τ u ) R {± } with (τ l, τ u ) R such that τ (τ l, τ u ) argmin P(ψ, φ, τ) {x X φ(x) = τ }. Then, for every τ (τ l, τ u ), (a) v 2 (v 1 (τ)) = τ, and (b) argmin P(ψ, φ, τ) argmin P(φ, ψ, v 1 (τ)) {x X φ(x) = v 1 (τ)}.
63 An Inverse Function Theorem for Optimal-Value Functions Suppose that there is an interval (τ l, τ u ) R {± } with (τ l, τ u ) R such that τ (τ l, τ u ) argmin P(ψ, φ, τ) {x X φ(x) = τ }. Then, for every τ (τ l, τ u ), (a) v 2 (v 1 (τ)) = τ, and (b) argmin P(ψ, φ, τ) argmin P(φ, ψ, v 1 (τ)) {x X φ(x) = v 1 (τ)}. Moreover, where v 1 (v 2 (β)) = β for all β (β l, β u ), β l = inf {v 1 (τ) τ (τ l, τ u )} and β u = sup {v 1 (τ) τ (τ l, τ u )}, whenever (β l, β u ) {v 1 (τ) τ (τ l, τ u )}.
64 Optimization by Zero Finding The inverse function theorem gives conditions for v 1 (v 2 (β)) = β. Therefore, if we find a solution to τ of v 1 (τ) = β, then τ = v 2 (β)) and argmin P(ψ, φ, τ) argmin P(φ, ψ, v 1 ( τ)) = argmin P(φ, ψ, β).
65 Optimization by Zero Finding The inverse function theorem gives conditions for v 1 (v 2 (β)) = β. Therefore, if we find a solution to τ of v 1 (τ) = β, then τ = v 2 (β)) and argmin P(ψ, φ, τ) argmin P(φ, ψ, v 1 ( τ)) = argmin P(φ, ψ, β). The equation v 1 (τ) = β can be solved via an inexact secant method with the iterates converging at a super-linear rate.
Optimizaton and Kalman-Bucy Smoothing. The Chinese University of Hong Kong, March 4, 2016
Optimizaton and Kalman-Bucy Smoothing Aleksandr Y. Aravkin University of Washington sasha.aravkin@gmail.com James V. Burke University of Washington jvburke@uw.edu Bradley Bell University of Washington
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Vietnam
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current
More informationOptimal Value Function Methods in Numerical Optimization Level Set Methods
Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander
More informationOptimal Value Function Methods in Numerical Optimization Level Set Methods
Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationMatrix Support Functional and its Applications
Matrix Support Functional and its Applications James V Burke Mathematics, University of Washington Joint work with Yuan Gao (UW) and Tim Hoheisel (McGill), CORS, Banff 2016 June 1, 2016 Connections What
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More information9. Robust regression
9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationSparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory
Journal of Machine Learning Research 14 (2013) 2689-2728 Submitted 2/12; Revised 1/13; Published 9/13 Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation,
More information6. Approximation and fitting
6. Approximation and fitting Convex Optimization Boyd & Vandenberghe norm approximation least-norm problems regularized approximation robust approximation 6 Norm approximation minimize Ax b (A R m n with
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationEstimation of Constrained Mixture Densities
Estimation of Constrained Mixture Densities Yeongcheong Baek Chris Jordan-Squire Jim Burke Optimization, Convex Analysis, & Nonsmooth Analysis Seminar UBC Okanaga February 9, 2011 Mixture Densities Nonparametric
More informationContre-examples for Bayesian MAP restoration. Mila Nikolova
Contre-examples for Bayesian MAP restoration Mila Nikolova CMLA ENS de Cachan, 61 av. du Président Wilson, 94235 Cachan cedex (nikolova@cmla.ens-cachan.fr) Obergurgl, September 26 Outline 1. MAP estimators
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationConvex Optimization: Applications
Convex Optimization: Applications Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 1-75/36-75 Based on material from Boyd, Vandenberghe Norm Approximation minimize Ax b (A R m
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationSparse Regularization via Convex Analysis
Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which
More informationInfeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization
Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationPDEs in Image Processing, Tutorials
PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationDirect Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so
More informationExpanding the reach of optimal methods
Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM
More informationRobust Monte Carlo Methods for Sequential Planning and Decision Making
Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory
More informationReliability and Risk Analysis. Time Series, Types of Trend Functions and Estimates of Trends
Reliability and Risk Analysis Stochastic process The sequence of random variables {Y t, t = 0, ±1, ±2 } is called the stochastic process The mean function of a stochastic process {Y t} is the function
More informationOptimal normalization of DNA-microarray data
Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La
More informationRobust estimation, efficiency, and Lasso debiasing
Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationRegularization and Inverse Problems
Regularization and Inverse Problems Caroline Sieger Host Institution: Universität Bremen Home Institution: Clemson University August 5, 2009 Caroline Sieger (Bremen and Clemson) Regularization and Inverse
More informationLearnability of Gaussians with flexible variances
Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationErgodicity in data assimilation methods
Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 15, 2016 ETH Zurich David Kelly (CIMS) Data assimilation
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More information4. Convex optimization problems
Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationStructured Statistical Learning with Support Vector Machine for Feature Selection and Prediction
Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee Predictive
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationParameter Estimation in a Moving Horizon Perspective
Parameter Estimation in a Moving Horizon Perspective State and Parameter Estimation in Dynamical Systems Reglerteknik, ISY, Linköpings Universitet State and Parameter Estimation in Dynamical Systems OUTLINE
More informationIntroduction to Bayesian Inference
University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ
More informationNonlinear Model Predictive Control Tools (NMPC Tools)
Nonlinear Model Predictive Control Tools (NMPC Tools) Rishi Amrit, James B. Rawlings April 5, 2008 1 Formulation We consider a control system composed of three parts([2]). Estimator Target calculator Regulator
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationConvex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014
Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationSource estimation for frequency-domain FWI with robust penalties
Source estimation for frequency-domain FWI with robust penalties Aleksandr Y. Aravkin, Tristan van Leeuwen, Henri Calandra, and Felix J. Herrmann Dept. of Earth and Ocean sciences University of British
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLesson 2: Analysis of time series
Lesson 2: Analysis of time series Time series Main aims of time series analysis choosing right model statistical testing forecast driving and optimalisation Problems in analysis of time series time problems
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationLecture 8 Plus properties, merit functions and gap functions. September 28, 2008
Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationMultilevel Preconditioning and Adaptive Sparse Solution of Inverse Problems
Multilevel and Adaptive Sparse of Inverse Problems Fachbereich Mathematik und Informatik Philipps Universität Marburg Workshop Sparsity and Computation, Bonn, 7. 11.6.2010 (joint work with M. Fornasier
More informationApproximate Bayesian Computation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationIterative Gaussian Process Regression for Potential Energy Surfaces. Matthew Shelley University of York ISNET-5 Workshop 6th November 2017
Iterative Gaussian Process Regression for Potential Energy Surfaces Matthew Shelley University of York ISNET-5 Workshop 6th November 2017 Outline Motivation: Calculation of potential energy surfaces (PES)
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationLecture: Convex Optimization Problems
1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationX t = a t + r t, (7.1)
Chapter 7 State Space Models 71 Introduction State Space models, developed over the past 10 20 years, are alternative models for time series They include both the ARIMA models of Chapters 3 6 and the Classical
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationUsing Duality as a Method to Solve SVM Regression. Problems. Langley DeWitt
Using Duality as a Method to Solve SVM Regression 1. Introduction. Reproducing Kernel Hilbert Space 3. SVM Definition 4. Measuring the Quality of an SVM 5. Representor Theorem Problems Langley DeWitt 6.
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationCalibrating Environmental Engineering Models and Uncertainty Analysis
Models and Cornell University Oct 14, 2008 Project Team Christine Shoemaker, co-pi, Professor of Civil and works in applied optimization, co-pi Nikolai Blizniouk, PhD student in Operations Research now
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationKernels to detect abrupt changes in time series
1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February
More informationA New Penalty-SQP Method
Background and Motivation Illustration of Numerical Results Final Remarks Frank E. Curtis Informs Annual Meeting, October 2008 Background and Motivation Illustration of Numerical Results Final Remarks
More informationA Bibliography of Publications of Brad Bell
A Bibliography of Publications of Brad Bell Brad Bell University of Washington Applied Physics Laboratory Seattle, Washington, 98105 USA Tel: +1 206 543 6855 FAX: +1 206 543 6785 E-mail: brad@apl.washington.edu
More informationSwitching Regime Estimation
Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms
More informationConvex optimization problems. Optimization problem in standard form
Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More information