Bayesian Models for Regularization in Optimization

Size: px
Start display at page:

Download "Bayesian Models for Regularization in Optimization"

Transcription

1 Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University, August 17, 2011

2 Outline The Optimization Problem Applications PLQ Functions Log-Concave PLQ Densities Interior Point Methods for PLQ Optimization Example: Robust Kalman Smoothing PLQ Objectives with PLQ Constraints

3 The Optimization Problem min x X ρ(f (x))

4 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y

5 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex

6 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Convex composite optimization

7 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling)

8 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling) 2

9 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling) 2 1

10 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling) 2 1 ρ H ( ) (Huber)

11 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling) 2 1 ρ H ( ) (Huber) ρ V ( ) (Vapnik)

12 The Optimization Problem min x X ρ(f (x)) Example: ρ and error function and F (x) = Ax y ρ typically convex Examples of ρ (up to rescaling) 2 1 ρ H ( ) (Huber) ρ V ( ) (Vapnik) Or ρ is a combination of these as well as the convex indicators for the level sets for such functions (ρ(y) τ).

13 Graphs of ρ y y x x V (x) = 1 2 x2 V (x) = x y y K K x ɛ ɛ x V (x) = Kx 1 2 K2 ; x < K V (x) = 1 2 x2 ; K x K V (x) = Kx 1 2 K2 ; K < x V (x) = x ɛ; x < ɛ V (x) = 0; ɛ x ɛ V (x) = x ɛ; ɛ x

14 Applications Robust Kalman Filtering (RFPK-UW-NIH, APL-UW-NOAA) tracking: drug concentrations, underwater vehicles

15 Applications Robust Kalman Filtering Global Health: Burden of Disease Models (IHME-UW-GF)

16 Applications Robust Kalman Filtering Global Health: Burden of Disease Models Robust Bundle Adjustment Algorithms (NASA-Ames)

17 Applications Robust Kalman Filtering Global Health: Burden of Disease Models Robust Bundle Adjustment Algorithms Sparsity Optimization

18 Applications Robust Kalman Filtering Global Health: Burden of Disease Models Robust Bundle Adjustment Algorithms Sparsity Optimization Machine Learning (Reproducing Kernel Hilbert Spaces) Control sensor distribution networks

19 Applications Robust Kalman Filtering Global Health: Burden of Disease Models Robust Bundle Adjustment Algorithms Sparsity Optimization Machine Learning (Reproducing Kernel Hilbert Spaces) Geophysical Inverse Problems (SLIM-UBC-NSERC)

20 Piecewise Linear Quadratic (PLQ) Penalties (Rockafellar-Wets 86) θ U,M (w) := sup { u, w 12 } u, Mu u U M S k + U R k polyhedral convex

21 Piecewise Linear Quadratic (PLQ) Penalties (Rockafellar-Wets 86) θ U,M (w) := sup { u, w 12 } u, Mu u U M S k + U R k polyhedral convex Examples: 1 2 w 2 2 = sup u R n [ u, w 12 u, u ] w 1 = sup u, w 1 u i 1

22 Huber ρ H as a PLQ function y K K x ρ H (w) = V (x) = Kx 1 2 K2 ; x< K V (x) = 1 2 x2 ; K x K V (x) =Kx 1 2 K2 ; K < x sup { w, u 12 } u, u. u [ K,K]

23 Vapnik ρ V as a PLQ function Modest extension: ρ U,M,b,B (y) := θ U,M (b + By) = sup u U { u, b + By 12 u, Mu } B R s k injective b R s

24 Vapnik ρ V as a PLQ function Modest extension: ρ U,M,b,B (y) := θ U,M (b + By) = sup u U { u, b + By 12 u, Mu } B R s k injective b R s y ɛ ɛ x V (x) = x ɛ; x< ɛ V (x) = 0; ɛ x ɛ ρ V (x) =x ɛ; ɛ x V (y) = sup b + By, u u U [ ] U = [0, 1] k [0, 1] k I B = I b = ( ɛ1 ɛ1 )

25 Optimization Model Class ρ(f (x))

26 Optimization Model Class ρ(f (x)) ρ is the optimization model

27 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model

28 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model How is the model ρ chosen to reflect our knowledge about the problem data F and the nature of the solution x?

29 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model How is the model ρ chosen to reflect our knowledge about the problem data F and the nature of the solution x? Consider linear regression as a prototypical example.

30 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model How is the model ρ chosen to reflect our knowledge about the problem data F and the nature of the solution x? Consider linear regression as a prototypical example. ρ(f (x)) = 1 2 Ax y 2 2

31 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model How is the model ρ chosen to reflect our knowledge about the problem data F and the nature of the solution x? Consider linear regression as a prototypical example. ρ(f (x)) = 1 2 Ax y ˆρ(x) Bayesian prior

32 Optimization Model Class ρ(f (x)) ρ is the optimization model F is the data for the optimization model How is the model ρ chosen to reflect our knowledge about the problem data F and the nature of the solution x? Consider linear regression as a prototypical example. ρ(f (x)) = 1 2 Ax y ˆρ(x) Bayesian prior Maximum Likelihood Estimation: ρ(f (x)) is a negative log-likelihood of the joint density.

33 Log-Concave PLQ Densities Define probability densities p(y) exp ( ρ U,M,B,b (y)) on aff (dom (ρ)) = B T (U Null(M)) U is the horizon cone (or recession cone) of the convex set U (set of directions in which U is unbounded).

34 Log-Concave PLQ Densities Define probability densities p(y) exp ( ρ U,M,B,b (y)) on aff (dom (ρ)) = B T (U Null(M)) U is the horizon cone (or recession cone) of the convex set U (set of directions in which U is unbounded). When are these true densities?

35 PLQ Densities THEOREM: (PLQ Integrability) Suppose ρ(y) is coercive, and let n aff denote the dimension of aff (dom (ρ)). Then the function f (y) = exp ( ρ(y)) is integrable on aff (dom (ρ)) with the n aff -dimensional Lebesgue measure.

36 PLQ Densities THEOREM: (PLQ Integrability) Suppose ρ(y) is coercive, and let n aff denote the dimension of aff (dom (ρ)). Then the function f (y) = exp ( ρ(y)) is integrable on aff (dom (ρ)) with the n aff -dimensional Lebesgue measure. THEOREM: (Coercivity of ρ) ρ is coercive if and only if [B T cone (U)] = {0}, or equivalently if B T cone (U) = R n.

37 PLQ Densities THEOREM: (PLQ Integrability) Suppose ρ(y) is coercive, and let n aff denote the dimension of aff (dom (ρ)). Then the function f (y) = exp ( ρ(y)) is integrable on aff (dom (ρ)) with the n aff -dimensional Lebesgue measure. THEOREM: (Coercivity of ρ) ρ is coercive if and only if [B T cone (U)] = {0}, or equivalently if B T cone (U) = R n , 1, ρ H, ρ V all generate true probability densities.

38 PLQ Densities DEFINITION: Let ρ be any coercive piecewise linear quadratic function on R n of the form ρ U,M,B,b (y) = θ U,M (b + By). Define p(y) to be the density { c1 1 p(y) = exp ( c 2ρ(y)) y dom (ρ) 0 else, where c 2 is a positive constant and ( ) c 1 = exp ( c 2 ρ(y)) dy. y dom (ρ) The integral above is with respect to the Lebesgue measure with dimension dim (aff (dom (ρ))).

39 Constructing PLQ Densities y = (y 1,..., y n ) T a vector of independent PLQ random variables with mean 0 and variance 1.

40 Constructing PLQ Densities y = (y 1,..., y n ) T a vector of independent PLQ random variables with mean 0 and variance 1. Each y i has parameters b i, B i, U i, M i. Set U = U 1 U 2 U n M = diag (M 1, M 2,..., M n ) B = diag (B 1, B 2,..., B n ) b = vec[b 1, b 2,..., b n ].

41 Constructing PLQ Densities y = (y 1,..., y n ) T a vector of independent PLQ random variables with mean 0 and variance 1. Each y i has parameters b i, B i, U i, M i. Set U = U 1 U 2 U n M = diag (M 1, M 2,..., M n ) B = diag (B 1, B 2,..., B n ) b = vec[b 1, b 2,..., b n ]. The random vector z = A 1/2 (y + µ) has mean µ and variance A.

42 Constructing PLQ Densities y = (y 1,..., y n ) T a vector of independent PLQ random variables with mean 0 and variance 1. Each y i has parameters b i, B i, U i, M i. Set U = U 1 U 2 U n M = diag (M 1, M 2,..., M n ) B = diag (B 1, B 2,..., B n ) b = vec[b 1, b 2,..., b n ]. The random vector z = A 1/2 (y + µ) has mean µ and variance A. If C is the normalizing constant for y, then C det(a) 1/2 is the normalizing constant for z.

43 PLQ Normalizing Constants Suppose ρ(y) is a scalar PLQ penalty symmetric about 0. Then is a PLQ density when p(y) = 1 c 1 exp ( ρ(c 2 y)) c 2 = c 1 = 1 c 2 u 2 exp ( ρ(u)) du exp ( ρ(u)) du exp ( ρ(u)) du.

44 PLQ Normalizing Constants Suppose ρ(y) is a scalar PLQ penalty symmetric about 0. Then is a PLQ density when p(y) = 1 c 1 exp ( ρ(c 2 y)) c 2 = c 1 = 1 c 2 u 2 exp ( ρ(u)) du exp ( ρ(u)) du exp ( ρ(u)) du. We need to compute u 2 exp ( ρ(u)) du and exp ( ρ(u)) du.

45 Huber Normalizing Constants exp ( ρ H (y)) dy = 2exp ( K 2 /2 ) 1 K + 2π (2Φ(K) 1) y 2 exp ( ρ H (y)) dy = 4exp ( K 2 /2 ) 1 + K 2 K 3 + 2π (2Φ(K) 1), where Φ is the standard normal CDF.

46 Vapnik Normalizing Constants exp ( ρ V (y)) dy = 2(ɛ + 1) y 2 exp ( ρ V (y)) dy = 2 3 ɛ3 + 2(2 2ɛ + ɛ 2 )

47 PLQ Optimization with min y { ρ U,M,b,B (y) := sup u, b + By 1 } u U 2 ut Mu U = {u : C T u c}.

48 PLQ Optimization with min y { ρ U,M,b,B (y) := sup u, b + By 1 } u U 2 ut Mu U = {u : C T u c}. ρ U,M,b,B (y) = θ U1,M 1 (Ay r) + θ U2,M 2 (y)

49 PLQ Optimization with min y { ρ U,M,b,B (y) := sup u, b + By 1 } u U 2 ut Mu U = {u : C T u c}. KKT Conditions: 0 = B T u 0 = b + By Mu Cq 0 = C T u + s c 0 = q i s i i, q, s 0.

50 Interior Point Methods (IPM) 0 = B T u 0 = b + By Mu Cq 0 = C T u + s c τ = q i s i i, q, s 0.

51 Interior Point Methods (IPM) 0 = B T u 0 = b + By Mu Cq 0 = C T u + s c τ = q i s i i, q, s 0. THEOREM: This KKT system can be solved using an IPM if and only if Null (M) Null (C) = {0}. In particular, this is implied by the condition dom (θ U,M ) = R m.

52 Example: Robust Kalman Smoothing x k = g k (x k 1 ) + w k z k = h k (x k ) + v k, where g k : R n R n a known process function h k : R n R m(k) a known measurement function w k unknown Gaussian process noise N(0, Q k ) v k unknown l 1 -Laplace measurement noise L 1 (0, R k )

53 Robust Kalman Smoothing An unknown linear deterministic process. ( ) ( ) 1 X2 (t) X (0) =, Ẋ (t) = 0 X 1 (t) ( ) cos(t) i.e., X (t) =. sin(t) ;

54 Robust Kalman Smoothing An unknown linear deterministic process. ( ) ( ) 1 X2 (t) X (0) =, Ẋ (t) = 0 X 1 (t) ( ) cos(t) i.e., X (t) =. sin(t) ; k = 0,..., N, let t k = k t and x k = X (t k ) ( t/n, N = 100) g k (x k 1 ) = [ ] 1 0 x t 1 k 1 +w k, w k N(0, Q k ), Q k = z k = X 2 (t k ) + v k R k = We assume the observations z k have outliers. [ t t 2 ] /2 t 2 /2 t 3 /3

55 Robust Kalman Smoothing v k (1 p)n(0, 0.25) + pn(0, φ) Function units Time Simulation: measurements (+), outliers (o) (absolute residuals more than three standard deviations), true function (thick line), l 1 -Laplace estimate (thin line), Gaussian estimate (dashed line), Gaussian outlier removal estimate (dotted line)

56 Robust Kalman Smoothing Mean Squared Error MSE = 1 N N [x 1,k ˆx 1,k ] 2 + [x 2,k ˆx 2,k ] 2 k=1 Table: Median MSE and 95% confidence intervals for the different estimation methods p φ GKF RKF IGS ILS 0.34 (.24,.47).42 (.15, 1.1).04(.02,.1).04(.01,.1) (.26,.60).48 (.15, 1.1).06(.02,.12).04(.02,.10) (.32, 1.1).56 (.18, 1.5).09(.04,.29).05(.02,.12) (.42, 2.3).58 (.19, 1.7).17(.05,.55).05(.02,.13) (1.7, 17.9).55 (.18, 2.0) 1.3(.30, 5.0).05(.02,.14) 1000 realizations of each: v k (1 p)n(0, 0.25) + pn(0, φ).

57 PLQ Objectives with PLQ Constraints P(ψ, φ, τ) minimize x X ψ(x) subject to φ(x) τ.

58 PLQ Objectives with PLQ Constraints P(ψ, φ, τ) minimize x X ψ(x) subject to φ(x) τ. v 1 (τ) = min {ψ(x) x X, φ(x) τ } v 2 (β) = min {φ(x) x X, ψ(x) β }.

59 PLQ Objectives with PLQ Constraints P(ψ, φ, τ) minimize x X ψ(x) subject to φ(x) τ. v 1 (τ) = min {ψ(x) x X, φ(x) τ } v 2 (β) = min {φ(x) x X, ψ(x) β }. v 1 and v 2 are both convex functions.

60 An Inverse Function Theorem for Optimal-Value Functions Suppose that there is an interval (τ l, τ u ) R {± } with (τ l, τ u ) R such that τ (τ l, τ u ) argmin P(ψ, φ, τ) {x X φ(x) = τ }. Then, for every τ (τ l, τ u ),

61 An Inverse Function Theorem for Optimal-Value Functions Suppose that there is an interval (τ l, τ u ) R {± } with (τ l, τ u ) R such that τ (τ l, τ u ) argmin P(ψ, φ, τ) {x X φ(x) = τ }. Then, for every τ (τ l, τ u ), (a) v 2 (v 1 (τ)) = τ, and

62 An Inverse Function Theorem for Optimal-Value Functions Suppose that there is an interval (τ l, τ u ) R {± } with (τ l, τ u ) R such that τ (τ l, τ u ) argmin P(ψ, φ, τ) {x X φ(x) = τ }. Then, for every τ (τ l, τ u ), (a) v 2 (v 1 (τ)) = τ, and (b) argmin P(ψ, φ, τ) argmin P(φ, ψ, v 1 (τ)) {x X φ(x) = v 1 (τ)}.

63 An Inverse Function Theorem for Optimal-Value Functions Suppose that there is an interval (τ l, τ u ) R {± } with (τ l, τ u ) R such that τ (τ l, τ u ) argmin P(ψ, φ, τ) {x X φ(x) = τ }. Then, for every τ (τ l, τ u ), (a) v 2 (v 1 (τ)) = τ, and (b) argmin P(ψ, φ, τ) argmin P(φ, ψ, v 1 (τ)) {x X φ(x) = v 1 (τ)}. Moreover, where v 1 (v 2 (β)) = β for all β (β l, β u ), β l = inf {v 1 (τ) τ (τ l, τ u )} and β u = sup {v 1 (τ) τ (τ l, τ u )}, whenever (β l, β u ) {v 1 (τ) τ (τ l, τ u )}.

64 Optimization by Zero Finding The inverse function theorem gives conditions for v 1 (v 2 (β)) = β. Therefore, if we find a solution to τ of v 1 (τ) = β, then τ = v 2 (β)) and argmin P(ψ, φ, τ) argmin P(φ, ψ, v 1 ( τ)) = argmin P(φ, ψ, β).

65 Optimization by Zero Finding The inverse function theorem gives conditions for v 1 (v 2 (β)) = β. Therefore, if we find a solution to τ of v 1 (τ) = β, then τ = v 2 (β)) and argmin P(ψ, φ, τ) argmin P(φ, ψ, v 1 ( τ)) = argmin P(φ, ψ, β). The equation v 1 (τ) = β can be solved via an inexact secant method with the iterates converging at a super-linear rate.

Optimizaton and Kalman-Bucy Smoothing. The Chinese University of Hong Kong, March 4, 2016

Optimizaton and Kalman-Bucy Smoothing. The Chinese University of Hong Kong, March 4, 2016 Optimizaton and Kalman-Bucy Smoothing Aleksandr Y. Aravkin University of Washington sasha.aravkin@gmail.com James V. Burke University of Washington jvburke@uw.edu Bradley Bell University of Washington

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Vietnam

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current

More information

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Optimal Value Function Methods in Numerical Optimization Level Set Methods Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander

More information

Optimal Value Function Methods in Numerical Optimization Level Set Methods

Optimal Value Function Methods in Numerical Optimization Level Set Methods Optimal Value Function Methods in Numerical Optimization Level Set Methods James V Burke Mathematics, University of Washington, (jvburke@uw.edu) Joint work with Aravkin (UW), Drusvyatskiy (UW), Friedlander

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Matrix Support Functional and its Applications

Matrix Support Functional and its Applications Matrix Support Functional and its Applications James V Burke Mathematics, University of Washington Joint work with Yuan Gao (UW) and Tim Hoheisel (McGill), CORS, Banff 2016 June 1, 2016 Connections What

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory Journal of Machine Learning Research 14 (2013) 2689-2728 Submitted 2/12; Revised 1/13; Published 9/13 Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation,

More information

6. Approximation and fitting

6. Approximation and fitting 6. Approximation and fitting Convex Optimization Boyd & Vandenberghe norm approximation least-norm problems regularized approximation robust approximation 6 Norm approximation minimize Ax b (A R m n with

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Estimation of Constrained Mixture Densities

Estimation of Constrained Mixture Densities Estimation of Constrained Mixture Densities Yeongcheong Baek Chris Jordan-Squire Jim Burke Optimization, Convex Analysis, & Nonsmooth Analysis Seminar UBC Okanaga February 9, 2011 Mixture Densities Nonparametric

More information

Contre-examples for Bayesian MAP restoration. Mila Nikolova

Contre-examples for Bayesian MAP restoration. Mila Nikolova Contre-examples for Bayesian MAP restoration Mila Nikolova CMLA ENS de Cachan, 61 av. du Président Wilson, 94235 Cachan cedex (nikolova@cmla.ens-cachan.fr) Obergurgl, September 26 Outline 1. MAP estimators

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Convex Optimization: Applications

Convex Optimization: Applications Convex Optimization: Applications Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 1-75/36-75 Based on material from Boyd, Vandenberghe Norm Approximation minimize Ax b (A R m

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

PDEs in Image Processing, Tutorials

PDEs in Image Processing, Tutorials PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so

More information

Expanding the reach of optimal methods

Expanding the reach of optimal methods Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM

More information

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Robust Monte Carlo Methods for Sequential Planning and Decision Making Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory

More information

Reliability and Risk Analysis. Time Series, Types of Trend Functions and Estimates of Trends

Reliability and Risk Analysis. Time Series, Types of Trend Functions and Estimates of Trends Reliability and Risk Analysis Stochastic process The sequence of random variables {Y t, t = 0, ±1, ±2 } is called the stochastic process The mean function of a stochastic process {Y t} is the function

More information

Optimal normalization of DNA-microarray data

Optimal normalization of DNA-microarray data Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Regularization and Inverse Problems

Regularization and Inverse Problems Regularization and Inverse Problems Caroline Sieger Host Institution: Universität Bremen Home Institution: Clemson University August 5, 2009 Caroline Sieger (Bremen and Clemson) Regularization and Inverse

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Ergodicity in data assimilation methods

Ergodicity in data assimilation methods Ergodicity in data assimilation methods David Kelly Andy Majda Xin Tong Courant Institute New York University New York NY www.dtbkelly.com April 15, 2016 ETH Zurich David Kelly (CIMS) Data assimilation

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee Predictive

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Parameter Estimation in a Moving Horizon Perspective

Parameter Estimation in a Moving Horizon Perspective Parameter Estimation in a Moving Horizon Perspective State and Parameter Estimation in Dynamical Systems Reglerteknik, ISY, Linköpings Universitet State and Parameter Estimation in Dynamical Systems OUTLINE

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ

More information

Nonlinear Model Predictive Control Tools (NMPC Tools)

Nonlinear Model Predictive Control Tools (NMPC Tools) Nonlinear Model Predictive Control Tools (NMPC Tools) Rishi Amrit, James B. Rawlings April 5, 2008 1 Formulation We consider a control system composed of three parts([2]). Estimator Target calculator Regulator

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014

Convex optimization. Javier Peña Carnegie Mellon University. Universidad de los Andes Bogotá, Colombia September 2014 Convex optimization Javier Peña Carnegie Mellon University Universidad de los Andes Bogotá, Colombia September 2014 1 / 41 Convex optimization Problem of the form where Q R n convex set: min x f(x) x Q,

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Source estimation for frequency-domain FWI with robust penalties

Source estimation for frequency-domain FWI with robust penalties Source estimation for frequency-domain FWI with robust penalties Aleksandr Y. Aravkin, Tristan van Leeuwen, Henri Calandra, and Felix J. Herrmann Dept. of Earth and Ocean sciences University of British

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lesson 2: Analysis of time series

Lesson 2: Analysis of time series Lesson 2: Analysis of time series Time series Main aims of time series analysis choosing right model statistical testing forecast driving and optimalisation Problems in analysis of time series time problems

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008 Lecture 8 Plus properties, merit functions and gap functions September 28, 2008 Outline Plus-properties and F-uniqueness Equation reformulations of VI/CPs Merit functions Gap merit functions FP-I book:

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Multilevel Preconditioning and Adaptive Sparse Solution of Inverse Problems

Multilevel Preconditioning and Adaptive Sparse Solution of Inverse Problems Multilevel and Adaptive Sparse of Inverse Problems Fachbereich Mathematik und Informatik Philipps Universität Marburg Workshop Sparsity and Computation, Bonn, 7. 11.6.2010 (joint work with M. Fornasier

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

Iterative Gaussian Process Regression for Potential Energy Surfaces. Matthew Shelley University of York ISNET-5 Workshop 6th November 2017

Iterative Gaussian Process Regression for Potential Energy Surfaces. Matthew Shelley University of York ISNET-5 Workshop 6th November 2017 Iterative Gaussian Process Regression for Potential Energy Surfaces Matthew Shelley University of York ISNET-5 Workshop 6th November 2017 Outline Motivation: Calculation of potential energy surfaces (PES)

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Lecture: Convex Optimization Problems

Lecture: Convex Optimization Problems 1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

X t = a t + r t, (7.1)

X t = a t + r t, (7.1) Chapter 7 State Space Models 71 Introduction State Space models, developed over the past 10 20 years, are alternative models for time series They include both the ARIMA models of Chapters 3 6 and the Classical

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Relevance Vector Machines

Relevance Vector Machines LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

More information

Using Duality as a Method to Solve SVM Regression. Problems. Langley DeWitt

Using Duality as a Method to Solve SVM Regression. Problems. Langley DeWitt Using Duality as a Method to Solve SVM Regression 1. Introduction. Reproducing Kernel Hilbert Space 3. SVM Definition 4. Measuring the Quality of an SVM 5. Representor Theorem Problems Langley DeWitt 6.

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Calibrating Environmental Engineering Models and Uncertainty Analysis

Calibrating Environmental Engineering Models and Uncertainty Analysis Models and Cornell University Oct 14, 2008 Project Team Christine Shoemaker, co-pi, Professor of Civil and works in applied optimization, co-pi Nikolai Blizniouk, PhD student in Operations Research now

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Kernels to detect abrupt changes in time series

Kernels to detect abrupt changes in time series 1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

A New Penalty-SQP Method

A New Penalty-SQP Method Background and Motivation Illustration of Numerical Results Final Remarks Frank E. Curtis Informs Annual Meeting, October 2008 Background and Motivation Illustration of Numerical Results Final Remarks

More information

A Bibliography of Publications of Brad Bell

A Bibliography of Publications of Brad Bell A Bibliography of Publications of Brad Bell Brad Bell University of Washington Applied Physics Laboratory Seattle, Washington, 98105 USA Tel: +1 206 543 6855 FAX: +1 206 543 6785 E-mail: brad@apl.washington.edu

More information

Switching Regime Estimation

Switching Regime Estimation Switching Regime Estimation Series de Tiempo BIrkbeck March 2013 Martin Sola (FE) Markov Switching models 01/13 1 / 52 The economy (the time series) often behaves very different in periods such as booms

More information

Convex optimization problems. Optimization problem in standard form

Convex optimization problems. Optimization problem in standard form Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information