MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Size: px

Start display at page:

Download "MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression"

Robert Horn
6 years ago
Views:

1 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson

2 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture

3 33 MACHINE APPLIED MACHINE LEARNING LEARNING Locally Weghted Regresson Estmate s determned through local nfluence of each group of dataponts M j M y y / : weghts functon of 1 j1 d, = K d,, wth K d, e, d,. y X: query pont ŷ Generates a smooth functon y()

4 44 MACHINE APPLIED MACHINE LEARNING LEARNING Locally Weghted Regresson Estmate s determned through local nfluence of each group of dataponts M j M y y / : weghts functon of 1 j1 =, K d Model-free regresson! No longer eplct model of the form y T w Regresson computed at each query pont. Depends on tranng ponts.

5 55 MACHINE APPLIED MACHINE LEARNING LEARNING Locally Weghted Regresson Estmate s determned through local nfluence of each group of dataponts M j M y y / : weghts functon of 1 j1 =, K d Whch tranng ponts? Whch kernel?

6 6 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Data-drven Regresson Good predcton depends on the choce of dataponts. y Green: true functon Blue: estmated functon

7 7 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Data-drven Regresson Good predcton depends on the choce of dataponts. The more dataponts, the better the ft. Computatonal costs ncrease dramatcally wth number of dataponts y

8 8 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Data-drven Regresson Several methods n ML for performng non-lnear regresson. Dffer n the objectve functon, n the amount of parameters. Support Vector Regresson (SVR) pcks a subset of dataponts (support vectors) y

9 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Data-drven Regresson Several methods n ML for performng non-lnear regresson. Dffer n the objectve functon, n the amount of parameters. Support Vector Regresson (SVR) pcks a subset of dataponts (support vectors) y y=f() For llustratve purpose, we plot the negatve Gauss functon net to the SV, but they are dstrbuted on the negatve y as. 9

10 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Several methods n ML for performng non-lnear regresson. Dffer n the objectve functon, n the amount of parameters. Support Vector Regresson (SVR) pcks a subset of dataponts (support vectors) y M *, y f k b 1 The Lagrange multplers defne the mportance of each Gaussan functon. Data-drven Regresson Analytcal soluton found after solvng a conve optmzaton problem Converges to b when SV effect vanshes. y=f() b * * *

11 11 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Data-drven Regresson Several methods n ML for performng non-lnear regresson. Dffer n the objectve functon, n the amount of parameters. Support Vector Regresson (SVR) pcks a subset of dataponts (support vectors) Gaussan Mture Regresson (GMR) generates a new set of dataponts (centers of Gaussan functons) y

12 12 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Gaussan Mture Regresson

13 13 13 MACHINE APPLIED MACHINE LEARNING LEARNING Gaussan Mture Regresson (GMR) 1) Estmate the jont densty, p(,y), across pars of dataponts usng GMM. K,, ;,, wth, ;,, p y p y p y N 1, : mean and covarance matr of Gaussan y y 2D projecton of a Gauss functon Ellpse contour ~ 2 std devaton

14 14 14 MACHINE APPLIED MACHINE LEARNING LEARNING Gaussan Mture Regresson (GMR) 1) Estmate the jont densty, p(,y), across pars of dataponts usng GMM. K,, ;,, wth, ;,, p y p y p y N 1, : mean and covarance matr of Gaussan Parameters are learned through Epectaton-mamzaton. Iteratve procedure. Start wth random ntalzaton. y

15 15 15 MACHINE APPLIED MACHINE LEARNING LEARNING Gaussan Mture Regresson (GMR) 1) Estmate the jont densty, p(,y), across pars of dataponts usng GMM. K,, ;,, wth, ;,, p y p y p y N 1, : mean and covarance matr of Gaussan y Mng Coeffcents K 1 1 Relatve mportance of each Gaussan : 1 p M M j p 1 1 2

16 MACHINE APPLIED MACHINE LEARNING LEARNING Gaussan Mture Regresson (GMR) 1) Estmate the jont densty, p(,y), across pars of dataponts usng GMM. 2) Compute the regressve sgnal, by takng p(y ) K ;, p y p y 1 wth K j1 p ;, j p ;, j j y Gauss functon The varance changes dependng on the query pont 16 16

GMM. 2) Compute the regressve sgnal, by takng p(y ) K ;, p y p y 1 wth

17 17 17 MACHINE APPLIED MACHINE LEARNING LEARNING Gaussan Mture Regresson (GMR) 1) Estmate the jont densty, p(,y), across pars of dataponts usng GMM. 2) Compute the regressve sgnal, by takng p(y ) K ;, p y p y 1 wth Influence of each margnal s modulated by K j1 p ;, j p ;, j j y 2 1 Query pont

18 18 18 MACHINE APPLIED MACHINE LEARNING LEARNING Gaussan Mture Regresson (GMR) 3) The regressve sgnal s then obtaned by computng E{p(y )}: 2 1 K 1 y 1 y y y E p y 1 y E p y K 2 1 Lnear combnaton of K local regressve models The covarance matr of each Gauss functon can be decomposed nto blocks of matrces, wth and the covarance matrces on and y and y y y yy yy the crosscovarance matr.

19 MACHINE APPLIED MACHINE LEARNING LEARNING Computng the varance var{p(,y)} provdes nformaton on the uncertanty of the predcton computed from the condtonal dstrbuton. 2 1 y Gaussan Mture Regresson (GMR) K 2 K 2 2 var p y 1 1 E p y K wth 1 yy y y The varance of the model s a weghted combnaton of the varances of the models around the weghted mean. Careful: Ths s not the uncertanty of the model. Use the lkelhood to compute the uncertanty of the model! 19 19

20 20 20 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Gaussan Mture Regresson (GMR) Computng the varance var{p(,y)} provdes nformaton on the uncertanty of the predcton computed from the condtonal dstrbuton. E p y var p y Color shadng gves the lkelhood of the model (uncertanty).

21 21 21 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 var p y Observe the modulaton of the varance from small varance n frst Gauss functon to large varance n the second Gauss functon

22 22 22 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 GMR: Senstvty to Choce of K and Intalzaton

23 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 GMR: Senstvty to Choce of K and Intalzaton Ft wth 4 Gaussans Unform ntalzaton 23 23

24 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 GMR: Senstvty to Choce of K and Intalzaton Ft wth 4 Gaussans Random ntalzaton 24 24

25 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 GMR: Senstvty to Choce of K and Intalzaton Ft wth 10 Gaussans Random ntalzaton 25 25

26 26 26 MACHINE APPLIED MACHINE LEARNING LEARNING Gaussan Mture Regresson: Summary Parametrze the densty p(,y) and then estmate solely the parameters. The densty s constructed from a mture of K Gaussans: K,, ;,, wth, ;,, p y p y p y N 1, : mean and covarance matr of Gaussan Such generatve model provdes more nformaton than models that drectly compute p(y ). It allows to learn to predct a mult-dmensonal output y. It allows to query gven y,.e. to compute p( y).

27 27 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Comparson Across Methods Generalzaton predcton away from dataponts M 1 *, y k b SVR predcts y=b away from dataponts

28 28 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Comparson Across Methods Generalzaton predcton away from dataponts K y 1 GMR predcts the trend away from data wth K j1 p ;, j p ;, j j

29 29 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Comparson Across Methods Generalzaton predcton away from dataponts K y 1 GMR predcts the trend away from data But predcton depends on model choce and ntalzaton (that nfluence soluton found durng GMM tranng phase).

30 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Comparson Across Methods Generalzaton predcton away from dataponts The predcton away from the

30 30 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Comparson Across Methods Generalzaton predcton away from dataponts The predcton away from the dataponts s affected by all regressve models. It may become meanngless! use the lkelhood of the model to determne whether t s safe or not to use the predcton

31 Varance n SVR represents the epslon-tube, the uncertanty around the predcted value of y. It does not represent uncertanty of the model ether! Varance n p(y ) n GMR represents the modelled uncertanty of the value of y. It s not a measure of the uncertanty of the model.

32 32 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 SVR, GMR: Smlartes SVR and GMR are based on the same regressve model: SVR Soluton y f ( ) GMR Soluton y E p y Assume whte nose: ~ N 0, Ep y y E p y E p y

33 33 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 SVR, GMR: Smlartes SVR and GMR are based on the same regressve model: SVR and GMR compute a weghted combnaton of local predctors M 1 SVR Soluton *, y y k b GMR Soluton Both separate nput space nto regons modeled by Gaussan dstrbutons (true only when usng Gaussan/RBF kernels for SVR). 1 Model computed locally (locally weghted regresson)! K

34 34 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 SVR, GMR: Dfferences GMR allows to predct mult-dmensonal outputs, whle SVR can predct only a un-dmensonal output y. SVR Soluton y f ( ) GMR Soluton y E p y But starts by computng p y, y s undmensonal can be mult-dmensonal can compute p y y, can have arbtrary dmensons

35 35 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 SVR, GMR: Dfferences SVR and GMR and GPR are based on the same regressve model. But they do not optmze the same objectve functon fnd dfferent solutons. SVR: mnmzes reconstructon error through conve optmzaton ensured to fnd the optmal estmate; but not unque soluton usually fnds a nm of models <= nm of dataponts (support vectors) GMR: learns p(,y) through mamum lkelhood fnds local optmum compute a generatve model p(,y) from whch t derves p(y ) starts wth a low nm of models << nm of dataponts

36 36 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Hyperparameters of SVR, GMR SVR and GMR depend on hyperparameters that need to be determned beforehand. These are: SVR choce of error margn and penalty factor C. choce of kernel and assocated kernel parameters GMR: choce of the number of Gaussans choce of ntalzaton (affects convergence to local optmum) The hyperparamaters can be optmzed separately; e.g. the nm of Gaussans n GMR can be estmated usng BIC; the kernel parameters of SVR can be optmzed through grd search.

37 MACHINE APPLIED MACHINE LEARNING LEARNING 2012 Concluson No easy way to determne whch regresson technque fts best your problem Tranng Testng SVR Conve optmzaton (SMO solver) Parameters grow O(M*N) GMR EM, teratve technque, needs several runs Parameters grow O(K*N 2 ) SVR GMR Grows O(K) Grows O(number of SV) Few SV - Small fracton of orgnal data M: number of dataponts; N: Dmenson of data; K: Number of Gauss Functons n GMM model 37

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N