Advances in Extreme Learning Machines
|
|
- Lynn Cobb
- 5 years ago
- Views:
Transcription
1 Advances in Extreme Learning Machines Mark van Heeswijk April 17, 2015
2 Outline Context Extreme Learning Machines Part I: Ensemble Models of ELM Part II: Variable Selection and ELM Part III: Trade-offs in ELM Summary Advances in Extreme Learning Machines 2/23
3 Outline Context Extreme Learning Machines Part I: Ensemble Models of ELM Part II: Variable Selection and ELM Part III: Trade-offs in ELM Summary Advances in Extreme Learning Machines 3/23
4 Trends in Machine Learning increasing dimensionality of data sets increasing size of data sets 300 hours of video are uploaded to YouTube every minute 1.8 billion pictures uploaded every day to various sites (mid-2014) Advances in Extreme Learning Machines 4/23
5 Trends in Machine Learning [ Advances in Extreme Learning Machines 4/23
6 Trends in Machine Learning increasing dimensionality of data sets increasing size of data sets 300 hours of video are uploaded to YouTube every minute 1.8 billion pictures uploaded every day to various sites (mid-2014) machine learning methods needed, that can be used to analyze this data and extract useful knowledge and insights from this wealth of information Advances in Extreme Learning Machines 4/23
7 Contributions Combinations of Multiple Extreme Learning Machines (ELM) I: Adaptive Ensemble Models of ELMs for Time Series Prediction II: GPU-accelerated and parallelized ELM ensembles for large-scale regression Variable Selection and ELM III: Feature selection for nonlinear models with extreme learning machines IV: Fast Feature Selection in a GPU Cluster Using the Delta Test V: Binary/Ternary Extreme Learning Machines Trade-offs in Extreme Learning Machines VI: Compressive ELM: Improved Models Through Exploiting Time-Accuracy Trade-offs Advances in Extreme Learning Machines 5/23
8 Outline Context Extreme Learning Machines Part I: Ensemble Models of ELM Part II: Variable Selection and ELM Part III: Trade-offs in ELM Summary Advances in Extreme Learning Machines 6/23
9 Extreme Learning Machines Input layer Hidden layer Output layer input x j 1 input x j 2 input x j 3 output y j input x j 4 Advances in Extreme Learning Machines 7/23
10 Extreme Learning Machines Input layer Hidden layer Output layer input x j 1 input x j 2 input x j 3 random weights output y j input x j 4 Advances in Extreme Learning Machines 7/23
11 Extreme Learning Machines Input layer Hidden layer Output layer input x j 1 input x j 2 input x j 3 random weights output y j input x j 4 least-squares solution Advances in Extreme Learning Machines 7/23
12 Outline Context Extreme Learning Machines Part I: Ensemble Models of ELM Part II: Variable Selection and ELM Part III: Trade-offs in ELM Summary Advances in Extreme Learning Machines 8/23
13 Adaptive Ensemble of ELMs (Publ I) random inputs random #neurons trained on sliding/growing window adaptive linear combination based on performance of each ELM Advances in Extreme Learning Machines 9/23
14 Adaptive Ensemble of ELMs (Publ I) random inputs random #neurons trained on sliding/growing window adaptive linear combination based on performance of each ELM Advances in Extreme Learning Machines 9/23
15 Adaptive Ensemble of ELMs (Publ I) random inputs random #neurons trained on sliding/growing window adaptive linear combination based on performance of each ELM Advances in Extreme Learning Machines 9/23
16 Parallelized/GPU-Accelerated Ensemble of ELMs (Publ II) random inputs optimized #neurons (loo-cv on GPU) training performed on GPU training of ELMs parallelized over multiple CPU/GPU fixed linear combination based on loo-cv Advances in Extreme Learning Machines 10/23
17 Parallelized/GPU-Accelerated Ensemble of ELMs (Publ II) random inputs optimized #neurons (loo-cv on GPU) training performed on GPU training of ELMs parallelized over multiple CPU/GPU fixed linear combination based on loo-cv Advances in Extreme Learning Machines 10/23
18 Parallelized/GPU-Accelerated Ensemble of ELMs (Publ II) random inputs optimized #neurons (loo-cv on GPU) training performed on GPU training of ELMs parallelized over multiple CPU/GPU fixed linear combination based on loo-cv Advances in Extreme Learning Machines 10/23
19 Highlighted ELM Ensemble Results w i time w i time Adaptation of ensemble weight during task Advances in Extreme Learning Machines 11/23
20 Highlighted ELM Ensemble Results NMSE ensemble Number of models in ensemble The more models in the ensemble, the more accurate the results Advances in Extreme Learning Machines 11/23
21 Highlighted ELM Ensemble Results 6000 Running time ensemble (s) Number of workers Scalability through parallelization, limited precision, use of GPUs Advances in Extreme Learning Machines 11/23
22 Outline Context Extreme Learning Machines Part I: Ensemble Models of ELM Part II: Variable Selection and ELM Part III: Trade-offs in ELM Summary Advances in Extreme Learning Machines 12/23
23 ELM-based Feature Selection (Publ III) scaling layer random #neurons for each restart of algorithm input x j 1 s 1 input x j 2 input x j 3 s 2 s 3 output y j input x j 4 s 4 starting from a random scaling: gradually reduced until empty, based on training and loo error Advances in Extreme Learning Machines 13/23
24 ELM-based Feature Selection (Publ III) scaling layer random #neurons for each restart of algorithm input x j 1 s 1 input x j 2 input x j 3 s 2 s 3 output y j input x j 4 s 4 starting from a random scaling: gradually reduced until empty, based on training and loo error Advances in Extreme Learning Machines 13/23
25 ELM-based Feature Selection (Publ III) scaling layer random #neurons for each restart of algorithm input x j 1 s 1 input x j 2 input x j 3 s 2 s 3 output y j input x j 4 s 4 starting from a random scaling: gradually reduced until empty, based on training and loo error Advances in Extreme Learning Machines 13/23
26 Highlighted ELM-FS Results Advances in Extreme Learning Machines 14/23
27 Fast Feature Selection with GPU-accelerated Delta Test (Publ IV) input x j 1 0/1 input x j 2 input x j 3 0/1 0/1 output y j input x j 4 0/1 variable selection optimized using an evolutionary algorithm with GPU-accelerated Delta-Test as fitness criterion Advances in Extreme Learning Machines 15/23
28 Fast Feature Selection with GPU-accelerated Delta Test (Publ IV) input x j 1 0/1 input x j 2 input x j 3 0/1 0/1 output y j input x j 4 0/1 variable selection optimized using an evolutionary algorithm with GPU-accelerated Delta-Test as fitness criterion Advances in Extreme Learning Machines 15/23
29 Fast Feature Selection with GPU-accelerated Delta Test (Publ IV) 1 2N N (y i y NN(i) ) 2 i=1 Advances in Extreme Learning Machines 16/23
30 Binary/Ternary Extreme Learning Machines (Publ V) input x j 1 input x j 2 input x j 3 binary/ ternary weights output y j input x j 4 scaling of weights and biases optimized using batchintrinsic plasticity pretraining least-square fit with fast L2 regularization by using SVD decomposition Advances in Extreme Learning Machines 17/23
31 Binary/Ternary Extreme Learning Machines (Publ V) input x j 1 input x j 2 input x j 3 binary/ ternary weights output y j input x j 4 scaling of weights and biases optimized using batchintrinsic plasticity pretraining least-square fit with fast L2 regularization by using SVD decomposition Advances in Extreme Learning Machines 17/23
32 Binary/Ternary Extreme Learning Machines (Publ V) input x j 1 input x j 2 input x j 3 binary/ ternary weights output y j input x j 4 scaling of weights and biases optimized using batchintrinsic plasticity pretraining least-square fit with fast L2 regularization by using SVD decomposition Advances in Extreme Learning Machines 17/23
33 Binary/Ternary Extreme Learning Machines (Publ V) input x j 1 input x j 2 input x j 3 binary/ ternary weights output y j input x j 4 scaling of weights and biases optimized using batchintrinsic plasticity pretraining least-squares solution with fast L2 regularization by using SVD decomposition Advances in Extreme Learning Machines 17/23
34 Ternary ELM Results 1 var 2 vars 3 vars RMSEtest BIP(rand)-TR-ELM BIP(rand)-TR-2-ELM BIP(rand)-TR-3-ELM ,000 numhidden Advances in Extreme Learning Machines 18/23
35 Outline Context Extreme Learning Machines Part I: Ensemble Models of ELM Part II: Variable Selection and ELM Part III: Trade-offs in ELM Summary Advances in Extreme Learning Machines 19/23
36 Compressive ELM (Publ VI) input x j 1 input x j 2 input x j 3 binary/ ternary weights output y j input x j 4 scaling of weights and biases optimized using batchintrinsic plasticity pretraining least-squares solution with fast L2 regularization by using SVD decomposition Advances in Extreme Learning Machines 20/23
37 Compressive ELM (Publ VI) fast low-distortion embedding into lower-dimensional space input x j 1 input x j 2 input x j 3 binary/ ternary weights output y j input x j 4 scaling of weights and biases optimized using batchintrinsic plasticity pretraining least-squares solution with fast L2 regularization by using SVD decomposition Advances in Extreme Learning Machines 20/23
38 Highlighted Results Compressive ELM training time TR-3-ELM CS-TR-3-ELM(k=50) CS-TR-3-ELM(k=100) CS-TR-3-ELM(k=200) CS-TR-3-ELM(k=400) CS-TR-3-ELM(k=600) msetest TR-3-ELM CS-TR-3-ELM(k=50) CS-TR-3-ELM(k=100) CS-TR-3-ELM(k=200) CS-TR-3-ELM(k=400) CS-TR-3-ELM(k=600) ,000 #hidden neurons training time Advances in Extreme Learning Machines 21/23
39 Outline Context Extreme Learning Machines Part I: Ensemble Models of ELM Part II: Variable Selection and ELM Part III: Trade-offs in ELM Summary Advances in Extreme Learning Machines 22/23
40 Summary Overall: resulting collection of proposed methods provides an efficient, accurate and flexible framework for solving large-scale supervised learning problems. proposed methods: are not limited to the particular types of random-weight neural networks and contexts in which they have been tested can easily be incorporated in new contexts and models Advances in Extreme Learning Machines 23/23
41 Defence Slides: Related Models Advances in Extreme Learning Machines 24/23
42 ELM vs RVFL Input layer Hidden layer Output layer input x j 1 input x j 2 input x j 3 output y j input x j 4 Advances in Extreme Learning Machines 25/23
43 ELM vs RVFL Input layer Hidden layer Output layer input x j 1 input x j 2 input x j 3 output y j input x j 4 Advances in Extreme Learning Machines 25/23
44 ELM vs RVFL Input layer Hidden layer Output layer input x j 1 input x j 2 input x j 3 output y j input x j 4 Advances in Extreme Learning Machines 25/23
45 Defence Slides: General Advances in Extreme Learning Machines 26/23
46 Standard ELM Given a training set (x i, y i ), x i R d, y i R, an activation function f : R R and M the number of hidden nodes: 1: - Randomly assign input weights w i and biases b i, i [1, M]; 2: - Calculate the hidden layer output matrix H; 3: - Calculate output weights matrix β = H Y. where f (w 1 x 1 + b 1) f (w M x 1 + b M ) H = f (w 1 x N + b 1) f (w M x N + b M ) Advances in Extreme Learning Machines 27/23
47 ELM Theory vs Practice In theory, ELM is universal approximator In practice, limited number of samples; risk of overfitting Therefore: the functional approximation should use as limited number of neurons as possible the hidden layer should extract and retain as much useful information as possible from the input samples Advances in Extreme Learning Machines 28/23
48 ELM Theory vs Practice Weight considerations: weight range determines typical activation of the transfer function (remember w i, x = w i x cos θ,) therefore, normalize or tune the length of the weights vectors somehow Linear vs non-linear: since sigmoid neurons operate in nonlinear regime, add d linear neurons for the ELM to work better on (almost) linear problems Avoiding overfitting: use efficient L2 regularization Advances in Extreme Learning Machines 29/23
49 Batch Intrinsic Plasticity suppose (x 1,..., x N ) R N d, and output of neuron i is h i = f (a i w i x k + b i ), where f is an invertible transfer function for each neuron i from exponential distribution with mean µ exp, draw targets t = (t 1, t 2,..., t N ) and sort such that t 1 < t 2 <... < t N compute all presynaptic inputs s k = w i x k, and sort such that s 1 < s 2 <... < s N now, find a i and b i such that s s N 1 ( ai b i ) = f 1 (t 1 ). f 1 (t N ) Advances in Extreme Learning Machines 30/23
50 Fast leave-one-out cross-validation The leave-one-out (LOO) error can be computed using the PRESS statistics: N ( yi ŷ i ) 2 E loo = 1 N i=1 1 hat ii where hat ii is the i th value on the diagonal of the HAT-matrix, which can be quickly computed, given H : Ŷ = Hβ = HH Y = HAT Y Advances in Extreme Learning Machines 31/23
51 Fast leave-one-out cross-validation Using the SVD decomposition of H = UDV T, it is possible to obtain all needed information for computing the PRESS statistic without recomputing the pseudo-inverse for every λ: Ŷ = Hβ = H(H T H + λi) 1 H T Y = HV(D 2 + λi) 1 DU T Y = UDV T V(D 2 + λi) 1 DU T Y = UD(D 2 + λi) 1 DU T Y = HAT Y Advances in Extreme Learning Machines 32/23
52 Fast leave-one-out cross-validation where D(D 2 + λi) 1 D is a diagonal matrix with diagonal entry. Now: N ( ) yi ŷ 2 i d 2 ii d2 ii +λ as the i th MSE TR-PRESS = 1 N = 1 N = 1 N 1 hat ii N ( y i ŷ i 1 h i (H T H + λi) 1 h T i 2 N y i ŷ ( i ) d 2 1 u ii i dii 2+λ u T i i=1 i=1 i=1 ) 2 Advances in Extreme Learning Machines 33/23
53 Leave-one-out computation overhead 8 6 Time (s) Number of hidden neurons Figure : Comparison of running times of ELM training (solid lines) and ELM training + leave-one-out-computation (dotted lines), with (black lines) and without (gray lines) explicitly computing and reusing H Advances in Extreme Learning Machines 34/23
54 Defence Slides: Ensembles Advances in Extreme Learning Machines 35/23
55 Error reduction by ensembling ŷ i = y + ɛ i, (1) [{ 1 E ens = E m E[ { ŷ i y } 2 ] = E[ɛ 2 i ]. (2) E avg = 1 m E[ɛ 2 i ]. m (3) m i=1 i=1 } 2 ] [{ 1 (ŷ y) = E m m } 2 ɛ i ]. (4) i=1 E ens = 1 m E avg = 1 m m 2 E[ɛ 2 i ], (5) i=1 Advances in Extreme Learning Machines 36/23
56 Running times parallelized and GPU-Accelerated ELM Ensemble (SantaFe, ESTSP) N t(mldiv dp ) t(gesv dp ) t(mldiv sp) t(gesv sp) t(culagesv sp) SantaFe s s s s s s s s s s s s s s s s s s s s s s s s s ESTSP s s s s s s s s s s s s s s s s s s s s s s s s s Advances in Extreme Learning Machines 37/23
57 Running times parallelized and GPU-Accelerated ELM Ensemble (SantaFe, ESTSP) Running time ensemble (s) Number of workers Running time ensemble (s) Number of workers Advances in Extreme Learning Machines 38/23
58 ELM Ensemble Accuracy (SantaFe, ESTSP) NMSE ensemble NMSE ensemble Number of models in ensemble Number of models in ensemble Advances in Extreme Learning Machines 39/23
59 Defence Slides: Binary / Ternary ELM Advances in Extreme Learning Machines 40/23
60 Better Weights random layer weights and biases drawn from e.g. uniform / normal distribution with certain range / variance typical transfer function f ( w i, x + b i ) from w i, x = w i x cos θ, it can be seen that the typical activation of f depends on: expected length of w i expected length of x angles θ between the weights and the samples Advances in Extreme Learning Machines 41/23
61 Better Weights: Orthogonality? Idea 1: improve the diversity of the weights by taking weights that are mutually orthogonal (e.g. M d-dimensional basis vectors, randomly rotated in the d-dimensional space) however, does not give significantly better accuracy apparently, for the tested cases, random weight scheme of ELM already covers the possible weight space pretty well Advances in Extreme Learning Machines 42/23
62 Better Weights: Sparsity! Idea 2: improve the diversity of the weights by having each of them work in a different subspace (e.g. each weight vector has different subset of variables as input) spoiler: significantly improves accuracy, at no extra computational cost experiments suggest this is due to the weight scheme enabling implicit variable selection Advances in Extreme Learning Machines 43/23
63 Binary Weight Scheme 1 var 2 vars 3 vars etc. until enough neurons: add w {0, 1} d with 1 var (# = 2 1 ( d 1) ) add w {0, 1} d with 2 vars (# = 2 2 ( d 2) ) add w {0, 1} d with 3 vars (# = 2 3 ( d 3) )... For each subspace, weights are added in random order to avoid bias toward particular variables Advances in Extreme Learning Machines 44/23
64 Ternary Weight Scheme 1 var 2 vars 3 vars until enough neurons: add w { 1, 0, 1} d with 1 var (3 1 ( d 1) ) add w { 1, 0, 1} d with 2 vars (3 2 ( d 2) ) add w { 1, 0, 1} d with 3 vars (3 3 ( d 3) ) For each subspace, weights are added in random order to avoid bias toward particular variables Advances in Extreme Learning Machines 45/23
65 Experimental Settings Data Abbreviation number of variables # training # test Abalone Ab CaliforniaHousing Ca CensusHouse8L Ce DeltaElevators De ComputerActivity Co BIP(CV)-TR-ELM vs BIP(CV)-TR-2-ELM vs BIP(CV)-TR-3-ELM Experiment 1: relative performance Experiment 2: robustness against irrelevant vars Experiment 3: implicit variable selection (all results are averaged over 100 repetitions, each with randomly drawn training/test set) Advances in Extreme Learning Machines 46/23
66 Exp 1: numhidden vs. RMSE (Abalone) RMSEtest BIP(rand)-TR-ELM BIP(rand)-TR-2-ELM BIP(rand)-TR-3-ELM averages over 100 runs gaussian < binary ternary < gaussian better RMSE with much less neurons ,000 numhidden Advances in Extreme Learning Machines 47/23
67 Exp 1: numhidden vs. RMSE (CpuActivity) RMSEtest BIP(rand)-TR-ELM BIP(rand)-TR-2-ELM BIP(rand)-TR-3-ELM averages over 100 runs binary < gaussian ternary < gaussian better RMSE with much less neurons ,000 numhidden Advances in Extreme Learning Machines 48/23
68 Exp 2: Robustness against irrelevant variables (Abalone) RMSEtest BIP(rand)-TR-ELM BIP(rand)-TR-2-ELM BIP(rand)-TR-3-ELM number of added noise variables 1000 neurons binary weight scheme gives similar RMSE ternary weight scheme makes ELM more robust against irrelevant vars Advances in Extreme Learning Machines 49/23
69 Exp 2: Robustness against irrelevant variables (CpuActivity) RMSEtest BIP(rand)-TR-ELM BIP(rand)-TR-2-ELM BIP(rand)-TR-3-ELM 1000 neurons binary and ternary weight scheme makes ELM more robust against irrelevant vars number of added noise variables Advances in Extreme Learning Machines 50/23
70 Exp 2: Robustness against irrelevant variables Ab Co gaussian binary ternary gaussian binary ternary RMSE with original variables RMSE with 30 added irr. vars RMSE loss Table : Average RMSE loss of ELMs with 1000 hidden neurons, trained on the original data, and the data with 30 added irrelevant variables Advances in Extreme Learning Machines 51/23
71 Exp 3: Implicit Variable Selection (CpuAct) relevance of each input variable quantified as M i=1 β i w i gaussian variable relevance D1 D2 D3 D4 D5 R1 R2 R3 R4 R5 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 variables Advances in Extreme Learning Machines 52/23
72 Exp 3: Implicit Variable Selection (CpuAct) relevance of each input variable quantified as M i=1 β i w i binary view variable relevance D1 D2 D3 D4 D5 R1 R2 R3 R4 R5 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 variables Advances in Extreme Learning Machines 53/23
73 Exp 3: Implicit Variable Selection (CpuAct) relevance of each input variable quantified as M i=1 β i w i ternary variable relevance D1 D2 D3 D4 D5 R1 R2 R3 R4 R5 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 variables Advances in Extreme Learning Machines 54/23
74 Binary/Ternary ELM Conclusions We propose simple change to weight scheme and introduce robust ELM variants: BIP(rand)-TR-ELM BIP(rand)-TR-2-ELM BIP(rand)-TR-3-ELM Our experiments suggest that 1. ternary weight scheme generally better than gaussian weights 2. ternary weight scheme robust against irrelevant variables 3. binary/ternary weight scheme allows ELM to perform implicit variable selection The added robustness and increased accuracy comes for free! Advances in Extreme Learning Machines 55/23
75 Defence Slides: Trade-offs / Compressive ELM Advances in Extreme Learning Machines 56/23
76 Time-accuracy Trade-offs for Several ELMs ELM OP-ELM: Optimally Pruned ELM with neurons ranked by relevance, and then pruned to optimize the leave-one-out error TR-ELM: Tikhonov-regularized ELM, with efficient optimization of regularization parameter λ, using the SVD approach to computing H TROP-ELM: Tikhonov regularized OP-ELM BIP(0.2), BIP(rand), BIP(CV): ELMs pretrained using Batch Intrinsic Plasticity mechanism, adapting the hidden layer weights and biases, such that they retain as much information as possible BIP parameter is either fixed, randomized, or cross-validated over 20 possible values Advances in Extreme Learning Machines 57/23
77 ELM Time-accuracy Trade-offs (Abalone UCI) training time OP-3-ELM TROP-3-ELM TR-3-ELM BIP(CV)-TR-3-ELM BIP(0.2)-TR-3-ELM BIP(rand)-TR-3-ELM msetest OP-3-ELM TROP-3-ELM TR-3-ELM BIP(CV)-TR-3-ELM BIP(0.2)-TR-3-ELM BIP(rand)-TR-3-ELM ,000 #hidden neurons ,000 #hidden neurons Advances in Extreme Learning Machines 58/23
78 ELM Time-accuracy Trade-offs (Abalone UCI) OP-3-ELM TROP-3-ELM TR-3-ELM BIP(CV)-TR-3-ELM BIP(0.2)-TR-3-ELM BIP(rand)-TR-3-ELM OP-3-ELM TROP-3-ELM TR-3-ELM BIP(CV)-TR-3-ELM BIP(0.2)-TR-3-ELM BIP(rand)-TR-3-ELM msetest msetest training time testing time Advances in Extreme Learning Machines 59/23
79 ELM Time-accuracy Trade-offs (Abalone UCI) Depending on the user s criteria, these results suggest: training time most important: BIP(rand)-TR-3-ELM (almost optimal performance, while keeping training time low) if test error is most important: BIP(CV)-TR-3-ELM (slightly better accuracy, but training time is 20 times as high) if testing time is most important: BIP(rand)-TR-3-ELM (surprisingly) (OP-ELM and TROP-ELM tend to be faster in test, but suffer from slight overfitting) Since TR-3-ELM offers attractive trade-offs between speed and accuracy, this model will be central in the rest of the paper. Advances in Extreme Learning Machines 60/23
80 Two approaches for improving models Time-accuracy trade-offs suggest two possible strategies to obtain models that are preferable over other models: reducing test error, using a better algorithm ( in terms of training time-accuracy plot: pushing the curve down ) reducing computational time, while retaining as much accuracy as possible ( in terms of training time-accuracy plot: pushing the curve to the left ) Compressive ELM focuses on reducing computational time by performing the training in a reduced space, and then projecting back the solution back to the original space. Advances in Extreme Learning Machines 61/23
81 Compressive ELM Given m n matrix A, compute k-term approximate SVD A UDV T [Halko2009]: Form the n (k + p) random matrix Ω. (where p is small) Form the m (k + p) sampling matrix Y = AΩ. (sketch it by applying Ω) Form the m (k + p) orthonormal matrix Q (such that range(q) = range(y )) Compute B = Q A. Form the SVD of B so that B = ÛDV T Compute the matrix U = QÛ Advances in Extreme Learning Machines 62/23
82 Faster Sketching? Bottleneck in Algorithm is the time it takes to sketch the matrix. Rather than using Gaussian random matrices for sketching A, use random matrices that are sparse or structured in some way and allow for faster multiplication: ( ) ( ) ( ) P W D k n n n Fast Johnson Lindenstrauss Transform (FJLT) introduced in [Ailon2006] for which P is a sparse matrix of random Gaussian variables, and H encodes the Discrete Hadamard Transform Subsampled Randomized Hadamard Transform (SRHT) for which P is a matrix selecting k random columns from H, and H encodes the Discrete Hadamard Transform (Experiments did not show substantial difference in terms of computational time. Dataset too small?) n n Advances in Extreme Learning Machines 63/23
83 Compressive ELM (CalHousing, FJLT) training time TR-3-ELM CS-TR-3-ELM(k=50) CS-TR-3-ELM(k=100) CS-TR-3-ELM(k=200) CS-TR-3-ELM(k=400) CS-TR-3-ELM(k=600) msetest TR-3-ELM CS-TR-3-ELM(k=50) CS-TR-3-ELM(k=100) CS-TR-3-ELM(k=200) CS-TR-3-ELM(k=400) CS-TR-3-ELM(k=600) ,000 #hidden neurons ,000 #hidden neurons Advances in Extreme Learning Machines 64/23
84 Compressive ELM (CalHousing, FJLT) TR-3-ELM CS-TR-3-ELM(k=50) CS-TR-3-ELM(k=100) CS-TR-3-ELM(k=200) CS-TR-3-ELM(k=400) CS-TR-3-ELM(k=600) TR-3-ELM CS-TR-3-ELM(k=50) CS-TR-3-ELM(k=100) CS-TR-3-ELM(k=200) CS-TR-3-ELM(k=400) CS-TR-3-ELM(k=600) msetest msetest training time testing time Advances in Extreme Learning Machines 65/23
85 Compressive ELM Conclusions Contributions Compressive ELM provides a flexible way to reduce training time by doing the optimization in a reduced space of k dimensions given k large enough, Compressive ELM achieves the best test error for each computational time (i.e. there are no models that achieve better test error and can be trained in the same or less time). Future work let theory/bounds on low-distortion embeddings inform the choice of k Advances in Extreme Learning Machines 66/23
Approximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationCS540 Machine learning Lecture 5
CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3
More informationAir Quality Forecasting using Neural Networks
Air Quality Forecasting using Neural Networks Chen Zhao, Mark van Heeswijk, and Juha Karhunen Aalto University School of Science, FI-00076, Finland {chen.zhao, mark.van.heeswijk, juha.karhunen}@aalto.fi
More informationDATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized
More informationResampling techniques for statistical modeling
Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationIn the Name of God. Lectures 15&16: Radial Basis Function Networks
1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationMANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional
Low rank approximation and decomposition of large matrices using error correcting codes Shashanka Ubaru, Arya Mazumdar, and Yousef Saad 1 arxiv:1512.09156v3 [cs.it] 15 Jun 2017 Abstract Low rank approximation
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationCOMPRESSED AND PENALIZED LINEAR
COMPRESSED AND PENALIZED LINEAR REGRESSION Daniel J. McDonald Indiana University, Bloomington mypage.iu.edu/ dajmcdon 2 June 2017 1 OBLIGATORY DATA IS BIG SLIDE Modern statistical applications genomics,
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationCS168: The Modern Algorithmic Toolbox Lecture #6: Regularization
CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationThe Application of Extreme Learning Machine based on Gaussian Kernel in Image Classification
he Application of Extreme Learning Machine based on Gaussian Kernel in Image Classification Weijie LI, Yi LIN Postgraduate student in College of Survey and Geo-Informatics, tongji university Email: 1633289@tongji.edu.cn
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationSketched Ridge Regression:
Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang UC Berkeley Alex Gittens RPI Michael Mahoney UC Berkeley Overview Ridge Regression min w f w = 1 n Xw y + γ w Over-determined:
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationMultiple-step Time Series Forecasting with Sparse Gaussian Processes
Multiple-step Time Series Forecasting with Sparse Gaussian Processes Perry Groot ab Peter Lucas a Paul van den Bosch b a Radboud University, Model-Based Systems Development, Heyendaalseweg 135, 6525 AJ
More informationepochs epochs
Neural Network Experiments To illustrate practical techniques, I chose to use the glass dataset. This dataset has 214 examples and 6 classes. Here are 4 examples from the original dataset. The last values
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationRadial-Basis Function Networks
Radial-Basis Function etworks A function is radial () if its output depends on (is a nonincreasing function of) the distance of the input from a given stored vector. s represent local receptors, as illustrated
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine
More informationSample Exam COMP 9444 NEURAL NETWORKS Solutions
FAMILY NAME OTHER NAMES STUDENT ID SIGNATURE Sample Exam COMP 9444 NEURAL NETWORKS Solutions (1) TIME ALLOWED 3 HOURS (2) TOTAL NUMBER OF QUESTIONS 12 (3) STUDENTS SHOULD ANSWER ALL QUESTIONS (4) QUESTIONS
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationUSING SINGULAR VALUE DECOMPOSITION (SVD) AS A SOLUTION FOR SEARCH RESULT CLUSTERING
POZNAN UNIVE RSIY OF E CHNOLOGY ACADE MIC JOURNALS No. 80 Electrical Engineering 2014 Hussam D. ABDULLA* Abdella S. ABDELRAHMAN* Vaclav SNASEL* USING SINGULAR VALUE DECOMPOSIION (SVD) AS A SOLUION FOR
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationIn this chapter, we provide an introduction to covariate shift adaptation toward machine learning in a non-stationary environment.
1 Introduction and Problem Formulation In this chapter, we provide an introduction to covariate shift adaptation toward machine learning in a non-stationary environment. 1.1 Machine Learning under Covariate
More informationIntroduction to Machine Learning and Cross-Validation
Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationLearning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationMixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes
Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationVC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms
03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationUsing the Delta Test for Variable Selection
Using the Delta Test for Variable Selection Emil Eirola 1, Elia Liitiäinen 1, Amaury Lendasse 1, Francesco Corona 1, and Michel Verleysen 2 1 Helsinki University of Technology Lab. of Computer and Information
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More information1 Cricket chirps: an example
Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationHypothesis Evaluation
Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall 1395 1 / 31 Table of contents 1 Introduction
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More information10701/15781 Machine Learning, Spring 2007: Homework 2
070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach
More information