ABC-LogitBoost for Multi-Class Classification
|
|
- Shanon Holt
- 5 years ago
- Views:
Transcription
1 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University
2 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall What is Classification? An Example: USPS Handwritten Zipcode Recognition Person 1: Person 2: Person 3: The task: Teach the machine to automatically recognize the 10 digits.
3 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Multi-Class Classification Given a training data set {y i, X i } N i=1, X i R p, y i {0, 1, 2,..., K 1} the task is to learn a function to predict the class label y i from X i. K = 2 : binary classification K > 2 : multi-class classification Many important practical problems can be cast as (multi-class) classification. For example, Li, Burges, and Wu, NIPS 2007 Learning to Ranking Using Multiple Classification and Gradient Boosting.
4 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Logistic Regression for Classification First learn the class probabilities ˆp k = Pr {y = k X}, k = 0, 1,..., K 1, K 1 k=0 ˆp k = 1, (only K 1 degrees of freedom). Then assign the class label according to ŷ X = argmax k ˆp k
5 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Multinomial Logit Probability Model p k = e F k K 1 s=0 ef s where F k = F k (x) is the function to be learned from the data. Classical logistic regression: The task is to learn the coefficients β. F(x) = β T x
6 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Flexible additive modeling: F(x) = F (M) (x) = M m=1 ρ m h(x;a m ), h(x; a) is a pre-specified function (e.g., trees). The task is to learn the parameters ρ m and a m. - Both LogitBoost (Friedman et. al, ) and MART (Multiple Additive Regression Trees, Friedman 2001) adopted this model.
7 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Learning Logistic Regression by Maximum Likelihood Seek F i,k to maximize the mutlinomial likelihood: Suppose y i = k, Lik p 0 i,0... p i,k 1... p 0 i,k 1 = p i,k or equivalently, maximizing the log likelihood: log Lik log p i,k Or equivalently, minimizing the negative log likelihood loss L i = log p i,k, (y i = k)
8 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall The Negative Log-Likelihood Loss L = N L i = i=1 { N i=1 K 1 k=0 r i,k log p i,k } r i,k = 1 if y i = k 0 otherwise
9 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Two Basic Optimization Methods for Maximum Likelihood 1. Newton s Method Uses the first and second derivatives of the loss function. The method in LogitBoost. 2. Gradient Descent Only uses the first order derivative of the loss function. - MART used a creative combination of gradient descent and Newton s method.
10 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Derivatives Used in LogitBoost and MART The loss function: L = The first derivative: N L i = i=1 { N i=1 K 1 k=0 r i,k log p i,k } L i F i,k = (r i,k p i,k ) The second derivative: 2 L i F 2 i,k = p i,k (1 p i,k ).
11 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall The Original LogitBoost Algorithm 1: F i,k = 0, p i,k = 1, k = 0 to K 1, i = 1 to N K 2: For m = 1 to M Do 3: For k = 0 to K 1, Do 4: w i,k = p i,k (1 p i,k ), z i,k = r i,k p i,k p i,k (1 p i,k ). 5: Fit the function f i,k by a weighted least-square of z i,k to x i with weights w i,k. 6: F i,k = F i,k +ν K 1 K ( f i,k 1 K ) K 1 k=0 f i,k 7: End 8: p i,k = exp(f i,k )/ K 1 s=0 exp(f i,s), k = 0 to K 1, i = 1 to N 9: End
12 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall The Original MART Algorithm 1: F i,k = 0, p i,k = 1, k = 0 to K 1, i = 1 to N K 2: For m = 1 to M Do 3: For k = 0 to K 1 Do 4: {R j,k,m } J j=1 = J-terminal node regression tree from {r i,k p i,k, x i } N i=1 5: β j,k,m = K 1 K x i R j,k,m r i,k p i,k x i R j,k,m (1 p i,k )p i,k 6: F i,k = F i,k + ν J j=1 β j,k,m1 xi R j,k,m 7: End 8: p i,k = exp(f i,k )/ K 1 s=0 exp(f i,s), k = 0 to K 1, i = 1 to N 9: End
13 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Comparing LogitBoost with MART LogitBoost used first and second derivatives to construct the trees. MART only used the first order information to construct the trees. Both used second-order information to update values of the terminal nodes. LogitBoost was believed to have numerical instability problems.
14 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall The Numerical Issue in LoigtBoost 4: w i,k = p i,k (1 p i,k ), z i,k = r i,k p i,k p i,k (1 p i,k ). 5: Fit the function f i,k by a weighted least-square of z i,k to x i with weights w i,k. ( ) 6: F i,k = F i,k + ν K 1 f K i,k 1 K 1 K k=0 f i,k The instability issue : When p i,k is close to 0 or 1, z i,k = z i,k = r i,k p i,k p i,k (1 p i,k ) may approach infinity.
15 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall The Numerical Issue in LoigtBoost (Friedman et al ) used regression trees and suggested some crucial implementation protections : In Line 4, compute z i,k by 1 p i,k (if r i,k = 1) or Bound z i,k by z max [2, 4]. 1 1 p i,k (if r i,k = 0). Robust LogitBoost avoids this pointwise thresholding and is essentially free of numerical problems. It turns out, the numerical issue does not really exist, especially when the trees are not too large.
16 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Tree-Splitting Using the Second-Order Information Feature values: x i, i = 1 to N. Assume x 1 x 2... x N. Weight values: w i, i = 1 to N. Response values: z i, i = 1 to N. We seek the index s, 1 s < N, to maximize the gain of weighted SE: Gain(s) =SE T (SE L + SE R ) [ N s = (z i z) 2 w i (z i z L ) 2 w i + i=1 i=1 N i=s+1 (z i z R ) 2 w i ] where z = N i=1 z iw i N i=1 w i, z L = s i=1 z iw i s i=1 w i, z R = N i=s+1 z iw i N i=s+1 w i.
17 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall After simplification, we obtain Gain(s) = [ s i=1 z iw i ] 2 s i=1 w i + [ N i=s+1 z iw i ] 2 N i=s+1 w i [ N i=1 z iw i ] 2 N i=1 w i = [ s i=1 r i,k p i,k ] 2 s i=1 p i,k(1 p i,k ) + [ N i=s+1 r i,k p i,k ] 2 N i=s+1 p i,k(1 p i,k ) [ N i=1 r i,k p i,k ] 2 N i=1 p i,k(1 p i,k ). Recall w i = p i,k (1 p i,k ), z i = r i,k p i,k p i,k (1 p i,k ). This procedure is numerically stable.
18 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall MART only used the first order information to construct the trees: MARTGain(s) = 1 s [ s i=1 r i,k p i,k ] N s [ N i=s+1 r i,k p i,k ] 2 1 N [ N i=1 r i,k p i,k ] 2. Which can also be derived by letting weights w i,k = 1 and response z i,k = r i,k p i,k. LogitBoost used more information and could be more accurate in many datasets.
19 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Update terminal node values In LogitBoost: node z i,kw i,k node w i,k which is the same as in MART. = node r i,k p i,k node p i,k(1 p i,k ),
20 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Robust LogitBoost 1: F i,k = 0, p i,k = 1, k = 0 to K 1, i = 1 to N K 2: For m = 1 to M Do 3: For k = 0 to K 1 Do 4: {R j,k,m } J j=1 = J-terminal node regression tree from {r i,k p i,k, x i } N i=1, : with weights p i,k (1 p i,k ). 5: β j,k,m = K 1 K x i R j,k,m r i,k p i,k x i R j,k,m (1 p i,k )p i,k 6: F i,k = F i,k + ν J j=1 β j,k,m1 xi R j,k,m 7: End 8: p i,k = exp(f i,k )/ K 1 s=0 exp(f i,s), k = 0 to K 1, i = 1 to N 9: End
21 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Experiments on Binary Classification (Multi-class classification is even more interesting!) Data IJCNN1: training samples, test samples This dataset was used in a competition. LibSVM was the winner. Forest100k: Forest521k: training samples, 0 test samples training samples, 0 test samples The two largest datasets from Bordes et al. JMLR 2005, Fast Kernel Classifiers with Online and Active Learning
22 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall IJCNN1 Test Errors Test misclassification error Test: J = 20 ν = MART 1400 LibSVM 1200 Robust LogitBoost Iterations
23 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Forest100k Test Errors Test misclassification error Test: J = 20 ν = 0.1 MART SVM Robust LogitBoost Iterations
24 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Forest521k Test Errors Test misclassification error Test: J = 20 ν = SVM MART Robust LogitBoost Iterations
25 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall ABC-Boost for Multi-Class Classification ABC = Adaptive Base Class ABC-MART = ABC-Boost + MART ABC-LogitBoost = ABC-Boost + (Robust) LogitBoost The key to the success of ABC-Boost is the use of better derivatives.
26 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Review Components of Logistic Regression The multinomial probability model: p k = e F k K 1 s=0 ef s, K 1 k=0 p k = 1 where F k = F k (x) is the function to be learned from the data. The sum-to-zero constraint: K 1 k=0 F k (x) = 0 is commonly used to obtain a unique solution (only K 1 degrees of freedom).
27 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Why the sum-to-zero constraint? e F i,k+c K 1 s=0 ef i,s+c = ec efi,k e C K 1 s=0 ef i,s = e F i,k K 1 s=0 ef i,s = p i,k. For identifiability, one should impose a constraint. One popular choice is to assume K 1 k=0 F i,k = const, equivalent to K 1 k=0 F i,k = 0. This is the assumption used in many papers including LogitBoost and MART.
28 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall The negative log-likelihood loss L = N L i = i=1 { N i=1 K 1 k=0 r i,k log p i,k } r i,k = 1 if y i = k 0 otherwise K 1 k=0 r i,k = 1
29 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Derivatives used in LogitBoost and MART: L i F i,k = (r i,k p i,k ) 2 L i F 2 i,k = p i,k (1 p i,k ), which could be derived without imposing any constraints on F k.
30 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Singularity of Hessian without Sum-to-zero Constraint Without sum-to-zero constraint K 1 k=0 F i,k = 0: L i F i,k = (r i,k p i,k ), 2 L i F 2 i,k = p i,k (1 p i,k ). For example, when K = 3. 2 L i 2 L i F 2 F i,0 i,0 F i,1 2 L i 2 L i F i,1 F i,0 Fi,1 2 2 L i F i,2 F i,0 2 L i F i,2 F i,1 2 L i F i,0 F i,2 2 L i F i,1 F i,2 2 L i F 2 i,2 = p 0 (1 p 0 ) p 0 p 1 p 0 p 2 p 1 p 0 p 1 (1 p 1 ) p 1 p 2 p 2 p 0 p 2 p 1 p 2 (1 p 2 ) =0
31 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Diagonal Approximation (Friedman et al and Friedman 2001) Hessian: K 1 K = K 1 K 2 L i F 2 i,0 2 L i F 2 i,1 p 0 (1 p 0 ) used diagonal approximation of 2 L i F 2 i,2 p 1 (1 p 1 ) p 2 (1 p 2 )
32 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Derivatives Under Sum-to-zero Constraint The loss function: L i = K 1 k=0 r i,k log p i,k The probability model and sum-to-zero constraint: p i,k = e F i,k K 1 s=0 ef i,s, K 1 k=0 F i,k = 0 Without loss of generality, we assume k = 0 is the base class F i,0 = K 1 i=1 F i,k
33 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall New derivatives: L i F i,k = (r i,0 p i,0 ) (r i,k p i,k ), 2 L i F 2 i,k = p i,0 (1 p i,0 ) + p i,k (1 p i,k ) + 2p i,0 p i,k. MART and LogitBoost used: L i F i,k = (r i,k p i,k ), 2 L i F 2 i,k = p i,k (1 p i,k ).
34 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Assume class 0 is the base class. The determinant of the Hessian is 2 L i F 2 i,1 2 L i F i,2 F i,1 2 L i F i,1 F i,2 2 L i F 2 i,2 = p 0 (1 p 0 ) + p 1 (1 p 1 ) + 2p 0 p 1 p 0 p p 0 p 1 + p 0 p 2 p 1 p 2 p 0 p p 0 p 1 + p 0 p 2 p 1 p 2 p 0 (1 p 0 ) + p 2 (1 p 2 ) + 2p 0 p 2 =p 0 p 1 + p 0 p 2 + p 1 p 2 p 0 p 2 1 p 0 p 2 2 p 1 p 2 2 p 2 p 2 1 p 1 p 2 0 p 2 p p 0 p 1 p 2, independent of the choice of the base class. However, diagonal approximation appears to be a must when using trees. Thus the choice of base class matters.
35 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Adaptive Base Class Boost (ABC-Boost) Two Key Ideas of ABC-Boost: 1. Formulate the multi-class boosting algorithm by considering a base class. Do not have to train for the base class, which is inferred from the sum-zero-constraint K 1 k=0 F i,k = At each boosting step, adaptively choose the base class.
36 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall How to choose the base class? We should choose the base class based on performance (training loss). How? (One Idea) Exhaustively search for all K base classes and choose the base class that leads to the best performance (smallest training loss). Computationally expensive (but not too bad, unless K is really large) Good performance can be achieved. Many other ideas.
37 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall ABC-MART = ABC-Boost + MART. ABC-LogitBoost = ABC-Boost + Robust LogitBoost.
38 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall The Original MART Algorithm 1: F i,k = 0, p i,k = 1, k = 0 to K 1, i = 1 to N K 2: For m = 1 to M Do 3: For k = 0 to K 1 Do 4: {R j,k,m } J j=1 = J-terminal node regression tree from {r i,k p i,k, x i } N i=1 5: β j,k,m = K 1 K x i R j,k,m r i,k p i,k x i R j,k,m (1 p i,k )p i,k 6: F i,k = F i,k + ν J j=1 β j,k,m1 xi R j,k,m 7: End 8: p i,k = exp(f i,k )/ K 1 s=0 exp(f i,s), k = 0 to K 1, i = 1 to N 9: End
39 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall ABC-MART 1: F i,k = 0, k = 0 to K 1, i = 1 to N 2: For m = 1 to M Do : For b = 0 to K 1 DO 3: For k = 0 to K 1 (and k b) Do 4: {R j,k,m } J j=1 = J-terminal node tree from {r i,k p i,k (r i,b p i,b ), x i } N i=1 5: β j,k,m = x i R (r j,k,m i,k p i,k ) (r i,b p i,b ) x i R j,k,m (1 p i,k )p i,k +(1 p i,b )p i,b +2p i,k p i,b 6: F i,k = F i,k + ν J j=1 β j,k,m1 xi R j,k,m 7: End 8: p i,k = exp(f i,k )/ K 1 s=0 exp(f i,s) 9: End
40 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Datasets UCI-Covertype Total samples. Two datasets were generated: Covertype290k, Covertype145k UCI-Poker Original training samples and 1 million test samples. Poker25kT1, Poker25kT2, Poker525k, Poker275k, Poker150k, Poker100k. MNIST Originally training samples and test samples. MNIST10k swapped the training with test samples. Many variations of MNIST Original MNIST is a well-known easy problem. ( lisa/twiki/bin/view.cgi/public/ DeepVsShallowComparisonICML2007) created a variety of much more difficult datasets by adding various background (correlated) noise, background images, rotations, etc. UCI-Letter Total 0 samples.
41 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall dataset K # training # test # features Covertype290k Covertype145k Poker525k Poker275k Poker150k Poker100k Poker25kT Poker25kT Mnist10k M-Basic M-Rotate M-Image M-Rand M-RotImg M-Noise M-Noise M-Noise M-Noise M-Noise M-Noise Letter15k Letter4k Letter2k
42 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Summary of test mis-classification errors Dataset abc- boost abc-boost logistic regression # test Covertype290k Covertype145k Poker525k Poker275k Poker150k Poker100k Poker25kT Poker25kT Mnist10k M-Basic M-Rotate M-Image M-Rand M-RotImg M-Noise M-Noise M-Noise M-Noise M-Noise M-Noise Letter15k Letter4k Letter2k
43 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall P -Values P 1: for testing if abc- has significantly lower error rates than. P 2: for testing if (robust) boost has significantly lower error rates than. P 3: for testing if abc-boost has significantly lower error rates than abc-. P 4: for testing if abc-boost has significantly lower error rates than (robust) boost. P -values are computed using binomial distributions and normal approximations.
44 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Summary of test P -Values Dataset P1 P2 P3 P4 Covertype290k Covertype145k Poker525k Poker275k Poker150k Poker100k Poker25kT Poker25kT Mnist10k M-Basic M-Rotate M-Image M-Rand M-RotImg M-Noise M-Noise M-Noise M-Noise M-Noise M-Noise Letter15k Letter4k Letter2k
45 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Comparisons with SVM and Deep Learning Datasets: M-Noise1 to M-Noise6 Results on SVM, Neural Nets, and Deep Learning are from lisa/twiki/bin/view.cgi/public/ DeepVsShallowComparisonICML SAA 3 40 Error rate (%) DBN 3 SVM rbf Error rate (%) Degree of correlation Degree of correlation
46 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Comparisons with SVM and Deep Learning Datasets: M-Noise1 to M-Noise6 Error rate (%) SAA 3 DBN 3 SVM rbf Degree of correlation Error rate (%) Degree of correlation 6
47 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall More Comparisons with SVM and Deep Learning M-Basic M-Rotate M-Image M-Rand M-RotImg SVM-RBF 3.05% 11.11% 22.61% 14.58% 55.18% SVM-POLY 3.69% 15.42% 24.01% 16.62% 56.41% NNET 4.69% 18.11% 27.41% 20.04% 62.16% DBN % 10.30% 16.31% 6.73% 47.39% SAA % 10.30% 23.00% 11.28% 51.93% DBN % 14.69% 16.15% 9.80% 52.21% 4.12% 15.35% 11.64% 13.15% 49.82% abc- 3.69% 13.27% 9.45% 10.60% 46.14% boost 3.45% 13.63% 9.41% 10.04% 45.92% abc-boost 3.20% 11.92% 8.54% 9.45% 44.69%
48 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Training Loss Vs Boosting Iterations Training loss Train 10 3 Covertype290k: J=20, ν= Training loss Train Poker525k: J=20, ν=
49 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Test Errors Vs Boosting Iterations Mnist10k: J = 20, ν = Letter15k: J = 20, ν =
50 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Letter4k: J = 20, ν = Letter2k: J = 20, ν =
51 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall x Covertype290k: J=20, ν= x Covertype145k: J=20, ν= x 104 Poker525k: J = 20, ν = x 104 Poker275k: J = 20, ν =
52 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall x 104 Poker150k: J = 20, ν = x Poker100k: J = 20, ν = x Poker25kT1: J = 6, ν = x Poker25kT2: J = 6, ν =
53 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Detailed Experiment Results on Mnist10k J {4, 6, 8, 10, 12, 14, 16, 18, 20, 24, 30, 40, 50} ν {0.04, 0.06, 0.08, 0.1}. The goal is to show the improvements at all reasonable combinations of J and ν.
54 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall abc- ν = 0.04 ν = 0.06 ν = 0.08 ν = 0.1 J = J = J = J = J = J = J = J = J = J = J = J = J =
55 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall boost abc- ν = 0.04 ν = 0.06 ν = 0.08 ν = 0.1 J = J = J = J = J = J = J = J = J = J = J = J = J =
56 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall ν = 0.04 ν = 0.06 ν = 0.08 ν = 0.1 J = J = J = J = J = J = J = J = J = J = J = J = J = P1
57 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall ν = 0.04 ν = 0.06 ν = 0.08 ν = 0.1 J = J = J = J = J = J = J = J = J = J = J = J = J = P2
58 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall ν = 0.04 ν = 0.06 ν = 0.08 ν = 0.1 J = J = J = J = J = J = J = J = J = J = J = J = J = P3
59 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall ν = 0.04 ν = 0.06 ν = 0.08 ν = 0.1 J = J = J = J = J = J = J = J = J = J = J = J = J = P4
60 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Mnist10k: J = 4, ν = Mnist10k: J = 6, ν = Mnist10k: J = 8, ν = Mnist10k: J = 4, ν = Mnist10k: J = 6, ν = Mnist10k: J = 8, ν = Mnist10k: J = 4, ν = Mnist10k: J = 6, ν = Mnist10k: J = 8, ν =
61 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Mnist10k: J = 10, ν = Mnist10k: J = 10, ν = Mnist10k: J = 10, ν = Mnist10k: J = 12, ν = Mnist10k: J = 12, ν = Mnist10k: J = 12, ν = Mnist10k: J = 14, ν = Mnist10k: J = 14, ν = Mnist10k: J = 14, ν =
62 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Mnist10k: J = 16, ν = Mnist10k: J = 16, ν = Mnist10k: J = 16, ν = Mnist10k: J = 18, ν = Mnist10k: J = 18, ν = Mnist10k: J = 18, ν = Mnist10k: J = 20, ν = Mnist10k: J = 20, ν = Mnist10k: J = 20, ν =
63 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Mnist10k: J = 24, ν = Mnist10k: J = 24, ν = Mnist10k: J = 24, ν = Mnist10k: J = 30, ν = Mnist10k: J = 30, ν = m Mnist10k: J = 40, ν = Mnist10k: J = 40, ν = Mnist10k: J = 30, ν = Mnist10k: J = 40, ν =
64 Ping Li, Cornell University ABC-Boost BTRY 6520 Fall Mnist10k: J = 50, ν = Mnist10k: J = 50, ν = Mnist10k: J = 50, ν = Want to See More?
ABC-Boost: Adaptive Base Class Boost for Multi-class Classification
ABC-Boost: Adaptive Base Class Boost for Multi-class Classification Ping Li Department of Statistical Science, Cornell University, Ithaca, NY 14853 USA pingli@cornell.edu Abstract We propose -boost (adaptive
More informationAOSO-LogitBoost: Adaptive One-Vs-One LogitBoost for Multi-Class Problem
: Adaptive One-Vs-One LogitBoost for Multi-Class Problem Peng Sun sunp08@mails.tsinghua.edu.cn Tsinghua National Laboratory for Information Science and Technology(TNList), Department of Automation, Tsinghua
More informationLogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets
Applied to the WCCI 2006 Performance Prediction Challenge Datasets Roman Lutz Seminar für Statistik, ETH Zürich, Switzerland 18. July 2006 Overview Motivation 1 Motivation 2 The logistic framework The
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationLogistic Regression Trained with Different Loss Functions. Discussion
Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationMachine Learning - Waseda University Logistic Regression
Machine Learning - Waseda University Logistic Regression AD June AD ) June / 9 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationBoosting Based Conditional Quantile Estimation for Regression and Binary Classification
Boosting Based Conditional Quantile Estimation for Regression and Binary Classification Songfeng Zheng Department of Mathematics Missouri State University, Springfield, MO 65897, USA SongfengZheng@MissouriState.edu
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationMachine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary
More informationThe Perceptron Algorithm
The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output
More informationGeneralized Boosted Models: A guide to the gbm package
Generalized Boosted Models: A guide to the gbm package Greg Ridgeway April 15, 2006 Boosting takes on various forms with different programs using different loss functions, different base models, and different
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationCSE 250a. Assignment Noisy-OR model. Out: Tue Oct 26 Due: Tue Nov 2
CSE 250a. Assignment 4 Out: Tue Oct 26 Due: Tue Nov 2 4.1 Noisy-OR model X 1 X 2 X 3... X d Y For the belief network of binary random variables shown above, consider the noisy-or conditional probability
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationLinks between Perceptrons, MLPs and SVMs
Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationLogistic Regression and Boosting for Labeled Bags of Instances
Logistic Regression and Boosting for Labeled Bags of Instances Xin Xu and Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand {xx5, eibe}@cs.waikato.ac.nz Abstract. In
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationEfficient Data Reduction and Summarization
Ping Li Efficient Data Reduction and Summarization December 8, 2011 FODAVA Review 1 Efficient Data Reduction and Summarization Ping Li Department of Statistical Science Faculty of omputing and Information
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationMachine Learning Linear Models
Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationCS 340 Lec. 16: Logistic Regression
CS 34 Lec. 6: Logistic Regression AD March AD ) March / 6 Introduction Assume you are given some training data { x i, y i } i= where xi R d and y i can take C different values. Given an input test data
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationGradient Boosting, Continued
Gradient Boosting, Continued David Rosenberg New York University December 26, 2016 David Rosenberg (New York University) DS-GA 1003 December 26, 2016 1 / 16 Review: Gradient Boosting Review: Gradient Boosting
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationDelta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto
Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto This presentation has been prepared for the Actuaries Institute 2015 ASTIN
More informationA Deep Interpretation of Classifier Chains
A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén http://users.ics.aalto.fi/{jesse,jhollmen}/ Aalto University School of Science, Department of Information and Computer Science and
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationNeural networks (NN) 1
Neural networks (NN) 1 Hedibert F. Lopes Insper Institute of Education and Research São Paulo, Brazil 1 Slides based on Chapter 11 of Hastie, Tibshirani and Friedman s book The Elements of Statistical
More informationHOMEWORK 4: SVMS AND KERNELS
HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationAu-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto
Au-delà de la Machine de Boltzmann Restreinte Hugo Larochelle University of Toronto Introduction Restricted Boltzmann Machines (RBMs) are useful feature extractors They are mostly used to initialize deep
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More information