Advanced Machine Learning
|
|
- Claire Copeland
- 6 years ago
- Views:
Transcription
1 Advanced Machine Learning Domain Adaptation MEHRYAR MOHRI COURANT INSTITUTE & GOOGLE RESEARCH.
2 Non-Ideal World time real world ideal domain sampling page 2
3 Outline Domain adaptation. Multiple-source domain adaptation. page 3
4 Domain Adaptation Sentiment analysis. Language modeling, part-of-speech tagging. Statistical parsing. Speech recognition. Computer vision. Solution critical for applications. page 4
5 This Talk Domain adaptation Discrepancy Theoretical guarantees Algorithm Enhancements page 5
6 Domain Adaptation Problem Domains: source, target. Input: labeled sample unlabeled sample (Q, f Q ) (P, f P ) S drawn from source. T drawn from target. h Problem: find hypothesis in with small expected loss with respect to target domain, that is H L P (h, f P )= E L h(x),f P (x). x P page 6
7 Sample Bias Correction Pb Problem: special case of domain adaptation with f Q = f P. supp(q) supp(p ). page 7
8 Related Work in Theory Single-source adaptation: d A et al. (1996); Kifer et al. (2004); Ben-David et al. (2007)). relation between adaptation and the distance (Devroye a few negative examples of adaptation (Ben-David et al. (AISTATS 2010)). analysis and learning guarantees for importance weighting (Cortes, Mansour, and MM (NIPS 2010)). page 8
9 Related Work in Theory Multiple-source: same input distribution, but different labels (Crammer et al., 2005, 2006). theoretical analysis and method for multiple-source adaptation (Mansour, MM, Rostamizadeh, 2008). page 9
10 Distribution Mismatch Q P Which distance should we use to compare these distributions? Adaptation Theory and Algorithms - Mohri@ page 10
11 Simple Analysis Proposition: assume that the loss is bounded by, then Proof: L Q (h, f) L P (h, f) ML 1 (Q, P ). L P (h, f) L Q (h, f) = E x P L (h(x),f(x) E x Q L (h(x),f(x) L M = M x x P (x) Q(x) L (h(x),f(x) P (x) Q(x). But, is this bound informative? Adaptation Theory and Algorithms - Mohri@ page 11
12 Example - Zero-One Loss a a h f a L Q (h, f) L P (h, f) = Q(a) P (a) Adaptation Theory and Algorithms - Mohri@ page 12
13 Discrepancy Definition: (Mansour, MM, Rostami, COLT 2009) disc(p, Q) = max h,h H L P (h, h ) L Q (h, h ). symmetric, triangle inequality, in general not a distance. helps compare distributions for arbitrary losses, e.g. hinge loss, or generalization of L p d A loss. (2004); Ben-David et al. (2007)). distance (Devroye et al. (1996); Kifer et al. Adaptation Theory and Algorithms - Mohri@ page 13
14 Discrepancy - Properties L q M >0 1 Theorem: for loss bounded by, for any, with probability at least, disc(p, Q) disc(p,q)+4q R S (H)+R T (H) +3M log 4 2m + log 4 2n. Adaptation Theory and Algorithms - Mohri@ page 14
15 Discrepancy = Distance Theorem: let be a universal kernel (e.g., Gaussian kernel) and H = {h H K : h K }. Then, for the L 2 loss, discrepancy is a distance over a compact set. Proof: for norm, thus continuous on C(X). K : h E x P [h 2 (x)] E x Q [h 2 (x)] disc(p, Q)=0 (h)=0 h H implies for all. H C(X) =0 C(X) since is dense in, over. E P [f] E Q [f]=0 f 0 C(X) thus, for all in. P =Q this implies. (Cortes & MM (TCS 2013)) X is continuous Adaptation Theory and Algorithms - Mohri@ page 15
16 Theoretical Guarantees Two types of questions: difference between average loss of hypothesis on Q versus? difference of loss (measured on ) between hypothesis obtained when training on (Q, f Q ) versus hypothesis obtained when training on? P (P,f P ) h P h h Adaptation Theory and Algorithms - Mohri@ page 16
17 Notation: Generalization Bound L Q (h Q,f Q )=min L Q(h, f Q ) h2h. L P (h P,f P )=min. Theorem: assume that the following holds: h2h L P (h, f P ) L (Mansour, MM, Rostami (COLT 2009)) obeys the triangle inequality, then L P (h, f P ) L Q (h, h Q )+L P (h P,f P )+disc(p, Q) + L Q (h Q,h P ). Adaptation Theory and Algorithms - Mohri@ page 17
18 Some Natural Cases h = h Q = h P When, L P (h, f P ) L Q (h, h )+L P (h,f P ) + disc(p, Q). When f P H (consistent case), L P (h, f P ) L Q (h, f P ) disc(q, P ). Bound of (Ben-David et al., NIPS 2006) or (Blitzer et al., NIPS 2007): always worse in these cases. Adaptation Theory and Algorithms - Mohri@ page 18
19 Regularized ERM Algorithms Objective function: F bq (h) = h 2 K + R b Q (h), where K >0 R bq (h) is a PDS kernel; is a trade-off parameter; and is the empirical error of. h broad family of algorithms including SVM, SVR, kernel ridge regression, etc. Adaptation Theory and Algorithms - Mohri@ page 19
20 Guarantees for Reg. ERM K K(x, x) R 2 (Cortes & MM (TCS 2013)) Theorem: let be a PDS kernel with and a loss function such that L(,y) is µ -Lipschitz. Let h 0 be the minimizer of F bp and h that of that F bq, then, for all (x, y) X Y, s L L(h 0 (x),y) L(h(x),y) apple µr disc( b P, b Q)+µ H (f P,f Q ), where H (f P,f Q )= inf h2h max f P (x) h(x) + max f Q (x) h(x). x2supp( P b ) x2supp( Q) b Adaptation Theory and Algorithms - Mohri@ page 20
21 Proof By the property of the minimizers, there exist subgradients such that Thus, 2 h 0 = R bp (h 0 ) 2 h = R bq (h). 2 kh 0 hk 2 = hh 0 h, R bp (h 0 ) R bq (h)i = hh 0 h, R bp (h 0 )i + hh 0 h, R bq (h)i apple R bp (h) R bp (h 0 )+R bq (h 0 ) R bq (h) apple 2disc( P, b Q)+2µ b H (f P,f Q ). Foundations of Machine Learning page 21
22 Guarantees for Reg. ERM Theorem: let be a PDS kernel with and the loss bounded by. Then, for all, L 2 K M K(x, x) R 2 (x, y) (Cortes & MM (TCS 2013)) L L(h (x),y) L(h(x),y) R M disc(p,q), where =min h H E x Q b h(x) f Q (x) K(x) E x P b h(x) f P (x) K(x). K For f P =f Q =f, R =0 f H if is ε-close to on samples. for a Gaussian kernel and continuous. f Adaptation Theory and Algorithms - Mohri@ page 22
23 Empirical Discrepancy Discrepancy measure disc(p,q) critical term in bounds. Smaller empirical discrepancy guarantees closeness of pointwise losses of h and h. But, can we further reduce the discrepancy? Adaptation Theory and Algorithms - Mohri@ page 23
24 Algorithm - Idea Search for a new empirical distribution support: q with same q = argmin supp(q) supp( b Q) Solve modified optimization problem: disc(p,q). min h F q (h) = m i=1 q (x i )L(h(x i ),y i )+ h 2 K. Adaptation Theory and Algorithms - Mohri@ page 24
25 Case of Halfspaces Adaptation Theory and Algorithms - Mohri@ page 25
26 Min-Max Problem Reformulation: Q = argmin bq Q max L b h,h H P (h,h) L bq (h,h). game theoretical interpretation. gives lower bound: max min L bp (h,h) L bq (h,h) h,h H bq Q min bq Q max L b h,h H P (h,h) L bq (h,h). page 26
27 Classification - 0/1 Loss Problem: min Q max a H H subject to Q (a) P (a) x S Q, Q (x) 0 x S Q Q (x) =1. page 27
28 Classification - 0/1 Loss Linear program (LP): min Q subject to a H H, Q (a) P (a) a H H, P (a) Q (a) x S Q, Q (x) 0 x S Q Q (x) =1. No. of constraints bounded by shattering coefficient. H H(m 0 + n 0 ) page 28
29 Algorithm - 1D page 29
30 Regression - L2 Loss Problem: min bq Q max E [(h (x) h(x)) 2 ] E[(h (x) h(x)) 2 ]. h,h H P b bq min bq 0 2Q =min bq 0 2Q =min bq 0 2Q =min bq 0 2Q max kwkapple1 kw 0 kapple1 max kwkapple1 kw 0 kapple1 max kukapple2 E bp [((w 0 w) > x) 2 ] E bq 0[((w 0 w) > x) 2 ] X ( P b (x) Q b 0 (x))[(w 0 w) > x] 2 x2s X ( P b (x) Q b 0 (x))[u > x] 2 x2s max kukapple2 u> X x2s ( b P (x) b Q 0 (x))xx > u. page 30
31 Regression - L2 Loss Problem equivalent to Xm 0 with: M(z) =M 0 M 0 = X x2s min max kzk 1 =1 kuk=1 u> M(z)u, z 0 i=1 P (x)xx > z i M i, M i = s i s > i elements of supp(q) Foundations of Machine Learning page 31
32 Regression - L2 Loss Semi-definite program (SDP): linear hypotheses. min z, subject to I M(z) 0 I + M(z) 0 1 z =1 z 0, where the matrix M(z) = M(z) is defined by: x S P (x)xx m 0 i=1 z i s i s i. page 32
33 Regression - L2 Loss SDP: generalization to RKHS for some kernel. with: min z, H subject to I M(z) 0 M(z) =M 0 m 0 I + M(z) 0 1 z =1 z 0, i=1 z i M i M 0 = K 1/2 diag(p (s 1 ),...,P(s p0 ))K 1/2 M i = K 1/2 I i K 1/2. K page 33
34 Discrepancy Min. Algorithm Convex optimization: cast as semi-definite programming (SDP) prob. efficient solution using smooth optimization. Algorithm and solution for arbitrary kernels. Outperforms other algorithms in experiments. (Cortes & MM (TCS 2013)) Adaptation Theory and Algorithms - Mohri@ page 34
35 Experiments Classification: Q and P Gaussians. H: halfspaces. f: interval [-1, +1] w/ min disc w/ orig disc # Training Points page 35
36 Experiments Regression: Q Q P # Training Points SDP solved in about 15s using SeDuMi on 3GHz CPU with 2GB memory. page 36
37 Experiments page 37
38 Shortcomings: Enhancement discrepancy depends on maximizing pair of hypotheses. DM algorithm too conservative. Ideas: finer quantity: generalized discrepancy, hypothesisdependent. reweighting depending on hypothesis. (Cortes, MM, and Muñoz (2014)) page 38
39 Choose Ideally: Q h Algorithm such that objectives are unif. close: h 2 K + L Q h (h, f Q ) h 2 K + L P b(h, f P ). (Cortes, MM, and Muñoz (2014)) Q h =argmin q Using convex surrogate H : Q h =argmin q L q (h, f Q ) L bp (h, f P ). max L q(h, f Q ) L(h, h ). h H page 39
40 L Qh (h, f Q )= Optimization argmin l {L q (h,f Q ):q F(S X,R)} =argmin l R max l L b h H P (h, h ) (Cortes, MM, and Muñoz (2014)) max l L b h H P (h, h ) = 1 2 max L b h H P (h, h )+ min L b h H P (h, h ). Convex optimization problem (loss jointly convex): min h h 2 K max L b h H P (h, h )+ min L b h H P (h, h ). page 40
41 Convex Surrogate Hyp. Set Choice of H among balls B(r) ={h H L q (h,f Q ) r p }. (Cortes, MM, and Muñoz (2014)) Generalization bound proven to be more favorable than DM for some choices of radius r. r Radius chosen via cross-validation using small amount of labeled data from target. Further improvement of empirical results. page 41
42 Conclusion Theory of adaptation based on discrepancy: key term in analysis of adaptation and drifting. discrepancy minimization algorithm DM. compares favorably to other adaptation algorithms in experiments. Generalized discrepancy: extension to hypothesis-dependent reweighting. convex optimization problem. further empirical improvements. page 42
43 Outline Domain adaptation. Multiple-source domain adaptation. page 43
44 Problem Formulation Given distributions and corresponding hypotheses: h 1 h 2. h k Notation: D 1 D 2 D k f. unknown target L(D i,h i,f)= E x i, L(D i,h i,f). each hypothesis performs well in its domain. Di [L(h i (x),f(x))]. Loss L assumed non-negative, bounded, convex and continuous. page 44
45 Problem Formulation The unknown target distribution is a mixture of input distributions. D 1. D T D T (x) = k i=1 id i (x) D k Task: choose a hypothesis mixture that performs well in target distribution. h z (x) = k i=1 z i h i (x) convex combination rule k z i D i (x) h z (x) = k i=1 j=1 z jd j (x) h i(x) distribution weighted combination page 45
46 Known Target Distribution For some distributions, any convex combination performs poorly. distribution weights DT D0 D1 a b hypothesis output f h0 h1 a b base hypotheses have no error within domain. any convex combination has error of 1/2! page 46
47 Main Results Thus, although convex combinations seem natural, they can perform very poorly. We will show that distribution weighted combinations seem to define the right combination rule. There exists a single robust distribution weighted hypothesis, that does well for any target mixture. f, z,, L(D,h z,f). page 47
48 Known Target Distribution If distribution is known, distribution weighted rule will always do well. Choose: z =. k id i (x) h (x) = i=1 k j=1 jd j (x) h i(x). Proof: L(D T,h,f)= = x X x X k i=1 L(h (x),f(x))d T (x) k i=1 id i (x) D T (x) L(h i(x),f(x))d T (x) il(d i,h i (x),f(x)) k i=1 i =. page 48
49 Unknown Target Mixture Zero-sum game: NATURE: select a target distribution. z LEARNER: select a, i.e. a distribution weighted hypothesis. h z L(D i,h z,f) Payoff:. Already shown: game value is at most. Minimax theorem (modulo discretization of ): there exists a mixture of distribution weighted hypothesis that does well for any distribution mixture. D i j jh zj z page 49
50 Balancing Losses Brouwer s Fixed Point theorem: for any compact, convex, non-empty set A and any continuous function f: A A, there exists such that:. x f(x)=x A f Notation: L z i :=L(D i,h z,f). Define mapping by: [ (z)] i = z il z i j z jl z j By fixed point theorem (modulo continuity):. z : i, z i = z il z i j z jl z j = i, L z i = j z j L z j =:. page 50
51 Bounding Loss z For fixed point, L(D z,h z,f)= L(h z (x),f(x)) = = x X k i=1 k i=1 Also, by convexity, k i=1 z i D i (x) z i x X D i (x)l(h z (x),f(x)) z i L z i = k i=1 z i =. = L(D z,h z,f) x X k i=1 z i D i (x) D z (x) L(h i(x),f(x))d z (x) = k i=1 z i L(D i,h i,f). page 51
52 Bounding Loss Thus, and for any mixture, k L(D,h z,f)= i=1 il(d i,h z,f) k i=1 i =. D 2 h z D 1. D i f page 52
53 Details To deal with non-continuity refine hypotheses: h z (x) = Theorem: for any target function and any, If loss obeys triangle inequality: k i=1 z i D i (x)+ /k k j=1 z jd j (x)+ h i(x). holds for all admissible target functions. f >0 >0,z:, L(D,h z,f) +. >0, z, > 0,, f F, L(D,h z,f) 3 +. page 53
54 A Simple Algorithm A simple constructive algorithm, choose z with uniform weights: L(D,h u,f)= = x x x k i=1 x D (x)l k m=1 k i=1 md m (x) k m=1 md m (x) k j=1 D j(x) 1 D i (x) k j=1 D j(x) h i(x),f(x) L k i=1 k i=1 D i (x)l (h i (x),f(x)) = D i (x) k j=1 D j(x) h i(x),f(x) D i (x)l (h i (x),f(x)) k i=1 L(D i,h i,f)= k i=1 i k. page 54
55 Preliminary Empirical Results Sentiment Analysis - given a product review (text string), predict a rating (between 1.0 and 5.0). 4 Domains: Books, DVDs, Electronics and Kitchen Appliances. Base hypotheses are trained within each domain (Support Vector Regression). We are not given the distributions. We model each distribution using a bag of words model. We then test the distribution combination rule on known target mixture domains. page 55
56 Empirical Results Uniform Mixture Over 4 Domains In Domain Out Domain 1.9 MSE page 56
57 Empirical Results 2 class MSE Mixture = α book + (1 α) kitchen weighted linear book kitchen α page 57
58 Conclusion Formulation of the multiple source adaptation problem. Theoretical analysis for mixture distributions. Efficient algorithm for finding distribution weighted combination hypothesis? Beyond mixture distributions? page 58
59 Rényi Divergences Definition: for 0, D (P kq) = 1 1 log X x P (x) apple P (x) Q(x) 1. =1 =2 : coincide with relative entropy. : logarithm of expected probability ratio; =+1 D (P kq) = log E x P apple P (x) Q(x) : logarithm of maximum probability ratio; apple P (x). Q(x) D (P kq) = log sup x P. Foundations of Machine Learning page 59
60 Extensions - Arbitrary Target >0 Theorem: for any and > 1, 9,z: 8P, L(P, h z,f) apple P (Mansour, MM, and Rostami, 2009) h i 1 d (P kq)( + ) M 1. Q= k i=1 id i : k measured in terms of Rényi divergence, apple X P (x) d (P, Q)= Q 1 (x) x 1 1. page 60
61 Proof By Hölder s inequality, for any hypothesis h, L(P, h, f) = X x apple X apple x P (x) Q 1 (x) Q 1 P x) Q 1 (x) =(d (P kq)) 1 =(d (P kq)) 1 apple (d (P kq)) 1 h h (x)l(h(x),f(x)) 1 h X x i 1 Q(x)L 1 (h(x),f(x)) i 1 E [L 1 (h(x),f(x))] x Q i 1 E [L(h(x),f(x))L 1 1 (h(x),f(x))] x Q h L(Q, h, f)m 1 1 i 1. Foundations of Machine Learning page 61
62 Other Extensions Approximate distributions (estimated): similar results shown depending on divergence between true and estimated distributions. Different source target functions : similar results when target functions close to f on target distribution. f i (Mansour, MM, and Rostami, 2009) page 62
63 References S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In NIPS S. Ben-David, T. Lu, T. Luu, and D. Pal. Impossibility theorems for domain adaptation. Journal of Machine Learning Research-Proceedings Track, 9: , S. Bickel, M. Bruckner, and T. Scheffer. Discriminative learning under covariate shift. Journal of Machine Learning Research, 10: , J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman. Learning bounds for domain adaptation. In NIPS Corinna Cortes, Yishay Mansour, and Mehryar Mohri. Learning bounds for importance weighting. In NIPS Corinna Cortes and Mehryar Mohri. Domain adaptation and sample bias correction theory and algorithm for regression. Theoretical Computer Science, 519, page 63
64 References Corinna Cortes, Mehryar Mohri, Michael Riley, and Afshin Rostamizadeh. Sample selection bias correction theory. In Proceedings of ALT Koby Crammer, Michael Kearns, and Jennifer Wortman. Learning from Data of Variable Quality. In Proceedings of NIPS, Koby Crammer, Michael Kearns, and Jennifer Wortman.Learning from multiple sources.in Proceedings of NIPS, Devroye, L., Gyorfi, L., and Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. M. Dredze, J. Blitzer, P. P. Talukdar, K. Ganchev, J. Graca, and F. Pereira. Frustratingly Hard Domain Adaptation for Parsing. In CoNLL, J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt, and B. Scholkopf. Correcting sample selection bias by unlabeled data. In NIPS, volume 19, pages page 64
65 References Kifer D., Ben-David S., Gehrke J. Detecting change in data streams. In: Proceedings of VLDB, pp Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation: Learning bounds and algorithms. In COLT Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation with multiple sources. In Advances in NIPS, pages Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Multiple source adaptation and the Rényi divergence. In Proceedings of UAI Mehryar Mohri and Andres Muñoz Medina. New analysis and algorithm for learning with drifting distributions. In Proceedings of ALT, volume 7568, pages page 65
66 References H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2): , M. Sugiyama, S. Nakajima, H. Kashima, P. von Bunau, and M. Kawanabe. Direct importance estimation with model selection and its application to covariate shift adaptation. In NIPS M. Sugiyama, T. Suzuki, S. Nakajima, H. Kashima, P. von Bunau, and M.Kawanabe. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60: , page 66
Learning with Imperfect Data
Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Joint work with: Yishay Mansour (Tel-Aviv & Google) and Afshin Rostamizadeh (Courant Institute). Standard Learning Assumptions IID assumption.
More informationDomain Adaptation for Regression
Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.
More informationDomain Adaptation with Multiple Sources
Domain Adaptation with Multiple Sources Yishay Mansour Google Research and Tel Aviv Univ. mansour@tau.ac.il Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Afshin Rostamizadeh Courant
More informationAuthors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania)
Learning Bouds for Domain Adaptation Authors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania) Presentation by: Afshin Rostamizadeh (New York
More informationImpossibility Theorems for Domain Adaptation
Shai Ben-David and Teresa Luu Tyler Lu Dávid Pál School of Computer Science University of Waterloo Waterloo, ON, CAN {shai,t2luu}@cs.uwaterloo.ca Dept. of Computer Science University of Toronto Toronto,
More informationSample Selection Bias Correction
Sample Selection Bias Correction Afshin Rostamizadeh Joint work with: Corinna Cortes, Mehryar Mohri & Michael Riley Courant Institute & Google Research Motivation Critical Assumption: Samples for training
More informationLearning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationLearning Bounds for Importance Weighting
Learning Bounds for Importance Weighting Corinna Cortes Google Research corinna@google.com Yishay Mansour Tel-Aviv University mansour@tau.ac.il Mehryar Mohri Courant & Google mohri@cims.nyu.edu Motivation
More informationMarginal Singularity, and the Benefits of Labels in Covariate-Shift
Marginal Singularity, and the Benefits of Labels in Covariate-Shift Samory Kpotufe ORFE, Princeton University Guillaume Martinet ORFE, Princeton University SAMORY@PRINCETON.EDU GGM2@PRINCETON.EDU Abstract
More informationarxiv: v1 [cs.lg] 19 May 2008
Sample Selection Bias Correction Theory Corinna Cortes 1, Mehryar Mohri 2,1, Michael Riley 1, and Afshin Rostamizadeh 2 arxiv:0805.2775v1 [cs.lg] 19 May 2008 1 Google Research, 76 Ninth Avenue, New York,
More informationInstance-based Domain Adaptation via Multi-clustering Logistic Approximation
Instance-based Domain Adaptation via Multi-clustering Logistic Approximation FENG U, Nanjing University of Science and Technology JIANFEI YU, Singapore Management University RUI IA, Nanjing University
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationAdvanced Machine Learning
Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. page 2 Machine Learning
More informationTime Series Prediction & Online Learning
Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.
More informationOnline Learning for Time Series Prediction
Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationLearning Theory and Algorithms for Auctioning and Adaptation Problems
Learning Theory and Algorithms for Auctioning and Adaptation Problems by Andrés Muñoz Medina A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department
More informationlearning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015
learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 Introduction Often, training distribution does not match testing distribution Want to utilize information about test
More informationDomain Adaptation Can Quantity Compensate for Quality?
Domain Adaptation Can Quantity Compensate for Quality? hai Ben-David David R. Cheriton chool of Computer cience University of Waterloo Waterloo, ON N2L 3G1 CANADA shai@cs.uwaterloo.ca hai halev-hwartz
More informationFoundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Multi-Class Classification Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation Real-world problems often have multiple classes: text, speech,
More informationA Two-Stage Weighting Framework for Multi-Source Domain Adaptation
A Two-Stage Weighting Framework for Multi-Source Domain Adaptation Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, Jieping Ye Computer Science and Engineering, Arizona State University, AZ 8587
More informationBoosting Ensembles of Structured Prediction Rules
Boosting Ensembles of Structured Prediction Rules Corinna Cortes Google Research 76 Ninth Avenue New York, NY 10011 corinna@google.com Vitaly Kuznetsov Courant Institute 251 Mercer Street New York, NY
More informationFoundations of Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification page 2 Motivation Real-world problems often have multiple classes:
More informationOn-Line Learning with Path Experts and Non-Additive Losses
On-Line Learning with Path Experts and Non-Additive Losses Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Manfred Warmuth (UC Santa Cruz) MEHRYAR MOHRI MOHRI@ COURANT
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationStructured Prediction Theory and Algorithms
Structured Prediction Theory and Algorithms Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE
More informationImportance Reweighting Using Adversarial-Collaborative Training
Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a
More informationSparse Domain Adaptation in a Good Similarity-Based Projection Space
Sparse Domain Adaptation in a Good Similarity-Based Projection Space Emilie Morvant, Amaury Habrard, Stéphane Ayache To cite this version: Emilie Morvant, Amaury Habrard, Stéphane Ayache. Sparse Domain
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationarxiv: v2 [cs.lg] 13 Sep 2018
Unsupervised Domain Adaptation Based on ource-guided Discrepancy arxiv:1809.03839v2 [cs.lg] 13 ep 2018 eiichi Kuroki 1 Nontawat Charoenphakdee 1 Han Bao 1,2 Junya Honda 1,2 Issei ato 1,2 Masashi ugiyama
More informationDeep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.
Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers
More informationIntroduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationStructured Prediction
Structured Prediction Ningshan Zhang Advanced Machine Learning, Spring 2016 Outline Ensemble Methods for Structured Prediction[1] On-line learning Boosting AGeneralizedKernelApproachtoStructuredOutputLearning[2]
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationAgnostic Domain Adaptation
Agnostic Domain Adaptation Alexander Vezhnevets Joachim M. Buhmann ETH Zurich 8092 Zurich, Switzerland {alexander.vezhnevets,jbuhmann}@inf.ethz.ch Abstract. The supervised learning paradigm assumes in
More informationCovariate Shift in Hilbert Space: A Solution via Surrogate Kernels
Kai Zhang kzhang@nec-labs.com NEC Laboratories America, Inc., 4 Independence Way, Suite 2, Princeton, NJ 854 Vincent W. Zheng Advanced Digital Sciences Center, 1 Fusionopolis Way, Singapore 138632 vincent.zheng@adsc.com.sg
More informationUnlabeled Data: Now It Helps, Now It Doesn t
institution-logo-filena A. Singh, R. D. Nowak, and X. Zhu. In NIPS, 2008. 1 Courant Institute, NYU April 21, 2015 Outline institution-logo-filena 1 Conflicting Views in Semi-supervised Learning The Cluster
More informationAdvanced Machine Learning
Advanced Machine Learning Time Series Prediction VITALY KUZNETSOV KUZNETSOV@ GOOGLE RESEARCH.. MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH.. Motivation Time series prediction: stock values.
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationA survey of multi-source domain adaptation
A survey of multi-source domain adaptation Shiliang Sun, Honglei Shi, Yuanbin Wu Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and Technology, East
More informationAdvanced Machine Learning
Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.
More informationAlgorithms for Learning Kernels Based on Centered Alignment
Journal of Machine Learning Research 3 (2012) XX-XX Submitted 01/11; Published 03/12 Algorithms for Learning Kernels Based on Centered Alignment Corinna Cortes Google Research 76 Ninth Avenue New York,
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationDistribution Regression with Minimax-Optimal Guarantee
Distribution Regression with Minimax-Optimal Guarantee (Gatsby Unit, UCL) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton
More informationTime Series Prediction and Online Learning
JMLR: Workshop and Conference Proceedings vol 49:1 24, 2016 Time Series Prediction and Online Learning Vitaly Kuznetsov Courant Institute, New York Mehryar Mohri Courant Institute and Google Research,
More informationADANET: adaptive learning of neural networks
ADANET: adaptive learning of neural networks Joint work with Corinna Cortes (Google Research) Javier Gonzalo (Google Research) Vitaly Kuznetsov (Google Research) Scott Yang (Courant Institute) MEHRYAR
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationIntroduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationRobust Support Vector Machines for Probability Distributions
Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,
More informationAdvanced Machine Learning
Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationHierarchical Active Transfer Learning
Hierarchical Active Transfer Learning David Kale Marjan Ghazvininejad Anil Ramakrishna Jingrui He Yan Liu Abstract We describe a unified active transfer learning framework called Hierarchical Active Transfer
More informationIntroduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable
More informationFoundations of Machine Learning
Maximum Entropy Models, Logistic Regression Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Motivation Probabilistic models: density estimation. classification. page 2 This
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationIntroduction to Machine Learning Lecture 9. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 9 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient computation of inner
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationLearning Weighted Automata
Learning Weighted Automata Joint work with Borja Balle (Amazon Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Weighted Automata (WFAs) page 2 Motivation Weighted automata (WFAs): image
More informationTowards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert
Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology
More informationUniform concentration inequalities, martingales, Rademacher complexity and symmetrization
Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More informationDistribution Regression: A Simple Technique with Minimax-optimal Guarantee
Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationUnsupervised Domain Adaptation with Distribution Matching Machines
Unsupervised Domain Adaptation with Distribution Matching Machines Yue Cao, Mingsheng Long, Jianmin Wang KLiss, MOE; NEL-BDS; TNList; School of Software, Tsinghua University, China caoyue1@gmail.com mingsheng@tsinghua.edu.cn
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationDistribution Regression
Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationInverse Density as an Inverse Problem: the Fredholm Equation Approach
Inverse Density as an Inverse Problem: the Fredholm Equation Approach Qichao Que, Mikhail Belkin Department of Computer Science and Engineering The Ohio State University {que,mbelkin}@cse.ohio-state.edu
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationInstance-based Domain Adaptation
Instance-based Domain Adaptation Rui Xia School of Computer Science and Engineering Nanjing University of Science and Technology 1 Problem Background Training data Test data Movie Domain Sentiment Classifier
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationSpeech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research
Speech Recognition Lecture 7: Maximum Entropy Models Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Information theory basics Maximum entropy models Duality theorem
More informationMidterm Exam, Spring 2005
10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name
More informationStatistical Learning Reading Assignments
Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical
More informationClass Prior Estimation from Positive and Unlabeled Data
IEICE Transactions on Information and Systems, vol.e97-d, no.5, pp.1358 1362, 2014. 1 Class Prior Estimation from Positive and Unlabeled Data Marthinus Christoffel du Plessis Tokyo Institute of Technology,
More informationDomain Adaptation via Transfer Component Analysis
Domain Adaptation via Transfer Component Analysis Sinno Jialin Pan 1,IvorW.Tsang, James T. Kwok 1 and Qiang Yang 1 1 Department of Computer Science and Engineering Hong Kong University of Science and Technology,
More informationDynamic Time-Alignment Kernel in Support Vector Machine
Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information
More informationUnsupervised Domain Adaptation with a Relaxed Covariate Shift Assumption
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Unsupervised Domain Adaptation with a Relaxed Covariate Shift Assumption Tameem Adel University of Manchester, UK tameem.hesham@gmail.com
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More information