Bayesian simultaneous regression and dimension reduction
|
|
- Roger Stewart
- 5 years ago
- Views:
Transcription
1 Bayesian simultaneous regression and dimension reduction MCMski II Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University January 10, 2008
2 Table of contents 1 Statistical principles 2 Simulated data Digits 3 Pathways and gene sets Progression in prostate cancer 4
3 Motivation and related work Data generated by measuring thousands of variables lies on or near a low-dimensional manifold or strong dependencies between variables.
4 Motivation and related work Data generated by measuring thousands of variables lies on or near a low-dimensional manifold or strong dependencies between variables. Manifold learning: LLE, ISOMAP, Laplacian Eigenmaps, Hessian Eigenmaps.
5 Motivation and related work Data generated by measuring thousands of variables lies on or near a low-dimensional manifold or strong dependencies between variables. Manifold learning: LLE, ISOMAP, Laplacian Eigenmaps, Hessian Eigenmaps. Simultaneous dimensionality reduction and regression: SIR, MAVE, SAVE.
6 Generative vs. predictive modelling Given data = {Z i = (x i, y i )} n i=1 with Z i iid ρ(x, Y ). X X IR p and Y IR and p n.
7 Generative vs. predictive modelling Given data = {Z i = (x i, y i )} n i=1 with Z i iid ρ(x, Y ). X X IR p and Y IR and p n. Two options 1 discriminative or regression Y X 2 generative X Y (sometimes called inverse regression)
8 Regression Statistical principles Given X X IR p and Y IR and p n and ρ(x, Y ) we want Y X. A natural idea f r (x) = arg min[var (f )] = arg min E Y (Y f (X )) 2, and f r (x) = E Y [Y x] provides a summary of Y X.
9 Inverse regression Statistical principles Given X X IR p and Y IR and p n and ρ(x, Y ) we want X Y. Ω = cov (X Y ) provides a summary of X Y.
10 Inverse regression Statistical principles Given X X IR p and Y IR and p n and ρ(x, Y ) we want X Y. Ω = cov (X Y ) provides a summary of X Y. 1 Ω ii relevance of variable with respect to label 2 Ω ij covariation with respect to label
11 Statistical principles Model simultaneously f r (x) and f r = ( f r x 1,..., fr x p ) T.
12 Statistical principles Model simultaneously f r (x) and f r = ( f r x 1,..., fr x p ) T. 1 regression: f r (x) 2 inverse regression: gradient outer product (GOP) Γ = E[ f r f r ] or fr Γ ij = x i, f r x j.
13 Linear case Statistical principles We start with the linear case Σ X = cov (X ), σ 2 Y Γ = σ 2 Y y = w x + ε, ε iid No(0, σ 2 ). = var (Y ). ( 1 σ2 σ 2 Y ) 2 Σ 1 X ΩΣ 1 X σ 2 Σ 1ΩΣ 1. Y X X Γ and Ω are equivalent modulo rotation and scale.
14 Nonlinear case Statistical principles For smooth f (x) Ω = cov (X Y ) not so clear. y = f (x) + ε, ε iid No(0, σ 2 ).
15 Nonlinear case Statistical principles Partition into sections and compute local quantities X = I i=1 χ i
16 Nonlinear case Statistical principles Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi )
17 Nonlinear case Statistical principles Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi ) Σ i = cov (X χi )
18 Nonlinear case Statistical principles Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi ) Σ i = cov (X χi ) σi 2 = var (Y χi )
19 Nonlinear case Statistical principles Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi ) Σ i = cov (X χi ) σi 2 = var (Y χi ) m i = ρ X (χ i ).
20 Nonlinear case Statistical principles Partition into sections and compute local quantities I X = i=1 χ i Ω i = cov (X χi Y χi ) Σ i = cov (X χi ) σi 2 = var (Y χi ) m i = ρ X (χ i ). Γ I i=1 m i σi 2 Σ 1 i Ω i Σ 1 i.
21 Gradient estimate for regression Taylor expanding f (x) around data should result in (f (x j ) f (x i ) + f (x i ) (x j x i )) 2 0 for x i x j.
22 Gradient estimate for regression Taylor expanding f (x) around data should result in (f (x j ) f (x i ) + f (x i ) (x j x i )) 2 0 for x i x j. L(f, f, data) = ij w ij (y j f (x i ) + f (x i ) (x j x i )) 2.
23 Gradient estimate for regression Taylor expanding f (x) around data should result in (f (x j ) f (x i ) + f (x i ) (x j x i )) 2 0 for x i x j. L(f, f, data) = ij w ij (y j f (x i ) + f (x i ) (x j x i )) 2. Similar idea for classification, link function.
24 Gradient estimate Statistical principles Optimization Problem { } (f D, f D ) = arg min (f, f ) H p+1 K L(f, f, data) + λ 1 f 2 K + λ 2 f 2 K f is vector of gradients λ 1, λ 2 are regularization terms L( ) is empirical error using convex loss function
25 Gradient estimate Statistical principles Optimization Problem { } (f D, f D ) = arg min (f, f ) H p+1 K L(f, f, data) + λ 1 f 2 K + λ 2 f 2 K f is vector of gradients λ 1, λ 2 are regularization terms L( ) is empirical error using convex loss function Representation form n f D (x) = a i,d K(x i, x), fd (x) = i=1 n c i,d K(x i, x) with a D = (a 1,D,..., a n,d ) R n and c D = (c 1,D,..., c n,d ) T R np. i=1
26 Gradient Outer Product (GOP) A central quantity in this talk will be the GOP. Definition (GOP) ˆΓ = f D f D = c T D Kc D E( f f )
27 Dimension reduction Statistical principles Proposition The eigenvectors corresponding to the d non-zero eigenvalues of Γ span the subspace relevant to prediction. Gradients provide information on the predictive directions b i, i = 1,..., d.
28 Gaussian Markov distributions over graphs Give a multivariate normal distribution with covariance matrix Σ the matrix P = Σ 1 is the conditional independence matrix P ij = dependence of i j all other variables.
29 Gaussian Markov distributions over graphs Give a multivariate normal distribution with covariance matrix Σ the matrix P = Σ 1 is the conditional independence matrix P ij = dependence of i j all other variables. Note by construction ˆΓ is a covariance matrix of a Gaussian process.
30 Gaussian Markov distributions over graphs Give a multivariate normal distribution with covariance matrix Σ the matrix P = Σ 1 is the conditional independence matrix P ij = dependence of i j all other variables. Note by construction ˆΓ is a covariance matrix of a Gaussian process. J = inv(ˆγ) is the inferred conditional independence matrix.
31 Restriction to a manifold Assume the data is concentrated on a manifold M IR p with M IR d and there exists an isometric embedding ϕ : M R p.
32 Restriction to a manifold Assume the data is concentrated on a manifold M IR p with M IR d and there exists an isometric embedding ϕ : M R p. Theorem Under mild regularity conditions on the distribution and corresponding density, with probability 1 δ ( ) (dϕ) 1 fd M f ρx C log n 1/d f D f ρx C log where (dϕ) is the dual of the map dϕ. δ ( 1 δ ) n 1/d,
33 Bayesian kernel model for regression y i = f (x i ) + ε, ε iid No(0, σ 2 ). f (x) = K(x, u)z(du) where Z(du) M(X ) is a signed measure on X. X
34 Bayesian kernel model for regression y i = f (x i ) + ε, ε iid No(0, σ 2 ). f (x) = K(x, u)z(du) where Z(du) M(X ) is a signed measure on X. X this implies a posterior on f. π(z data) L(data Z) π(z),
35 Priors and integral operators Integral operator L K : Γ G { G = f f (x) := L K [γ](x) = K(x, u) dγ(u), X } γ Γ, with Γ B(X ).
36 Priors and integral operators Integral operator L K : Γ G { G = f f (x) := L K [γ](x) = K(x, u) dγ(u), X } γ Γ, with Γ B(X ). A prior on Γ implies a prior on G.
37 Equivalence with RKHS For what Γ is H K = span(g)? What is L 1(H K K ) =??. This is hard to characterize.
38 Equivalence with RKHS For what Γ is H K = span(g)? What is L 1(H K K ) =??. This is hard to characterize. An appropriate choice for Γ is the union of integrable functions and discrete measures.
39 Signed measures are (almost) just right Nonsingular measures: M = L 1 (X ) M D Proposition L K (M) is dense in H K with respect to the RKHS norm.
40 Signed measures are (almost) just right Nonsingular measures: M = L 1 (X ) M D Proposition L K (M) is dense in H K with respect to the RKHS norm. Proposition B(X ) L 1 K (H K (X )).
41 The implication Statistical principles Take home message need priors on signed measures. A function theoretic foundation for random signed measures such as Gaussian, Dirichlet and Lévy process priors.
42 Bayesian kernel model y i = f (x i ) + ε, ε iid No(0, σ 2 ). f (x) = K(x, u)z(du) where Z(du) M(X ) is a signed measure on X. X
43 Bayesian kernel model y i = f (x i ) + ε, ε iid No(0, σ 2 ). f (x) = K(x, u)z(du) where Z(du) M(X ) is a signed measure on X. X this implies a posterior on f. π(z data) L(data Z) π(z),
44 Dirichlet process prior f (x) = X K(x, u)z(du) = X K(x, u)w(u)f (du) F (du) is a distribution and w(u) a coefficient function.
45 Dirichlet process prior f (x) = X K(x, u)z(du) = X K(x, u)w(u)f (du) F (du) is a distribution and w(u) a coefficient function. Model F using a Dirichlet process prior: DP(α, F 0 )
46 Bayesian representer form Given X n = (x 1,..., x n ) iid F n F X n DP(α + n, F n ), F n = (αf 0 + δ xi )/(α + n). i=1 E[f X n ] = a n n K(x, u) w(u) df 0 (u)+n 1 (1 a n ) w(x i ) K(x, x i ), i=1 a n = α/(α + n).
47 Bayesian representer form Taking lim α 0 to represent a non-informative prior: Proposition (Bayesian representer form) n ˆf n (x) = w i K(x, x i ), w i = w(x i )/n. i=1
48 Bayesian kernel model for gradient estimates By Taylor expansion y i = f (x j ) + f (x i ) (x i x j ) + ε xi,x j.
49 Bayesian kernel model for gradient estimates By Taylor expansion y i = f (x j ) + f (x i ) (x i x j ) + ε xi,x j. By representer form y i = α 0 + Kα + (ιx i X )CK i + ε i where ι = (1,..., 1), α = (α 1,...α n ) IR n, C = (c 1,...c n ) IR p n, X is the n p data matrix, K i is the ith column of K.
50 Likelihood: error term and spatial statistics Intuition: Consider a spatial model (similarity matrix) w ij := θ exp{ φ x i x j }, where θ and φ are parameters of a spatial model.
51 Likelihood: error term and spatial statistics Intuition: Consider a spatial model (similarity matrix) w ij := θ exp{ φ x i x j }, where θ and φ are parameters of a spatial model. A natural modeling assumption is ε xi,x j w 1 ij.
52 Likelihood: error term and spatial statistics Intuition: Consider a spatial model (similarity matrix) w ij := θ exp{ φ x i x j }, where θ and φ are parameters of a spatial model. A natural modeling assumption is ε xi,x j w 1 ij. Given this spatial structure ε i No n (0, W 1 i ) where W i = diag(w xi,x 1,..., w xi,x n ).
53 Likelihood Statistical principles Given the error model the likelihood is L(data f, f ) w ij exp { 1 (e i 2 W ie i ) } ij i
54 Likelihood Statistical principles Given the error model the likelihood is L(data f, f ) w ij exp { 1 (e i 2 W ie i ) } with K = F F := diag(λ 2 1,..., λ2 n ) α = F 1 β e i = y i α 0 F β (ιx i X )CK i ij i
55 Prior specification Statistical principles π(α 0, θ) 1/θ,
56 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T )
57 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2),
58 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2), C kj (1 π k )δ 0 + π k No(0, φ 1 k )
59 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2), C kj (1 π k )δ 0 + π k No(0, φ 1 k ) φ k Ga(α c /2, β c /2)
60 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2), C kj (1 π k )δ 0 + π k No(0, φ 1 k ) φ k Ga(α c /2, β c /2) π k Beta(α π, β π ),
61 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2), C kj (1 π k )δ 0 + π k No(0, φ 1 k ) φ k Ga(α c /2, β c /2) π k Beta(α π, β π ), φ Ga(a φ /2, b φ /2)
62 Prior specification Statistical principles π(α 0, θ) 1/θ, β No(0, T ) T := diag(τ 1,..., τ n ), τ 1 i Ga(a τ /2, b τ /2), C kj (1 π k )δ 0 + π k No(0, φ 1 k ) φ k Ga(α c /2, β c /2) π k Beta(α π, β π ), φ Ga(a φ /2, b φ /2) Standard Gibbs sampler simulates p(α, α 0, C, φ, θ data).
63 Linear example Statistical principles Simulated data Digits Dimensions RKHS norm samples Dimensions
64 Linear example Statistical principles Simulated data Digits x Dimensions Dimensions
65 Nonlinear example Statistical principles Simulated data Digits Dimension Feature Dimension Feature 1
66 Digit classification Statistical principles Simulated data Digits Input MNIST handwritten digits database: X i R 784 : 28 by 28 gray-scale pixel image
67 Digit classification Statistical principles Simulated data Digits Input MNIST handwritten digits database: X i R 784 : 28 by 28 gray-scale pixel image Formulation Problem 1: 3 vs 8 with 50 3 s, 50 8 s Problem 2: 5 vs 8 with 50 5 s, 50 8 s
68 Digit classification Statistical principles Simulated data Digits Input MNIST handwritten digits database: X i R 784 : 28 by 28 gray-scale pixel image Formulation Problem 1: 3 vs 8 with 50 3 s, 50 8 s Problem 2: 5 vs 8 with 50 5 s, 50 8 s Goal Learn features for predictive model: 3 vs 8 5 vs 8
69 3, 5, 8 Classification problem Simulated data Digits
70 Top features: 3 vs 8 Statistical principles Simulated data Digits
71 Top features: 5 vs 8 Statistical principles Simulated data Digits
72 Genes don t do things Pathways and gene sets Progression in prostate cancer
73 Diabetes Oxphos Statistical principles Pathways and gene sets Progression in prostate cancer
74 Gender Statistical principles Pathways and gene sets Progression in prostate cancer
75 Gene set database Statistical principles Pathways and gene sets Progression in prostate cancer The gene sets in the database are defined by 1 Positional gene sets: cytogenetic bands, 3 megabase windows;
76 Gene set database Statistical principles Pathways and gene sets Progression in prostate cancer The gene sets in the database are defined by 1 Positional gene sets: cytogenetic bands, 3 megabase windows; 2 Motif gene sets: TRANSFAC motifs, Representative motifs;
77 Gene set database Statistical principles Pathways and gene sets Progression in prostate cancer The gene sets in the database are defined by 1 Positional gene sets: cytogenetic bands, 3 megabase windows; 2 Motif gene sets: TRANSFAC motifs, Representative motifs; 3 Curated gene sets: Pathways, Literature reviews, Animal models, Clinical phenotypes, Expert curations, Chemical or genetic perturbations.
78 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer.
79 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer. Progression: {b p m}.
80 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer. Progression: {b p m}. 523 pathway defined gene sets.
81 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer. Progression: {b p m}. 523 pathway defined gene sets. 1 Which pathways are involved in all or some stages of progression?
82 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer. Progression: {b p m}. 523 pathway defined gene sets. 1 Which pathways are involved in all or some stages of progression? 2 What are the pathway dependencies (inferring pathway networks)?
83 Progression of prostate cancer Pathways and gene sets Progression in prostate cancer Gene expression from 22, 283 genes. 71 people 22 benign (b) prostate epithelium, 32 primary (p) prostate cancer, 17 metastatic (m) prostate cancer. Progression: {b p m}. 523 pathway defined gene sets. 1 Which pathways are involved in all or some stages of progression? 2 What are the pathway dependencies (inferring pathway networks)? 3 For each relevant pathway infer gene network for pathway.
84 Pathways relevant in progression Pathways and gene sets Progression in prostate cancer A TRANS b p p m 0.8 CCC B GHD KREB 0.2 C HORM GLY
85 Pathways and gene sets Progression in prostate cancer Pathway dependencies: benign to primary A B
86 Refinement of gene sets Pathways and gene sets Progression in prostate cancer 1 Not all genes in a gene set are relevant in the specific context studied.
87 Refinement of gene sets Pathways and gene sets Progression in prostate cancer 1 Not all genes in a gene set are relevant in the specific context studied. 2 Genes not included in the gene set maybe relevant to the specific context studied.
88 Gene network for ERK pathway Pathways and gene sets Progression in prostate cancer NGF PTPR EL K 1 SOS1 NGFB DPM2 GR B2 PPP2CA GNB1 MK NK 2 MK NK 1 EGFR R PS R AF1 SHC1 STAT TGFB MY C RPS6K AS MAPK 1 MAP2K 2 PDG MAP2K 1 GNAS
89 Relevant papers Learning Coordinate Covariances via Gradients. S. Mukherjee, D-X. Zhou; Journal of Machine Learning Research, 7(Mar): , Estimation of Gradients and Coordinate Covariation in Classification. S. Mukherjee, Q. Wu; Journal of Machine Learning Research, 7(Nov): , Characterizing the Function Space for Bayesian Kernel Models. N. Pillai, Q. Wu, F. Liang, S. Mukherjee, R.L. Wolpert; Journal of Machine Learning Research, 8(Aug): , Non-parametric Bayesian kernel models. F. Liang, K. Mao, M. Liao, S. Mukherjee, M. West; Biometrika, in submission. Learning Gradients: predictive models that infer geometry and dependence. Qiang Wu, Justin Guinney, Mauro Maggioni, ; Journal of Machine Learning Research, submitted. Modeling Cancer Progression via Pathway Dependencies. E. Edelman, J. Guinney, J-T. Chi, P.G. Febbo, S. Mukherjee; PLoS Computational Biology, in press. Bayesian simultaneous dimension reduction and regression. K. Mao, F. Liang, S. Mukherjee, Q. Wu; in preparation.
90 Acknowledgements People that did the work: Gradients Q Wu, D-X Zhou, K Mao, J Guinney
91 Acknowledgements People that did the work: Gradients Q Wu, D-X Zhou, K Mao, J Guinney Computational biology E Edelman, J Guinney, P Febbo, J-T Chi
92 Acknowledgements People that did the work: Gradients Q Wu, D-X Zhou, K Mao, J Guinney Computational biology E Edelman, J Guinney, P Febbo, J-T Chi Bayesian modeling N Pillai, K Mao, F Liang, M West, R Wolpert
93 Acknowledgements People that did the work: Gradients Q Wu, D-X Zhou, K Mao, J Guinney Computational biology E Edelman, J Guinney, P Febbo, J-T Chi Bayesian modeling N Pillai, K Mao, F Liang, M West, R Wolpert Funding: IGSP Center for Systems Biology at Duke NSF DMS
Learning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationSayan Mukherjee. June 15, 2007
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University June 15, 2007 To Tommy Poggio This talk is dedicated to my advisor Tommy Poggio as
More informationSupervised Dimension Reduction:
Supervised Dimension Reduction: A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, M. Maggioni, D-X. Zhou Department of Statistical Science Institute for Genome Sciences & Policy Department
More informationNonparametric Bayesian Models for Supervised Dimension R
Nonparametric Bayesian Models for Supervised Dimension Reduction Department of Statistical Science Duke University December 2, 2009 Nonparametric Bayesian Models for Supervised Dimension Reduction Department
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationLocalized Sliced Inverse Regression
Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationTwo models for Bayesian supervised dimension reduction
Two models for Bayesian supervised dimension reduction BY KAI MAO Department of Statistical Science Duke University, Durham NC 778-5, U.S.A. km68@stat.duke.edu QIANG WU Department of Mathematics Michigan
More informationLearning Gradients on Manifolds
Learning Gradients on Manifolds Sayan Mukherjee, Qiang Wu and Ding-Xuan Zhou Duke University, Michigan State University, and City University of Hong Kong Sayan Mukherjee Department of Statistical Science
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationLocal regression, intrinsic dimension, and nonparametric sparsity
Local regression, intrinsic dimension, and nonparametric sparsity Samory Kpotufe Toyota Technological Institute - Chicago and Max Planck Institute for Intelligent Systems I. Local regression and (local)
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationComputational Genomics
Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationNonparametric Bayes Density Estimation and Regression with High Dimensional Data
Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationStat 710: Mathematical Statistics Lecture 31
Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:
More informationEstimating variable structure and dependence in Multi-task learning via gradients
ESTIMATING VARIABLE STRUCTURE IN MULTI-TASK LEARNING Estimating variable structure and dependence in Multi-task learning via gradients Justin Guinney 1,2 Qiang Wu 1,3,4 Sayan Mukherjee 1,3,4 1 Institute
More informationData-dependent representations: Laplacian Eigenmaps
Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationNonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationVariable Selection and Dimension Reduction by Learning Gradients
Variable Selection and Dimension Reduction by Learning Gradients Qiang Wu and Sayan Mukherjee August 6, 2008 1 Introduction High dimension data analysis has become a challenging problem in modern sciences.
More informationPhysician Performance Assessment / Spatial Inference of Pollutant Concentrations
Physician Performance Assessment / Spatial Inference of Pollutant Concentrations Dawn Woodard Operations Research & Information Engineering Cornell University Johns Hopkins Dept. of Biostatistics, April
More informationarxiv: v1 [stat.me] 5 Aug 2015
Scalable Bayesian Kernel Models with Variable Selection Lorin Crawford, Kris C. Wood, and Sayan Mukherjee arxiv:1508.01217v1 [stat.me] 5 Aug 2015 Summary Nonlinear kernels are used extensively in regression
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationLearning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst
Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationMarch 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University
Kernels March 13, 2008 Paper: R.R. Coifman, S. Lafon, maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University Kernels Figure: Example Application from [LafonWWW] meaningful geometric
More informationCertifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint
More informationSpectral Regularization
Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationRegularization via Spectral Filtering
Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationResearch Statement on Statistics Jun Zhang
Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationIntrinsic Structure Study on Whale Vocalizations
1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationGRAPH SIGNAL PROCESSING: A STATISTICAL VIEWPOINT
GRAPH SIGNAL PROCESSING: A STATISTICAL VIEWPOINT Cha Zhang Joint work with Dinei Florêncio and Philip A. Chou Microsoft Research Outline Gaussian Markov Random Field Graph construction Graph transform
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More information20 Unsupervised Learning and Principal Components Analysis (PCA)
116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationExchangeability. Peter Orbanz. Columbia University
Exchangeability Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent noise Peter
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationContribution from: Springer Verlag Berlin Heidelberg 2005 ISBN
Contribution from: Mathematical Physics Studies Vol. 7 Perspectives in Analysis Essays in Honor of Lennart Carleson s 75th Birthday Michael Benedicks, Peter W. Jones, Stanislav Smirnov (Eds.) Springer
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More information