Probabilistic Reasoning in Deep Learning
|
|
- Nelson Lambert
- 6 years ago
- Views:
Transcription
1 Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39
2 OVERVIEW OF THE TALK Basics of Bayesian Inference Konstantina Palla 2 / 39
3 OVERVIEW OF THE TALK Basics of Bayesian Inference Bayesian reasoning inside DNNs. Konstantina Palla 2 / 39
4 OVERVIEW OF THE TALK Basics of Bayesian Inference Bayesian reasoning inside DNNs. > The Bayesian paradigm as an approach to account for uncetainty in the DNNs. Konstantina Palla 2 / 39
5 OVERVIEW OF THE TALK Basics of Bayesian Inference Bayesian reasoning inside DNNs. > The Bayesian paradigm as an approach to account for uncetainty in the DNNs. Connection to other statistical models (nonparametric models). Konstantina Palla 2 / 39
6 OVERVIEW OF THE TALK Basics of Bayesian Inference Bayesian reasoning inside DNNs. > The Bayesian paradigm as an approach to account for uncetainty in the DNNs. Connection to other statistical models (nonparametric models). > How kernels and Gaussian Processes fit in the picture of NNs. Konstantina Palla 2 / 39
7 PART I Bayesian Inference: Basics Konstantina Palla 3 / 39
8 BAYESIAN INFERENCE: BASICS Data: Observations Y = y (1), y (2),..., y (N) Problem: How was it generated? Konstantina Palla 4 / 39
9 BAYESIAN INFERENCE: BASICS Data: Observations Y = y (1), y (2),..., y (N) Problem: How was it generated? Answer: Make assumptions distributional θ y (i) θ p θ (y θ) y Konstantina Palla 4 / 39
10 BAYESIAN INFERENCE: BASICS Data: Observations Y = y (1), y (2),..., y (N) Problem: How was it generated? Answer: Make assumptions distributional θ y (i) θ p θ (y θ) y y (i) θ N (y; µ, σ 2 ) θ = {µ, σ 2 } Konstantina Palla 4 / 39
11 BAYESIAN INFERENCE: BASICS Data: Observations Y = y (1), y (2),..., y (N) Problem: How was it generated? Answer: Make assumptions distributional θ y (i) θ p θ (y θ) y likelihood: the probability of the data under the model parameters θ y (i) θ N (y; µ, σ 2 ) θ = {µ, σ 2 } N p(y θ) = p(y (i) θ) i=1 N = N (y (i) µ, σ 2 ) i=1 Konstantina Palla 4 / 39
12 BAYESIAN INFERENCE: BASICS Bayesian Principle All forms of uncertainty should be expressed by means of probability distributions. θ Prior: θ p(θ) Our prior belief before seeing the data Posterior: Our updated belief after having seen the data Bayes Rule y p(θ Y) = p(y θ)p(θ) p(y) Konstantina Palla 5 / 39
13 BAYESIAN INFERENCE: BASICS Baye s rule p(θ Y) = p(y θ)p(θ) p(y) Konstantina Palla 6 / 39
14 BAYESIAN INFERENCE: BASICS Baye s rule posterior p(θ Y) = p(y θ)p(θ) p(y) The posterior is proportional to likelihood prior Konstantina Palla 6 / 39
15 BAYESIAN INFERENCE: BASICS Baye s rule likelihood posterior p(θ Y) = p(y θ)p(θ) p(y) The posterior is proportional to likelihood prior Konstantina Palla 6 / 39
16 BAYESIAN INFERENCE: BASICS likelihood Baye s rule posterior p(θ Y) = prior p(y θ)p(θ) p(y) The posterior is proportional to likelihood prior Konstantina Palla 6 / 39
17 BAYESIAN INFERENCE: BASICS likelihood Baye s rule posterior p(θ Y) = prior p(y θ)p(θ) p(y) The posterior is proportional to likelihood prior Update my prior beliefs after I see the data Konstantina Palla 6 / 39
18 BAYESIAN INFERENCE: BASICS P(θ Y) P(Y θ)p(θ) Konstantina Palla 7 / 39
19 BAYESIAN INFERENCE: BASICS Make predictions For unseen y marginalise! P(y Y) = P(y θ)p(θ Y)dθ Konstantina Palla 8 / 39
20 BAYESIAN INFERENCE: BASICS Make predictions For unseen y marginalise! P(y Y) = P(y θ)p(θ Y)dθ The prediction does not depend on a point estimate of θ but it is expressed as a weighted average over all the possible values of θ Konstantina Palla 8 / 39
21 BAYESIAN INFERENCE: BASICS Why is this useful? > Uncertainty is taken into account No point estimates BUT distribution over the unknown θ; p(θ Y) Konstantina Palla 9 / 39
22 BAYESIAN INFERENCE: BASICS Why is this useful? > Uncertainty is taken into account No point estimates BUT distribution over the unknown θ; p(θ Y) > Natural handling of missing data θ y a probability distribution is estimated for each missing value θ? θ p(θ Y) Konstantina Palla 9 / 39
23 BAYESIAN INFERENCE: BASICS Why is this useful? > Uncertainty is taken into account No point estimates BUT distribution over the unknown θ; p(θ Y) > Natural handling of missing data θ y a probability distribution is estimated for each missing value θ? θ p(θ Y) > Adresses issues like regularization, overfitting useful in Deep Neural Networks. Konstantina Palla 9 / 39
24 PART II Bayesian Reasoning in (Deep) Neural Networks Konstantina Palla 10 / 39
25 PART II Bayesian Reasoning in (Deep) Neural Networks Overview of (D)NNs Konstantina Palla 10 / 39
26 PART II Bayesian Reasoning in (Deep) Neural Networks Overview of (D)NNs Bayesian Reasoning in DNNs Konstantina Palla 10 / 39
27 OVERVIEW OF (DEEP) NEURAL NETWORKS Input layer Hidden layer Hidden layer Hidden layer Output layer What is Deep Learning A framework for constructing flexible models a way of constructing outputs using functions on inputs. Konstantina Palla 11 / 39
28 OVERVIEW OF DEEP NNS Repetition of building blocks Konstantina Palla 12 / 39
29 OVERVIEW OF DEEP NNS Building block x x 1 β 1 x 2 β 2 f (η) y η = β T x linear. β M f (η) non linear x M y > Output: y > Linear predictor : η = M m=1 βmxm = βt x, coefficients β weights > Inverse Link function: f (η) activation function Konstantina Palla 13 / 39
30 OVERVIEW OF DEEP NNS..... In each layer l the input is linearly composed η l = β T l x l Konstantina Palla 14 / 39
31 OVERVIEW OF DEEP NNS..... In each layer l the input is linearly composed η l = β T l x l a non-linear function is applied f l(η l ) Konstantina Palla 14 / 39
32 OVERVIEW OF DEEP NNS..... In each layer l the input is linearly composed η l = β T l x l a non-linear function is applied f l(η l ) A Deep Neural Network construction of recursive building blocks Konstantina Palla 14 / 39
33 OVERVIEW OF DEEP NNS x η = β T x f (η) η l = β T l x l f l(η l ) y A Deep Neural Network can represent functions of increasing complexity. Konstantina Palla 15 / 39
34 DEEP NNS AS A PROBABILISTIC MODEL x 1 β 1 x 2 β 2 f (η) y. β p x p η = β T x Konstantina Palla 16 / 39
35 DEEP NNS AS A PROBABILISTIC MODEL x 1 β 1 x y x 2. β 2 β p f (η) y Parametric! p(y x; β) x p η = β T x Konstantina Palla 16 / 39
36 DEEP NNS AS A PROBABILISTIC MODEL x y p(y x; β) Parametric! x 1 β 1 Example: Regression x 2. β 2 β p f (η) η f (η) = η = β T x y = β T x + ɛ, ɛ N (0, σ 2 ) p(y x, β) = N (β T x, σ 2 ) x p η = β T x Konstantina Palla 16 / 39
37 DEEP NNS AS A PROBABILISTIC MODEL x y p(y x; β) Parametric! x 1 β 1 Example: Regression x 2. β 2 β p f (η) η f (η) = η = β T x y = β T x + ɛ, ɛ N (0, σ 2 ) p(y x, β) = N (β T x, σ 2 ) x p η = β T x Train the network: learn β? Konstantina Palla 16 / 39
38 DEEP NNS AS A PROBABILISTIC MODEL Learn β Maximum Likelihood Principal Minimize the negative log-likelihood J (β) = log(p(y x, β)) β MLE : arg min J(β) β Konstantina Palla 17 / 39
39 DEEP NNS AS A PROBABILISTIC MODEL Learn β Maximum Likelihood Principal Minimize the negative log-likelihood J (β) = log(p(y x, β)) β MLE : arg min J(β) β Regression x y p(y x, β) = N (β T x, σ 2 ) Konstantina Palla 17 / 39
40 DEEP NNS AS A PROBABILISTIC MODEL Learn β Maximum Likelihood Principal Minimize the negative log-likelihood J (β) = log(p(y x, β)) β MLE : arg min J(β) β Regression x Squared Loss function y J(β) = 1 2 N {y (i) β T x (i) } 2 + const. i=1 p(y x, β) = N (β T x, σ 2 ) Konstantina Palla 17 / 39
41 DEEP NNS AS A PROBABILISTIC MODEL Learn β Maximum Likelihood Principal Minimize the negative log-likelihood J (β) = log(p(y x, β)) β MLE : arg min J(β) β Regression x Squared Loss function y J(β) = 1 2 N {y (i) β T x (i) } 2 + const. i=1 p(y x, β) = N (β T x, σ 2 ) > Solve using Stachastic Gradient, back-propagation etc. Konstantina Palla 17 / 39
42 DEEP NNS AS A PROBABILISTIC MODEL δ β Maximum Likelihood Estimation β is fixed but unknown - Point estimates - Overfitting Konstantina Palla 18 / 39
43 DEEP NNS AS A PROBABILISTIC MODEL δ β p(β) Maximum Likelihood Estimation β is fixed but unknown - Point estimates - Overfitting Bayesian Approach β is unknown/uncertain + Account for uncertainty p(β), p(β y) + Regularisation Konstantina Palla 18 / 39
44 BAYESIAN REASONING IN DEEP LEARNING Maximum a Posteriori Estimation β x y Konstantina Palla 19 / 39
45 BAYESIAN REASONING IN DEEP LEARNING Maximum a Posteriori Estimation Distribution over the network weights β β x β p(β) y x, β p(y x, β) y Konstantina Palla 19 / 39
46 BAYESIAN REASONING IN DEEP LEARNING Maximum a Posteriori Estimation Distribution over the network weights β β x β p(β) y x, β p(y x, β) Update the belief over β Bayes rule y Likelihood p(β Y) p(y β, X) prior p(β) Konstantina Palla 19 / 39
47 BAYESIAN REASONING IN DEEP LEARNING Maximum a Posteriori Estimation Distribution over the network weights β β x β p(β) y x, β p(y x, β) Update the belief over β Bayes rule y Likelihood p(β Y) p(y β, X) prior p(β) Maximum a Posteriori Find β MAP that maximizes log p(β Y) Find β MAP that minimizes J (β) = log p(β Y) β MAP : arg min J(β) β Konstantina Palla 19 / 39
48 BAYESIAN REASONING IN DEEP LEARNING Regression: Maximum a Posteriori β x y prior over the weights y = β T x + ɛ multivariate Gaussian p(β) = N (m β, λ 1 I) p(y x, β) = N (β T x, σ 2 ) Konstantina Palla 20 / 39
49 BAYESIAN REASONING IN DEEP LEARNING Regression: Maximum a Posteriori β prior over the weights x y y = β T x + ɛ multivariate Gaussian Loss function: J (β) = log p(β Y) J (β) = 1 2 N {y (i) β T x (i) } 2 + i=1 weight decay - regularization λ 2 βt β +const. p(β) = N (m β, λ 1 I) p(y x, β) = N (β T x, σ 2 ) Konstantina Palla 20 / 39
50 BAYESIAN REASONING IN DEEP LEARNING Regression: Maximum a Posteriori β prior over the weights x y y = β T x + ɛ multivariate Gaussian Loss function: J (β) = log p(β Y) J (β) = 1 2 N {y (i) β T x (i) } 2 + i=1 weight decay - regularization λ 2 βt β +const. The posterior distribution over the weights allows for a natural regularizer to arise! Bayesian side effect p(β) = N (m β, λ 1 I) p(y x, β) = N (β T x, σ 2 ) Konstantina Palla 20 / 39
51 BAYESIAN REASONING IN DEEP LEARNING Regression: Maximum a Posteriori β prior over the weights x y y = β T x + ɛ multivariate Gaussian p(β) = N (m β, λ 1 I) Loss function: J (β) = log p(β Y) J (β) = 1 2 N {y (i) β T x (i) } 2 + i=1 weight decay - regularization λ 2 βt β +const. The posterior distribution over the weights allows for a natural regularizer to arise! Bayesian side effect β MAP : arg min J(β) β p(y x, β) = N (β T x, σ 2 ) Konstantina Palla 20 / 39
52 BAYESIAN REASONING IN DEEP LEARNING Regression: Maximum a Posteriori β prior over the weights x y y = β T x + ɛ multivariate Gaussian p(β) = N (m β, λ 1 I) p(y x, β) = N (β T x, σ 2 ) Loss function: J (β) = log p(β Y) J (β) = 1 2 N {y (i) β T x (i) } 2 + i=1 weight decay - regularization λ 2 βt β +const. The posterior distribution over the weights allows for a natural regularizer to arise! Bayesian side effect β MAP : arg min J(β) β Predicitons: y Y, x N (β T MAP x, σ 2 ) Konstantina Palla 20 / 39
53 BAYESIAN REASONING IN DEEP LEARNING What we have seen so far... > Being Bayesian over the weights β, i.e. putting a prior distribution, allows for a better loss function in learning. natural regularization Konstantina Palla 21 / 39
54 BAYESIAN REASONING IN DEEP LEARNING What we have seen so far... > Being Bayesian over the weights β, i.e. putting a prior distribution, allows for a better loss function in learning. natural regularization > (D)NNs don t handle missing data x p(y x; β) y x (i), y (i), i = 1,..., N train data y : p(y x ; β) Konstantina Palla 21 / 39
55 BAYESIAN REASONING IN DEEP LEARNING What we have seen so far... > Being Bayesian over the weights β, i.e. putting a prior distribution, allows for a better loss function in learning. natural regularization > (D)NNs don t handle missing data > Need for p(x); x p(x) x p(y x; β) y x (i), y (i), i = 1,..., N train data y : p(y x ; β) x (i)?, y (i) Konstantina Palla 21 / 39
56 BAYESIAN REASONING IN DEEP LEARNING What we have seen so far... > Being Bayesian over the weights β, i.e. putting a prior distribution, allows for a better loss function in learning. natural regularization > (D)NNs don t handle missing data > Need for p(x); x p(x) > (D)NNs don t generate new examples x p(y x; β) y x (i), y (i), i = 1,..., N train data y : p(y x ; β) x (i)?, y (i) Konstantina Palla 21 / 39
57 BAYESIAN REASONING IN DEEP LEARNING What we have seen so far... > Being Bayesian over the weights β, i.e. putting a prior distribution, allows for a better loss function in learning. natural regularization > (D)NNs don t handle missing data > Need for p(x); x p(x) > (D)NNs don t generate new examples > p(x, y) = p(y x)p(x) draw x (i) p(x (i) ) draw y (i) p(y (i) x (i) ) p(y x; β) x y x (i), y (i), i = 1,..., N train data y : p(y x ; β) x (i)?, y (i)? Konstantina Palla 21 / 39
58 BAYESIAN REASONING IN DEEP LEARNING What we have seen so far... > Being Bayesian over the weights β, i.e. putting a prior distribution, allows for a better loss function in learning. natural regularization > (D)NNs don t handle missing data > Need for p(x); x p(x) > (D)NNs don t generate new examples > p(x, y) = p(y x)p(x) draw x (i) p(x (i) ) draw y (i) p(y (i) x (i) ) p(y x; β) x y x (i), y (i), i = 1,..., N train data y : p(y x ; β) x (i)?, y (i)? Need for a fully probabilstic model Konstantina Palla 21 / 39
59 BAYESIAN REASONING IN DEEP LEARNING Autoencoder as a fully probabilistic Neural Network Konstantina Palla 22 / 39
60 > Two parts: > Encoder: "encodes" the input to hidden code (representation) > Decoder: "decodes" the input back > Minimize reconstruction error L = y ỹ 2 2 Konstantina Palla 23 / 39 BAYESIAN REASONING IN DEEP LEARNING Autoencoder Input Output Code z y Encoder Decoder ỹ
61 BAYESIAN REASONING IN DEEP LEARNING Autoencoder Input Output Code z y Encoder Decoder ỹ Unsupervised method: Experiences data y and attempts to learn the distribution over y, i.e. p(y z) Konstantina Palla 24 / 39
62 BAYESIAN REASONING IN DEEP LEARNING Input Output Code p φ (z y) p θ (y z) y z ỹ Encoder Decoder {θ, φ} NN s weights Konstantina Palla 25 / 39
63 BAYESIAN REASONING IN DEEP LEARNING Decoder Generative Model z Latent variable model ỹ z p(z) ỹ p θ (y z) Konstantina Palla 26 / 39
64 BAYESIAN REASONING IN DEEP LEARNING Encoder Inference Mechanism z Decoder Generative Model z Latent variable model y ỹ Learn p φ (z y) φ Learn θ z p(z) ỹ p θ (y z) Konstantina Palla 26 / 39
65 BAYESIAN REASONING IN DEEP LEARNING Encoder Inference Mechanism Decoder Generative Model z z Latent variable model y ỹ Learn p φ (z y) φ z p(z) Learn θ ỹ p θ (y z) Inference: Learn φ, θ weights! Konstantina Palla 26 / 39
66 BAYESIAN REASONING IN DEEP LEARNING Variational Inference p φ (z y) = p(y z)p(z) p(y)? Cannot sample from p φ (z y) cannot construct it using the NNs structure (weights φ) Konstantina Palla 27 / 39
67 BAYESIAN REASONING IN DEEP LEARNING Variational Inference Main Principle approximate distribution q φ (z) p(z y) p φ (z y) = p(y z)p(z) p(y)? Cannot sample from p φ (z y) cannot construct it using the NNs structure (weights φ) Konstantina Palla 27 / 39
68 BAYESIAN REASONING IN DEEP LEARNING Variational Inference Main Principle approximate distribution q φ (z) p(z y) p φ (z y) = p(y z)p(z) p(y)? Cannot sample from p φ (z y) cannot construct it using the NNs structure (weights φ) z z q φ (z) q φ (z) y Konstantina Palla 27 / 39
69 BAYESIAN REASONING IN DEEP LEARNING Variational Inference Main Principle approximate distribution q φ (z) p(z y) p φ (z y) = p(y z)p(z) p(y)? KL[q φ (z) p(z y)] Cannot sample from p φ (z y) cannot construct it using the NNs structure (weights φ) KL measures the information lost when using q to approximate p z z q φ (z) q φ (z) y Konstantina Palla 27 / 39
70 BAYESIAN REASONING IN DEEP LARNING Variational Inference Cost function A metric for how well {φ, θ} capture the data generating p θ (y z) & latent distributions p φ (z y) Minimize L = log p(y θ, φ) Konstantina Palla 28 / 39
71 BAYESIAN REASONING IN DEEP LARNING Variational Inference Cost function A metric for how well {φ, θ} capture the data generating p θ (y z) & latent distributions p φ (z y) Minimize L = log p(y θ, φ) Reconstruction cost: Expected log-likelihood measures how well samples from q φ (z) are able to explain the data y. Penalty: This divergence measures how much information is lost (in units of nats) when using q φ (z) to represent p(z y) Minimize F Reconstruction F(φ, θ; y) = E qφ (z y)[log p θ (y z)] + Penalty KL[q φ (z) p(z y)] Konstantina Palla 28 / 39
72 BAYESIAN REASONING IN DEEP LARNING Variational Inference Cost function A metric for how well {φ, θ} capture the data generating p θ (y z) & latent distributions p φ (z y) Reconstruction cost: Expected log-likelihood measures how well samples from q φ (z) are able to explain the data y. Penalty: This divergence measures how much information is lost (in units of nats) when using q φ (z) to represent p(z y) Minimize F Reconstruction Penalty F(φ, θ; y) = E qφ (z y)[log p θ (y z)] + KL[q φ (z) p(z y)] Penalty is derived from your model and does not need to be designed Konstantina Palla 28 / 39
73 BAYESIAN REASONING IN DEEP LEARNING Bayesian & DL marriage - What have we gained? z z q φ (z) z ++ Principled inference approach - Bayesian Penalty as intrinsic part of the model! ++ Encode uncertainty ++ Impute missing data ỹ ỹ p θ (y z) y Model Inference Konstantina Palla 29 / 39
74 Deep Learning as a kernel method > So far... Predictions made using point estimates β (weights) parametric > Now... Store the entire training set in order to make predictions kernel approach non-parametric Konstantina Palla 30 / 39
75 DEEP LEARNING AS A KERNEL METHOD.... y x β φ(.) Regression example M y = β T φ(x) = β mφ m(x) m=1 Konstantina Palla 31 / 39
76 DEEP LEARNING AS A KERNEL METHOD Parametric approach J(β) = 1 2 N n=1 {βt φ(x n) y n} 2 + λ 2 βt β > MLE training point estimates of β > Predictions: J(β) = 0 β MLE = 1 λ N {β T φ(x n) y n}φ(x) n n=1 y = β T MLE φ(x ) Predictions purely based on the learned parameter- all the information from the data is transferred into a set of parameters Konstantina Palla 32 / 39
77 DEEP LEARNING AS A KERNEL METHOD Parametric approach J(β) = 1 2 N n=1 {βt φ(x n) y n} 2 + λ 2 βt β > MLE training point estimates of β > Predictions: J(β) = 0 β MLE = 1 λ N {β T φ(x n) y n}φ(x) n n=1 y = β T MLE φ(x ) Predictions purely based on the learned parameter- all the information from the data is transferred into a set of parameters parametric! Konstantina Palla 32 / 39
78 DEEP LEARNING AS A KERNEL METHOD Change of scenery...let the data speak J(β) = 1 2 N n=1 {βt φ(x n) y n} 2 + λ 2 βt β J(β) = 0 β MLE = 1 λ = N {β T φ(x n) y n}φ(x n) n=1 N α nφ(x n) = n=1 design matrix M N Φ T α n = 1 λ (βt φ(x n) y n) N 1 α Konstantina Palla 33 / 39
79 DEEP LEARNING AS A KERNEL METHOD Change of scenery...let the data speak J(β) = 1 2 N n=1 {βt φ(x n) y n} 2 + λ 2 βt β dual representation J(β) = 0 β MLE = 1 λ = N {β T φ(x n) y n}φ(x n) n=1 N α nφ(x n) = n=1 design matrix M N Φ T N 1 α J(α) = 0 α = ( Gram matrix N N K +λi N) 1 y κ lm = φ(x l), φ(x m) = k(x l, x m) kernel function α n = 1 λ (βt φ(x n) y n) Konstantina Palla 33 / 39
80 DEEP LEARNING AS A KERNEL METHOD Why? β MLE = 1 N {β T φ(x n) y n}φ(x n) λ n=1 Konstantina Palla 34 / 39
81 DEEP LEARNING AS A KERNEL METHOD Why? α = (K + λi N) 1 y β MLE = 1 N {β T φ(x n) y n}φ(x n) λ n=1 Konstantina Palla 34 / 39
82 DEEP LEARNING AS A KERNEL METHOD Why? β MLE = 1 λ Predictions: N {β T φ(x n) y n}φ(x n) n=1 point estimateno memory of data α = (K + λi N) 1 y y = β T MLE φ(x ) Konstantina Palla 34 / 39
83 DEEP LEARNING AS A KERNEL METHOD Why? β MLE = 1 λ Predictions: N {β T φ(x n) y n}φ(x n) n=1 point estimateno memory of data α = (K + λi N) 1 y all data y = k(x, x ) T (K + λi) 1 y y = β T MLE φ(x ) Konstantina Palla 34 / 39
84 DEEP LEARNING AS A KERNEL METHOD Why? β MLE = 1 λ Predictions: N {β T φ(x n) y n}φ(x n) n=1 point estimateno memory of data α = (K + λi N) 1 y all data y = k(x, x ) T (K + λi) 1 y y = β T MLE φ(x ) parametric non parametric Konstantina Palla 34 / 39
85 DEEP LEARNING AS A KERNEL METHOD Why? β MLE = 1 λ Predictions: N {β T φ(x n) y n}φ(x n) n=1 point estimateno memory of data α = (K + λi N) 1 y all data y = k(x, x ) T (K + λi) 1 y y = β T MLE φ(x ) parametric non parametric Memory in β MLE Memory by actual storing all the data Konstantina Palla 34 / 39
86 DEEP LEARNING AS A KERNEL METHOD Deep Networks dual Kernel Methods Konstantina Palla 35 / 39
87 DEEP LEARNING AS A KERNEL METHOD Step back... φ 1(x) φ 2(x) β 1 β 2 y M y = β T φ(x) = β mφ m(x) y = f (x) TASK: learn the best regression function f () m. β M φ M(x) Konstantina Palla 36 / 39
88 DEEP LEARNING AS A KERNEL METHOD Step back... φ 1(x) φ 2(x) β 1 β 2 y y = β T φ(x) = M β mφ m(x) m y = f (x) TASK: learn the best regression function f (). β M What about a bit more of Bayesian reasoning? φ M(x) f = [f (x 1), f (x 2),..., f (x n)] p(f) Consider f a multivariate variable- Apply distribution! Konstantina Palla 36 / 39
89 DEEP LEARNING AS A KERNEL METHOD φ 1(x) φ 2(x) β 1 β 2 y y = β T φ(x) = M β mφ m(x) m y = f (x). φ M(x) β M f = [f (x 1), f (x 2),..., f (x n)] Number of hidden units φ m(x) to infinity, M β N (m, Σ) f N (0, K), K ij = k(x i, x j) Konstantina Palla 37 / 39
90 DEEP LEARNING AS A KERNEL METHOD φ 1(x) φ 2(x) β 1 β 2 y y = β T φ(x) = M β mφ m(x) m y = f (x). φ M(x) β M f = [f (x 1), f (x 2),..., f (x n)] Number of hidden units φ m(x) to infinity, M β N (m, Σ) f N (0, K), K ij = k(x i, x j) When N, sample from a Gaussian process f GP(0, K) Konstantina Palla 37 / 39
91 DEEP LEARNING AS A KERNEL METHOD φ 1(x) φ 2(x) β 1 β 2 y y = β T φ(x) = M β mφ m(x) m y = f (x). φ M(x) β M f = [f (x 1), f (x 2),..., f (x n)] Number of hidden units φ m(x) to infinity, M β N (m, Σ) Predict y = f (x ) f N (0, K), K ij = k(x i, x j) When N, sample from a Gaussian process f GP(0, K) p(f X, y, x ) = p(y x, f, X, y)p(f X, y)df Konstantina Palla 37 / 39
92 DEEP LEARNING AS A KERNEL METHOD φ 1(x) φ 2(x). φ M(x) β 1 β 2 β M Number of hidden units φ m(x) to infinity, M y Gaussian Processes inifnite limits Deep Networks Konstantina Palla 38 / 39
93 Thank you! Konstantina Palla 39 / 39
Density Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationProbabilistic & Bayesian deep learning. Andreas Damianou
Probabilistic & Bayesian deep learning Andreas Damianou Amazon Research Cambridge, UK Talk at University of Sheffield, 19 March 2019 In this talk Not in this talk: CRFs, Boltzmann machines,... In this
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationDeep latent variable models
Deep latent variable models Pierre-Alexandre Mattei IT University of Copenhagen http://pamattei.github.io @pamattei 19 avril 2018 Séminaire de statistique du CNAM 1 Overview of talk A short introduction
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationVariational Inference via Stochastic Backpropagation
Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationVariational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain
Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationIntroduction into Bayesian statistics
Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationBayesian Deep Learning
Bayesian Deep Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp June 06, 2018 Mohammad Emtiyaz Khan 2018 1 What will you learn? Why is Bayesian inference
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationVariational Autoencoders (VAEs)
September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p
More informationProbabilistic Graphical Models Lecture 20: Gaussian Processes
Probabilistic Graphical Models Lecture 20: Gaussian Processes Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 30, 2015 1 / 53 What is Machine Learning? Machine learning algorithms
More informationLarge-scale Collaborative Prediction Using a Nonparametric Random Effects Model
Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationStochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints
Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationFully Bayesian Deep Gaussian Processes for Uncertainty Quantification
Fully Bayesian Deep Gaussian Processes for Uncertainty Quantification N. Zabaras 1 S. Atkinson 1 Center for Informatics and Computational Science Department of Aerospace and Mechanical Engineering University
More informationLecture 1b: Linear Models for Regression
Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationComputational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept
Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationExpectation Propagation in Dynamical Systems
Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More information