Lecture 11: Probability Distributions and Parameter Estimation
|
|
- Victor Manning
- 5 years ago
- Views:
Transcription
1 Intelligent Data Analysis and Probabilistic Inference Lecture 11: Probability Distributions and Parameter Estimation Recommended reading: Bishop: Chapters 1.2, , Appendix B Duncan Gillies and Marc Deisenroth Department of Computing Imperial College London February 10, 2016
2 Key Concepts in Probability Theory Two fundamental rules: ż ppxq ppx, yqdy ppx, yq ppy xqppxq Sum rule/marginalization property Product rule Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
3 Key Concepts in Probability Theory Two fundamental rules: ż ppxq ppx, yqdy ppx, yq ppy xqppxq Sum rule/marginalization property Product rule Bayes Theorem (Probabilistic Inverse) ppy xq ppxq ppx yq, x : hypothesis, y : measurement ppyq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
4 Key Concepts in Probability Theory Two fundamental rules: ż ppxq ppx, yqdy ppx, yq ppy xqppxq Sum rule/marginalization property Product rule Bayes Theorem (Probabilistic Inverse) ppy xq ppxq ppx yq, x : hypothesis, y : measurement ppyq Posterior belief Prior belief x Likelihood (measurement model) y Marginal likelihood (normalization constant) Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
5 Mean and (Co)Variance Mean and covariance are often useful to describe properties of probability distributions (expected values and spread). Definition ż E x rxs xppxqdx : µ V x rxs E x rpx µqpx µq J s E x rxx J s E x rxse x rxs J : Σ Covrx, ys E x,y rxy J s E x rxse y rys J Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
6 Mean and (Co)Variance Mean and covariance are often useful to describe properties of probability distributions (expected values and spread). Definition ż E x rxs xppxqdx : µ V x rxs E x rpx µqpx µq J s E x rxx J s E x rxse x rxs J : Σ Covrx, ys E x,y rxy J s E x rxse y rys J Linear/Affine Transformations: y Ax ` b, Erys Vrys where E x rxs µ, V x rxs Σ Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
7 Mean and (Co)Variance Mean and covariance are often useful to describe properties of probability distributions (expected values and spread). Definition ż E x rxs xppxqdx : µ V x rxs E x rpx µqpx µq J s E x rxx J s E x rxse x rxs J : Σ Covrx, ys E x,y rxy J s E x rxse y rys J Linear/Affine Transformations: y Ax ` b, where E x rxs µ, V x rxs Σ Erys E x rax ` bs AE x rxs ` b Aµ ` b Vrys Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
8 Mean and (Co)Variance Mean and covariance are often useful to describe properties of probability distributions (expected values and spread). Definition ż E x rxs xppxqdx : µ V x rxs E x rpx µqpx µq J s E x rxx J s E x rxse x rxs J : Σ Covrx, ys E x,y rxy J s E x rxse y rys J Linear/Affine Transformations: y Ax ` b, where E x rxs µ, V x rxs Σ Erys E x rax ` bs AE x rxs ` b Aµ ` b Vrys V x rax ` bs V x raxs AV x rxsa J AΣA J Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
9 Mean and (Co)Variance Mean and covariance are often useful to describe properties of probability distributions (expected values and spread). Definition ż E x rxs xppxqdx : µ V x rxs E x rpx µqpx µq J s E x rxx J s E x rxse x rxs J : Σ Covrx, ys E x,y rxy J s E x rxse y rys J Linear/Affine Transformations: y Ax ` b, where E x rxs µ, V x rxs Σ Erys E x rax ` bs AE x rxs ` b Aµ ` b Vrys V x rax ` bs V x raxs AV x rxsa J AΣA J If x, y independent: V x,y rx ` ys V x rxs ` V y rys Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
10 Basic Probability Distributions Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
11 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 p(x) Mean 95% confidence bound p(x) x Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
12 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 p(x) Mean 95% confidence bound 3 2 Mean 95% confidence bound p(x) x x x 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
13 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 Data p(x) Mean 95% confidence interval 3 2 Data Mean 95% confidence bound p(x) x x x 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
14 Conditional 3 2 «ff µx Joint p(x,y) ppx, yq N, µ y «ff Σxx Σ xy Σ yx Σ yy 1 0 y x Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
15 Conditional Joint p(x,y) Observation ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y x Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
16 Conditional 3 2 Joint p(x,y) Observation y Conditional p(x y) ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y ppx yq N `µ x y, Σ x y µ x y µ x ` Σ xy Σ 1 yy py µ y q Σ x y Σ xx Σ xy Σ 1 yy Σ yx x Conditional ppx yq is also Gaussian Computationally convenient Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
17 Marginal y Joint p(x,y) Marginal p(x) x ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy Marginal distribution: ż pp x q pp x, y qd y N ` µ x, Σ xx Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
18 Marginal y Joint p(x,y) Marginal p(x) x ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Marginal distribution: ż pp x q pp x, y qd y N ` µ x, Σ xx The marginal of a joint Gaussian distribution is Gaussian Intuitively: Ignore (integrate out) everything you are not interested in Σ yy Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
19 Bernoulli Distribution Distribution for a single binary variable x P t0, 1u Governed by a single continuous parameter µ P r0, 1s that represents the probability of x P t0, 1u. ppx µq µ x p1 µq 1 x Erxs µ Vrxs µp1 µq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
20 Bernoulli Distribution Distribution for a single binary variable x P t0, 1u Governed by a single continuous parameter µ P r0, 1s that represents the probability of x P t0, 1u. ppx µq µ x p1 µq 1 x Erxs µ Vrxs µp1 µq Example: Result of flipping a coin. Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
21 Beta Distribution ppµ α, βq Erµs Distribution over a continuous variable µ P r0, 1s, which is often used to represent the probability for some binary event (see Bernoulli distribution) Governed by two parameters α ą 0, β ą 0 Γpα ` βq µα 1 p1 µq β 1 ΓpαqΓpβq α α ` β, Vrµs αβ pα ` βq 2 pα ` β ` 1q Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
22 Binomial Distribution 0.25 Generalization of the Bernoulli distribution to a distribution over integers ppm N, µq Probability of observing m occurrences of x 1 in a set of N samples from a Bernoulli distribution, where ppx 1q µ P r0, 1s ˆN µ m p1 µq N m m Erms Nµ, Vrms Nµp1 µq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
23 Binomial Distribution 0.25 Generalization of the Bernoulli distribution to a distribution over integers ppm N, µq Probability of observing m occurrences of x 1 in a set of N samples from a Bernoulli distribution, where ppx 1q µ P r0, 1s ˆN µ m p1 µq N m m Erms Nµ, Vrms Nµp1 µq Example: What is the probability of observing m heads in N experiments if the probability for observing head in a single experiment is µ? Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
24 Gamma Distribution ppτ a, bq 1 Γpaq ba τ a 1 expp bτq Erτs a b Vrτs a b 2 Distribution over positive real numbers τ ą 0 Governed by parameters a ą 0 (shape), b ą 0 (scale) Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
25 Conjugate Priors Posterior 9 prior ˆ likelihood Specification or the prior can be tricky Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
26 Conjugate Priors Posterior 9 prior ˆ likelihood Specification or the prior can be tricky Some priors are (computationally) convenient If the posterior and the prior are of the same type (e.g., Beta), the prior is called conjugate Likelihood is also involved... Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
27 Conjugate Priors Posterior 9 prior ˆ likelihood Specification or the prior can be tricky Some priors are (computationally) convenient If the posterior and the prior are of the same type (e.g., Beta), the prior is called conjugate Likelihood is also involved... Examples: Conjugate prior Likelihood Posterior Beta Bernoulli Beta Gaussian-iGamma Gaussian Gaussian-iGamma Dirichlet Multinomial Dirichlet Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
28 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
29 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
30 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 If we now observe some outcomes x px 1,..., x n q of a repeated coin-flip experiment with h heads and t tails, we compute the posterior distribution on µ : Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
31 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 If we now observe some outcomes x px 1,..., x n q of a repeated coin-flip experiment with h heads and t tails, we compute the posterior distribution on µ : ppµ x hq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
32 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 If we now observe some outcomes x px 1,..., x n q of a repeated coin-flip experiment with h heads and t tails, we compute the posterior distribution on µ : ppµ x hq 9 ppx µqppµ α, βq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
33 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 If we now observe some outcomes x px 1,..., x n q of a repeated coin-flip experiment with h heads and t tails, we compute the posterior distribution on µ : ppµ x hq 9 ppx µqppµ α, βq µ h p1 µq t µ α 1 p1 µq β 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
34 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 If we now observe some outcomes x px 1,..., x n q of a repeated coin-flip experiment with h heads and t tails, we compute the posterior distribution on µ : ppµ x hq 9 ppx µqppµ α, βq µ h p1 µq t µ α 1 p1 µq β 1 µ h`α 1 p1 µq t`β 1 9 Betaph ` α, t ` βq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
35 Parameter Estimation Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
36 Parameter Estimation t x 1 Given a data set we want to obtain good estimates of the parameters of the model that may have generated the data Parameter estimation problem Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
37 Parameter Estimation t x 1 Given a data set we want to obtain good estimates of the parameters of the model that may have generated the data Parameter estimation problem Example: x 1,..., x N P R D are i.i.d. samples from a Gaussian Find the mean and covariance of ppxq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
38 Maximum Likelihood Parameter Estimation (1) Maximum likelihood estimation finds a point estimate of the parameters that maximizes the likelihood of the parameters Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
39 Maximum Likelihood Parameter Estimation (1) Maximum likelihood estimation finds a point estimate of the parameters that maximizes the likelihood of the parameters In our Gaussian example, we seek µ, Σ: Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
40 Maximum Likelihood Parameter Estimation (1) Maximum likelihood estimation finds a point estimate of the parameters that maximizes the likelihood of the parameters In our Gaussian example, we seek µ, Σ: max ppx 1,..., x n µ, Σq i.i.d. max Nź ppx i µ, Σq i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
41 Maximum Likelihood Parameter Estimation (1) Maximum likelihood estimation finds a point estimate of the parameters that maximizes the likelihood of the parameters In our Gaussian example, we seek µ, Σ: max ppx 1,..., x n µ, Σq i.i.d. max max Nÿ log ppx i µ, Σq i 1 Nź ppx i µ, Σq i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
42 Maximum Likelihood Parameter Estimation (1) Maximum likelihood estimation finds a point estimate of the parameters that maximizes the likelihood of the parameters In our Gaussian example, we seek µ, Σ: max ppx 1,..., x n µ, Σq i.i.d. max max Nÿ log ppx i µ, Σq i 1 Nź ppx i µ, Σq i 1 max ND 2 logp2πq N 1 log Σ 2 2 Nÿ px i µq J Σ 1 px i µq i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
43 Maximum Likelihood Parameter Estimation (2) µ ML arg max ND µ 2 logp2πq N log Σ N Nÿ i 1 x i Nÿ px i µq J Σ 1 px i µq i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
44 Maximum Likelihood Parameter Estimation (2) µ ML arg max ND µ 2 logp2πq N log Σ Σ ML 1 N Nÿ i 1 x i arg max ND Σ 2 logp2πq N 1 log Σ N Nÿ px i µ ML qpx i µ ML q J i 1 Nÿ px i µq J Σ 1 px i µq i 1 Nÿ px i µq J Σ 1 px i µq i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
45 Maximum Likelihood Parameter Estimation (2) µ ML arg max ND µ 2 logp2πq N log Σ Σ ML 1 N Nÿ i 1 x i arg max ND Σ 2 logp2πq N 1 log Σ N Nÿ px i µ ML qpx i µ ML q J i 1 Nÿ px i µq J Σ 1 px i µq i 1 Nÿ px i µq J Σ 1 px i µq ML estimate Σ ML is biased, but we can get an unbiased estimate as Σ 1 Nÿ px i µ N 1 ML qpx i µ ML q J i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10, i 1
46 MLE: Properties Asymptotic consistency: The MLE converges to the true value in the limit of infinitely many observations, plus a random error that is approximately normal The size of the sample necessary to achieve these properties can be quite large The error s variance decays in 1{N where N is the number of data points Especially, in the small data regime, MLE can lead to overfitting Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
47 Example: MLE in the Small-Data Regime True Gaussian True mean MLE mean Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
48 Maximum A Posteriori Estimation Instead of maximizing the likelihood, we can seek parameters that maximize the posterior distribution of the parameters θ arg max θ ppθ xq arg max log ppθq ` log ppx θq θ Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
49 Maximum A Posteriori Estimation Instead of maximizing the likelihood, we can seek parameters that maximize the posterior distribution of the parameters θ arg max θ ppθ xq arg max log ppθq ` log ppx θq θ MLE with an additional regularizer that comes from the prior MAP estimator Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
50 Maximum A Posteriori Estimation Instead of maximizing the likelihood, we can seek parameters that maximize the posterior distribution of the parameters θ arg max θ ppθ xq arg max log ppθq ` log ppx θq θ MLE with an additional regularizer that comes from the prior MAP estimator Example: Estimate the mean µ of a 1D Gaussian with known variance σ 2 after having observed N data points x i. Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
51 Maximum A Posteriori Estimation Instead of maximizing the likelihood, we can seek parameters that maximize the posterior distribution of the parameters θ arg max θ ppθ xq arg max log ppθq ` log ppx θq θ MLE with an additional regularizer that comes from the prior MAP estimator Example: Estimate the mean µ of a 1D Gaussian with known variance σ 2 after having observed N data points x i. Gaussian prior ppµq N `µ m, s 2 on mean yields µ MAP Ns2 Ns 2 ` σ 2 µ σ 2 ML ` Ns 2 ` σ 2 m Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
52 Interpreting the Result µ MAP Ns2 Ns 2 ` σ 2 µ σ 2 ML ` Ns 2 ` σ 2 m Linear interpolation between the prior mean and the sample mean (ML estimate), weighted by their respective covariances Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
53 Interpreting the Result µ MAP Ns2 Ns 2 ` σ 2 µ σ 2 ML ` Ns 2 ` σ 2 m Linear interpolation between the prior mean and the sample mean (ML estimate), weighted by their respective covariances The more data we have seen (N), the more we believe the sample mean Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
54 Interpreting the Result µ MAP Ns2 Ns 2 ` σ 2 µ σ 2 ML ` Ns 2 ` σ 2 m Linear interpolation between the prior mean and the sample mean (ML estimate), weighted by their respective covariances The more data we have seen (N), the more we believe the sample mean The higher the variance s 2 of the prior, the more we believe the sample mean; MAP s2 Ñ8 ÝÑ MLE Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
55 Interpreting the Result µ MAP Ns2 Ns 2 ` σ 2 µ σ 2 ML ` Ns 2 ` σ 2 m Linear interpolation between the prior mean and the sample mean (ML estimate), weighted by their respective covariances The more data we have seen (N), the more we believe the sample mean The higher the variance s 2 of the prior, the more we believe the sample mean; MAP s2 Ñ8 ÝÑ MLE The higher the variance σ 2 of the data, the less we believe the sample mean Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
56 Example True Gaussian Mean Prior True mean MLE mean MAP mean Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
57 Bayesian Inference (Marginalization) An even better idea than MAP estimation: Instead of estimating a parameter, integrate it out (according to the posterior) when making predictions ż ppx Dq ppx θqppθ Dqdθ where ppθ Dq is the parameter posterior Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
58 Bayesian Inference (Marginalization) An even better idea than MAP estimation: Instead of estimating a parameter, integrate it out (according to the posterior) when making predictions ż ppx Dq ppx θqppθ Dqdθ where ppθ Dq is the parameter posterior This integral is often tricky to solve Choose appropriate distributions (e.g., conjugate distributions) or solve approximately (e.g., sampling or variational inference) Works well (even in the small-data regime) and is robust to overfitting Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
59 Example: Linear Regression t x 1 Adapted from PRML (Bishop, 2006) Blue: data Black: True function (unknown) Red: Posterior mean (MAP estimate) Red-shaded: 95% confidence area of the prediction Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
60 Example: Linear Regression t x 1 Adapted from PRML (Bishop, 2006) Blue: data Black: True function (unknown) Red: Posterior mean (MAP estimate) Red-shaded: 95% confidence area of the prediction Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
61 Example: Linear Regression t x 1 Adapted from PRML (Bishop, 2006) Blue: data Black: True function (unknown) Red: Posterior mean (MAP estimate) Red-shaded: 95% confidence area of the prediction Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
62 Example: Linear Regression t x 1 Adapted from PRML (Bishop, 2006) Blue: data Black: True function (unknown) Red: Posterior mean (MAP estimate) Red-shaded: 95% confidence area of the prediction Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
63 References I [1] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag, Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,
Gaussian Processes. Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Data Analysis and Probabilistic Inference
Data Analysis and Probabilistic Inference Gaussian Processes Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Marc Deisenroth Department of Computing Imperial College
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationCOMP 551 Applied Machine Learning Lecture 19: Bayesian Inference
COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationData Mining Techniques. Lecture 3: Probability
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de Meent (credit: Zhao, CS 229, Bishop) Project Vote 1. Freeform: Develop your own project proposals 30% of
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationNPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic
NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIntroduction to Machine Learning
Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationLecture 5 September 19
IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationIntroduc)on to Bayesian methods (con)nued) - Lecture 16
Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationGaussians Linear Regression Bias-Variance Tradeoff
Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationAarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationA Brief Review of Probability, Bayesian Statistics, and Information Theory
A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationDiscrete Binary Distributions
Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:
More informationBayesian Hypotheses Testing
Bayesian Hypotheses Testing Jakub Repický Faculty of Mathematics and Physics, Charles University Institute of Computer Science, Czech Academy of Sciences Selected Parts of Data Mining Jan 19 2018, Prague
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationMathematics for Inference and Machine Learning
LECTURE NOTES IMPERIAL COLLEGE LONDON DEPARTMENT OF COMPUTING Mathematics for Inference and Machine Learning Instructors: Marc Deisenroth, Stefanos Zafeiriou Version: January 3, 207 Contents Linear Regression
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationIntroduction to Machine Learning
Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationParametric Models: from data to models
Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationHomework 5: Conditional Probability Models
Homework 5: Conditional Probability Models Instructions: Your answers to the questions below, including plots and mathematical work, should be submitted as a single PDF file. It s preferred that you write
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More information