Lecture 11: Probability Distributions and Parameter Estimation

Size: px
Start display at page:

Download "Lecture 11: Probability Distributions and Parameter Estimation"

Transcription

1 Intelligent Data Analysis and Probabilistic Inference Lecture 11: Probability Distributions and Parameter Estimation Recommended reading: Bishop: Chapters 1.2, , Appendix B Duncan Gillies and Marc Deisenroth Department of Computing Imperial College London February 10, 2016

2 Key Concepts in Probability Theory Two fundamental rules: ż ppxq ppx, yqdy ppx, yq ppy xqppxq Sum rule/marginalization property Product rule Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

3 Key Concepts in Probability Theory Two fundamental rules: ż ppxq ppx, yqdy ppx, yq ppy xqppxq Sum rule/marginalization property Product rule Bayes Theorem (Probabilistic Inverse) ppy xq ppxq ppx yq, x : hypothesis, y : measurement ppyq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

4 Key Concepts in Probability Theory Two fundamental rules: ż ppxq ppx, yqdy ppx, yq ppy xqppxq Sum rule/marginalization property Product rule Bayes Theorem (Probabilistic Inverse) ppy xq ppxq ppx yq, x : hypothesis, y : measurement ppyq Posterior belief Prior belief x Likelihood (measurement model) y Marginal likelihood (normalization constant) Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

5 Mean and (Co)Variance Mean and covariance are often useful to describe properties of probability distributions (expected values and spread). Definition ż E x rxs xppxqdx : µ V x rxs E x rpx µqpx µq J s E x rxx J s E x rxse x rxs J : Σ Covrx, ys E x,y rxy J s E x rxse y rys J Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

6 Mean and (Co)Variance Mean and covariance are often useful to describe properties of probability distributions (expected values and spread). Definition ż E x rxs xppxqdx : µ V x rxs E x rpx µqpx µq J s E x rxx J s E x rxse x rxs J : Σ Covrx, ys E x,y rxy J s E x rxse y rys J Linear/Affine Transformations: y Ax ` b, Erys Vrys where E x rxs µ, V x rxs Σ Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

7 Mean and (Co)Variance Mean and covariance are often useful to describe properties of probability distributions (expected values and spread). Definition ż E x rxs xppxqdx : µ V x rxs E x rpx µqpx µq J s E x rxx J s E x rxse x rxs J : Σ Covrx, ys E x,y rxy J s E x rxse y rys J Linear/Affine Transformations: y Ax ` b, where E x rxs µ, V x rxs Σ Erys E x rax ` bs AE x rxs ` b Aµ ` b Vrys Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

8 Mean and (Co)Variance Mean and covariance are often useful to describe properties of probability distributions (expected values and spread). Definition ż E x rxs xppxqdx : µ V x rxs E x rpx µqpx µq J s E x rxx J s E x rxse x rxs J : Σ Covrx, ys E x,y rxy J s E x rxse y rys J Linear/Affine Transformations: y Ax ` b, where E x rxs µ, V x rxs Σ Erys E x rax ` bs AE x rxs ` b Aµ ` b Vrys V x rax ` bs V x raxs AV x rxsa J AΣA J Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

9 Mean and (Co)Variance Mean and covariance are often useful to describe properties of probability distributions (expected values and spread). Definition ż E x rxs xppxqdx : µ V x rxs E x rpx µqpx µq J s E x rxx J s E x rxse x rxs J : Σ Covrx, ys E x,y rxy J s E x rxse y rys J Linear/Affine Transformations: y Ax ` b, where E x rxs µ, V x rxs Σ Erys E x rax ` bs AE x rxs ` b Aµ ` b Vrys V x rax ` bs V x raxs AV x rxsa J AΣA J If x, y independent: V x,y rx ` ys V x rxs ` V y rys Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

10 Basic Probability Distributions Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

11 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 p(x) Mean 95% confidence bound p(x) x Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

12 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 p(x) Mean 95% confidence bound 3 2 Mean 95% confidence bound p(x) x x x 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

13 The Gaussian Distribution ppx µ, Σq p2πq D 2 Σ 1 2 exp ` 1 2 px µqj Σ 1 px µq Mean vector µ Average of the data Covariance matrix Σ Spread of the data 0.3 Data p(x) Mean 95% confidence interval 3 2 Data Mean 95% confidence bound p(x) x x x 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

14 Conditional 3 2 «ff µx Joint p(x,y) ppx, yq N, µ y «ff Σxx Σ xy Σ yx Σ yy 1 0 y x Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

15 Conditional Joint p(x,y) Observation ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y x Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

16 Conditional 3 2 Joint p(x,y) Observation y Conditional p(x y) ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy y ppx yq N `µ x y, Σ x y µ x y µ x ` Σ xy Σ 1 yy py µ y q Σ x y Σ xx Σ xy Σ 1 yy Σ yx x Conditional ppx yq is also Gaussian Computationally convenient Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

17 Marginal y Joint p(x,y) Marginal p(x) x ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Σ yy Marginal distribution: ż pp x q pp x, y qd y N ` µ x, Σ xx Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

18 Marginal y Joint p(x,y) Marginal p(x) x ppx, yq N «µx µ y ff, «ff Σxx Σ xy Σ yx Marginal distribution: ż pp x q pp x, y qd y N ` µ x, Σ xx The marginal of a joint Gaussian distribution is Gaussian Intuitively: Ignore (integrate out) everything you are not interested in Σ yy Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

19 Bernoulli Distribution Distribution for a single binary variable x P t0, 1u Governed by a single continuous parameter µ P r0, 1s that represents the probability of x P t0, 1u. ppx µq µ x p1 µq 1 x Erxs µ Vrxs µp1 µq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

20 Bernoulli Distribution Distribution for a single binary variable x P t0, 1u Governed by a single continuous parameter µ P r0, 1s that represents the probability of x P t0, 1u. ppx µq µ x p1 µq 1 x Erxs µ Vrxs µp1 µq Example: Result of flipping a coin. Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

21 Beta Distribution ppµ α, βq Erµs Distribution over a continuous variable µ P r0, 1s, which is often used to represent the probability for some binary event (see Bernoulli distribution) Governed by two parameters α ą 0, β ą 0 Γpα ` βq µα 1 p1 µq β 1 ΓpαqΓpβq α α ` β, Vrµs αβ pα ` βq 2 pα ` β ` 1q Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

22 Binomial Distribution 0.25 Generalization of the Bernoulli distribution to a distribution over integers ppm N, µq Probability of observing m occurrences of x 1 in a set of N samples from a Bernoulli distribution, where ppx 1q µ P r0, 1s ˆN µ m p1 µq N m m Erms Nµ, Vrms Nµp1 µq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

23 Binomial Distribution 0.25 Generalization of the Bernoulli distribution to a distribution over integers ppm N, µq Probability of observing m occurrences of x 1 in a set of N samples from a Bernoulli distribution, where ppx 1q µ P r0, 1s ˆN µ m p1 µq N m m Erms Nµ, Vrms Nµp1 µq Example: What is the probability of observing m heads in N experiments if the probability for observing head in a single experiment is µ? Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

24 Gamma Distribution ppτ a, bq 1 Γpaq ba τ a 1 expp bτq Erτs a b Vrτs a b 2 Distribution over positive real numbers τ ą 0 Governed by parameters a ą 0 (shape), b ą 0 (scale) Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

25 Conjugate Priors Posterior 9 prior ˆ likelihood Specification or the prior can be tricky Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

26 Conjugate Priors Posterior 9 prior ˆ likelihood Specification or the prior can be tricky Some priors are (computationally) convenient If the posterior and the prior are of the same type (e.g., Beta), the prior is called conjugate Likelihood is also involved... Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

27 Conjugate Priors Posterior 9 prior ˆ likelihood Specification or the prior can be tricky Some priors are (computationally) convenient If the posterior and the prior are of the same type (e.g., Beta), the prior is called conjugate Likelihood is also involved... Examples: Conjugate prior Likelihood Posterior Beta Bernoulli Beta Gaussian-iGamma Gaussian Gaussian-iGamma Dirichlet Multinomial Dirichlet Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

28 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

29 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

30 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 If we now observe some outcomes x px 1,..., x n q of a repeated coin-flip experiment with h heads and t tails, we compute the posterior distribution on µ : Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

31 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 If we now observe some outcomes x px 1,..., x n q of a repeated coin-flip experiment with h heads and t tails, we compute the posterior distribution on µ : ppµ x hq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

32 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 If we now observe some outcomes x px 1,..., x n q of a repeated coin-flip experiment with h heads and t tails, we compute the posterior distribution on µ : ppµ x hq 9 ppx µqppµ α, βq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

33 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 If we now observe some outcomes x px 1,..., x n q of a repeated coin-flip experiment with h heads and t tails, we compute the posterior distribution on µ : ppµ x hq 9 ppx µqppµ α, βq µ h p1 µq t µ α 1 p1 µq β 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

34 Example Consider a Binomial random variable x Binpm N, µq where ˆN ppx µq µ m p1 µq N m 9 µ a p1 µq b m for some constants a, b. We place a Beta-prior on the parameter µ: Betapµ α, βq Γpα ` βq ΓpαqΓpβq µα 1 p1 µq β 1 9 µ α 1 p1 µq β 1 If we now observe some outcomes x px 1,..., x n q of a repeated coin-flip experiment with h heads and t tails, we compute the posterior distribution on µ : ppµ x hq 9 ppx µqppµ α, βq µ h p1 µq t µ α 1 p1 µq β 1 µ h`α 1 p1 µq t`β 1 9 Betaph ` α, t ` βq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

35 Parameter Estimation Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

36 Parameter Estimation t x 1 Given a data set we want to obtain good estimates of the parameters of the model that may have generated the data Parameter estimation problem Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

37 Parameter Estimation t x 1 Given a data set we want to obtain good estimates of the parameters of the model that may have generated the data Parameter estimation problem Example: x 1,..., x N P R D are i.i.d. samples from a Gaussian Find the mean and covariance of ppxq Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

38 Maximum Likelihood Parameter Estimation (1) Maximum likelihood estimation finds a point estimate of the parameters that maximizes the likelihood of the parameters Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

39 Maximum Likelihood Parameter Estimation (1) Maximum likelihood estimation finds a point estimate of the parameters that maximizes the likelihood of the parameters In our Gaussian example, we seek µ, Σ: Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

40 Maximum Likelihood Parameter Estimation (1) Maximum likelihood estimation finds a point estimate of the parameters that maximizes the likelihood of the parameters In our Gaussian example, we seek µ, Σ: max ppx 1,..., x n µ, Σq i.i.d. max Nź ppx i µ, Σq i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

41 Maximum Likelihood Parameter Estimation (1) Maximum likelihood estimation finds a point estimate of the parameters that maximizes the likelihood of the parameters In our Gaussian example, we seek µ, Σ: max ppx 1,..., x n µ, Σq i.i.d. max max Nÿ log ppx i µ, Σq i 1 Nź ppx i µ, Σq i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

42 Maximum Likelihood Parameter Estimation (1) Maximum likelihood estimation finds a point estimate of the parameters that maximizes the likelihood of the parameters In our Gaussian example, we seek µ, Σ: max ppx 1,..., x n µ, Σq i.i.d. max max Nÿ log ppx i µ, Σq i 1 Nź ppx i µ, Σq i 1 max ND 2 logp2πq N 1 log Σ 2 2 Nÿ px i µq J Σ 1 px i µq i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

43 Maximum Likelihood Parameter Estimation (2) µ ML arg max ND µ 2 logp2πq N log Σ N Nÿ i 1 x i Nÿ px i µq J Σ 1 px i µq i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

44 Maximum Likelihood Parameter Estimation (2) µ ML arg max ND µ 2 logp2πq N log Σ Σ ML 1 N Nÿ i 1 x i arg max ND Σ 2 logp2πq N 1 log Σ N Nÿ px i µ ML qpx i µ ML q J i 1 Nÿ px i µq J Σ 1 px i µq i 1 Nÿ px i µq J Σ 1 px i µq i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

45 Maximum Likelihood Parameter Estimation (2) µ ML arg max ND µ 2 logp2πq N log Σ Σ ML 1 N Nÿ i 1 x i arg max ND Σ 2 logp2πq N 1 log Σ N Nÿ px i µ ML qpx i µ ML q J i 1 Nÿ px i µq J Σ 1 px i µq i 1 Nÿ px i µq J Σ 1 px i µq ML estimate Σ ML is biased, but we can get an unbiased estimate as Σ 1 Nÿ px i µ N 1 ML qpx i µ ML q J i 1 Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10, i 1

46 MLE: Properties Asymptotic consistency: The MLE converges to the true value in the limit of infinitely many observations, plus a random error that is approximately normal The size of the sample necessary to achieve these properties can be quite large The error s variance decays in 1{N where N is the number of data points Especially, in the small data regime, MLE can lead to overfitting Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

47 Example: MLE in the Small-Data Regime True Gaussian True mean MLE mean Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

48 Maximum A Posteriori Estimation Instead of maximizing the likelihood, we can seek parameters that maximize the posterior distribution of the parameters θ arg max θ ppθ xq arg max log ppθq ` log ppx θq θ Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

49 Maximum A Posteriori Estimation Instead of maximizing the likelihood, we can seek parameters that maximize the posterior distribution of the parameters θ arg max θ ppθ xq arg max log ppθq ` log ppx θq θ MLE with an additional regularizer that comes from the prior MAP estimator Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

50 Maximum A Posteriori Estimation Instead of maximizing the likelihood, we can seek parameters that maximize the posterior distribution of the parameters θ arg max θ ppθ xq arg max log ppθq ` log ppx θq θ MLE with an additional regularizer that comes from the prior MAP estimator Example: Estimate the mean µ of a 1D Gaussian with known variance σ 2 after having observed N data points x i. Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

51 Maximum A Posteriori Estimation Instead of maximizing the likelihood, we can seek parameters that maximize the posterior distribution of the parameters θ arg max θ ppθ xq arg max log ppθq ` log ppx θq θ MLE with an additional regularizer that comes from the prior MAP estimator Example: Estimate the mean µ of a 1D Gaussian with known variance σ 2 after having observed N data points x i. Gaussian prior ppµq N `µ m, s 2 on mean yields µ MAP Ns2 Ns 2 ` σ 2 µ σ 2 ML ` Ns 2 ` σ 2 m Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

52 Interpreting the Result µ MAP Ns2 Ns 2 ` σ 2 µ σ 2 ML ` Ns 2 ` σ 2 m Linear interpolation between the prior mean and the sample mean (ML estimate), weighted by their respective covariances Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

53 Interpreting the Result µ MAP Ns2 Ns 2 ` σ 2 µ σ 2 ML ` Ns 2 ` σ 2 m Linear interpolation between the prior mean and the sample mean (ML estimate), weighted by their respective covariances The more data we have seen (N), the more we believe the sample mean Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

54 Interpreting the Result µ MAP Ns2 Ns 2 ` σ 2 µ σ 2 ML ` Ns 2 ` σ 2 m Linear interpolation between the prior mean and the sample mean (ML estimate), weighted by their respective covariances The more data we have seen (N), the more we believe the sample mean The higher the variance s 2 of the prior, the more we believe the sample mean; MAP s2 Ñ8 ÝÑ MLE Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

55 Interpreting the Result µ MAP Ns2 Ns 2 ` σ 2 µ σ 2 ML ` Ns 2 ` σ 2 m Linear interpolation between the prior mean and the sample mean (ML estimate), weighted by their respective covariances The more data we have seen (N), the more we believe the sample mean The higher the variance s 2 of the prior, the more we believe the sample mean; MAP s2 Ñ8 ÝÑ MLE The higher the variance σ 2 of the data, the less we believe the sample mean Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

56 Example True Gaussian Mean Prior True mean MLE mean MAP mean Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

57 Bayesian Inference (Marginalization) An even better idea than MAP estimation: Instead of estimating a parameter, integrate it out (according to the posterior) when making predictions ż ppx Dq ppx θqppθ Dqdθ where ppθ Dq is the parameter posterior Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

58 Bayesian Inference (Marginalization) An even better idea than MAP estimation: Instead of estimating a parameter, integrate it out (according to the posterior) when making predictions ż ppx Dq ppx θqppθ Dqdθ where ppθ Dq is the parameter posterior This integral is often tricky to solve Choose appropriate distributions (e.g., conjugate distributions) or solve approximately (e.g., sampling or variational inference) Works well (even in the small-data regime) and is robust to overfitting Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

59 Example: Linear Regression t x 1 Adapted from PRML (Bishop, 2006) Blue: data Black: True function (unknown) Red: Posterior mean (MAP estimate) Red-shaded: 95% confidence area of the prediction Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

60 Example: Linear Regression t x 1 Adapted from PRML (Bishop, 2006) Blue: data Black: True function (unknown) Red: Posterior mean (MAP estimate) Red-shaded: 95% confidence area of the prediction Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

61 Example: Linear Regression t x 1 Adapted from PRML (Bishop, 2006) Blue: data Black: True function (unknown) Red: Posterior mean (MAP estimate) Red-shaded: 95% confidence area of the prediction Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

62 Example: Linear Regression t x 1 Adapted from PRML (Bishop, 2006) Blue: data Black: True function (unknown) Red: Posterior mean (MAP estimate) Red-shaded: 95% confidence area of the prediction Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

63 References I [1] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag, Probability Distributions and Parameter Estimation IDAPI, Lecture 11 February 10,

Gaussian Processes. Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Data Analysis and Probabilistic Inference

Gaussian Processes. Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Data Analysis and Probabilistic Inference Data Analysis and Probabilistic Inference Gaussian Processes Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Marc Deisenroth Department of Computing Imperial College

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Point Estimation. Vibhav Gogate The University of Texas at Dallas Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Data Mining Techniques. Lecture 3: Probability

Data Mining Techniques. Lecture 3: Probability Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de Meent (credit: Zhao, CS 229, Bishop) Project Vote 1. Freeform: Develop your own project proposals 30% of

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Lecture 5 September 19

Lecture 5 September 19 IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Introduc)on to Bayesian methods (con)nued) - Lecture 16 Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

Gaussians Linear Regression Bias-Variance Tradeoff

Gaussians Linear Regression Bias-Variance Tradeoff Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Computational Perception. Bayesian Inference

Computational Perception. Bayesian Inference Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

A Brief Review of Probability, Bayesian Statistics, and Information Theory

A Brief Review of Probability, Bayesian Statistics, and Information Theory A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Discrete Binary Distributions

Discrete Binary Distributions Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:

More information

Bayesian Hypotheses Testing

Bayesian Hypotheses Testing Bayesian Hypotheses Testing Jakub Repický Faculty of Mathematics and Physics, Charles University Institute of Computer Science, Czech Academy of Sciences Selected Parts of Data Mining Jan 19 2018, Prague

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Mathematics for Inference and Machine Learning

Mathematics for Inference and Machine Learning LECTURE NOTES IMPERIAL COLLEGE LONDON DEPARTMENT OF COMPUTING Mathematics for Inference and Machine Learning Instructors: Marc Deisenroth, Stefanos Zafeiriou Version: January 3, 207 Contents Linear Regression

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Parametric Models: from data to models

Parametric Models: from data to models Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Homework 5: Conditional Probability Models

Homework 5: Conditional Probability Models Homework 5: Conditional Probability Models Instructions: Your answers to the questions below, including plots and mathematical work, should be submitted as a single PDF file. It s preferred that you write

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information