CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

Size: px
Start display at page:

Download "CS 540: Machine Learning Lecture 2: Review of Probability & Statistics"

Transcription

1 CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January / 35

2 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections ) AD () January / 35

3 Probability Basics Begin with a set Ω-the sample space; e.g. 6 possible rolls of a die. ω 2 Ω is a sample point/possible event. A probability space or probability model is a sample space with an assignment P (ω) for every ω 2 Ω s.t. 0 P (ω) 1 and P (ω) = 1; ω2ω e.g. for a die P (1) = P (2) = P (3) = P (4) = P (5) = P (6) = 1/6. An event A is any subset of Ω P (A) = P (ω) ω2a e.g. P (die roll < 3) = P (1) + P (2) = 1/6 + 1/6 = 1/3. AD () January / 35

4 Random Variables A random variable is loosely speaking a function from sample points to some range; e.g. the reals. P induces a probability distribution for any r.v. X : P (X = x) = P (ω) fω2ω:x (ω)=x g e.g. P (Odd = true) = P (1) + P (3) + P (5) = 1/6 + 1/6 + 1/6 = 1/2. AD () January / 35

5 Why use probability? Probability theory is nothing but common sense reduced to calculation Pierre Laplace, In 1931, de Finetti proved that it is irrational to have beliefs that violate these axioms, in the following sense: If you bet in accordance with your beliefs, but your beliefs violate the axioms, then you can be guaranteed to lose money to an opponent whose beliefs more accurately re ect the true state of the world. (Here, betting and money are proxies for decision making and utilities.) What if you refuse to bet? This is like refusing to allow time to pass: every action (including inaction) is a bet. Various other arguments (Cox, Savage). AD () January / 35

6 Probability for continuous variables We typically do not work with probability distributions but with densities. We have so We have Z P (X 2 A) = Z Ω A p X (x) dx. p X (x) dx = 1 P (X 2 [x, x + δx]) p X (x) δx Warning: A density evaluated at a point can be bigger than 1. AD () January / 35

7 Univariate Gaussian (Normal) density If X N µ, σ 2 then the pdf of X is de ned as p X (x) =! 1 p exp (x µ) 2 2πσ 2 2σ 2 We often use the precision λ = 1/σ 2 instead of the variance. AD () January / 35

8 Statistics of a distribution Recall that the mean and variance of a distribution are de ned as Z E (X ) = xp X (x) dx V (X ) = E (X E (X )) 2 = Z (x E (X )) 2 p X (x) dx For a Gaussian distribution, we have Chebyshev s inequality E (X ) = µ, V (X ) = σ 2. P (jx E (X )j ε) V (X ) ε 2. AD () January / 35

9 Cumulative distribution function The cumulative distribution function is de ned as We have F X (x) = P (X x) = Z x lim X (x) x! = 0, lim X (x) x! = 1. p X (u) du AD () January / 35

10 Law of large numbers Assume you have X i i.i.d. p X () then n Z 1 lim n! n ϕ (X i ) = E [ϕ (X )] = ϕ (x) p X (x) dx The result is still valid for weakly dependent random variables. AD () January / 35

11 Central limit theorem i.i.d. Let X i p X () such that E [X i ] = µ and V (X i ) = σ 2 then! p n 1 D n n X i µ! N 0, σ 2. This result is (too) often used to justify the Gaussian assumption AD () January / 35

12 Delta method Assume we have! p n 1 n n X i µ D! N 0, σ 2 Then we have! p n 1 n g n X i g (µ)! D! N 0, g 0 (µ) 2 σ 2 This follows from the Taylor s expansion g! n 1 n X i g (µ) + g 0 (µ) 1 n! n X i µ + o 1 n AD () January / 35

13 Prior probability Consider the following random variables Sick2 fyes,nog and Weather2 fsunny,rain,cloudy,snowg We can set a prior probability on these random variables; e.g. P (Sick = yes) = 1 P (Sick = no) = 0.1, P (Weather) = (0.72, 0.1, 0.08, 0.1) Obviously we cannot answer questions like P (Sick = yesj Weather = rainy) as long as we don t de ne an appropriate distribution. AD () January / 35

14 Joint probability distributions We need to assign a probability to each sample point For (Weather, Sick), we have to consider 4 2 events; e.g. Weather / Sick sunny rainy cloudy snow yes no From this probability distribution, we can compute probabilities of the form P (Sick = yesj Weather = rainy). AD () January / 35

15 Bayes rule We have It follows that P (x, y) = P (xj y) P (y) = P (yj x) P (x) P (xj y) = = P (x, y) P (y) P (yj x) P (x) x P (yj x) P (x) We have P (Sick = yesj Weather = rainy) = = P (Sick = yes,weather = rainy) P (Weather = rainy) = 0.2 AD () January / 35

16 Example Useful for assessing diagnostic prob from causal prob P (Causej E ect) = P (E ectj Cause) P (Cause) P (E ect) Let M be meningitis, S be sti neck: P (mj s) = P (sj m) P (m) P (s) = = Be careful to subtle exchanging of P (Aj B) for P (Bj A). AD () January / 35

17 Example Prosecutor s Fallacy. A zealous prosecutor has collected an evidence and has an expert testify that the probability of nding this evidence if the accused were innocent is one-in-a-million. The prosecutor concludes that the probability of the accused being innocent is one-in-a-million. This is WRONG. AD () January / 35

18 Assume no other evidence is available and the population is of 10 million people. De ning A = The accused is guilty then P (A) = De ning B = Finding this evidence then P (Bj A) = 1 & P Bj A = Bayes formula yields P (Aj B) = = P (Bj A) P (A) P (Bj A) P (A) + P Bj A P A ( ) 0.1. Real-life Example: Sally Clark was condemned in the UK (The RSS pointed out the mistake). Her conviction was eventually quashed (on other grounds). AD () January / 35

19 You feel sick and your GP thinks you might have contracted a rare disease (0.01% of the population has the disease). A test is available but not perfect. If a tested patient has the disease, 100% of the time the test will be positive. If a tested patient does not have the diseases, 95% of the time the test will be negative (5% false positive). Your test is positive, should you really care? AD () January / 35

20 Let A be the event that the patient has the disease and B be the event that the test returns a positive result P (Aj B) = Such a test would be a complete waste of money for you or the National Health System. A similar question was asked to 60 students and sta at Harvard Medical School: 18% got the right answer, the modal response was 95%! AD () January / 35

21 General Bayesian framework Assume you have some unknown parameter θ and some data D. The unknown parameters are now considered RANDOM of prior density p (θ) The likelihood of the observation D is p (Dj θ). The posterior is given by p ( θj D) = p (Dj θ) p (θ) p (D) where Z p (D) = p (Dj θ) p (θ) dθ p (D) is called the marginal likelihood or evidence. AD () January / 35

22 Bayesian interpretation of probability In the Bayesian approach, probability describes degrees of belief, instead of limiting frequencies. The selection of a prior has an obvious impact on the inference results! However, Bayesian statisticians are honest about it. AD () January / 35

23 Bayesian inference involves passing from a prior p (θ) to a posterior p ( θj D).We might expect that because the posterior incorporates the information from the data, it will be less variable than the prior. We have the following identities E [θ] = E [E [ θj D]], V [θ] = E [V [ θj D]] + V [E [ θj D]]. It means that, on average (over the realizations of the data X ) we expect the conditional expectation E [ θj D] to be equal to E [θ] and the posterior variance to be on average smaller than the prior variance by an amount that depend on the variations in posterior means over the distribution of possible data. AD () January / 35

24 If (θ, X ) are two scalar random variables then we have Proof: V [θ] = E [V [ θj D]] + V [E [ θj D]]. V [θ] = E θ 2 E (θ) 2 = E E θ 2 D (E (E ( θj D))) 2 = E E θ 2 D E (E ( θj D)) 2 +E (E ( θj D)) 2 (E (E ( θj D))) 2 = E (V ( θj X )) + V (E ( θj X )). Such results appear attractive but one should be careful. Here there is an underlying assumption that the observations are indeed distributed according to p (D). AD () January / 35

25 Frequentist interpretation of probability Probabilities are objective properties of the real world, and refer to limiting relative frequencies (e.g. number of times I have observed heads.). Problem. How do you attribute a probability to the following event There will be a major earthquake in Tokyo on the 27th April 2013? Hence in most cases θ is assumed unknown but xed. We then typically compute point estimates of the form bθ = ϕ (D) which are designed to have various desirable quantities when averaged over future data D 0 (assumed to be drawn from the true distribution). AD () January / 35

26 Maximum Likelihood Estimation Suppose we have X i i.i.d. p ( j θ) for i = 1,..., N then The MLE is de ned as L (θ) = p (Dj θ) = N bθ = arg max L (θ) = arg max l (θ) p (x i j θ). where l (θ) = log L (θ) = N log p (x i j θ). AD () January / 35

27 MLE for 1D Gaussian Consider X i i.i.d. N µ, σ 2 then so p Dj µ, σ 2 = N N x i ; µ, σ 2 l (θ) = N N 2 log (2π) N 2 log σ2 1 2σ 2 (x i µ) 2 Setting l(θ) µ = l(θ) = 0 yields σ 2 bµ ML = 1 N N x i, N cσ 2 ML = 1 (x i bµ N ML ) 2. h i We have E [bµ ML ] = µ and E cσ 2 ML = N N 1 σ2. AD () January / 35

28 Consistent estimators De nition. A sequence of estimators bθ N = bθ N (X 1,..., X N ) is consistent for the parameter θ if, for every ε > 0 and every θ 2 Θ lim P θ bθ N θ < ε = 1 (equivalently lim P θ bθ N θ ε = 0). N! N! i.i.d. Example: Consider X i N (µ, 1) and bµ N = 1 N N X i then bµ N N µ, N 1 and P θ bθ N θ < ε = Z ε p N ε p N 1 p 2π exp u 2 2 du! 1. It is possible to avoid this calculations and use instead Chebychev s inequality b 2 E P θ bθ N θ ε = P θ bθ n θ 2 θ θ n θ ε 2 where E θ b θ n 2 2 θ = var θ b θ n + E θ b θ n θ. AD () January / 35 ε 2

29 Example of inconsistent MLE (Fisher) (X i, Y i ) N µi µ i The likelihood function is given by L (θ) = We obtain 1 (2πσ 2 ) n exp l (θ) = cste n log σ 2 " n 1 xi + y i 2σ We have bµ i = x i + y i 2 σ 2 0, 0 σ 2. n 1 h 2σ 2 (x i µ i ) 2 + (y i µ i ) 2i! µ i , c σ 2 = n (x i y i ) 2 4n # n (x i y i ) 2.! σ2 2. AD () January / 35

30 Consistency of the MLE Kullback-Leibler Distance: For any density f, g Z f (x) D (f, g) = f (x) log dx g (x) We have Indeed Z D (f, g) = D (f, g) 0 and D (f, f ) = 0. f (x) log Z g (x) dx f (x) g (x) f (x) f (x) 1 dx = 0 D (f, g) is a very useful distance and appears in many di erent contexts. AD () January / 35

31 Sketch of consistency of the MLE Assume the pdfs f (xj θ) have common support for all θ and p (xj θ) 6= p xj θ 0 for θ 6= θ 0 ; i.e. S θ = fx : p (xj θ) > 0g is independent of θ. Denote M N (θ) = 1 N N log p (X i j θ) p (X i j θ ) As the MLE bθ n maximizes L (θ), it also maximizes M N (θ). i.i.d. Assume X i p ( j θ ). Note that by the law of large numbers M N (θ) converges to Z p (X j θ) p (xj θ) E θ log = p (xj θ ) log p (X j θ ) p (xj θ ) dx = D (p ( j θ ), p ( j θ)) := M (θ). Hence, M N (θ) D (p ( j θ ), p ( j θ)) which is maximized for θ so we expect that its maximizer will converge towards θ. AD () January / 35

32 Asymptotic Normality Assuming bθ N is a consistent estimate of θ, we have under regularity assumptions p N b θ N θ ) N 0, [I (θ 1 )] where We can estimate I (θ ) through [I (θ)] k,l = 2 log p (xj θ) E θ k θ l [I (θ )] k,l = 1 N N 2 log p (X i j θ) θ k θ l b θ N AD () January / 35

33 Sketch of the proof We have 0 = l 0 bθ N l 0 (θ ) + b θ N ) p N b θ N θ = 1 p N l 0 (θ ) 1 N l 00 (θ ) θ l 00 (θ ) h Now l 0 (θ ) = n log p( X i jθ) θ where E log p( Xi jθ) θ θ θ h i log p( Xi jθ) var θ θ = I (θ ) so the CLT tells us that θ where by the law of large numbers 1 N l 00 (θ ) = 1 p N l 0 (θ ) D! N (0, I (θ )) 1 N N 2 log p (X i j θ) θ k θ l θ! I (θ ). θ i = 0 and AD () January / 35

34 Frequentist con dence intervals The MLE being approximately Gaussian, it is possible to de ne (approximate) con dence intervals. However be careful, θ has a 95% con dence interval of [a, b] does NOT mean that P ( θ 2 [a, b]j D) = It means P D 0 P ( jd ) θ 2 a D 0, b D 0 = 0.95 i.e., if we were to repeat the experiment, then 95% of the time, the true parameter would be in that interval. This does not mean that we believe it is in that interval given the actual data D we have observed. AD () January / 35

35 Bayesian statistics as an unifying approach In my opinion, the Bayesian approach is much simpler and more natural than classical statistics. Once you have de ned a Bayesian model, everything follows naturally. De ning a Bayesian model requires specifying a prior distribution, but this assumption is clearly stated. For many years, Bayesian statistics could not be implemented for all but the simple models but now numerous algorithms have been proposed to perform approximate inference. AD () January / 35

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

Probability, Statistics, and Bayes Theorem Session 3

Probability, Statistics, and Bayes Theorem Session 3 Probability, Statistics, and Bayes Theorem Session 3 1 Introduction Now that we know what Bayes Theorem is, we want to explore some of the ways that it can be used in real-life situations. Often the results

More information

2. A Basic Statistical Toolbox

2. A Basic Statistical Toolbox . A Basic Statistical Toolbo Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Wikipedia definition Mathematical statistics: concerned

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

L2: Review of probability and statistics

L2: Review of probability and statistics Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution

More information

Bayesian Inference. Introduction

Bayesian Inference. Introduction Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD Bayesian decision theory Nuno Vasconcelos ECE Department, UCSD Notation the notation in DHS is quite sloppy e.g. show that ( error = ( error z ( z dz really not clear what this means we will use the following

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012 COMP9414/ 9814/ 3411: Artificial Intelligence 14. Uncertainty Russell & Norvig, Chapter 13. COMP9414/9814/3411 14s1 Uncertainty 1 Outline Uncertainty Probability Syntax and Semantics Inference Independence

More information

Brief Review of Probability

Brief Review of Probability Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic

More information

Lecture Stat Information Criterion

Lecture Stat Information Criterion Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

V7 Foundations of Probability Theory

V7 Foundations of Probability Theory V7 Foundations of Probability Theory Probability : degree of confidence that an event of an uncertain nature will occur. Events : we will assume that there is an agreed upon space of possible outcomes

More information

Review: Statistical Model

Review: Statistical Model Review: Statistical Model { f θ :θ Ω} A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced the data. The statistical model

More information

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:

More information

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty Bayes Classification n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Parameter estimation Conditional risk

Parameter estimation Conditional risk Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Choosing among models

Choosing among models Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:

More information

Combining Macroeconomic Models for Prediction

Combining Macroeconomic Models for Prediction Combining Macroeconomic Models for Prediction John Geweke University of Technology Sydney 15th Australasian Macro Workshop April 8, 2010 Outline 1 Optimal prediction pools 2 Models and data 3 Optimal pools

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Fall Semester, 2007 Lecture 12 Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Fall Semester, 2007 Lecture 12 Notes

More information

Uncertainty. Chapter 13

Uncertainty. Chapter 13 Uncertainty Chapter 13 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule Uncertainty Let s say you want to get to the airport in time for a flight. Let action A

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

An AI-ish view of Probability, Conditional Probability & Bayes Theorem An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there

More information

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty. An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Session 2B: Some basic simulation methods

Session 2B: Some basic simulation methods Session 2B: Some basic simulation methods John Geweke Bayesian Econometrics and its Applications August 14, 2012 ohn Geweke Bayesian Econometrics and its Applications Session 2B: Some () basic simulation

More information

Lecture Stat Bayesian Statistics

Lecture Stat Bayesian Statistics Lecture Stat 461-561 Bayesian Statistics AD March 2008 AD () March 2008 1 / 89 Introduction The following problem is taken from Pillow Problems by Lewis Carroll Consider a bag containing a ball of unknown

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

ECE531: Principles of Detection and Estimation Course Introduction

ECE531: Principles of Detection and Estimation Course Introduction ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 22-January-2009 WPI D. Richard Brown III 22-January-2009 1 / 37 Lecture 1 Major Topics 1. Web page. 2. Syllabus

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Parametric Models: from data to models

Parametric Models: from data to models Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:

More information

Pengju

Pengju Introduction to AI Chapter13 Uncertainty Pengju Ren@IAIR Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule Example: Car diagnosis Wumpus World Environment Squares

More information

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Point Estimation. Vibhav Gogate The University of Texas at Dallas Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables

More information

Review of probabilities

Review of probabilities CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes

More information

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty Outline Uncertainty Uncertainty Chapter 13 Probability Syntax and Semantics Inference Independence and ayes Rule Chapter 13 1 Chapter 13 2 Uncertainty et action A t =leaveforairportt minutes before flight

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Probability Theory Review

Probability Theory Review Probability Theory Review Brendan O Connor 10-601 Recitation Sept 11 & 12, 2012 1 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2 Probability

More information

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13 Uncertainty AIMA2e Chapter 13 1 Outline Uncertainty Probability Syntax and Semantics Inference Independence and ayes Rule 2 Uncertainty Let action A t = leave for airport t minutes before flight Will A

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Spring Semester, 2008 Week 12 Course Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Spring Semester, 2008 Week

More information

Parameter Estimation. Industrial AI Lab.

Parameter Estimation. Industrial AI Lab. Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William

More information

Probabilistic Reasoning

Probabilistic Reasoning Probabilistic Reasoning Philipp Koehn 4 April 2017 Outline 1 Uncertainty Probability Inference Independence and Bayes Rule 2 uncertainty Uncertainty 3 Let action A t = leave for airport t minutes before

More information

Probability and statistics; Rehearsal for pattern recognition

Probability and statistics; Rehearsal for pattern recognition Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Probability theory basics

Probability theory basics Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:

More information

Probability Based Learning

Probability Based Learning Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Refresher on Discrete Probability

Refresher on Discrete Probability Refresher on Discrete Probability STAT 27725/CMSC 25400: Machine Learning Shubhendu Trivedi University of Chicago October 2015 Background Things you should have seen before Events, Event Spaces Probability

More information

Discrete Structures Prelim 1 Selected problems from past exams

Discrete Structures Prelim 1 Selected problems from past exams Discrete Structures Prelim 1 CS2800 Selected problems from past exams 1. True or false (a) { } = (b) Every set is a subset of its power set (c) A set of n events are mutually independent if all pairs of

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information