CS 540: Machine Learning Lecture 2: Review of Probability & Statistics
|
|
- Stewart Todd
- 5 years ago
- Views:
Transcription
1 CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January / 35
2 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections ) AD () January / 35
3 Probability Basics Begin with a set Ω-the sample space; e.g. 6 possible rolls of a die. ω 2 Ω is a sample point/possible event. A probability space or probability model is a sample space with an assignment P (ω) for every ω 2 Ω s.t. 0 P (ω) 1 and P (ω) = 1; ω2ω e.g. for a die P (1) = P (2) = P (3) = P (4) = P (5) = P (6) = 1/6. An event A is any subset of Ω P (A) = P (ω) ω2a e.g. P (die roll < 3) = P (1) + P (2) = 1/6 + 1/6 = 1/3. AD () January / 35
4 Random Variables A random variable is loosely speaking a function from sample points to some range; e.g. the reals. P induces a probability distribution for any r.v. X : P (X = x) = P (ω) fω2ω:x (ω)=x g e.g. P (Odd = true) = P (1) + P (3) + P (5) = 1/6 + 1/6 + 1/6 = 1/2. AD () January / 35
5 Why use probability? Probability theory is nothing but common sense reduced to calculation Pierre Laplace, In 1931, de Finetti proved that it is irrational to have beliefs that violate these axioms, in the following sense: If you bet in accordance with your beliefs, but your beliefs violate the axioms, then you can be guaranteed to lose money to an opponent whose beliefs more accurately re ect the true state of the world. (Here, betting and money are proxies for decision making and utilities.) What if you refuse to bet? This is like refusing to allow time to pass: every action (including inaction) is a bet. Various other arguments (Cox, Savage). AD () January / 35
6 Probability for continuous variables We typically do not work with probability distributions but with densities. We have so We have Z P (X 2 A) = Z Ω A p X (x) dx. p X (x) dx = 1 P (X 2 [x, x + δx]) p X (x) δx Warning: A density evaluated at a point can be bigger than 1. AD () January / 35
7 Univariate Gaussian (Normal) density If X N µ, σ 2 then the pdf of X is de ned as p X (x) =! 1 p exp (x µ) 2 2πσ 2 2σ 2 We often use the precision λ = 1/σ 2 instead of the variance. AD () January / 35
8 Statistics of a distribution Recall that the mean and variance of a distribution are de ned as Z E (X ) = xp X (x) dx V (X ) = E (X E (X )) 2 = Z (x E (X )) 2 p X (x) dx For a Gaussian distribution, we have Chebyshev s inequality E (X ) = µ, V (X ) = σ 2. P (jx E (X )j ε) V (X ) ε 2. AD () January / 35
9 Cumulative distribution function The cumulative distribution function is de ned as We have F X (x) = P (X x) = Z x lim X (x) x! = 0, lim X (x) x! = 1. p X (u) du AD () January / 35
10 Law of large numbers Assume you have X i i.i.d. p X () then n Z 1 lim n! n ϕ (X i ) = E [ϕ (X )] = ϕ (x) p X (x) dx The result is still valid for weakly dependent random variables. AD () January / 35
11 Central limit theorem i.i.d. Let X i p X () such that E [X i ] = µ and V (X i ) = σ 2 then! p n 1 D n n X i µ! N 0, σ 2. This result is (too) often used to justify the Gaussian assumption AD () January / 35
12 Delta method Assume we have! p n 1 n n X i µ D! N 0, σ 2 Then we have! p n 1 n g n X i g (µ)! D! N 0, g 0 (µ) 2 σ 2 This follows from the Taylor s expansion g! n 1 n X i g (µ) + g 0 (µ) 1 n! n X i µ + o 1 n AD () January / 35
13 Prior probability Consider the following random variables Sick2 fyes,nog and Weather2 fsunny,rain,cloudy,snowg We can set a prior probability on these random variables; e.g. P (Sick = yes) = 1 P (Sick = no) = 0.1, P (Weather) = (0.72, 0.1, 0.08, 0.1) Obviously we cannot answer questions like P (Sick = yesj Weather = rainy) as long as we don t de ne an appropriate distribution. AD () January / 35
14 Joint probability distributions We need to assign a probability to each sample point For (Weather, Sick), we have to consider 4 2 events; e.g. Weather / Sick sunny rainy cloudy snow yes no From this probability distribution, we can compute probabilities of the form P (Sick = yesj Weather = rainy). AD () January / 35
15 Bayes rule We have It follows that P (x, y) = P (xj y) P (y) = P (yj x) P (x) P (xj y) = = P (x, y) P (y) P (yj x) P (x) x P (yj x) P (x) We have P (Sick = yesj Weather = rainy) = = P (Sick = yes,weather = rainy) P (Weather = rainy) = 0.2 AD () January / 35
16 Example Useful for assessing diagnostic prob from causal prob P (Causej E ect) = P (E ectj Cause) P (Cause) P (E ect) Let M be meningitis, S be sti neck: P (mj s) = P (sj m) P (m) P (s) = = Be careful to subtle exchanging of P (Aj B) for P (Bj A). AD () January / 35
17 Example Prosecutor s Fallacy. A zealous prosecutor has collected an evidence and has an expert testify that the probability of nding this evidence if the accused were innocent is one-in-a-million. The prosecutor concludes that the probability of the accused being innocent is one-in-a-million. This is WRONG. AD () January / 35
18 Assume no other evidence is available and the population is of 10 million people. De ning A = The accused is guilty then P (A) = De ning B = Finding this evidence then P (Bj A) = 1 & P Bj A = Bayes formula yields P (Aj B) = = P (Bj A) P (A) P (Bj A) P (A) + P Bj A P A ( ) 0.1. Real-life Example: Sally Clark was condemned in the UK (The RSS pointed out the mistake). Her conviction was eventually quashed (on other grounds). AD () January / 35
19 You feel sick and your GP thinks you might have contracted a rare disease (0.01% of the population has the disease). A test is available but not perfect. If a tested patient has the disease, 100% of the time the test will be positive. If a tested patient does not have the diseases, 95% of the time the test will be negative (5% false positive). Your test is positive, should you really care? AD () January / 35
20 Let A be the event that the patient has the disease and B be the event that the test returns a positive result P (Aj B) = Such a test would be a complete waste of money for you or the National Health System. A similar question was asked to 60 students and sta at Harvard Medical School: 18% got the right answer, the modal response was 95%! AD () January / 35
21 General Bayesian framework Assume you have some unknown parameter θ and some data D. The unknown parameters are now considered RANDOM of prior density p (θ) The likelihood of the observation D is p (Dj θ). The posterior is given by p ( θj D) = p (Dj θ) p (θ) p (D) where Z p (D) = p (Dj θ) p (θ) dθ p (D) is called the marginal likelihood or evidence. AD () January / 35
22 Bayesian interpretation of probability In the Bayesian approach, probability describes degrees of belief, instead of limiting frequencies. The selection of a prior has an obvious impact on the inference results! However, Bayesian statisticians are honest about it. AD () January / 35
23 Bayesian inference involves passing from a prior p (θ) to a posterior p ( θj D).We might expect that because the posterior incorporates the information from the data, it will be less variable than the prior. We have the following identities E [θ] = E [E [ θj D]], V [θ] = E [V [ θj D]] + V [E [ θj D]]. It means that, on average (over the realizations of the data X ) we expect the conditional expectation E [ θj D] to be equal to E [θ] and the posterior variance to be on average smaller than the prior variance by an amount that depend on the variations in posterior means over the distribution of possible data. AD () January / 35
24 If (θ, X ) are two scalar random variables then we have Proof: V [θ] = E [V [ θj D]] + V [E [ θj D]]. V [θ] = E θ 2 E (θ) 2 = E E θ 2 D (E (E ( θj D))) 2 = E E θ 2 D E (E ( θj D)) 2 +E (E ( θj D)) 2 (E (E ( θj D))) 2 = E (V ( θj X )) + V (E ( θj X )). Such results appear attractive but one should be careful. Here there is an underlying assumption that the observations are indeed distributed according to p (D). AD () January / 35
25 Frequentist interpretation of probability Probabilities are objective properties of the real world, and refer to limiting relative frequencies (e.g. number of times I have observed heads.). Problem. How do you attribute a probability to the following event There will be a major earthquake in Tokyo on the 27th April 2013? Hence in most cases θ is assumed unknown but xed. We then typically compute point estimates of the form bθ = ϕ (D) which are designed to have various desirable quantities when averaged over future data D 0 (assumed to be drawn from the true distribution). AD () January / 35
26 Maximum Likelihood Estimation Suppose we have X i i.i.d. p ( j θ) for i = 1,..., N then The MLE is de ned as L (θ) = p (Dj θ) = N bθ = arg max L (θ) = arg max l (θ) p (x i j θ). where l (θ) = log L (θ) = N log p (x i j θ). AD () January / 35
27 MLE for 1D Gaussian Consider X i i.i.d. N µ, σ 2 then so p Dj µ, σ 2 = N N x i ; µ, σ 2 l (θ) = N N 2 log (2π) N 2 log σ2 1 2σ 2 (x i µ) 2 Setting l(θ) µ = l(θ) = 0 yields σ 2 bµ ML = 1 N N x i, N cσ 2 ML = 1 (x i bµ N ML ) 2. h i We have E [bµ ML ] = µ and E cσ 2 ML = N N 1 σ2. AD () January / 35
28 Consistent estimators De nition. A sequence of estimators bθ N = bθ N (X 1,..., X N ) is consistent for the parameter θ if, for every ε > 0 and every θ 2 Θ lim P θ bθ N θ < ε = 1 (equivalently lim P θ bθ N θ ε = 0). N! N! i.i.d. Example: Consider X i N (µ, 1) and bµ N = 1 N N X i then bµ N N µ, N 1 and P θ bθ N θ < ε = Z ε p N ε p N 1 p 2π exp u 2 2 du! 1. It is possible to avoid this calculations and use instead Chebychev s inequality b 2 E P θ bθ N θ ε = P θ bθ n θ 2 θ θ n θ ε 2 where E θ b θ n 2 2 θ = var θ b θ n + E θ b θ n θ. AD () January / 35 ε 2
29 Example of inconsistent MLE (Fisher) (X i, Y i ) N µi µ i The likelihood function is given by L (θ) = We obtain 1 (2πσ 2 ) n exp l (θ) = cste n log σ 2 " n 1 xi + y i 2σ We have bµ i = x i + y i 2 σ 2 0, 0 σ 2. n 1 h 2σ 2 (x i µ i ) 2 + (y i µ i ) 2i! µ i , c σ 2 = n (x i y i ) 2 4n # n (x i y i ) 2.! σ2 2. AD () January / 35
30 Consistency of the MLE Kullback-Leibler Distance: For any density f, g Z f (x) D (f, g) = f (x) log dx g (x) We have Indeed Z D (f, g) = D (f, g) 0 and D (f, f ) = 0. f (x) log Z g (x) dx f (x) g (x) f (x) f (x) 1 dx = 0 D (f, g) is a very useful distance and appears in many di erent contexts. AD () January / 35
31 Sketch of consistency of the MLE Assume the pdfs f (xj θ) have common support for all θ and p (xj θ) 6= p xj θ 0 for θ 6= θ 0 ; i.e. S θ = fx : p (xj θ) > 0g is independent of θ. Denote M N (θ) = 1 N N log p (X i j θ) p (X i j θ ) As the MLE bθ n maximizes L (θ), it also maximizes M N (θ). i.i.d. Assume X i p ( j θ ). Note that by the law of large numbers M N (θ) converges to Z p (X j θ) p (xj θ) E θ log = p (xj θ ) log p (X j θ ) p (xj θ ) dx = D (p ( j θ ), p ( j θ)) := M (θ). Hence, M N (θ) D (p ( j θ ), p ( j θ)) which is maximized for θ so we expect that its maximizer will converge towards θ. AD () January / 35
32 Asymptotic Normality Assuming bθ N is a consistent estimate of θ, we have under regularity assumptions p N b θ N θ ) N 0, [I (θ 1 )] where We can estimate I (θ ) through [I (θ)] k,l = 2 log p (xj θ) E θ k θ l [I (θ )] k,l = 1 N N 2 log p (X i j θ) θ k θ l b θ N AD () January / 35
33 Sketch of the proof We have 0 = l 0 bθ N l 0 (θ ) + b θ N ) p N b θ N θ = 1 p N l 0 (θ ) 1 N l 00 (θ ) θ l 00 (θ ) h Now l 0 (θ ) = n log p( X i jθ) θ where E log p( Xi jθ) θ θ θ h i log p( Xi jθ) var θ θ = I (θ ) so the CLT tells us that θ where by the law of large numbers 1 N l 00 (θ ) = 1 p N l 0 (θ ) D! N (0, I (θ )) 1 N N 2 log p (X i j θ) θ k θ l θ! I (θ ). θ i = 0 and AD () January / 35
34 Frequentist con dence intervals The MLE being approximately Gaussian, it is possible to de ne (approximate) con dence intervals. However be careful, θ has a 95% con dence interval of [a, b] does NOT mean that P ( θ 2 [a, b]j D) = It means P D 0 P ( jd ) θ 2 a D 0, b D 0 = 0.95 i.e., if we were to repeat the experiment, then 95% of the time, the true parameter would be in that interval. This does not mean that we believe it is in that interval given the actual data D we have observed. AD () January / 35
35 Bayesian statistics as an unifying approach In my opinion, the Bayesian approach is much simpler and more natural than classical statistics. Once you have de ned a Bayesian model, everything follows naturally. De ning a Bayesian model requires specifying a prior distribution, but this assumption is clearly stated. For many years, Bayesian statistics could not be implemented for all but the simple models but now numerous algorithms have been proposed to perform approximate inference. AD () January / 35
Machine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationAarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationProbability, Statistics, and Bayes Theorem Session 3
Probability, Statistics, and Bayes Theorem Session 3 1 Introduction Now that we know what Bayes Theorem is, we want to explore some of the ways that it can be used in real-life situations. Often the results
More information2. A Basic Statistical Toolbox
. A Basic Statistical Toolbo Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Wikipedia definition Mathematical statistics: concerned
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More information15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018
15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due
More informationL2: Review of probability and statistics
Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution
More informationBayesian Inference. Introduction
Bayesian Inference Introduction The frequentist approach to inference holds that probabilities are intrinsicially tied (unsurprisingly) to frequencies. This interpretation is actually quite natural. What,
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationBayesian decision theory. Nuno Vasconcelos ECE Department, UCSD
Bayesian decision theory Nuno Vasconcelos ECE Department, UCSD Notation the notation in DHS is quite sloppy e.g. show that ( error = ( error z ( z dz really not clear what this means we will use the following
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationCOMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012
COMP9414/ 9814/ 3411: Artificial Intelligence 14. Uncertainty Russell & Norvig, Chapter 13. COMP9414/9814/3411 14s1 Uncertainty 1 Outline Uncertainty Probability Syntax and Semantics Inference Independence
More informationBrief Review of Probability
Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic
More informationLecture Stat Information Criterion
Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationV7 Foundations of Probability Theory
V7 Foundations of Probability Theory Probability : degree of confidence that an event of an uncertain nature will occur. Events : we will assume that there is an agreed upon space of possible outcomes
More informationReview: Statistical Model
Review: Statistical Model { f θ :θ Ω} A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced the data. The statistical model
More informationSTATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN
Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:
More informationUncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty
Bayes Classification n Uncertainty & robability n Baye's rule n Choosing Hypotheses- Maximum a posteriori n Maximum Likelihood - Baye's concept learning n Maximum Likelihood of real valued function n Bayes
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationParameter estimation Conditional risk
Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationArtificial Intelligence
Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:
More informationCombining Macroeconomic Models for Prediction
Combining Macroeconomic Models for Prediction John Geweke University of Technology Sydney 15th Australasian Macro Workshop April 8, 2010 Outline 1 Optimal prediction pools 2 Models and data 3 Optimal pools
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationDiscrete Probability and State Estimation
6.01, Fall Semester, 2007 Lecture 12 Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Fall Semester, 2007 Lecture 12 Notes
More informationUncertainty. Chapter 13
Uncertainty Chapter 13 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule Uncertainty Let s say you want to get to the airport in time for a flight. Let action A
More informationPreliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com
1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationAn AI-ish view of Probability, Conditional Probability & Bayes Theorem
An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there
More information10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.
An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationSession 2B: Some basic simulation methods
Session 2B: Some basic simulation methods John Geweke Bayesian Econometrics and its Applications August 14, 2012 ohn Geweke Bayesian Econometrics and its Applications Session 2B: Some () basic simulation
More informationLecture Stat Bayesian Statistics
Lecture Stat 461-561 Bayesian Statistics AD March 2008 AD () March 2008 1 / 89 Introduction The following problem is taken from Pillow Problems by Lewis Carroll Consider a bag containing a ball of unknown
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationECE531: Principles of Detection and Estimation Course Introduction
ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 22-January-2009 WPI D. Richard Brown III 22-January-2009 1 / 37 Lecture 1 Major Topics 1. Web page. 2. Syllabus
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationParametric Models: from data to models
Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:
More informationPengju
Introduction to AI Chapter13 Uncertainty Pengju Ren@IAIR Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule Example: Car diagnosis Wumpus World Environment Squares
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationReview of probabilities
CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes
More informationOutline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty
Outline Uncertainty Uncertainty Chapter 13 Probability Syntax and Semantics Inference Independence and ayes Rule Chapter 13 1 Chapter 13 2 Uncertainty et action A t =leaveforairportt minutes before flight
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationProbability Theory Review
Probability Theory Review Brendan O Connor 10-601 Recitation Sept 11 & 12, 2012 1 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2 Probability
More informationUncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13
Uncertainty AIMA2e Chapter 13 1 Outline Uncertainty Probability Syntax and Semantics Inference Independence and ayes Rule 2 Uncertainty Let action A t = leave for airport t minutes before flight Will A
More informationDiscrete Probability and State Estimation
6.01, Spring Semester, 2008 Week 12 Course Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Spring Semester, 2008 Week
More informationParameter Estimation. Industrial AI Lab.
Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationSpace Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses
Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William
More informationProbabilistic Reasoning
Probabilistic Reasoning Philipp Koehn 4 April 2017 Outline 1 Uncertainty Probability Inference Independence and Bayes Rule 2 uncertainty Uncertainty 3 Let action A t = leave for airport t minutes before
More informationProbability and statistics; Rehearsal for pattern recognition
Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationProbability theory basics
Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:
More informationProbability Based Learning
Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationRefresher on Discrete Probability
Refresher on Discrete Probability STAT 27725/CMSC 25400: Machine Learning Shubhendu Trivedi University of Chicago October 2015 Background Things you should have seen before Events, Event Spaces Probability
More informationDiscrete Structures Prelim 1 Selected problems from past exams
Discrete Structures Prelim 1 CS2800 Selected problems from past exams 1. True or false (a) { } = (b) Every set is a subset of its power set (c) A set of n events are mutually independent if all pairs of
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More information