CS395T Computational Statistics with Application to Bioinformatics

Size: px
Start display at page:

Download "CS395T Computational Statistics with Application to Bioinformatics"

Transcription

1 CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 2: Estimation of Parameters The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 1

2 Our next topic is Bayesian Estimation of Parameters. We ll ease into it with an example that looks a lot like the Monte Hall Problem: The Jailer s Tip: A Of 3 prisoners (A,B,C), 2 will be released tomorrow. A, who thinks he has a 2/3 chance of being released, asks jailer for name of one of the lucky but not himself. Jailer says, truthfully, B. Darn, thinks A, now my chances are only ½, C or me. Is this like Monty Hall? Did the data ( B ) change the probabilities? The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 2

3 Further, suppose (unlike Monty Hall) the jailer is not indifferent about responding B versus C. Does that change your answer to the previous question? P (S B BC) =x, (0 x 1) says B P (A S B )=P (AB S B )+P (AC S B ) = = P (S B AB)P (AB) P (S B AB)P (AB)+P (S B BC)P (BC)+P (S B CA)P (CA) x = 1 1+x 1 1/3 So if A knows the value x, he can calculate his chances. If x=1/2 (like Monty Hall), his chances are 2/3, same as before; so (unlike Monty Hall) he got no new information. If x 1/2, he does get new info his chances change. But what if he doesn t know x at all? x The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 3

4 Marginalization (this is important!) When a model has unknown, or uninteresting, parameters we integrate them out multiplying by any knowledge of their distribution At worst, just a prior informed by background information At best, a narrower distribution based on data This is not any new assumption about the world it s just the Law of de-anding (e.g., Jailer s Tip): P (A S B I)= R x P (A S BxI) p(x I) dx = R x 1 1+x p(x I) dx law of de-anding: x s are EME! The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 4

5 (repeating previous equation:) P (A S B I)= R x P (A S BxI) p(x I) dx = R x 1 1+x p(x I) dx first time we ve seen a continuous probability distribution, but we ll skip the obvious repetition of all the previous laws p(x) p(x I) (Notice that p(x) is a probability of a probability! That is fairly common in Bayesian inference.) X P i =1 X i i p(x i )dx i =1 Z x p(x) dx =1 The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 5

6 (repeating previous equation:) P (A S B I)= R x P (A S BxI) p(x I) dx = R x 1 1+x p(x I) dx What should Prisoner A take for p(x)? Maybe the uniform prior? A p(x) =1, (0 x 1) P (A S B I)= R x dx =ln2=0.693 Not the same as the massed prior at x=1/2 Dirac delta function p(x) =δ(x 1 2 ), (0 x 1) P (A S B I)= 1 1+1/2 =2/3 substitute value and remove integral The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 6

7 Review where we are: We are trying to estimate a parameter x = P (S B BC), (0 x 1) P (A S B I)= R x P (A S BxI) p(x I) dx = R x 1 1+x p(x I) dx The form of our estimate is a (Bayesian) probability distribution (of the parameter, itself here just happening to be a probability) This is a sterile exercise if it is just a debate about priors. What we need is data! Data might be a previous history of choices by the jailer in identical circumstances. BCBCCBCCCBBCBCBCCCCBBCBCCCBCBCBBCCB N =35, N B =15, N C =20 (What s wrong with: x=15/35=0.43? Hold on ) We hypothesize (might later try to check) that these are i.i.d. Bernoulli trials and therefore informative about x independent and identically distributed As good Bayesians, we now need P (data x) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 7

8 P (data x) means different things in frequentist vs. Bayesian contexts, so this is a good time to understand the differences (we ll use both ideas as appropriate) Frequentist considers the universe of what might have been, imagining repeated trials, even if they weren t actually tried, and needs no prior: since i.i.d. only the N s can matter (a so-called sufficient statistic ). P (data x) = no. of equivalent arrangements µ N N B prob. of exact sequence seen x N B (1 x) N C n k = n! k!(n k)! Bayesian considers only the exact data seen, and has a prior: P (x data) x N B (1 x) N C p(x I) No binomial coefficient, both conceptually and also since independent of x and absorbed in the proportionality. Use only the data you see, not equivalent arrangements that you didn t see. This issue is one we ll return to, not always entirely sympathetically to Bayesians (e.g., goodness-of-fit). but we might first suppose that the prior it is uniform The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 8

9 Bayes numerator and denominator are: P (x data) = x N B (1 x) N N B 1 Z 1 0 P (x data) = Z 1 0 x N B (1 x) N N B dx = Γ(N B +1)Γ(N N B +1) Γ(N +2) Plot of numerator over denominator for N=35, N B = 15: so this is what the data tells us about p(x) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 9

10 You should learn to do calculations like this in MATLAB or Mathematica: syms nn nb x num = x^nb * (1-x)^(nn-nb) num = x^nb*(1-x)^(nn-nb) denom = int(num, 0, 1) denom = gamma(nn-nb+1)*gamma(nb+1)/gamma(nn+2) p = num / denom p = x^nb*(1-x)^(nn-nb)/gamma(nnnb+1)/gamma(nb+1)*gamma(nn+2) ezplot(subs(p,[nn,nb],[35,15]),[0,1]) In[7]:= num x^nb 1 x ^ nn nb Out[7]= 1 x nb nn x nb In[8]:= denom Integrate num, x, 0, 1, GenerateConditions False Out[8]= Gamma 1 nb Gamma 1 nb nn Gamma 2 nn In[9]:= p x_ num denom Out[9]= 1 x nb nn x nb Gamma 2 nn Gamma 1 nb Gamma 1 nb nn In[12]:= Plot p x. nn 35, nb 15, x, 0, 1, PlotRange All, Frame True Out[12]= Graphics The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 10

11 Find the mean, standard error, and mode of our estimate for x P (x data) = x N B (1 x) N N B dp (x data) dx =0 x = N B N maximum likelihood (ML) answer is to estimate x as exactly the fraction seen hxi = Z 1 0 xp (x data)dx = N B +1 N +2 mean is the 1 st moment notice it s different from ML! variance involves the 2 nd moment, Var(x) = x 2 hxi 2 = Z 1 0 x 2 P (x data)dx hxi 2 = (N B +1)(N N B +1) (N +2) 2 (N +3) This shows how p(x) gets narrower as the amount of data increases. The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 11

12 (Let s leave behind the metaphor of the Jailer and Prisoner A.) What we are illustrating is called Bernoulli trials: two possible outcomes i.i.d. events a single parameter x (the probability of one outcome) a sufficient statistic is the pair of numbers N and N B P (data x) =x N B (1 x) N N B (in the Bayesian sense) Jacob and Johann Bernoulli P (x data) x N B (1 x) N N B P (x I) for uniform prior, the Bayes denominator is, as we ve seen, easy to calculate: Z 1 0 P (x data) = Z 1 0 x N B (1 x) N N B dx = Γ(N B +1)Γ(N N B +1) Γ(N +2) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 12

13 Are there any other mathematical forms for the prior that would still leave the Bayes denominator easy to calculate? Yes! try P (x I) x β (1 x) α Choose and to make any desired center and width. P (x data) = x N B (1 x) N N B x β (1 x) α Z 1 0 P (x data) = Z 1 0 x N B+β (1 x) N N B+α dx = Γ(N B + β +1)Γ(N N B + α +1) Γ(N + α + β +2) Priors that preserve the analytic form of p(x) are called conjugate priors. There is nothing special about them except mathematical convenience. If you start with a conjugate prior, you ll also be able to assimilate new data trivially, just by changing the parameters of your estimate. This is because every posterior is in the right analytic form to be the new prior! The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 13

14 By the way, if I show a special love of Bernoulli trials, it might be because I am an academic descendent of the Bernoulli brothers! Actually, this is not a very exclusive club: Gauss and the Bernoullis each have ~50,000 recorded descendents in the Mathematics Genealogy database, and probably many times more unrecorded. The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 14

15 Next example (with some biology): Individual identity, or ancestry, can be determined by variable length short tandem repeats (STRs) in the genome. ~0.5% mutation prob per STR per generation (though highly variable) if use Y chromosome only, get paternal ancestry There are companies that sell certificates with your genotype. A bit opportunistic, since in a few years your whole genome will be sequenced by your health plan. The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 15

16 Margaret, my ex-wife, is really into the Towne family. (And, she s neither a biologist nor a Towne.) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 16

17 Here s data from Margaret on 8 recent Townes (identified only by T code). (We ll use this data several times in the next few of lectures.) or, just showing the changes: The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 17

18 Let s do a Bayesian estimation of the parameter r, the mutation rate per locus per generation. let s assume no back mutations! (their effect on this data set would be small) N=1 N=3 =0 bin(0,3x37,r) N=6 =0 bin(0,3x37,r) N=9 N=10 N=11 =23 =0 =5 bin(0,6x37,r) =1 =0 bin(1,5x37,r) bin(0,5x37,r) =4 (of 12) =3 bin(1,10x37,r) =1 bin(1,11x37,r) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 18

19 Unraveling dependencies We used ancestry as an intuitive example of how dependencies work, because its causal mechanism is well known! Later, we ll do more general Bayesian networks to capture models with more complicated causal dependencies. a b c d e Another important idea is conditional independence Example: b and e are conditionally independent given a while b and d are not conditionally independent given a:? The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 19

20 So we have a statistical model for the data, that is, a way to compute P (data parameters) It is not exact, but statistical models rarely (never?) are. neglects backmutations assumes single probability for all loci etc. The model is: P (data r) =bin(0, 3 37,r)bin(0, 3 37,r)bin(1, 5 37,r)bin(0, 5 37,r) bin(0, 6 37,r)bin(1, 11 37,r)bin(3, 10 37,r) Bayes estimation of the parameter: P (r data) P (data r) P (r) P (data r) 1 r What kind of prior is this??? It is called log-uniform The log-uniform prior has equal probability in each order of magnitude. R 10r r P (r)dr = R 10r r 1 r dr =log10 It is often taken as the non-informative prior when you don t even know the order of magnitude of the (positive) quantity. It is an improper prior since its integral is infinite. This is almost always ok, but it is possible to construct paradoxes with improper priors (e.g., the marginalization paradox ) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 20

21 Here is the plot of the (normalized) P (r data) This is (almost) real biology. We ve measured the mutation probability, per locus per generation of Y chromosome STRs. This tells us something about the actual DNA replication machinery! The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 21

22 It really did matter (a bit) that we sorted out the conditional dependencies correctly. Here s a comparison to doing it wrong by assuming all data independent: The true dependencies allow somewhat larger values of r, because we don t wrongly count the =0 branches multiple times We ll come back to the Towne family for some fancier stuff later! Ignoring conditional dependencies and just multiplying the probabilities of the data as if they were independent is called naïve Bayes. People often do this. It is mathematically incorrect, but sometimes it is all you can do! The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 22

23 The basic paradigm of Bayesian parameter estimation : Construct a statistical model for the probability of the observed data as a function of all parameters treat dependency in the data correctly Assign prior distributions to the parameters jointly or independently as appropriate use the results of previous data if available Use Bayes law to get the (multivariate) posterior distribution of the parameters Marginalize as desired to get the distributions of single (or a manageable few multivariate) parameters Cosmological models are typically fit to many parameters. Marginalization yields the distribution of parameters of interest, here two, shown as contours. The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 23

24 A small cloud: The way we trimmed the data mattered. (And should trouble us a bit!) Here s the effect of including T11 and T13, both of which seemed to be outliers: P (data r) = [old model] bin(5, 9 37,r)bin(4, 10 12,r) now include these Editing outliers is a tricky issue that we will return to when we learn about mixture models. The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 24

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2009 The University of Texas at Austin Unit 2: (Bayesian) Estimation of Parameters The University

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D. probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I

More information

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 1. Probability and Inference The University of Texas at

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Quantitative Understanding in Biology 1.7 Bayesian Methods

Quantitative Understanding in Biology 1.7 Bayesian Methods Quantitative Understanding in Biology 1.7 Bayesian Methods Jason Banfelder October 25th, 2018 1 Introduction So far, most of the methods we ve looked at fall under the heading of classical, or frequentist

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Class Website: http://ircamera.as.arizona.edu/astr_518 Class Texts: Practical Statistics for Astronomers, J.V. Wall, and C.R. Jenkins Measuring the Universe,

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation. PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Physics 6720 Introduction to Statistics April 4, 2017

Physics 6720 Introduction to Statistics April 4, 2017 Physics 6720 Introduction to Statistics April 4, 2017 1 Statistics of Counting Often an experiment yields a result that can be classified according to a set of discrete events, giving rise to an integer

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy Probability (Lecture 1) Statistics (Lecture 2) Why do we need statistics? Useful Statistics Definitions Error Analysis Probability distributions Error Propagation Binomial

More information

Review of probability

Review of probability Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables

More information

Math 5a Reading Assignments for Sections

Math 5a Reading Assignments for Sections Math 5a Reading Assignments for Sections 4.1 4.5 Due Dates for Reading Assignments Note: There will be a very short online reading quiz (WebWork) on each reading assignment due one hour before class on

More information

Discrete Binary Distributions

Discrete Binary Distributions Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:

More information

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

6.080 / Great Ideas in Theoretical Computer Science Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 6.080 / 6.089 Great Ideas in Theoretical Computer Science Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

P (E) = P (A 1 )P (A 2 )... P (A n ).

P (E) = P (A 1 )P (A 2 )... P (A n ). Lecture 9: Conditional probability II: breaking complex events into smaller events, methods to solve probability problems, Bayes rule, law of total probability, Bayes theorem Discrete Structures II (Summer

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Lecture 5: Bayes pt. 1

Lecture 5: Bayes pt. 1 Lecture 5: Bayes pt. 1 D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Bayes Probabilities

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Computational Statistics with Application to Bioinformatics. Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model

Computational Statistics with Application to Bioinformatics. Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model

More information

εx 2 + x 1 = 0. (2) Suppose we try a regular perturbation expansion on it. Setting ε = 0 gives x 1 = 0,

εx 2 + x 1 = 0. (2) Suppose we try a regular perturbation expansion on it. Setting ε = 0 gives x 1 = 0, 4 Rescaling In this section we ll look at one of the reasons that our ε = 0 system might not have enough solutions, and introduce a tool that is fundamental to all perturbation systems. We ll start with

More information

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be

More information

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Fourier and Stats / Astro Stats and Measurement : Stats Notes Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing

More information

Bayesian Estimation An Informal Introduction

Bayesian Estimation An Informal Introduction Mary Parker, Bayesian Estimation An Informal Introduction page 1 of 8 Bayesian Estimation An Informal Introduction Example: I take a coin out of my pocket and I want to estimate the probability of heads

More information

Joint, Conditional, & Marginal Probabilities

Joint, Conditional, & Marginal Probabilities Joint, Conditional, & Marginal Probabilities The three axioms for probability don t discuss how to create probabilities for combined events such as P [A B] or for the likelihood of an event A given that

More information

Uncertainty. Michael Peters December 27, 2013

Uncertainty. Michael Peters December 27, 2013 Uncertainty Michael Peters December 27, 20 Lotteries In many problems in economics, people are forced to make decisions without knowing exactly what the consequences will be. For example, when you buy

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces. Probability Theory To start out the course, we need to know something about statistics and probability Introduction to Probability Theory L645 Advanced NLP Autumn 2009 This is only an introduction; for

More information

Descriptive Statistics (And a little bit on rounding and significant digits)

Descriptive Statistics (And a little bit on rounding and significant digits) Descriptive Statistics (And a little bit on rounding and significant digits) Now that we know what our data look like, we d like to be able to describe it numerically. In other words, how can we represent

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

CS 124 Math Review Section January 29, 2018

CS 124 Math Review Section January 29, 2018 CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to

More information

Main topics for the First Midterm Exam

Main topics for the First Midterm Exam Main topics for the First Midterm Exam The final will cover Sections.-.0, 2.-2.5, and 4.. This is roughly the material from first three homeworks and three quizzes, in addition to the lecture on Monday,

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th CMPT 882 - Machine Learning Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th Stephen Fagan sfagan@sfu.ca Overview: Introduction - Who was Bayes? - Bayesian Statistics Versus Classical Statistics

More information

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of Factoring Review for Algebra II The saddest thing about not doing well in Algebra II is that almost any math teacher can tell you going into it what s going to trip you up. One of the first things they

More information

An analogy from Calculus: limits

An analogy from Calculus: limits COMP 250 Fall 2018 35 - big O Nov. 30, 2018 We have seen several algorithms in the course, and we have loosely characterized their runtimes in terms of the size n of the input. We say that the algorithm

More information

Solve Systems of Equations Algebraically

Solve Systems of Equations Algebraically Part 1: Introduction Solve Systems of Equations Algebraically Develop Skills and Strategies CCSS 8.EE.C.8b You know that solutions to systems of linear equations can be shown in graphs. Now you will learn

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models Contents Mathematical Reasoning 3.1 Mathematical Models........................... 3. Mathematical Proof............................ 4..1 Structure of Proofs........................ 4.. Direct Method..........................

More information

- a value calculated or derived from the data.

- a value calculated or derived from the data. Descriptive statistics: Note: I'm assuming you know some basics. If you don't, please read chapter 1 on your own. It's pretty easy material, and it gives you a good background as to why we need statistics.

More information

Data Analysis and Monte Carlo Methods

Data Analysis and Monte Carlo Methods Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -

More information

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this.

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this. Preface Here are my online notes for my Calculus II course that I teach here at Lamar University. Despite the fact that these are my class notes they should be accessible to anyone wanting to learn Calculus

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Bayesian Inference: What, and Why?

Bayesian Inference: What, and Why? Winter School on Big Data Challenges to Modern Statistics Geilo Jan, 2014 (A Light Appetizer before Dinner) Bayesian Inference: What, and Why? Elja Arjas UH, THL, UiO Understanding the concepts of randomness

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Recursive Estimation

Recursive Estimation Recursive Estimation Raffaello D Andrea Spring 08 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March, 08 Notes: Notation: Unless otherwise noted, x, y, and z denote random variables,

More information

O.K. But what if the chicken didn t have access to a teleporter.

O.K. But what if the chicken didn t have access to a teleporter. The intermediate value theorem, and performing algebra on its. This is a dual topic lecture. : The Intermediate value theorem First we should remember what it means to be a continuous function: A function

More information

Computational Statistics with Application to Bioinformatics

Computational Statistics with Application to Bioinformatics Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 7: Fitting Models to Data and Estimating Errors in Model-Derived

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Lectures on Elementary Probability. William G. Faris

Lectures on Elementary Probability. William G. Faris Lectures on Elementary Probability William G. Faris February 22, 2002 2 Contents 1 Combinatorics 5 1.1 Factorials and binomial coefficients................. 5 1.2 Sampling with replacement.....................

More information

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Do students sleep the recommended 8 hours a night on average?

Do students sleep the recommended 8 hours a night on average? BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On Basics of Proofs The Putnam is a proof based exam and will expect you to write proofs in your solutions Similarly, Math 96 will also require you to write proofs in your homework solutions If you ve seen

More information

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2 1 Basic Probabilities The probabilities that we ll be learning about build from the set theory that we learned last class, only this time, the sets are specifically sets of events. What are events? Roughly,

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

Chapter 5 Simplifying Formulas and Solving Equations

Chapter 5 Simplifying Formulas and Solving Equations Chapter 5 Simplifying Formulas and Solving Equations Look at the geometry formula for Perimeter of a rectangle P = L W L W. Can this formula be written in a simpler way? If it is true, that we can simplify

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize

More information

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This document was written and copyrighted by Paul Dawkins. Use of this document and its online version is governed by the Terms and Conditions of Use located at. The online version of this document is

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

7.1 What is it and why should we care?

7.1 What is it and why should we care? Chapter 7 Probability In this section, we go over some simple concepts from probability theory. We integrate these with ideas from formal language theory in the next chapter. 7.1 What is it and why should

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter

More information

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table Your schedule of coming weeks One-way ANOVA, II 9.07 //00 Today: One-way ANOVA, part II Next week: Two-way ANOVA, parts I and II. One-way ANOVA HW due Thursday Week of May Teacher out of town all week

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Topic 2: Review of Probability Theory

Topic 2: Review of Probability Theory CS 8850: Advanced Machine Learning Fall 2017 Topic 2: Review of Probability Theory Instructor: Daniel L. Pimentel-Alarcón c Copyright 2017 2.1 Why Probability? Many (if not all) applications of machine

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

Covariance. if X, Y are independent

Covariance. if X, Y are independent Review: probability Monty Hall, weighted dice Frequentist v. Bayesian Independence Expectations, conditional expectations Exp. & independence; linearity of exp. Estimator (RV computed from sample) law

More information

Quick Sort Notes , Spring 2010

Quick Sort Notes , Spring 2010 Quick Sort Notes 18.310, Spring 2010 0.1 Randomized Median Finding In a previous lecture, we discussed the problem of finding the median of a list of m elements, or more generally the element of rank m.

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Lecture 36: Other Non-computable Problems

Lecture 36: Other Non-computable Problems Lecture 36: Other Non-computable Problems Aims: To show how to prove that other problems are non-computable, which involves reductions from, e.g., the Halting Problem; and To point out how few problems

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Parameter estimation Conditional risk

Parameter estimation Conditional risk Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Physics Sep Example A Spin System

Physics Sep Example A Spin System Physics 30 7-Sep-004 4- Example A Spin System In the last lecture, we discussed the binomial distribution. Now, I would like to add a little physical content by considering a spin system. Actually this

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information