CS395T Computational Statistics with Application to Bioinformatics
|
|
- Edwin Nash
- 5 years ago
- Views:
Transcription
1 CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 2: Estimation of Parameters The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 1
2 Our next topic is Bayesian Estimation of Parameters. We ll ease into it with an example that looks a lot like the Monte Hall Problem: The Jailer s Tip: A Of 3 prisoners (A,B,C), 2 will be released tomorrow. A, who thinks he has a 2/3 chance of being released, asks jailer for name of one of the lucky but not himself. Jailer says, truthfully, B. Darn, thinks A, now my chances are only ½, C or me. Is this like Monty Hall? Did the data ( B ) change the probabilities? The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 2
3 Further, suppose (unlike Monty Hall) the jailer is not indifferent about responding B versus C. Does that change your answer to the previous question? P (S B BC) =x, (0 x 1) says B P (A S B )=P (AB S B )+P (AC S B ) = = P (S B AB)P (AB) P (S B AB)P (AB)+P (S B BC)P (BC)+P (S B CA)P (CA) x = 1 1+x 1 1/3 So if A knows the value x, he can calculate his chances. If x=1/2 (like Monty Hall), his chances are 2/3, same as before; so (unlike Monty Hall) he got no new information. If x 1/2, he does get new info his chances change. But what if he doesn t know x at all? x The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 3
4 Marginalization (this is important!) When a model has unknown, or uninteresting, parameters we integrate them out multiplying by any knowledge of their distribution At worst, just a prior informed by background information At best, a narrower distribution based on data This is not any new assumption about the world it s just the Law of de-anding (e.g., Jailer s Tip): P (A S B I)= R x P (A S BxI) p(x I) dx = R x 1 1+x p(x I) dx law of de-anding: x s are EME! The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 4
5 (repeating previous equation:) P (A S B I)= R x P (A S BxI) p(x I) dx = R x 1 1+x p(x I) dx first time we ve seen a continuous probability distribution, but we ll skip the obvious repetition of all the previous laws p(x) p(x I) (Notice that p(x) is a probability of a probability! That is fairly common in Bayesian inference.) X P i =1 X i i p(x i )dx i =1 Z x p(x) dx =1 The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 5
6 (repeating previous equation:) P (A S B I)= R x P (A S BxI) p(x I) dx = R x 1 1+x p(x I) dx What should Prisoner A take for p(x)? Maybe the uniform prior? A p(x) =1, (0 x 1) P (A S B I)= R x dx =ln2=0.693 Not the same as the massed prior at x=1/2 Dirac delta function p(x) =δ(x 1 2 ), (0 x 1) P (A S B I)= 1 1+1/2 =2/3 substitute value and remove integral The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 6
7 Review where we are: We are trying to estimate a parameter x = P (S B BC), (0 x 1) P (A S B I)= R x P (A S BxI) p(x I) dx = R x 1 1+x p(x I) dx The form of our estimate is a (Bayesian) probability distribution (of the parameter, itself here just happening to be a probability) This is a sterile exercise if it is just a debate about priors. What we need is data! Data might be a previous history of choices by the jailer in identical circumstances. BCBCCBCCCBBCBCBCCCCBBCBCCCBCBCBBCCB N =35, N B =15, N C =20 (What s wrong with: x=15/35=0.43? Hold on ) We hypothesize (might later try to check) that these are i.i.d. Bernoulli trials and therefore informative about x independent and identically distributed As good Bayesians, we now need P (data x) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 7
8 P (data x) means different things in frequentist vs. Bayesian contexts, so this is a good time to understand the differences (we ll use both ideas as appropriate) Frequentist considers the universe of what might have been, imagining repeated trials, even if they weren t actually tried, and needs no prior: since i.i.d. only the N s can matter (a so-called sufficient statistic ). P (data x) = no. of equivalent arrangements µ N N B prob. of exact sequence seen x N B (1 x) N C n k = n! k!(n k)! Bayesian considers only the exact data seen, and has a prior: P (x data) x N B (1 x) N C p(x I) No binomial coefficient, both conceptually and also since independent of x and absorbed in the proportionality. Use only the data you see, not equivalent arrangements that you didn t see. This issue is one we ll return to, not always entirely sympathetically to Bayesians (e.g., goodness-of-fit). but we might first suppose that the prior it is uniform The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 8
9 Bayes numerator and denominator are: P (x data) = x N B (1 x) N N B 1 Z 1 0 P (x data) = Z 1 0 x N B (1 x) N N B dx = Γ(N B +1)Γ(N N B +1) Γ(N +2) Plot of numerator over denominator for N=35, N B = 15: so this is what the data tells us about p(x) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 9
10 You should learn to do calculations like this in MATLAB or Mathematica: syms nn nb x num = x^nb * (1-x)^(nn-nb) num = x^nb*(1-x)^(nn-nb) denom = int(num, 0, 1) denom = gamma(nn-nb+1)*gamma(nb+1)/gamma(nn+2) p = num / denom p = x^nb*(1-x)^(nn-nb)/gamma(nnnb+1)/gamma(nb+1)*gamma(nn+2) ezplot(subs(p,[nn,nb],[35,15]),[0,1]) In[7]:= num x^nb 1 x ^ nn nb Out[7]= 1 x nb nn x nb In[8]:= denom Integrate num, x, 0, 1, GenerateConditions False Out[8]= Gamma 1 nb Gamma 1 nb nn Gamma 2 nn In[9]:= p x_ num denom Out[9]= 1 x nb nn x nb Gamma 2 nn Gamma 1 nb Gamma 1 nb nn In[12]:= Plot p x. nn 35, nb 15, x, 0, 1, PlotRange All, Frame True Out[12]= Graphics The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 10
11 Find the mean, standard error, and mode of our estimate for x P (x data) = x N B (1 x) N N B dp (x data) dx =0 x = N B N maximum likelihood (ML) answer is to estimate x as exactly the fraction seen hxi = Z 1 0 xp (x data)dx = N B +1 N +2 mean is the 1 st moment notice it s different from ML! variance involves the 2 nd moment, Var(x) = x 2 hxi 2 = Z 1 0 x 2 P (x data)dx hxi 2 = (N B +1)(N N B +1) (N +2) 2 (N +3) This shows how p(x) gets narrower as the amount of data increases. The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 11
12 (Let s leave behind the metaphor of the Jailer and Prisoner A.) What we are illustrating is called Bernoulli trials: two possible outcomes i.i.d. events a single parameter x (the probability of one outcome) a sufficient statistic is the pair of numbers N and N B P (data x) =x N B (1 x) N N B (in the Bayesian sense) Jacob and Johann Bernoulli P (x data) x N B (1 x) N N B P (x I) for uniform prior, the Bayes denominator is, as we ve seen, easy to calculate: Z 1 0 P (x data) = Z 1 0 x N B (1 x) N N B dx = Γ(N B +1)Γ(N N B +1) Γ(N +2) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 12
13 Are there any other mathematical forms for the prior that would still leave the Bayes denominator easy to calculate? Yes! try P (x I) x β (1 x) α Choose and to make any desired center and width. P (x data) = x N B (1 x) N N B x β (1 x) α Z 1 0 P (x data) = Z 1 0 x N B+β (1 x) N N B+α dx = Γ(N B + β +1)Γ(N N B + α +1) Γ(N + α + β +2) Priors that preserve the analytic form of p(x) are called conjugate priors. There is nothing special about them except mathematical convenience. If you start with a conjugate prior, you ll also be able to assimilate new data trivially, just by changing the parameters of your estimate. This is because every posterior is in the right analytic form to be the new prior! The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 13
14 By the way, if I show a special love of Bernoulli trials, it might be because I am an academic descendent of the Bernoulli brothers! Actually, this is not a very exclusive club: Gauss and the Bernoullis each have ~50,000 recorded descendents in the Mathematics Genealogy database, and probably many times more unrecorded. The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 14
15 Next example (with some biology): Individual identity, or ancestry, can be determined by variable length short tandem repeats (STRs) in the genome. ~0.5% mutation prob per STR per generation (though highly variable) if use Y chromosome only, get paternal ancestry There are companies that sell certificates with your genotype. A bit opportunistic, since in a few years your whole genome will be sequenced by your health plan. The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 15
16 Margaret, my ex-wife, is really into the Towne family. (And, she s neither a biologist nor a Towne.) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 16
17 Here s data from Margaret on 8 recent Townes (identified only by T code). (We ll use this data several times in the next few of lectures.) or, just showing the changes: The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 17
18 Let s do a Bayesian estimation of the parameter r, the mutation rate per locus per generation. let s assume no back mutations! (their effect on this data set would be small) N=1 N=3 =0 bin(0,3x37,r) N=6 =0 bin(0,3x37,r) N=9 N=10 N=11 =23 =0 =5 bin(0,6x37,r) =1 =0 bin(1,5x37,r) bin(0,5x37,r) =4 (of 12) =3 bin(1,10x37,r) =1 bin(1,11x37,r) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 18
19 Unraveling dependencies We used ancestry as an intuitive example of how dependencies work, because its causal mechanism is well known! Later, we ll do more general Bayesian networks to capture models with more complicated causal dependencies. a b c d e Another important idea is conditional independence Example: b and e are conditionally independent given a while b and d are not conditionally independent given a:? The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 19
20 So we have a statistical model for the data, that is, a way to compute P (data parameters) It is not exact, but statistical models rarely (never?) are. neglects backmutations assumes single probability for all loci etc. The model is: P (data r) =bin(0, 3 37,r)bin(0, 3 37,r)bin(1, 5 37,r)bin(0, 5 37,r) bin(0, 6 37,r)bin(1, 11 37,r)bin(3, 10 37,r) Bayes estimation of the parameter: P (r data) P (data r) P (r) P (data r) 1 r What kind of prior is this??? It is called log-uniform The log-uniform prior has equal probability in each order of magnitude. R 10r r P (r)dr = R 10r r 1 r dr =log10 It is often taken as the non-informative prior when you don t even know the order of magnitude of the (positive) quantity. It is an improper prior since its integral is infinite. This is almost always ok, but it is possible to construct paradoxes with improper priors (e.g., the marginalization paradox ) The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 20
21 Here is the plot of the (normalized) P (r data) This is (almost) real biology. We ve measured the mutation probability, per locus per generation of Y chromosome STRs. This tells us something about the actual DNA replication machinery! The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 21
22 It really did matter (a bit) that we sorted out the conditional dependencies correctly. Here s a comparison to doing it wrong by assuming all data independent: The true dependencies allow somewhat larger values of r, because we don t wrongly count the =0 branches multiple times We ll come back to the Towne family for some fancier stuff later! Ignoring conditional dependencies and just multiplying the probabilities of the data as if they were independent is called naïve Bayes. People often do this. It is mathematically incorrect, but sometimes it is all you can do! The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 22
23 The basic paradigm of Bayesian parameter estimation : Construct a statistical model for the probability of the observed data as a function of all parameters treat dependency in the data correctly Assign prior distributions to the parameters jointly or independently as appropriate use the results of previous data if available Use Bayes law to get the (multivariate) posterior distribution of the parameters Marginalize as desired to get the distributions of single (or a manageable few multivariate) parameters Cosmological models are typically fit to many parameters. Marginalization yields the distribution of parameters of interest, here two, shown as contours. The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 23
24 A small cloud: The way we trimmed the data mattered. (And should trouble us a bit!) Here s the effect of including T11 and T13, both of which seemed to be outliers: P (data r) = [old model] bin(5, 9 37,r)bin(4, 10 12,r) now include these Editing outliers is a tricky issue that we will return to when we learn about mixture models. The University of Texas at Austin, CS 395T, Spring 2010, Prof. William H. Press 24
CS395T Computational Statistics with Application to Bioinformatics
CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2009 The University of Texas at Austin Unit 2: (Bayesian) Estimation of Parameters The University
More information1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.
probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I
More informationCS395T Computational Statistics with Application to Bioinformatics
CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 1. Probability and Inference The University of Texas at
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationQuantitative Understanding in Biology 1.7 Bayesian Methods
Quantitative Understanding in Biology 1.7 Bayesian Methods Jason Banfelder October 25th, 2018 1 Introduction So far, most of the methods we ve looked at fall under the heading of classical, or frequentist
More informationAST 418/518 Instrumentation and Statistics
AST 418/518 Instrumentation and Statistics Class Website: http://ircamera.as.arizona.edu/astr_518 Class Texts: Practical Statistics for Astronomers, J.V. Wall, and C.R. Jenkins Measuring the Universe,
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationPARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationPhysics 6720 Introduction to Statistics April 4, 2017
Physics 6720 Introduction to Statistics April 4, 2017 1 Statistics of Counting Often an experiment yields a result that can be classified according to a set of discrete events, giving rise to an integer
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10
EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped
More informationStatistical Methods for Astronomy
Statistical Methods for Astronomy Probability (Lecture 1) Statistics (Lecture 2) Why do we need statistics? Useful Statistics Definitions Error Analysis Probability distributions Error Propagation Binomial
More informationReview of probability
Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables
More informationMath 5a Reading Assignments for Sections
Math 5a Reading Assignments for Sections 4.1 4.5 Due Dates for Reading Assignments Note: There will be a very short online reading quiz (WebWork) on each reading assignment due one hour before class on
More informationDiscrete Binary Distributions
Discrete Binary Distributions Carl Edward Rasmussen November th, 26 Carl Edward Rasmussen Discrete Binary Distributions November th, 26 / 5 Key concepts Bernoulli: probabilities over binary variables Binomial:
More information6.080 / Great Ideas in Theoretical Computer Science Spring 2008
MIT OpenCourseWare http://ocw.mit.edu 6.080 / 6.089 Great Ideas in Theoretical Computer Science Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationP (E) = P (A 1 )P (A 2 )... P (A n ).
Lecture 9: Conditional probability II: breaking complex events into smaller events, methods to solve probability problems, Bayes rule, law of total probability, Bayes theorem Discrete Structures II (Summer
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationLecture 5: Bayes pt. 1
Lecture 5: Bayes pt. 1 D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Bayes Probabilities
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationComputational Statistics with Application to Bioinformatics. Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model
Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model
More informationεx 2 + x 1 = 0. (2) Suppose we try a regular perturbation expansion on it. Setting ε = 0 gives x 1 = 0,
4 Rescaling In this section we ll look at one of the reasons that our ε = 0 system might not have enough solutions, and introduce a tool that is fundamental to all perturbation systems. We ll start with
More informationMACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION
MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be
More informationFourier and Stats / Astro Stats and Measurement : Stats Notes
Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing
More informationBayesian Estimation An Informal Introduction
Mary Parker, Bayesian Estimation An Informal Introduction page 1 of 8 Bayesian Estimation An Informal Introduction Example: I take a coin out of my pocket and I want to estimate the probability of heads
More informationJoint, Conditional, & Marginal Probabilities
Joint, Conditional, & Marginal Probabilities The three axioms for probability don t discuss how to create probabilities for combined events such as P [A B] or for the likelihood of an event A given that
More informationUncertainty. Michael Peters December 27, 2013
Uncertainty Michael Peters December 27, 20 Lotteries In many problems in economics, people are forced to make decisions without knowing exactly what the consequences will be. For example, when you buy
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationProbability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.
Probability Theory To start out the course, we need to know something about statistics and probability Introduction to Probability Theory L645 Advanced NLP Autumn 2009 This is only an introduction; for
More informationDescriptive Statistics (And a little bit on rounding and significant digits)
Descriptive Statistics (And a little bit on rounding and significant digits) Now that we know what our data look like, we d like to be able to describe it numerically. In other words, how can we represent
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationCS 124 Math Review Section January 29, 2018
CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to
More informationMain topics for the First Midterm Exam
Main topics for the First Midterm Exam The final will cover Sections.-.0, 2.-2.5, and 4.. This is roughly the material from first three homeworks and three quizzes, in addition to the lecture on Monday,
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationCMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th
CMPT 882 - Machine Learning Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th Stephen Fagan sfagan@sfu.ca Overview: Introduction - Who was Bayes? - Bayesian Statistics Versus Classical Statistics
More information( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of
Factoring Review for Algebra II The saddest thing about not doing well in Algebra II is that almost any math teacher can tell you going into it what s going to trip you up. One of the first things they
More informationAn analogy from Calculus: limits
COMP 250 Fall 2018 35 - big O Nov. 30, 2018 We have seen several algorithms in the course, and we have loosely characterized their runtimes in terms of the size n of the input. We say that the algorithm
More informationSolve Systems of Equations Algebraically
Part 1: Introduction Solve Systems of Equations Algebraically Develop Skills and Strategies CCSS 8.EE.C.8b You know that solutions to systems of linear equations can be shown in graphs. Now you will learn
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationChapter 2. Mathematical Reasoning. 2.1 Mathematical Models
Contents Mathematical Reasoning 3.1 Mathematical Models........................... 3. Mathematical Proof............................ 4..1 Structure of Proofs........................ 4.. Direct Method..........................
More information- a value calculated or derived from the data.
Descriptive statistics: Note: I'm assuming you know some basics. If you don't, please read chapter 1 on your own. It's pretty easy material, and it gives you a good background as to why we need statistics.
More informationData Analysis and Monte Carlo Methods
Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -
More informationCalculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this.
Preface Here are my online notes for my Calculus II course that I teach here at Lamar University. Despite the fact that these are my class notes they should be accessible to anyone wanting to learn Calculus
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationBayesian Inference: What, and Why?
Winter School on Big Data Challenges to Modern Statistics Geilo Jan, 2014 (A Light Appetizer before Dinner) Bayesian Inference: What, and Why? Elja Arjas UH, THL, UiO Understanding the concepts of randomness
More informationTwo-sample Categorical data: Testing
Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationRecursive Estimation
Recursive Estimation Raffaello D Andrea Spring 08 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March, 08 Notes: Notation: Unless otherwise noted, x, y, and z denote random variables,
More informationO.K. But what if the chicken didn t have access to a teleporter.
The intermediate value theorem, and performing algebra on its. This is a dual topic lecture. : The Intermediate value theorem First we should remember what it means to be a continuous function: A function
More informationComputational Statistics with Application to Bioinformatics
Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 7: Fitting Models to Data and Estimating Errors in Model-Derived
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationLectures on Elementary Probability. William G. Faris
Lectures on Elementary Probability William G. Faris February 22, 2002 2 Contents 1 Combinatorics 5 1.1 Factorials and binomial coefficients................. 5 1.2 Sampling with replacement.....................
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationDo students sleep the recommended 8 hours a night on average?
BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationBasics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On
Basics of Proofs The Putnam is a proof based exam and will expect you to write proofs in your solutions Similarly, Math 96 will also require you to write proofs in your homework solutions If you ve seen
More informationToss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2
1 Basic Probabilities The probabilities that we ll be learning about build from the set theory that we learned last class, only this time, the sets are specifically sets of events. What are events? Roughly,
More informationHypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =
Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,
More informationChapter 5 Simplifying Formulas and Solving Equations
Chapter 5 Simplifying Formulas and Solving Equations Look at the geometry formula for Perimeter of a rectangle P = L W L W. Can this formula be written in a simpler way? If it is true, that we can simplify
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationBayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction
15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive
More informationProbability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008
Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize
More informationAlgebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
This document was written and copyrighted by Paul Dawkins. Use of this document and its online version is governed by the Terms and Conditions of Use located at. The online version of this document is
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More information7.1 What is it and why should we care?
Chapter 7 Probability In this section, we go over some simple concepts from probability theory. We integrate these with ideas from formal language theory in the next chapter. 7.1 What is it and why should
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter
More informationYour schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table
Your schedule of coming weeks One-way ANOVA, II 9.07 //00 Today: One-way ANOVA, part II Next week: Two-way ANOVA, parts I and II. One-way ANOVA HW due Thursday Week of May Teacher out of town all week
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationTopic 2: Review of Probability Theory
CS 8850: Advanced Machine Learning Fall 2017 Topic 2: Review of Probability Theory Instructor: Daniel L. Pimentel-Alarcón c Copyright 2017 2.1 Why Probability? Many (if not all) applications of machine
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationOne-sample categorical data: approximate inference
One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution
More informationCovariance. if X, Y are independent
Review: probability Monty Hall, weighted dice Frequentist v. Bayesian Independence Expectations, conditional expectations Exp. & independence; linearity of exp. Estimator (RV computed from sample) law
More informationQuick Sort Notes , Spring 2010
Quick Sort Notes 18.310, Spring 2010 0.1 Randomized Median Finding In a previous lecture, we discussed the problem of finding the median of a list of m elements, or more generally the element of rank m.
More informationPhysics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester
Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationLecture 36: Other Non-computable Problems
Lecture 36: Other Non-computable Problems Aims: To show how to prove that other problems are non-computable, which involves reductions from, e.g., the Halting Problem; and To point out how few problems
More informationChapter 1 Review of Equations and Inequalities
Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationParameter estimation Conditional risk
Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution
More informationNaive Bayes classification
Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental
More informationPhysics Sep Example A Spin System
Physics 30 7-Sep-004 4- Example A Spin System In the last lecture, we discussed the binomial distribution. Now, I would like to add a little physical content by considering a spin system. Actually this
More informationIntro to Probability. Andrei Barbu
Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More information