An English translation of Laplace s Théorie Analytique des Probabilités is available online at the departmental website of Richard Pulskamp (Math

Similar documents
2. A Basic Statistical Toolbox

Bayesian Models in Machine Learning

Bayesian data analysis using JASP

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Probability is related to uncertainty and not (only) to the results of repeated experiments

CS 361: Probability & Statistics

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Probability, Entropy, and Inference / More About Inference

COMP90051 Statistical Machine Learning

MATH MW Elementary Probability Course Notes Part I: Models and Counting

CS 361: Probability & Statistics

Conditional probabilities and graphical models

Probability theory basics

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

MLE/MAP + Naïve Bayes

CPSC 340: Machine Learning and Data Mining

The Bayesian Paradigm

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

With Question/Answer Animations. Chapter 7

Stochastic Processes

Probability Review and Naïve Bayes

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2018

The dark energ ASTR509 - y 2 puzzl e 2. Probability ASTR509 Jasper Wal Fal term

Computational Perception. Bayesian Inference

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Lecture 6: Finite Fields

Bayesian vs frequentist techniques for the analysis of binary outcome data

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

the time it takes until a radioactive substance undergoes a decay

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

STA Module 4 Probability Concepts. Rev.F08 1

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

arxiv: v1 [math.pr] 26 Jul 2016

Probability Theory Review

Conceptual Explanations: Radicals

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL. London SPM Course

Machine Learning 4771

Algebra Year 10. Language

PMR Learning as Inference

STA 4273H: Statistical Machine Learning

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

MLE/MAP + Naïve Bayes

Probability 1 (MATH 11300) lecture slides

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Angular Momentum Algebra

Algebra Year 9. Language

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Data Analysis and Monte Carlo Methods

Chapter 1 Review of Equations and Inequalities

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Probability (Devore Chapter Two)

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Introduction to Probability and Statistics (Continued)

Study and research skills 2009 Duncan Golicher. and Adrian Newton. Last draft 11/24/2008

Bayesian inference J. Daunizeau

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Models of Computation,

Elements of probability theory

Probabilistic Reasoning

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Logistic Regression. COMP 527 Danushka Bollegala

Probability calculus and statistics

Bayesian Inference: What, and Why?

Midterm sample questions

Hidden Markov Models: All the Glorious Gory Details

Bayesian inference J. Daunizeau

MATH 556: PROBABILITY PRIMER

Machine Learning

Information Retrieval and Web Search Engines

Markov localization uses an explicit, discrete representation for the probability of all position in the state space.

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

What are the laws of physics? Resisting reification

Discrete Binary Distributions

Bayesian Inference. p(y)

from Euclid to Einstein

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Recap on Data Assimilation

2011 Pearson Education, Inc

Day 1: Over + Over Again

Lecture 3: Probabilistic Retrieval Models

Confidence Intervals Lecture 2

Behavioral Data Mining. Lecture 2

Propositions and Proofs

Whaddya know? Bayesian and Frequentist approaches to inverse problems

Is probability the measure of uncertainty?

22. The Quadratic Sieve and Elliptic Curves. 22.a The Quadratic Sieve

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Probability and the Second Law of Thermodynamics

ALGEBRAIC STRUCTURE AND DEGREE REDUCTION

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

5.2 Infinite Series Brian E. Veitch

Machine Learning

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Introduction to Bayesian Methods - 1. Edoardo Milotti Università di Trieste and INFN-Sezione di Trieste

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Transcription:

An English translation of Laplace s Théorie Analytique des Probabilités is available online at the departmental website of Richard Pulskamp (Math Dept, Xavier U, Cincinnati). Ed Jaynes would have rejoiced to see this day! Pulskamp won t try to publish his translation (2014) in book form until/unless he (or another) annotates it, and meanwhile it is online. perhaps the greatest single work on probability ever published. Written in two parts, part 1 (routine) on generating fns, part 2 on probability theory. Pulskamp has translated the important part 2. Laplace also wrote a philosophical essay on probabilities which acts as the Intro to 3 rd edition. This has been translated into English previously (twice!)

From samples, how many species in the population? Anton Garrett Visiting Researcher Cavendish Laboratory University of Cambridge Department of Physics

How many nationalities are represented in a crowd, based on samples? How many subject categories are there in a library, based on inspection of books pulled out at random? (NB categories are written on book covers by name not number, or else largest category no. observed is an immediate lower bound) How many species of bacteria in a pond? (Population ecology) with and without sample replacement

The Bayesian solution to this problem will be given, ie the posterior distribution for the number of classes represented will be given in terms of the prior distn for parameters relating to this number. Forms for the likelihood (ie, the sampling distribution) and the prior will be discussed. The problem is routine for Bayesians, but we shall also look at the harder problem when we don t know how many countries/categories there are. Contains a twist! ONLY the sum and product rules of probability will be used. Bayes theorem is a corollary. Extra parameters will be marginalised over. Marginalisation is another corollary.

The answer depends on the prior info a strength of Bayesianism, not a weakness. For, if we knew the answer in advance and were doing the sampling only under orders, our prior would be a discrete δ-fn at the answer. In Bayes theorem a δ-fn prior carries through unchanged to the posterior (because the posterior is proportional to the prior, so that the zero prior probability value except at the δ-fn carries through). The variety of sampling-theoretical methods designed to let the data speak for themselves while ignoring the prior info give impossible answers (ie, nonzero prob away from the δ-fn) in such problems, and are therefore WRONG. Don t trust any method that fails in a simple problem.

What is probability? p(a B) how strongly B implies A. Formally, a measure of how strongly A is implied to be true upon supposing that B is true, according to relns known between their referents. (A,B: binary propositions, true/false). Degree of implication is what you actually want in any problem involving uncertainty. RT Cox (1946), Knuth: if propositions obey Boolean algebra then the degrees of implication for them obey corresponding algebraic relations that turn out to be the sum and product rules (and hence Bayes theorem). So let s call degree of implication probability. But if frequentists (etc) object then bypass them. Calculate the degree of implication in each problem, because it s what you want in order to solve any problem. In defining probability there are no higher criteria than consistency and universality. no worries over belief or imaginary ensembles; all probs automatically conditional. This viewpoint downplays random all it means is unpredictable, but by whom?

You have to use the prior info, and you have to use Bayes theorem to update it in the light of the data. Anything else is inequivalent to the sum and product rules, and they follow from Boolean algebra of the propositions that are the arguments of the probabilities. This is real Bayesianism; accept no other!

Make this an urn problem. We are sampling ball bearings from an urn containing N ball bearings, identical except that each bearing is stamped lightly with its manufacturer s name. How many manufacturers are represented in the urn? The prob of sampling the observed no. of ball bearings from each manufacturer, supposing that we know the number in the urn from each manufacturer, and ball bearings are replaced in the urn after sampling, is the multinomial distn (standard). Without replacement, it is the (multivariate) hypergeometric distribution (named after its moment generating fn).

Attached to the variables representing the numbers of ball bearings is a suffix identifying the manufacturer. So our answer works when our prior info identifies every manufacturer that might be in the urn. This solution is a routine application of Bayesianism. (Sampling-theoretical approach??) But what if we have lost the list of manufacturers and their output capacities (the key )? Or never had one? Suppose that after 20 samplings we have seen 15 ball bearings stamped by Smith, 3 by Jones and 2 by Davies. If we didn t know that these manufacturers even existed before the sampling, how can we have any prior info about them?

We can still make progress if we have statistical info about the manufacturers. If we know that (eg) one manufacturer has more ball bearings in the urn than the rest combined, that manufacturer is likely to be Smith. More generally, we can assign a probability to manufacturers specified only statistically in the prior being the ones observed in the sampling. Then we borrow the analysis above, assuming that particular identification; then marginalise over all possible identifications. Finally, extract a posterior distn for the no. of manufacturers with ball bearings in the urn, using the counting trick with the δ-fn.

Suppose we have statistical info that distinguishes between manufacturers even though we don t know which is which, or how many there are. The principles of economics might give a scaling law, such that (eg) twice as many manufacturers make 10,000 ball bearings/day as 100,000, etc (logarithmic). This induces a prior (see following slides). The scaling law allows us to choose a labelling of manufacturers according to the expected output of each manufacturer (or any other variable that distinguishes them statistically).

Another labelling can be generated from the stats of the samples (or just the order the manufacturers came out of the urn). This labelling is an unknown subpermutation of the labelling of the manufacturers in the prior which was based on the scaling law. Using Bayes theorem we can get a posterior distn for which subpermutation it is (the prior for perms is uniform over them). This variable enters the analysis as just another unknown that is ultimately to be marginalised over. Our answer is now a sum over a large number of quantities. (Care needed with normalisations!) But Bayesian computing continues to make progress too...

NB This situation is not the same as the prior for a dice that we know is weighted, but we don t know which face it is weighted towards. In that case we know the faces by name, and the prior is an exchangeable sum, with each term in the sum weighted toward a different face. Our prior might reasonably take the form where the mean μ depends on the label j according to the scaling law, which gives a density of manufacturers wrt μ. The standard deviation, ie the variability of output of the manufacturer, is assumed to be small. Urn filled in proportion to factory output.

The number of manufacturers was part of the conditioning info in this prior probability for the number of ball bearings in the urn by manufacturer. So we need a prior for the number of manufacturers in existence (of course!) Economics might also furnish this prior, given the size of the economy. Its tail will be important in answering how many manufacturers are/aren t in the urn.

Conclusion: This is a nice blind problem, in which something apparently vital in defining the problem can be demoted to being known only probabilistically, and marginalised out at the end (although the computations become formidable). Tricky, in that what is lost when this demotion happens is the labelling, without which you apparently cannot get the problem off the ground. Actually, you can by defining different labellings from the prior stats and the data stats, and relating these labellings probabilistically. Other problems which this trick can solve?