TEACHING INDEPENDENCE AND EXCHANGEABILITY

Similar documents
STA Module 4 Probability Concepts. Rev.F08 1

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

the time it takes until a radioactive substance undergoes a decay

ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 1. Reminder and Review of Probability Concepts

Module 1. Probability

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

STT When trying to evaluate the likelihood of random events we are using following wording.

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke

Probability theory basics

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

Basic notions of probability theory

Lecture 1. ABC of Probability

STAT:5100 (22S:193) Statistical Inference I

Origins of Probability Theory

Basic notions of probability theory

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 4.1-1

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.

Principles of Statistical Inference

Principles of Statistical Inference

SAMPLE CHAPTERS UNESCO EOLSS FOUNDATIONS OF STATISTICS. Reinhard Viertl Vienna University of Technology, A-1040 Wien, Austria

Bayesian Estimation An Informal Introduction

Econ 325: Introduction to Empirical Economics

Conditional probabilities and graphical models

Presentation on Theo e ry r y o f P r P o r bab a il i i l t i y

Probability and Independence Terri Bittner, Ph.D.

STAT Chapter 3: Probability

MA : Introductory Probability

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

2011 Pearson Education, Inc

Information Retrieval and Web Search Engines

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

2. AXIOMATIC PROBABILITY

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

2) There should be uncertainty as to which outcome will occur before the procedure takes place.

Example: Suppose we toss a quarter and observe whether it falls heads or tails, recording the result as 1 for heads and 0 for tails.

Probability is related to uncertainty and not (only) to the results of repeated experiments

Probability Year 9. Terminology

Chapter 7 Wednesday, May 26th

Single Maths B: Introduction to Probability

From Bayes Theorem to Pattern Recognition via Bayes Rule

EnM Probability and Random Processes

Review of Basic Probability

Statistical Theory 1

Chapter. Probability

Prof. Thistleton MAT 505 Introduction to Probability Lecture 5

Probability Year 10. Terminology

Lecture 3: Probabilistic Retrieval Models

Fundamentals of Probability CE 311S

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Introduction to Probability. Ariel Yadin. Lecture 1. We begin with an example [this is known as Bertrand s paradox]. *** Nov.

Probabilities and Expectations

2.4. Conditional Probability

Topic 3: Introduction to Probability

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Chapter 4 Probability

Combinatorics and probability

Introduction to Probability

1 What are probabilities? 2 Sample Spaces. 3 Events and probability spaces

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Birnbaum s Theorem Redux

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Chapter 4 - Introduction to Probability

Solution: chapter 2, problem 5, part a:

Lecture 8: Probability

Chapter 2 Classical Probability Theories

Uncertain Inference and Artificial Intelligence

Elements of probability theory

PERMUTATIONS, COMBINATIONS AND DISCRETE PROBABILITY

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Uncertain Knowledge and Bayes Rule. George Konidaris

MATH MW Elementary Probability Course Notes Part I: Models and Counting

UNIT Explain about the partition of a sampling space theorem?

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Probability - Lecture 4

P (E) = P (A 1 )P (A 2 )... P (A n ).

Formal Epistemology: Lecture Notes. Horacio Arló-Costa Carnegie Mellon University

Announcements. Topics: To Do:

Lecture 1: Overview of percolation and foundational results from probability theory 30th July, 2nd August and 6th August 2007

Lecture 9: Conditional Probability and Independence

Probabilistic models

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Math 243 Section 3.1 Introduction to Probability Lab

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Probabilistic Representation and Reasoning

If S = {O 1, O 2,, O n }, where O i is the i th elementary outcome, and p i is the probability of the i th elementary outcome, then

Notes on statistical tests

A PECULIAR COIN-TOSSING MODEL

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

STAT J535: Introduction

KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION. Unit : I - V

Probability & Random Variables

Chapter Learning Objectives. Random Experiments Dfiii Definition: Dfiii Definition:

Basic Probability. Introduction

Homework (due Wed, Oct 27) Chapter 7: #17, 27, 28 Announcements: Midterm exams keys on web. (For a few hours the answer to MC#1 was incorrect on

Transcription:

TEACHING INDEPENDENCE AND EXCHANGEABILITY Lisbeth K. Cordani Instituto Mauá de Tecnologia, Brasil Sergio Wechsler Universidade de São Paulo, Brasil lisbeth@maua.br Most part of statistical literature, mainly that written in elementary level, is based on the classical (frequentist) approach. The Bayesian school, even if originated in the 18 th century, has only recently seen a strong development of its tools. This development, however, has not been seen in a basic level. The disciplines, as well as the teachers, reflect the classical dominance, which reinforces the current paradigm. Although they have different starting points, both approaches, classical and Bayesian, have tools to analyze data, and we should offer the choice to the student. This article deals with two important concepts, one very useful from the classical point of view, which is the concept of independence, and other related to the Bayesian thought, the concept of exchangeability. Definitions and simple examples are presented to relate both approaches, from an elementary point of view. INTRODUCTION The inferential statistical approach, in many school levels, is usually taught through the frequentist or classical point of view, from the theories developed by sir Ronald Fisher and Neyman and Pearson in the first half of the 20 th century. The Bayesian School, even if very old in its origin (Bayes Theorem, 18 th century), is now seeing a great development in terms of mathematical statistics but the same has not happened in the didactic and pedagogical environment. Classical statistics development was contemporary the philosophical approach of positivism and it may be thought that the former was influenced by positivistic ideas. Specifically, classical statistics rests on the belief of objectivity toward scientific results. The Bayesian school, on the other hand, had been criticized for its subjectivity and, due also to computational barriers, it was put aside. Nowadays, given the advance of computational resources, this last point is not evoked anymore. There is also the dominance of the classical approach in the scholar environment, not only in the available textbooks for inertial reasons, but also in research in different fields of knowledge. As the teachers have nothing different to present to their students, they almost always offer the classical version. Statistics deals with uncertainty and partial information and one of its scope should be to develop some degree of criticism in the students. However, most of the time, there is a kind of pragmatic usage among their users, in order to have a special rule which guarantees the establishment of the truth, searching definitive answers, typical of the positivist behaviour. Both the classical and the Bayesian approaches offer tools for data analysis and students should be capable to decide which of them suits better to solve an inferential problem. But, as the current paradigm is the classical school, the Bayesian theory concepts have too little attention. The disciplines in the field of the Statistics emphasise the concept of independence, whether among events or random variables, but rarely refer to the concept of exchangeability, which should be offered in the same circumstances. This is probably linked to the fact that both schools are put apart, the classical school using the notion of independence intensively, whereas the Bayesian one uses exchangeability. In classical statistics, the samples are often supposed to be formed by independent and identically distributed random variables (iid), while in Bayesian statistics they can only be considered as such if conditioned to the parameter value, which is based on the notion of exchangeability. The word independence, in its colloquial sense may mean free from whatever subordination, impartiality, self-sufficiency. However, this does not help to understand the probabilistic concept of independence, which is not of common knowledge. Every teacher of basic probability has faced the students difficulty to understand the probabilistic meaning of the 1

term independence, since this word is usually confused with the notion of exclusion (that is, there is some confusion between independence and incompatibility). Apart from this, the concept of independence itself keeps little connection with the student s life, which means it is an abstract concept, with certain empiric meaning, whose application and interpretation are not easy. On the other hand, the notion of permutability (also a mathematical one) has a quicker empiric interpretation. The term exchangeability means what is exchangeable: it is said of two factors that may be changed without affecting the results. Here the factors may be instances, as throws of a coin, for example. Such throws are exchangeable if the order in which they are done is irrelevant for the probabilities of possible outcomes the probabilities being the results. A judgment of exchangeability of instances is a kind of confession from the observer that he cannot distinguish among them, since he believes they are homogeneous. The examples that follow show that apart from the fact that independence is mathematically stronger than exchangeability this last notion is more concrete and, therefore, easier for students to grasp. However, the notion of exchangeability, introduced by De Finetti in 1930, and retrieved by Savage in 1950, is rarely included in basic courses, avoiding a precise registration of students difficulties. However, this word itself allows an almost automatic understanding of its meaning in terms of probability. This does not occur with the primitive notion of independence. In this article, we intend to introduce the concept of exchangeability to beginners in a basic level, showing them its simplicity and comparing it to the notion of independence. DEFINITIONS Independence Two events A and B from a sample space S are said to be independent if P(A and B) = P(A).P(B). If P(B)>0, the events A and B are said to be independent if P(A B)=P(A), which is equivalent to say the occurrence of B does not affect the probability of A. In general, a set of events A 1, A 2,..., A n (n 2) are said to be independent if P(Ai 1 Ai 2... Ai m ) = P(Ai 1 ), P(Ai 2 ),..., P(Ai m ), whatever be 1 i 1 < i 2 <... < i m n, for any m, with m = 2,3,...,n. In a sequence of events A 1,A 2,..., they are said to be mutually independent if, whatever n 2, the events A 1, A 2,..., A n are independent. The terms stochastically independent, statistically independent and collectively independent may be used interchangeably. Exchangeability Two events A and B are said to be exchangeable if P(A c B) = P(AB c ), which means there is indifference with respect to the order, because both intersections describe the occurrence of exactly one of the two events, either at the 1 st or at the 2 nd. instance. A finite sequence of events A 1, A 2,...,A n is said to be exchangeable when the probability of occurrence of exactly k of them, whatever order, is always the same, ( k n). It can still be said that a sequence of events A 1, A 2,... is exchangeable whenever A 1, A 2,..., A n are exchangeable, for every n 1. Intuitively, it means that in n throws of the same coin, each particular sequence of Heads and Tails with m Heads and n-m Tails has the same probability, m n. The order is irrelevant. Both definitions were shown for a sequence of events and may be enlarged for a sequence of random variables (see Bernardo and Smith, 1994, for example). It is not so easy to understand the concept of independence. The introduction of the notion of exchangeability mathematically weaker could help. For example it can be shown that withdrawals with replacement from an urn with unknown composition are just exchangeable and not independent. EXAMPLES Example 1 Consider random sampling without replacement of marbles from an urn having known composition, 10 red and 5 white marbles. The marbles are selected one by one and let us define 2

the event R i as occurring when the i th sampled marble is red. Analogously, W can be used for the white marble related event. Thus, R 1 = red marble at the 1 st selection R 2 = red marble at the 2 nd selection R 3 = red marble at the 3 rd selection... R k = red marble at the k th selection The calculation of related probabilities is sometimes hard for beginning students to grasp, especially when there is too much combinatorial reasoning involved. To obtain the probabilities P(R 1 ), P(R 2 ), P(R 3 ), P(R 1 R 2 ), P(R 2 R 3 ), P(R 1 R 3 ), P(R 2 R 1 ), it may be possible to use a tree diagram for better motivation. A tree diagram in such small sized problems is usually easy to construct. The probabilities values may be easily obtained from the tree diagram constructed for the marbles selected without replacement from the urn: P(R 1 ) = 10/15 P(R 2 )=(10/15)(9/14)+(5/15)(10/14)=10/15 P(R 3 )=...=10/15 P(R 1 R 2) = (10/15) (9/14) = 3/7 P(R 2R 3) =... = 3/7 P(R 1 R 3 )=3/7 P(R 2 R 1)=9/14. As P(R 2 R 1 ) = 9/14 10/15 = P(R 2 ), R 2 and R 1 are not independent, that is, to inform that R 1 has occurred modifies the probability of R 2. But the probabilities P(R i ) still equal 10/15, for i=1, 2, 3. Furthermore, all intersection probabilities, taken pairwise, of the events R j, equal 3/7. This happens as the said events are exchangeable. This is a well-known example of dependence, with exchangeability: the selections are indistinguishable but nevertheless dependent. Example 2 Let us consider the same urn of the previous example, that is, an urn having 10 red marbles and 5 white marbles. The marbles are now randomly sampled with replacement from the urn. Letting again the event R i occur when the i th sampled marble is red, the probabilities P(R 1 ), P(R 2 ), P(R 3 ), P(R 1 R 2 ), P(R 2 R 3 ), P(R 1 R 3 ), P(R 2 R 1 ), 3

may be easily obtained from the same tree diagram with the appropriate modified values at every branch: P(R 1 ) = P(R 2 ) = P(R 3 ) = (10/15) = 2/3, and also P(R 1 R 2 ) = P(R 2 R 3 ) = P(R 1 R 3 ) = 4/9. And P(R 2 R 1 ) = (10/15) = 2/3. In this example we obtain the same probabilities P(R 1 ), P(R 2 ), and P(R 3 ) from the previous Example 1. Conditional probabilities, however, are now equal to the unconditional probabilities, e.g., P(R 2 R 1 ) = P(R 2 ). This is then a situation where the events are not only exchangeable, but also independent. It should be stressed that in both examples above the urn has known composition. If the composition is unknown, the events are just exchangeable not independent even when draws are made with replacement. DISCUSSION The property that defines exchangeability of n events is that the occurrence of any intersection of k such events has the same probability, i.e., this probability does not depend on their position, but exclusively on k (and a sequence is exchangeable if the property holds for every n). It is a property of symmetry with respect to their order (or their labels). In the examples presented above, the events are exchangeable, even for urns with unknown composition. According to Barnett (1982), the concept of exchangeability plays the same role in the subjectivist theory as random sampling in Von Mises frequentist theory. In other words, it captures the notion of a sequence of similar events. Exchangeability turns the classical i.i.d assumptions into a more realistic structure. Furthermore, Bayesian statistical models use exchangeability instead of independence the former is weaker. For example, exchangeable withdrawals are used as an assumption in Bayesian modeling the same way as independent withdrawals are used in classical statistics. Barnett also points out that inference under exchangeable events is usually robust relative to different a priori probability measures that may be expressed by different persons. Barnett has in mind the Law of Large Numbers, which is valid for sequences of exchangeable events (sequences of relative frequencies from exchangeable events converge with probability 1 to an unknown, random, limit). In particular, this teaching approach eases the understanding of the concept of independence, stressing its geometric or mathematical foundation: As in Geometry, where, Pythagoras Theorem, say, becomes the focus, with the numerical values of a, b and c being irrelevant, in Probability Calculus the numerical values of distribution parameters are irrelevant and attention is given to theorems like, e.g., Chebyshev s Inequality. In Statistical Inference, however, not knowing the numerical values of parameters makes an assumption of independence unrealistic. For example, if the probability p of Heads of a coin is (as is the case in Statistical Inference) unknown, predictions about future results are very influenced (dependent on) by knowledge and learning of past results, contradicting the very definition of independence! In other words, in the Calculus of Probability, the assumption of independence (of coin tosses, say) is innocuous relative to building connections to the real world, whereas in Statistical Inference such an assumption is absurd and, therefore, poorly understood by students. The correct way of teaching is to show that coin tosses are independent, given numerical values (of p). Without knowledge of the numerical value of p, what is to be taught is the exchangeability of the tosses, which in turn implies the aforementioned conditional independence. This is the essence of the celebrated Bruno De Finetti s Representation Theorem from 1937 (see Kotz and Johnson, 1993, for example). After long experience, even without a formal research, the authors suggest trying this approach to avoid initial difficulties (which includes semantic confusion, like independence and disjoint events). It is easier for students to understand the ampler concept of exchangeability 4

(indistinguishability of instances) than the concept of independence. As a consequence, it becomes advantageous to teach exchangeability before independence, the latter being presented as a special case of the former. ACKNOWLEDGEMENTS The authors thank the referees for their careful reading and review, which led to an improved final version. Note: An earlier version of this article has been accepted for publication in Spanish Language in Alternativas (Argentina) although the publishing date has not been fixed yet. REFERENCES Barnett, V. (1982). Comparative Statistical Inference (2 nd. edition). Chichester: Wiley. Bernardo, J. M. and Smith, A. F. M. (1994). Bayesian Theory. New York: Wiley. Kotz, S. and Johnson, N. L. (Eds.). (1993). Breakthroughs in Statistics. Volume I. New York: Springer-Verlag New York, Inc. 5