Basic Probability and Information Theory: quick revision
|
|
- Claire Golden
- 6 years ago
- Views:
Transcription
1 Basic Probability and Information Theory: quick revision ML for NLP Lecturer: S Luz February 17, 2015 In these notes we review the basics of probability theory and introduce the information theoretic notions which are essential to many aspects of machine learning in general, and to the induction of text classifiers in particular. We will attempt to illustrate the abstract concepts introduced in these notes with examples from a text categorization task (e.g. spam filtering). The reader should note, however, that the techniques and results reported here apply to a much wider domain of applications. The connection with TC will become clearer next week when we will see feature selection and classifier induction. Concise but useful introductions to probability theory can be found in (Russell and Norvig, 1995, chapter 14) and (Manning and Schütze, 1999, chapter 2). The latter also includes the basics of information theory, viewed mainly from a natural language processing perspective. A very good general introduction to probability theory is (Bertsekas and Tsitsiklis, 2002). Why review Probability and Information Theory? Probability theory gives us a tool to model uncertainty Probabilistic approaches (e.g. naive Bayes) are used in TC. Information Theory plays an important role in various areas of machine learning, in particular: Feature selection uses the information theoretic notions of information gain and mutual information 1
2 Learning algorithms, such as the decision tree induction, use the information theoretic concept of entropy to decide how to partition the document space. Information theory originated from Claude Shannon s research on the capacity of noisy information channels. Information theory is concerned with maximising the information one can transmit over an imperfect communication channel. The central concept of Information Theory is that of Entropy. Entropy (which we will define formally below) measures the amount of uncertainty in a probability distribuition. Reportedly, the term entropy was suggested to Shannon by John von Newman : You should call it entropy for two reasons: first, the function is already in use in thermodynamics under the same name; second, and more importantly, most people don t know what entropy really is, and if you use the word entropy in an argument you will win every time (Hamming, 1991). Probability theory: notation Notation Set Jargon Probability jargon Ω collection of objects sample space ω element of Ω elementary event D subset of Ω event that some outcome in D occurs D complement of D event that no outcome in D occurs D E intersection both D and E D E union D or E, or both D\E difference D but not E D E inclusion if D then E empty set impossible event Ω whole space certain event A notational variant to the above which stresses the connection with logic would treat set intersection as conjunction, set union as disjunction, etc. This variant is summarised below: Logic Set theory P (A B) P (A B) P (A B) P (A B) P (false) P ( ) P (true) P (Ω) 2
3 Sample spaces The set of all possible outcomes of an experiment is called sample space in Text Categorisation, for instance, one could regard the set of documents being processed as the sample space: Ω = D = {d 1,..., d Ω } However, one could alternatively take sets of documents rather than sets (or multisets, lists etc) of words to be elementary events. In this case, Ω = 2 D would be the sample space. An experiment could be performed to determine, for instance, which documents belong to category c (say, that should be classified as spam). The outcome of that experiment would be a subset of Ω. Different ways of characterising sample spaces in TC will be presented in the lecture on Naive Bayes Text Categorisation. Here is an even more mundane example: the combinations of heads (H) and tails (T) resulting from tossing a coin three times can be represented by the following sample space: Ω = {HHH, HHT, HT H, HT T, T HH, T T H, T HT, T T T } Discrete Uniform Probability Law Now that we have characterised the sample space, we would like to be able to quantify the likelihood of events. Discrete uniform probability law: If the sample space consists of n possible outcomes which are equally likely, then the probability of any event D is given by P (D) = No. of elements of D n One can think of the probability of occurrence of an event D as the proportion of times event D occurs in a large number of trials: P (D) = No. of occurrences of D No. of trials (1) 3
4 The view of probabilities implicit in the view adopted above has been termed frequentist. It relies on the empirical observation that the ratio between observed occurrences of an event and the number of trials appears to converge to a limit as the number of trials increases. The frequentist approach is inadequate in many ways, but a thorought discussion of its merits and limitations is beyond the scope of this revision. For a very readable discussion of the philosophy of probability theory, see (Hamming, 1991). We may illustrate this approach by calculating the probability that a document of corpus Ω is filed under category c as follows: where D = {ω Ω : f(ω, c) = T } P (c) = D Ω Visualisating sample spaces Sample spaces can be depicted in different ways. For events described by two rolls of a die, the sample space could be depicted as a grid: E2= {at least one roll is 5} In experiments of a sequential nature such as this, a tree representation is also informative.... 2nd roll... 1st roll E1= {same result in both rolls} P (E 1 ) = 1 6, P (E 2) = st roll nd roll (adapted from (Bertsekas and Tsitsiklis, 2002)). Sample space, in somewhat more formal terms Defining σ-field: 4
5 Example: A collection F of subsets of Ω is called a σ-field if it satisfies the following conditions: 1. F 2. if D 1,..., D n F then n i=1 D i F 3. if D F then D F the smallest σ-field associated with Ω... is the collection F = {,Ω}. Probability spaces We continue to add structure to our original set of events by defining P as a probability measure: A Probability Measure P on < Ω, F > is a function P : F [0, 1] satisfying: 1. P (Ω) = 1 2. if D 1, D 2,... is a collection of disjoint members of F, in that D i D j = for all i j, then P ( D i ) = i=1 P (D i ) i=1 The triple < Ω, F, P > is called a Probability Space In the definition of probability measure presented here, F is obviously to be understood as a σ-field. A probability measure is a special case of what is called in probability theory simply a measure. A measure is a function µ : F [0, ), satisfying (2) as above and µ( ) = 0. Some weight assignment functions, such as the ones often used in the decision trees are measures, though they are not probability measures. 5
6 Properties of probability spaces The following hold: P ( D) = 1 P (D) (2) If E D then P (E) = P (D) + P (E\D) P (D) (3) P (A B C) = P (A) + P (Ā B) + P (Ā B C) (4) Inclusion-exclusion principle: or, more generally: Proofs: P (D E) = P (D) + P (E) P (D E) (5) n P ( D i ) = P (D i ) P (D i D j ) i=1 i=1 i<j + P (D i D j D k )... i<j<k +( 1) n+1 P (D 1... D n ) (2) D D = Ω and D D =, so P (D D) = P (D) + P ( D) = 1 (3) If E D, then E = D (E\D), which is a union of disjoint sets. Therefore P (E) = P (D) + P (E\D) (5) The rationale for the Inclusion-exclusion principle is easy to visualise by drawing a Venn diagram of (possibly intersecting) sets D and E. Simply adding the probabilities of D and E is as if we are counting the probability of the intersection twice, so the result needs to be readjusted by subtracting the intersection: P (D E) = P ((D\E) (D E) (E\D)) (set theory) = P (D\E) + P (D E) + P (E\D) (disjoint sets) = P (D\E) + P (D E) + P (E\D) + P (D E) P (D E) (algebra) = P ((D\E) (D E)) + P ((E\D) (D E)) P (D E) (disjoint sets) = P (D) + P (E) P (D E) (set theory) 6
7 Conditional probability If P (E) > 0, then the conditional probability that D occurs given E is defined to be: P (D E) P (D E) = P (E) The diagram above illustrates conditional probabilities in terms of set size: If you see probability measures as frequencies/proportion of occurrence, then the conditional is given by D E E = D E Ω E Ω Properties of conditional probabilities = P (D E) P (E). 1. For any events D and E s.t. 0 < P (E) < 1, P (D) = P (D E)P (E) + P (D Ē)P (Ē) 2. More generally, if E 1,..., E n are partitions of Ω s.t. P (E i ) > 0, then n P (D) = P (D E i )P (E i ) 3. Chain rule: Proof: i=1 P (D 1... D n ) = P (D 1 )P (D 2 D 1 )P (D 3 D 2 D 1 ) D = (E D) (Ē D), which is a union of disjoint sets. Thus P (D) = P (E D) + P (Ē D) = P (D E)P (E) + P (D Ē)P (Ē) 7
8 Bayes rule Sometimes, as in the case of naive Bayes TC, it is easier to estimate the conditional probability of E given D than the other way around. In such cases, Bayes rule can be used to simplify computation: P (E D) = = P (E D) P (D) P (D E)P (E) P (D) (6) Proof: It follows trivially from the definition of conditional probability (slide 10) and the chain rule (slide 11). Independence In general, the occurrence of an event E changes the probability that another event D occurs. When this happens, the initial (prior) probability P (D) gets updated to P (D E). If the probability remains unchanged, i.e. P (D) = P (D E), then we call D and E independent: Events D 1 and D n are called independent if ( ) P D i = P (D i ) S {1, 2,..., n} i S i S E.g: Two fair coin tosses are independent. But note that, when we have more than two events, pairwise indepence does not imply indepence: from P (C A) = P (C) and P (C B) = P (C) you cannot conclude P (A B C) = P (A)P (B)P (C) Neither is the latter a sufficient condition for the indepence of A, B and C Examples (verify that they are actually the case, as an exercise): 1. Pairwise indepence does not imply indepence: A = { coin comes up heads on the first toss } B = { coin comes up heads on the second toss } C = { the two tosses have different results } 8
9 2. P (A B C) = P (A)P (B)P (C) is not sufficient for independence: Consider two throws of a fair die and the following events: A = { first roll is 1, 2 or 3 } B = { first roll is 3, 4, or 5 } C = { the sum of the two rolls is 9 } 3. Similarly for a set of random variables S = {X 1,..., X n }, P (X i Xj S\{X i } X j ) = P (X i ) does not imply independence for S: Again, consider two throws of a fair die and the following events: A = { first roll is 1, 3 or 4 } B = { first roll is 1, 2, or 4 } C = { the sum of the two rolls is 4 } (show that P (A B C) = P (A) and P (B A C) = P (B) and P (C B A) = P (C) but P (A B) P (A) etc) Conditional Independence Absolute independence (as described above) is a very strong requirement, which is seldom met. In practice, one often uses conditional independence: P (A B C) = P (A C)P (B C) (7) or, equivalently: P (A B C) = P (A C) (8) E.g.: Let A and B be two biased coins in that probab. of heads for A is.99 and for B.01. Choose a coin randomly (with a.5 probab. of choosing each) and toss it twice. The probabily of heads in the 2nd toss is not independent of the prob. of heads in the 1st, but they are independent given the choice of coin. Some exercises 1. The dangers of overtesting: Domingos (2012) review of ML used the following example to caution readers against overtesting:... a mutual fund that beats the market ten years in a row looks very impressive, until you realize that, if there are 1000 funds and each has a 50% chance of beating the market on any given year, its quite likely that one will succeed all ten times just by luck. 9
10 Question: What is the actual probability that one mutual fund will succeed all 10 times by luck? 2. Monty Hall (from Wikipedia): Suppose you re on a game show, and you re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, Do you want to pick door No. 2? Is it to your advantage to switch your choice? Question: The best strategy (as Wikipedia will tell you) is to switch. Give an explanation of why that is the case based on conditional probabilities and Bayes s rule. Random variables A random variable is a function X : Ω R with the property that {ω Ω : X(ω) x} F, for each x R Random variables offer a convenient way of abstracting over event spaces. E.g.: The notation P (X = x) is used to indicate the probability that a random variable X takes value x Another example: in categories over a, category can be seen as a random variable which returns the number of documents classified under a given category. Example (ctd.): Assume that 5 documents, out of a 20-document corpus Ω have been classified as spam. We are now regarding subsets of Ω (possibly the entire power set of Ω) defined by, for instance, the categories assigned to their elements as a σ-field, and the resulting triple < Ω, F, P > as a probability space. Events in such probability space will be things that denote 10
11 elements of F, such as the event that documents have been filed under category C. Lets also assume that category spam denotes a set {d 1,..., d 5 } F. The probability associated with the event that documents have been filed under category C is summarised in the random variable and is given by P (C). The probability associated with the event that documents have been filed under category spam is given by a specific value of the random variable (recall that what we are calling variable here is actually a function) is P (C = spam) = Discrete Random Variables A discrete random variable is a random variable whose range is finite (or countably infinite) A discrete random variables are associated with a probability mass function (PMF) A PMF maps each numerical value that a random variable can take to a probability A function of a discrete random variable defines another discrete random variable The PMF of this new random variable can be obtained from the PMF of the original one. Thus a random variable can be conditioned on another random variable (or on an event), and the notions of independence and conditional independence seen above also apply. Probability mass functions The PMF of a random variable X is the function p : R [0, 1], given by p(x) = P (X = x) For a discrete random variable: i N p(x i ) = i P (A xi ) = P (Ω) = 1 where A xi = {ω Ω : X(ω) = x i } So, to calculate the PMF of X we add the probabilities of all events X = x for each possible value x to get p(x). 11
12 E.g.: If X is the number of heads obtained in two tosses of a fair coin, its PMF is:.25 if x = 0 or x = 2 p(x) =.5 if x = 1 0 otherwise So the probability of at least one head is P (X > 0) =.75 Continuous Random Variables A continuous random variable is a random variable whose range is continous (e.g. velocity, time intervals etc) Variable X is called continous if there is a function f X of X s.t., for every subset B of R: P (X B) = f X (x)dx E.g.: the probability that X falls within interval [a, b] is P (a X b) = b a f(x)dx f X is called the probability density function (PDF) of X provided that it is non-negative and has the normalisation property: B f X (x)dx = P ( X ) = 1 Note that for a single value v, P (X = v) = v v f X(x)dx = 0, so the probability that X falls within interval [a, b] is the same as the probability that X falls within [a, b), (a, b] or (a, b) (i.e. it makes no difference wheter the endpoints are included or not. The probability that X falls within interval [a, b] can be interpreted as the area under the PDF curve for the interval. Cumulative Distribution Functions Cumulative Distribution Functions (CDF) subsume PDFs and PMFs under a single concept. The CDF F X of X gives the probability P (X x) so that for every x: p(k) if X is discrete k x F X (x) = P (X x) = x f X (t)dt if X is continuous 12
13 Since X x is always an event (having therefore a well defined probaility), every random variable X associated with a given probability model has a CDF. Moments, expectation, mean, variance The expected value of a discrete random variable X with PMF p is given by E[X] = x p(x)x For a continuous variable with PDF f we have E[X] = xf(x)dx This is AKA the expectation, mean or the first moment of X. In general, we define the n th moment as E[X n ] The variance of a random variable is defined as the expectation of the random variable (X E[X]) 2 : var(x) = E[(X E[X]) 2 x p(x)(x E[X])2 if X is discrete ] = (x E[X])2 f(x)dx if X is continuous Some Discrete Random Variables Bernoulli (parameter p): success (or failure) in a single trial: { p if k = 1 p(k) = E[X] = p, var(x) = p(1 p) 1 p if k = 0 Binomial (parameters p and n): successes in n independent Bernoulli trials: ( ) n p(k) = p k (1 p) n k, k = 0, 1,..., n k Geometric (parameter p): number of trials until first success: E[X] = np, var(x) = np(1 p) p(k) = (1 1) k 1 p, k = 0, 1,... E[X] = 1 p, var(x) = 1 p p 2 Poisson (parameter λ): approximation of binomial PMF when n is large, p is small and λ = np: p(k) = e λ k 1 λk k! k = 0, 1,... E[X] = λ, var(x) = λ 13
14 Some Continuous Random Variables Uniform (over interval [a, b]): f(x) = { 1 b a, if a x b 0, otherwise E[X] = a + b (b 2)2, var(x) = 2 12 Exponential (parameter λ): e.g. model time until some event occurs: { λe λx, if x 0 f(x) = 0, otherwise Geometric CDF Expoential CDF E[X] = 1 λ, var(x) = 1 λ 2 0 x Normal or Gaussian (parameters µ and σ 2 > 0): f(x) = 1 2πσ e (x µ)2 /2σ 2 E[X] = µ, var(x) = σ 2 Entropy Entropy, AKA self-information, measures the average amount of uncertainty in a probability mass distribution. In other words, entropy is a measure of how much we learn when we observe an event occurring in accordance with that distribution. The entropy of a random variable measures the amount of information in that variable (we will always be using log base 2 unless stated 14
15 otherwise): N.B.: We define 0 log(0) = 0 H(X) = H(p) = x X p(x) log p(x) = x X p(x) log 1 p(x) Example: suppose we have a set of documents D = {d 1,..., d n }, each classified according to whether it belongs or not to a certain category c, say, spam. First, suppose you know that all documents in D are filed under spam (we represent that as P (spam) = 1). How much information would we gain if someone told us that a certain document d i drawn randomly from corpus D has been filed under spam? Answer: zero, since we already knew this from the start! Now suppose that you know 80% of D (you incoming folder) is spam, and you randomly pick an message from D and find out that it is labelled spam. How much have you learned? Certainly more than before, although less than you would have learned if the proportion between spam and legitimate were In the former case there was less uncertainty involved than in the latter. Information gain We may also quantify the reduction of uncertainty of a random variable due to knowing about another. This is known as Expected Mutual Information: I(X; Y ) = IG(X, Y ) = x,y p(x, y) p(x, y) log p(x)p(y) (9) Entropies of different probability functions may also be compared by calculating the so called Information Gain. In decision tree learning, for instance: n G(D, F ) = H(t) p i H(t i ) (10) where t is the distribution of the mother, t i the distribution of daughter node i, and p i the proportion of texts passed to node i if term F is used to split corpus D. i=1 15
16 Information theory tries to quantify such uncertainties by assuming that the amount of information learned from an event is inversely proportional to the probability of its occurrence. So, in the case where there s a chance that d i will be spam, the amount learned from d i would be 1 i(p (C = spam)) = log P (d i = spam) = log = 1 i(.), as defined above, measures the uncertainty for a single value of random variable C. How would we measure uncertainty for all possible values of C (in this case, {spam, spam})? The answer is: we calculate the entropy of its probability mass function: H(C) = (p(spam) log p(spam) + p(spam)) log p(spam) = (0.5 log log 0.5) = 1 A more interesting corpus, where the probability of a document being labelled as spam is, say 0.25, would have entropy of H(C) = (0.25 log log 0.75) = N.B.: There s some confusion surrounding the related notions of Expected Mutual Information and Information Gain. The definition in (9) corresponds to what some call Information Gain (Sebastiani, 2002). For the purposes of choosing where to split the instances in decision trees, the definition of Information Gain used is the one in (10), as defined in (Manning and Schütze, 1999, ch. 2). We will reserve the term Expected Mutual Information I(X; Y ) for what Sebastiani (2002) call Information Gain, though we will sometimes write it IG(X, Y ). We will see details of information gain scores are used in decision tree induction when we review the topic next week. References Bertsekas, D. and Tsitsiklis, J. (2002). Introduction to Probability. Athena Scientific. Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10):
17 Hamming, R. W. (1991). The Art of Probability for Scientists and Engineers. Addison-Wesley. Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts. Russell, S. J. and Norvig, P. (1995). Artificial Intelligence. A Modern Approach. Prentice-Hall, Englewood Cliffs. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1):
Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationthe time it takes until a radioactive substance undergoes a decay
1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10
EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped
More informationWith Question/Answer Animations. Chapter 7
With Question/Answer Animations Chapter 7 Chapter Summary Introduction to Discrete Probability Probability Theory Bayes Theorem Section 7.1 Section Summary Finite Probability Probabilities of Complements
More informationPreliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com
1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix
More informationProbability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.
Probability Theory To start out the course, we need to know something about statistics and probability Introduction to Probability Theory L645 Advanced NLP Autumn 2009 This is only an introduction; for
More informationProbability. Lecture Notes. Adolfo J. Rumbos
Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................
More informationNorthwestern University Department of Electrical Engineering and Computer Science
Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationIntroduction to Probability Theory, Algebra, and Set Theory
Summer School on Mathematical Philosophy for Female Students Introduction to Probability Theory, Algebra, and Set Theory Catrin Campbell-Moore and Sebastian Lutz July 28, 2014 Question 1. Draw Venn diagrams
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationLECTURE 1. 1 Introduction. 1.1 Sample spaces and events
LECTURE 1 1 Introduction The first part of our adventure is a highly selective review of probability theory, focusing especially on things that are most useful in statistics. 1.1 Sample spaces and events
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationP (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).
Lectures 7-8 jacques@ucsdedu 41 Conditional Probability Let (Ω, F, P ) be a probability space Suppose that we have prior information which leads us to conclude that an event A F occurs Based on this information,
More informationProbability (Devore Chapter Two)
Probability (Devore Chapter Two) 1016-345-01: Probability and Statistics for Engineers Fall 2012 Contents 0 Administrata 2 0.1 Outline....................................... 3 1 Axiomatic Probability 3
More informationLecture 3. January 7, () Lecture 3 January 7, / 35
Lecture 3 January 7, 2013 () Lecture 3 January 7, 2013 1 / 35 Outline This week s lecture: Fast review of last week s lecture: Conditional probability. Partition, Partition theorem. Bayes theorem and its
More informationCSC Discrete Math I, Spring Discrete Probability
CSC 125 - Discrete Math I, Spring 2017 Discrete Probability Probability of an Event Pierre-Simon Laplace s classical theory of probability: Definition of terms: An experiment is a procedure that yields
More informationSTAT2201. Analysis of Engineering & Scientific Data. Unit 3
STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationCS 246 Review of Proof Techniques and Probability 01/14/19
Note: This document has been adapted from a similar review session for CS224W (Autumn 2018). It was originally compiled by Jessica Su, with minor edits by Jayadev Bhaskaran. 1 Proof techniques Here we
More informationProbability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008
Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize
More information1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).
CS 70 Discrete Mathematics for CS Spring 2006 Vazirani Lecture 8 Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According to clinical trials,
More informationRandom Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping
Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5 s in the 6 rolls. Let X = number of 5 s. Then X could be 0, 1, 2, 3, 4, 5, 6. X = 0 corresponds to the
More informationDiscrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations
EECS 70 Discrete Mathematics and Probability Theory Fall 204 Anant Sahai Note 5 Random Variables: Distributions, Independence, and Expectations In the last note, we saw how useful it is to have a way of
More informationElementary Discrete Probability
Elementary Discrete Probability MATH 472 Financial Mathematics J Robert Buchanan 2018 Objectives In this lesson we will learn: the terminology of elementary probability, elementary rules of probability,
More informationProbability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2
Probability Probability is the study of uncertain events or outcomes. Games of chance that involve rolling dice or dealing cards are one obvious area of application. However, probability models underlie
More informationLecture 1: Basics of Probability
Lecture 1: Basics of Probability (Luise-Vitetta, Chapter 8) Why probability in data science? Data acquisition is noisy Sampling/quantization external factors: If you record your voice saying machine learning
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More informationMAT 271E Probability and Statistics
MAT 71E Probability and Statistics Spring 013 Instructor : Class Meets : Office Hours : Textbook : Supp. Text : İlker Bayram EEB 1103 ibayram@itu.edu.tr 13.30 1.30, Wednesday EEB 5303 10.00 1.00, Wednesday
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationMAT 271E Probability and Statistics
MAT 7E Probability and Statistics Spring 6 Instructor : Class Meets : Office Hours : Textbook : İlker Bayram EEB 3 ibayram@itu.edu.tr 3.3 6.3, Wednesday EEB 6.., Monday D. B. Bertsekas, J. N. Tsitsiklis,
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationProbabilistic models
Kolmogorov (Andrei Nikolaevich, 1903 1987) put forward an axiomatic system for probability theory. Foundations of the Calculus of Probabilities, published in 1933, immediately became the definitive formulation
More informationCMPSCI 240: Reasoning Under Uncertainty
CMPSCI 240: Reasoning Under Uncertainty Lecture 5 Prof. Hanna Wallach wallach@cs.umass.edu February 7, 2012 Reminders Pick up a copy of B&T Check the course website: http://www.cs.umass.edu/ ~wallach/courses/s12/cmpsci240/
More informationProbability Theory and Simulation Methods
Feb 28th, 2018 Lecture 10: Random variables Countdown to midterm (March 21st): 28 days Week 1 Chapter 1: Axioms of probability Week 2 Chapter 3: Conditional probability and independence Week 4 Chapters
More informationLecture 3 Probability Basics
Lecture 3 Probability Basics Thais Paiva STA 111 - Summer 2013 Term II July 3, 2013 Lecture Plan 1 Definitions of probability 2 Rules of probability 3 Conditional probability What is Probability? Probability
More informationCS 361: Probability & Statistics
February 12, 2018 CS 361: Probability & Statistics Random Variables Monty hall problem Recall the setup, there are 3 doors, behind two of them are indistinguishable goats, behind one is a car. You pick
More informationStatistics 1 - Lecture Notes Chapter 1
Statistics 1 - Lecture Notes Chapter 1 Caio Ibsen Graduate School of Economics - Getulio Vargas Foundation April 28, 2009 We want to establish a formal mathematic theory to work with results of experiments
More informationConditional Probability
Conditional Probability Idea have performed a chance experiment but don t know the outcome (ω), but have some partial information (event A) about ω. Question: given this partial information what s the
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationFundamentals of Probability CE 311S
Fundamentals of Probability CE 311S OUTLINE Review Elementary set theory Probability fundamentals: outcomes, sample spaces, events Outline ELEMENTARY SET THEORY Basic probability concepts can be cast in
More informationIntroduction to probability theory
Introduction to probability theory Fátima Sánchez Cabo Institute for Genomics and Bioinformatics, TUGraz f.sanchezcabo@tugraz.at 07/03/2007 - p. 1/35 Outline Random and conditional probability (7 March)
More informationSingle Maths B: Introduction to Probability
Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More informationNotes 1 Autumn Sample space, events. S is the number of elements in the set S.)
MAS 108 Probability I Notes 1 Autumn 2005 Sample space, events The general setting is: We perform an experiment which can have a number of different outcomes. The sample space is the set of all possible
More informationWeek 2. Section Texas A& M University. Department of Mathematics Texas A& M University, College Station 22 January-24 January 2019
Week 2 Section 1.2-1.4 Texas A& M University Department of Mathematics Texas A& M University, College Station 22 January-24 January 2019 Oğuz Gezmiş (TAMU) Topics in Contemporary Mathematics II Week2 1
More information1 Presessional Probability
1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationClass 26: review for final exam 18.05, Spring 2014
Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event
More informationDiscrete Random Variables
Discrete Random Variables An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan 2014 Introduction The markets can be thought of as a complex interaction of a large number of random
More information1: PROBABILITY REVIEW
1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following
More information1 Random Variable: Topics
Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?
More informationLecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019
Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial
More informationRandom Variables Example:
Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5 s in the 6 rolls. Let X = number of 5 s. Then X could be 0, 1, 2, 3, 4, 5, 6. X = 0 corresponds to the
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationWeek 12-13: Discrete Probability
Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible
More informationRecap of Basic Probability Theory
02407 Stochastic Processes Recap of Basic Probability Theory Uffe Høgsbro Thygesen Informatics and Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: uht@imm.dtu.dk
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationBandits, Experts, and Games
Bandits, Experts, and Games CMSC 858G Fall 2016 University of Maryland Intro to Probability* Alex Slivkins Microsoft Research NYC * Many of the slides adopted from Ron Jin and Mohammad Hajiaghayi Outline
More informationACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability
ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER 2 2017/2018 DR. ANTHONY BROWN 5.1. Introduction to Probability. 5. Probability You are probably familiar with the elementary
More informationProperties of Probability
Econ 325 Notes on Probability 1 By Hiro Kasahara Properties of Probability In statistics, we consider random experiments, experiments for which the outcome is random, i.e., cannot be predicted with certainty.
More information2. AXIOMATIC PROBABILITY
IA Probability Lent Term 2. AXIOMATIC PROBABILITY 2. The axioms The formulation for classical probability in which all outcomes or points in the sample space are equally likely is too restrictive to develop
More informationDiscrete Probability Refresher
ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory
More informationRecap of Basic Probability Theory
02407 Stochastic Processes? Recap of Basic Probability Theory Uffe Høgsbro Thygesen Informatics and Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: uht@imm.dtu.dk
More informationProbability Theory and Applications
Probability Theory and Applications Videos of the topics covered in this manual are available at the following links: Lesson 4 Probability I http://faculty.citadel.edu/silver/ba205/online course/lesson
More informationStatistics for Economists. Lectures 3 & 4
Statistics for Economists Lectures 3 & 4 Asrat Temesgen Stockholm University 1 CHAPTER 2- Discrete Distributions 2.1. Random variables of the Discrete Type Definition 2.1.1: Given a random experiment with
More informationFormalizing Probability. Choosing the Sample Space. Probability Measures
Formalizing Probability Choosing the Sample Space What do we assign probability to? Intuitively, we assign them to possible events (things that might happen, outcomes of an experiment) Formally, we take
More informationLecture 9: Conditional Probability and Independence
EE5110: Probability Foundations July-November 2015 Lecture 9: Conditional Probability and Independence Lecturer: Dr. Krishna Jagannathan Scribe: Vishakh Hegde 9.1 Conditional Probability Definition 9.1
More informationDiscrete Random Variables
Discrete Random Variables An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan Introduction The markets can be thought of as a complex interaction of a large number of random processes,
More informationChapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 14 From Randomness to Probability Copyright 2012, 2008, 2005 Pearson Education, Inc. Dealing with Random Phenomena A random phenomenon is a situation in which we know what outcomes could happen,
More informationReview of Basic Probability Theory
Review of Basic Probability Theory James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 35 Review of Basic Probability Theory
More informationTopic 3: The Expectation of a Random Variable
Topic 3: The Expectation of a Random Variable Course 003, 2017 Page 0 Expectation of a discrete random variable Definition (Expectation of a discrete r.v.): The expected value (also called the expectation
More informationIntroduction to Probability 2017/18 Supplementary Problems
Introduction to Probability 2017/18 Supplementary Problems Problem 1: Let A and B denote two events with P(A B) 0. Show that P(A) 0 and P(B) 0. A A B implies P(A) P(A B) 0, hence P(A) 0. Similarly B A
More informationLecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability
Lecture Notes 1 Basic Probability Set Theory Elements of Probability Conditional probability Sequential Calculation of Probability Total Probability and Bayes Rule Independence Counting EE 178/278A: Basic
More informationProbability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...
Probability COMP 245 STATISTICS Dr N A Heard Contents Sample Spaces and Events. Sample Spaces........................................2 Events........................................... 2.3 Combinations
More informationChapter 8: An Introduction to Probability and Statistics
Course S3, 200 07 Chapter 8: An Introduction to Probability and Statistics This material is covered in the book: Erwin Kreyszig, Advanced Engineering Mathematics (9th edition) Chapter 24 (not including
More informationMonty Hall Puzzle. Draw a tree diagram of possible choices (a possibility tree ) One for each strategy switch or no-switch
Monty Hall Puzzle Example: You are asked to select one of the three doors to open. There is a large prize behind one of the doors and if you select that door, you win the prize. After you select a door,
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Week #1
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future
More informationLecture 4: Probability and Discrete Random Variables
Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 4: Probability and Discrete Random Variables Wednesday, January 21, 2009 Lecturer: Atri Rudra Scribe: Anonymous 1
More informationEcon 325: Introduction to Empirical Economics
Econ 325: Introduction to Empirical Economics Lecture 2 Probability Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 3-1 3.1 Definition Random Experiment a process leading to an uncertain
More informationWhat is a random variable
OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr
More informationFundamental Tools - Probability Theory II
Fundamental Tools - Probability Theory II MSc Financial Mathematics The University of Warwick September 29, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory II 1 / 22 Measurable random
More informationMathematical Foundations of Computer Science Lecture Outline October 18, 2018
Mathematical Foundations of Computer Science Lecture Outline October 18, 2018 The Total Probability Theorem. Consider events E and F. Consider a sample point ω E. Observe that ω belongs to either F or
More informationECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 1. Reminder and Review of Probability Concepts
ECO 317 Economics of Uncertainty Fall Term 2009 Notes for lectures 1. Reminder and Review of Probability Concepts 1 States and Events In an uncertain situation, any one of several possible outcomes may
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationStatistics for Financial Engineering Session 2: Basic Set Theory March 19 th, 2006
Statistics for Financial Engineering Session 2: Basic Set Theory March 19 th, 2006 Topics What is a set? Notations for sets Empty set Inclusion/containment and subsets Sample spaces and events Operations
More informationMath 105 Course Outline
Math 105 Course Outline Week 9 Overview This week we give a very brief introduction to random variables and probability theory. Most observable phenomena have at least some element of randomness associated
More informationINF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning
1 INF4080 2018 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning 2 Probability distributions Lecture 5, 5 September Today 3 Recap: Bayes theorem Discrete random variable Probability distribution Discrete
More informationDecision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation
Decision making and problem solving Lecture 1 Review of basic probability Monte Carlo simulation Why probabilities? Most decisions involve uncertainties Probability theory provides a rigorous framework
More informationProbability, Random Processes and Inference
INSTITUTO POLITÉCNICO NACIONAL CENTRO DE INVESTIGACION EN COMPUTACION Laboratorio de Ciberseguridad Probability, Random Processes and Inference Dr. Ponciano Jorge Escamilla Ambrosio pescamilla@cic.ipn.mx
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 5 Spring 2006
Review problems UC Berkeley Department of Electrical Engineering and Computer Science EE 6: Probablity and Random Processes Solutions 5 Spring 006 Problem 5. On any given day your golf score is any integer
More informationChapter 4: An Introduction to Probability and Statistics
Chapter 4: An Introduction to Probability and Statistics 4. Probability The simplest kinds of probabilities to understand are reflected in everyday ideas like these: (i) if you toss a coin, the probability
More informationSTAT 712 MATHEMATICAL STATISTICS I
STAT 72 MATHEMATICAL STATISTICS I Fall 207 Lecture Notes Joshua M. Tebbs Department of Statistics University of South Carolina c by Joshua M. Tebbs TABLE OF CONTENTS Contents Probability Theory. Set Theory......................................2
More informationConditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom
Conditional Probability, Independence and Bayes Theorem Class 3, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of conditional probability and independence of events. 2.
More informationStatistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions
Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions 1999 Prentice-Hall, Inc. Chap. 4-1 Chapter Topics Basic Probability Concepts: Sample
More information