Mathematical Foundations

Similar documents
Expectations and Entropy

Random Variables and Events

Some Probability and Statistics

Some Probability and Statistics

Discrete Probability Distributions

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #2 Scribe: Ellen Kim February 7, 2008

Recitation 2: Probability

CMPSCI 240: Reasoning about Uncertainty

Chapter 6: Probability The Study of Randomness

Fundamentals of Probability CE 311S

Probability Year 10. Terminology

Formalizing Probability. Choosing the Sample Space. Probability Measures

Probability Year 9. Terminology

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Introduction to probability and statistics

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

CS 361: Probability & Statistics

Probability Distributions: Continuous

Review Basic Probability Concept

Markov Chains. Chapter 16. Markov Chains - 1

Econ 325: Introduction to Empirical Economics

Outline Conditional Probability The Law of Total Probability and Bayes Theorem Independent Events. Week 4 Classical Probability, Part II

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Lecture 1: Probability Fundamentals

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Basic Concepts of Probability

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Probability - Lecture 4

2011 Pearson Education, Inc

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

3.2 Probability Rules

Review of probability

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Ch 14 Randomness and Probability

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Brief Review of Probability

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Probability Basics. Part 3: Types of Probability. INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

V7 Foundations of Probability Theory

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Basic Probability and Statistics

1 Probability Theory. 1.1 Introduction

Basic Concepts of Probability

Properties of Probability

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

1. Consider a random independent sample of size 712 from a distribution with the following pdf. c 1+x. f(x) =

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Probability & Random Variables

18.05 Practice Final Exam

UNIT 5 ~ Probability: What Are the Chances? 1

Machine Learning Recitation 8 Oct 21, Oznur Tastan

10601 Machine Learning

Mathematical Probability

Information Theory and Communication

Outline. Probability. Math 143. Department of Mathematics and Statistics Calvin College. Spring 2010

Basic Probability. Introduction

4/17/2012. NE ( ) # of ways an event can happen NS ( ) # of events in the sample space

Notes on Probability

Topic 2 Probability. Basic probability Conditional probability and independence Bayes rule Basic reliability

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation

7.1 What is it and why should we care?

What is a random variable

Probability theory basics

HW2 Solutions, for MATH441, STAT461, STAT561, due September 9th

02 Background Minimum background on probability. Random process

Probability the chance that an uncertain event will occur (always between 0 and 1)

STAT 111 Recitation 1

Exercises. Class choose one. 1.11: The birthday problem. 1.12: First to fail: Weibull.

Multivariate probability distributions and linear regression

Discrete Random Variables

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Joint Probability Distributions and Random Samples (Devore Chapter Five)

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.

1. Basics of Information

Independence. P(A) = P(B) = 3 6 = 1 2, and P(C) = 4 6 = 2 3.

Lecture #13 Tuesday, October 4, 2016 Textbook: Sections 7.3, 7.4, 8.1, 8.2, 8.3

STAT Chapter 3: Probability

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Review of Probability Mark Craven and David Page Computer Sciences 760.

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 16 Notes. Class URL:

Problem # Number of points 1 /20 2 /20 3 /20 4 /20 5 /20 6 /20 7 /20 8 /20 Total /150

Introduction to probability

Probability Theory Review

Discrete Probability. Chemistry & Physics. Medicine

Statistics for Managers Using Microsoft Excel (3 rd Edition)

Statistics for Business and Economics

Today we ll discuss ways to learn how to think about events that are influenced by chance.

More on Distribution Function

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Motivation. Stat Camp for the MBA Program. Probability. Experiments and Outcomes. Daniel Solow 5/10/2017

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

What does independence look like?

Transcription:

Mathematical Foundations Introduction to Data Science Algorithms Michael Paul and Jordan Boyd-Graber JANUARY 23, 2017 Introduction to Data Science Algorithms Boyd-Graber and Paul Mathematical Foundations 1 of 3

By the end of today... You ll be able to apply the concepts of distributions, independence, and conditional probabilities You ll be able to derive joint, marginal, and conditional probabilites from each other You ll be able to compute expectations and entropies Introduction to Data Science Algorithms Boyd-Graber and Paul Mathematical Foundations 2 of 3

Preface: Why make us do this? Probabilities are the language we use to describe data A reasonable (but geeky) definition of data science is how to get probabilities we care about from data Later classes will be about how to do this for different probability models But first, we need key definitions of probability (and it makes more sense to do it all at once) Introduction to Data Science Algorithms Boyd-Graber and Paul Mathematical Foundations 3 of 3

Preface: Why make us do this? Probabilities are the language we use to describe data A reasonable (but geeky) definition of data science is how to get probabilities we care about from data Later classes will be about how to do this for different probability models But first, we need key definitions of probability (and it makes more sense to do it all at once) So pay attention! Introduction to Data Science Algorithms Boyd-Graber and Paul Mathematical Foundations 3 of 3

Mathematical Foundations INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber JANUARY 23, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 1 of 4

Preface: Why make us do this? Probabilities (along with statistics) are the language we use to describe data A reasonable (but geeky) definition of data science is how to get probabilities we care about from data Later classes will be about how to do this for different probability models But first, we need key definitions of probability (and it makes more sense to do it all at once) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 2 of 4

Preface: Why make us do this? Probabilities (along with statistics) are the language we use to describe data A reasonable (but geeky) definition of data science is how to get probabilities we care about from data Later classes will be about how to do this for different probability models But first, we need key definitions of probability (and it makes more sense to do it all at once) So pay attention! INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 2 of 4

By the end of today... You ll be able to apply the concepts of distributions, independence, and conditional probabilities You ll be able to derive joint, marginal, and conditional probabilities from each other You ll be able to compute expectations and entropies INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 4

Engineering rationale behind probabilities Encoding uncertainty Data are variables We don t always know the values of variables Probabilities let us reason about variables even when we are uncertain INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 4 of 4

Engineering rationale behind probabilities Encoding uncertainty Data are variables We don t always know the values of variables Probabilities let us reason about variables even when we are uncertain Encoding confidence The flip side of uncertainty Useful for decision making: should we trust our conclusion? We can construct probabilistic models to boost our confidence E.g., combining polls INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 4 of 4

Mathematical Foundations INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 1 of 7

Random variable Probability is about random variables. A random variable is any probabilistic outcome. Examples of variables: Yesterday s high temperature The height of someone Examples of random variables: Tomorrow s high temperature The height of someone chosen randomly from a population INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 2 of 7

Random variable Probability is about random variables. A random variable is any probabilistic outcome. Examples of variables: Yesterday s high temperature The height of someone Examples of random variables: Tomorrow s high temperature The height of someone chosen randomly from a population We ll see that it s sometimes useful to think of quantities that are not strictly probabilistic as random variables. The high temperature on 03/04/1905 The number of times streetlight appears in a document INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 2 of 7

Random variable Random variables take on values in a sample space. They can be discrete or continuous: Coin flip: {H,T } Height: positive real values (0, ) Temperature: real values (, ) Number of words in a document: Positive integers {1,2,...} We call the outcomes events. Denote the random variable with a capital letter; denote a realization of the random variable with a lower case letter. E.g., X is a coin flip, x is the value (H or T ) of that coin flip. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 7

Discrete distribution A discrete distribution assigns a probability to every event in the sample space For example, if X is a coin, then P(X = H) = 0.5 P(X = T) = 0.5 And probabilities have to be greater than or equal to 0 Probabilities of disjunctions are sums over part of the space. E.g., the probability that a die is bigger than 3: P(D > 3) = P(D = 4) + P(D = 5) + P(D = 6) The probabilities over the entire space must sum to one INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 4 of 7

Discrete distribution A discrete distribution assigns a probability to every event in the sample space For example, if X is a coin, then P(X = H) = 0.5 P(X = T) = 0.5 And probabilities have to be greater than or equal to 0 Probabilities of disjunctions are sums over part of the space. E.g., the probability that a die is bigger than 3: P(D > 3) = P(D = 4) + P(D = 5) + P(D = 6) The probabilities over the entire space must sum to one INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 4 of 7

Discrete distribution A discrete distribution assigns a probability to every event in the sample space For example, if X is a coin, then P(X = H) = 0.5 P(X = T) = 0.5 And probabilities have to be greater than or equal to 0 Probabilities of disjunctions are sums over part of the space. E.g., the probability that a die is bigger than 3: P(D > 3) = P(D = 4) + P(D = 5) + P(D = 6) The probabilities over the entire space must sum to one P(X = x) = 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 4 of 7

Discrete distribution A discrete distribution assigns a probability to every event in the sample space For example, if X is a coin, then P(X = H) = 0.5 P(X = T) = 0.5 And probabilities have to be greater than or equal to 0 Probabilities of disjunctions are sums over part of the space. E.g., the probability that a die is bigger than 3: P(D > 3) = P(D = 4) + P(D = 5) + P(D = 6) The probabilities over the entire space must sum to one P(X = x) = 1 x INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 4 of 7

Events An event is a set of outcomes to which a probability is assigned drawing a black card from a deck of cards drawing a King of Hearts Intersections and unions: Intersection: drawing a red and a King P(A B) (1) Union: drawing a spade or a King P(A B) = P(A) + P(B) P(A B) (2) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 5 of 7

Events An event is a set of outcomes to which a probability is assigned drawing a black card from a deck of cards drawing a King of Hearts Intersections and unions: Intersection: drawing a red and a King P(A B) (1) Union: drawing a spade or a King B Intersection of A and B A P(A B) = P(A) + P(B) P(A B) (2) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 5 of 7

Events An event is a set of outcomes to which a probability is assigned drawing a black card from a deck of cards drawing a King of Hearts Intersections and unions: Intersection: drawing a red and a King P(A B) (1) Union: drawing a spade or a King B Union of A and B A P(A B) = P(A) + P(B) P(A B) (2) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 5 of 7

Joint distribution Typically, we consider collections of random variables. The joint distribution is a distribution over the configuration of all the random variables in the ensemble. For example, imagine flipping 4 coins. The joint distribution is over the space of all possible outcomes of the four coins. P(HHHH) = 0.0625 P(HHHT) = 0.0625 P(HHTH) = 0.0625... You can think of it as a single random variable with 16 values. INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 6 of 7

Visualizing a joint distribution ~x x ~x, ~y ~x, y x, y x, ~y INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 7 of 7

Mathematical Foundations INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 1 of 3

Marginalization If we know a joint distribution of multiple variables, what if we want to know the distribution of only one of the variables? We can compute the distribution of P(X) from P(X,Y,Z) through marginalization: P(X,Y = y,z = z) = P(X)P(Y = y,z = z X) y z y z = P(X) P(Y = y,z = z X) = P(X) y z INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 2 of 3

Marginalization If we know a joint distribution of multiple variables, what if we want to know the distribution of only one of the variables? We can compute the distribution of P(X) from P(X,Y,Z) through marginalization: P(X,Y = y,z = z) = P(X)P(Y = y,z = z X) y z y z = P(X) P(Y = y,z = z X) = P(X) We ll explain this notation more next week for now the formula is the most important part. y z INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 2 of 3

Marginalization (from Leyton-Brown) Joint distribution temperature (T) and weather (W) T=Hot T=Mild T=Cold W=Sunny.10.20.10 W=Cloudy.05.35.20 Marginalization allows us to compute distributions over smaller sets of variables: Marginalize out weather Marginalize out temperature P(X,Y) = z P(X,Y,Z = z) Corresponds to summing out a table dimension New table still sums to 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 3

Marginalization (from Leyton-Brown) Joint distribution temperature (T) and weather (W) T=Hot T=Mild T=Cold W=Sunny.10.20.10 W=Cloudy.05.35.20 Marginalization allows us to compute distributions over smaller sets of variables: P(X,Y) = z P(X,Y,Z = z) Corresponds to summing out a table dimension Marginalize out weather T=Hot T=Mild T=Cold Marginalize out temperature New table still sums to 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 3

Marginalization (from Leyton-Brown) Joint distribution temperature (T) and weather (W) T=Hot T=Mild T=Cold W=Sunny.10.20.10 W=Cloudy.05.35.20 Marginalization allows us to compute distributions over smaller sets of variables: P(X,Y) = z P(X,Y,Z = z) Corresponds to summing out a table dimension Marginalize out weather T=Hot T=Mild T=Cold Marginalize out temperature New table still sums to 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 3

Marginalization (from Leyton-Brown) Joint distribution temperature (T) and weather (W) T=Hot T=Mild T=Cold W=Sunny.10.20.10 W=Cloudy.05.35.20 Marginalization allows us to compute distributions over smaller sets of variables: P(X,Y) = z P(X,Y,Z = z) Corresponds to summing out a table dimension Marginalize out weather T=Hot T=Mild T=Cold.15 Marginalize out temperature New table still sums to 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 3

Marginalization (from Leyton-Brown) Joint distribution temperature (T) and weather (W) T=Hot T=Mild T=Cold W=Sunny.10.20.10 W=Cloudy.05.35.20 Marginalization allows us to compute distributions over smaller sets of variables: P(X,Y) = z P(X,Y,Z = z) Corresponds to summing out a table dimension Marginalize out weather T=Hot T=Mild T=Cold.15.55.30 Marginalize out temperature New table still sums to 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 3

Marginalization (from Leyton-Brown) Joint distribution temperature (T) and weather (W) T=Hot T=Mild T=Cold W=Sunny.10.20.10 W=Cloudy.05.35.20 Marginalization allows us to compute distributions over smaller sets of variables: P(X,Y) = z P(X,Y,Z = z) Corresponds to summing out a table dimension Marginalize out weather T=Hot T=Mild T=Cold.15.55.30 Marginalize out temperature W=Sunny W=Cloudy New table still sums to 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 3

Marginalization (from Leyton-Brown) Joint distribution temperature (T) and weather (W) T=Hot T=Mild T=Cold W=Sunny.10.20.10 W=Cloudy.05.35.20 Marginalization allows us to compute distributions over smaller sets of variables: P(X,Y) = z P(X,Y,Z = z) Corresponds to summing out a table dimension Marginalize out weather T=Hot T=Mild T=Cold.15.55.30 Marginalize out temperature W=Sunny W=Cloudy New table still sums to 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 3

Marginalization (from Leyton-Brown) Joint distribution temperature (T) and weather (W) T=Hot T=Mild T=Cold W=Sunny.10.20.10 W=Cloudy.05.35.20 Marginalization allows us to compute distributions over smaller sets of variables: P(X,Y) = z P(X,Y,Z = z) Corresponds to summing out a table dimension Marginalize out weather T=Hot T=Mild T=Cold.15.55.30 Marginalize out temperature W=Sunny.40 W=Cloudy New table still sums to 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 3

Marginalization (from Leyton-Brown) Joint distribution temperature (T) and weather (W) T=Hot T=Mild T=Cold W=Sunny.10.20.10 W=Cloudy.05.35.20 Marginalization allows us to compute distributions over smaller sets of variables: P(X,Y) = z P(X,Y,Z = z) Corresponds to summing out a table dimension Marginalize out weather T=Hot T=Mild T=Cold.15.55.30 Marginalize out temperature W=Sunny.40 W=Cloudy.60 New table still sums to 1 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 3

Mathematical Foundations INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 1 of 4

Independence Random variables X and Y are independent if and only if P(X = x,y = y) = P(X = x)p(y = y). Mathematical examples: If I flip a coin twice, is the second outcome independent from the first outcome? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 2 of 4

Independence Random variables X and Y are independent if and only if P(X = x,y = y) = P(X = x)p(y = y). Mathematical examples: If I flip a coin twice, is the second outcome independent from the first outcome? If I draw two socks from my (multicolored) laundry, is the color of the first sock independent from the color of the second sock? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 2 of 4

Independence Intuitive Examples: Independent: you use a Mac / the Hop bus is on schedule snowfall in the Himalayas / your favorite color is blue INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 4

Independence Intuitive Examples: Independent: you use a Mac / the Hop bus is on schedule snowfall in the Himalayas / your favorite color is blue Not independent: you vote for Mitt Romney / you are a Republican there is a traffic jam on 25 / the Broncos are playing INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 4

Independence Sometimes we make convenient assumptions. the values of two dice (ignoring gravity!) the value of the first die and the sum of the values whether it is raining and the number of taxi cabs whether it is raining and the amount of time it takes me to hail a cab the first two words in a sentence INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 4 of 4

Mathematical Foundations INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 1 of 5

Expectation An expectation of a random variable is a weighted average: E[f(X)] = f(x) p(x) (discrete) = x f(x) p(x) dx (continuous) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 2 of 5

Expectation Expectations of constants or known values: E[a] = a E[Y Y = y] = y INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 5

Expectation Intuition Average outcome (might not be an event: 2.4 children) Center of mass INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 4 of 5

Expectation of die / dice What is the expectation of the roll of die? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 5 of 5

Expectation of die / dice What is the expectation of the roll of die? One die 1 1 6 + 2 1 6 + 3 1 6 + 4 1 6 + 5 1 6 + 6 1 6 = INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 5 of 5

Expectation of die / dice What is the expectation of the roll of die? One die 1 1 6 + 2 1 6 + 3 1 6 + 4 1 6 + 5 1 6 + 6 1 6 = 3.5 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 5 of 5

Expectation of die / dice What is the expectation of the roll of die? One die 1 1 6 + 2 1 6 + 3 1 6 + 4 1 6 + 5 1 6 + 6 1 6 = 3.5 What is the expectation of the sum of two dice? INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 5 of 5

Expectation of die / dice What is the expectation of the roll of die? One die 1 1 6 + 2 1 6 + 3 1 6 + 4 1 6 + 5 1 6 + 6 1 6 = 3.5 What is the expectation of the sum of two dice? Two die 2 1 36 +3 2 36 +4 3 36 +5 4 36 +6 5 36 +7 6 36 +8 5 36 +9 4 36 +10 3 36 +11 2 36 +12 1 36 = INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 5 of 5

Expectation of die / dice What is the expectation of the roll of die? One die 1 1 6 + 2 1 6 + 3 1 6 + 4 1 6 + 5 1 6 + 6 1 6 = 3.5 What is the expectation of the sum of two dice? Two die 2 1 36 +3 2 36 +4 3 36 +5 4 36 +6 5 36 +7 6 36 +8 5 36 +9 4 36 +10 3 36 +11 2 36 +12 1 36 = 7 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 5 of 5

Mathematical Foundations INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM DAVE BLEI AND LAUREN HANNAH INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 1 of 5

Entropy Measure of disorder in a system In the real world, entroy in a system tends to increase Can also be applied to probabilities: Is one (or a few) outcomes certain (low entropy) Are things equiprobable (high entropy) In data science We look for features that allow us to reduce entropy (decision trees) All else being equal, we seek models that have maximum entropy (Occam s razor) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 2 of 5

Aside: Logarithms lg(1)=0 lg(2)=1 lg(x) = b 2 b = x Makes big numbers small Way to think about them: cutting a carrot lg(4)=2 lg(8)=3 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 5

Aside: Logarithms lg(1)=0 lg(2)=1 lg(x) = b 2 b = x Makes big numbers small Way to think about them: cutting a carrot lg(4)=2 Negative numbers? lg(8)=3 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 5

Aside: Logarithms lg(1)=0 lg(x) = b 2 b = x lg(2)=1 Makes big numbers small Way to think about them: cutting a carrot Negative numbers? lg(4)=2 Non-integers? lg(8)=3 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 3 of 5

Entropy Entropy is a measure of uncertainty that is associated with the distribution of a random variable: H(X) = E[lg(p(X))] = p(x) lg(p(x)) = x p(x) lg(p(x)) dx (discrete) (continuous) INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 4 of 5

Entropy Entropy is a measure of uncertainty that is associated with the distribution of a random variable: H(X) = E[lg(p(X))] = p(x) lg(p(x)) = x p(x) lg(p(x)) dx (discrete) (continuous) Does not account for the values of the random variable, only the spread of the distribution. H(X) 0 uniform distribution = highest entropy, point mass = lowest suppose P(X = 1) = p, P(X = 0) = 1 p and P(Y = 100) = p, P(Y = 0) = 1 p: X and Y have the same entropy INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 4 of 5

Wrap up Probabilities are the language of data science You ll need to manipulate probabilities and understand marginalization and independence Thursday: Working through probability examples Next week: Conditional probabilities INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Mathematical Foundations 5 of 5