Methods of Mathematics

Similar documents
Methods of Mathematics

Methods of Mathematics

z-test, t-test Kenneth A. Ribet Math 10A November 28, 2017

The remains of the course

Welcome to Math 10A!!

Intro to probability concepts

More on infinite series Antiderivatives and area

Probability concepts. Math 10A. October 33, 2017

Bell-shaped curves, variance

Derivatives: definition and first properties

Differential Equations

Methods of Mathematics

3 Multiple Discrete Random Variables

Discrete Random Variables

CMPSCI 240: Reasoning Under Uncertainty

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Math 111, Introduction to the Calculus, Fall 2011 Midterm I Practice Exam 1 Solutions

Substitution and change of variables Integration by parts

Hypothesis testing and the Gamma function

Last few slides from last time

This is the end. Kenneth A. Ribet. Math 10A December 1, 2016

Topic 3: The Expectation of a Random Variable

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Part 3: Parametric Models

Chapter 5. Means and Variances

CS 361: Probability & Statistics

Steve Smith Tuition: Maths Notes

What is a random variable

Vectors and their uses

56 CHAPTER 3. POLYNOMIAL FUNCTIONS

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

One box per group ( star group of 6)

Functions. If x 2 D, then g(x) 2 T is the object that g assigns to x. Writing the symbols. g : D! T

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

7. Be able to prove Rules in Section 7.3, using only the Kolmogorov axioms.

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

2. If the values for f(x) can be made as close as we like to L by choosing arbitrarily large. lim

University of California, Berkeley, Statistics 134: Concepts of Probability. Michael Lugo, Spring Exam 1

June If you want, you may scan your assignment and convert it to a.pdf file and it to me.

Math Lecture 4 Limit Laws

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

Section 2.8: The Power Chain Rule

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

COS 341: Discrete Mathematics

Derivatives: definition and computation

Probability. Kenneth A. Ribet. Math 10A After Election Day, 2016

CS 361: Probability & Statistics

Algebra Review. Finding Zeros (Roots) of Quadratics, Cubics, and Quartics. Kasten, Algebra 2. Algebra Review

The PROMYS Math Circle Problem of the Week #3 February 3, 2017

Physics 1140 Lecture 6: Gaussian Distributions

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

Part 3: Parametric Models

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 5 Spring 2006

Physics Motion Math. (Read objectives on screen.)

Continuity and One-Sided Limits

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Finding Limits Graphically and Numerically

Before this course is over we will see the need to split up a fraction in a couple of ways, one using multiplication and the other using addition.

Basic Probability Reference Sheet

STA Module 4 Probability Concepts. Rev.F08 1

September 12, Math Analysis Ch 1 Review Solutions. #1. 8x + 10 = 4x 30 4x 4x 4x + 10 = x = x = 10.

Discrete Random Variables. David Gerard Many slides borrowed from Linda Collins

base 2 4 The EXPONENT tells you how many times to write the base as a factor. Evaluate the following expressions in standard notation.

Conceptual Explanations: Simultaneous Equations Distance, rate, and time

CS 237: Probability in Computing

Today s Outline. Biostatistics Statistical Inference Lecture 01 Introduction to BIOSTAT602 Principles of Data Reduction

1 Normal Distribution.

Calculus Favorite: Stirling s Approximation, Approximately

Statistics and Econometrics I

From Probability, For the Enthusiastic Beginner (Draft version, March 2016) David Morin,

2.1 Definition. Let n be a positive integer. An n-dimensional vector is an ordered list of n real numbers.

NOTES ON OCT 4, 2012

Math 31 Lesson Plan. Day 5: Intro to Groups. Elizabeth Gillaspy. September 28, 2011

Mathematica Project 3

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Bernoulli and Binomial

Grades 7 & 8, Math Circles 24/25/26 October, Probability

Expected Value - Revisited

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

MAT137 - Week 12. Last lecture of the term! Next Thursday is the last day of classes, but it counts as a Monday.

Lecture #13 Tuesday, October 4, 2016 Textbook: Sections 7.3, 7.4, 8.1, 8.2, 8.3

Lecture 3. Discrete Random Variables

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Derivatives and series FTW!

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

Chapter 26: Comparing Counts (Chi Square)

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Bernoulli Trials, Binomial and Cumulative Distributions

Probability Density Functions and the Normal Distribution. Quantitative Understanding in Biology, 1.2

QUADRATICS 3.2 Breaking Symmetry: Factoring

1 Review of the dot product

Multivariate Distributions (Hogg Chapter Two)

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Recap of Basic Probability Theory

Transcription:

Methods of Mathematics Kenneth A. Ribet UC Berkeley Math 10B February 30, 2016

Office hours Monday 2:10 3:10 and Thursday 10:30 11:30 in Evans Tuesday 10:30 noon at the SLC Welcome to March!

Meals March 2, 6:30PM dinner at Chengdu Style Restaurant send email to reserve your place March 3, 8AM breakfast full March 4, 12:30PM pop-up Faculty Club lunch just show up! March 18, 8AM breakfast send email to reserve your place

Some variance calculations If Ω = { T, H } and X(T) = 0, X(H) = 1, then E[X] = p, where p is the probability of a head. It follows (board or doc camera) that Var[X] = p(1 p). We write σ 2 for Var[X], by the way. Now imagine the binomial distribution attached to n successive coin flips, and let X be the usual variable that counts the number of heads. Trick: we think of X as X 1 + + X n, where X i is 1 or 0 according as the ith coin flip is a T or H. Cheat: we admit (without checking the definition in detail) that the variables X 1, X 2,..., X n are independent. We do this because the various coin flips have nothing to do with each other.

Some variance calculations If Ω = { T, H } and X(T) = 0, X(H) = 1, then E[X] = p, where p is the probability of a head. It follows (board or doc camera) that Var[X] = p(1 p). We write σ 2 for Var[X], by the way. Now imagine the binomial distribution attached to n successive coin flips, and let X be the usual variable that counts the number of heads. Trick: we think of X as X 1 + + X n, where X i is 1 or 0 according as the ith coin flip is a T or H. Cheat: we admit (without checking the definition in detail) that the variables X 1, X 2,..., X n are independent. We do this because the various coin flips have nothing to do with each other.

Some variance calculations If Ω = { T, H } and X(T) = 0, X(H) = 1, then E[X] = p, where p is the probability of a head. It follows (board or doc camera) that Var[X] = p(1 p). We write σ 2 for Var[X], by the way. Now imagine the binomial distribution attached to n successive coin flips, and let X be the usual variable that counts the number of heads. Trick: we think of X as X 1 + + X n, where X i is 1 or 0 according as the ith coin flip is a T or H. Cheat: we admit (without checking the definition in detail) that the variables X 1, X 2,..., X n are independent. We do this because the various coin flips have nothing to do with each other.

It follows from the linearity of expected value that E[X] = E[X i ] and from the independence of the X i that Var[X] = Var[X i ]. On the other hand, each X i is a simple Bernoulli variable with expected value p and variance p(1 p). Thus: E[X] = np, Var[X] = np(1 p). Now let X := X 1 + + X n, n so X represents the fraction of the time that our coin landed on heads. It is immediate that E[X] = p, Var[X] = p(1 p) n = σ2 n.

When we flip a coin n times, the number of heads divided by n is expected to be p; as n, the fraction in question is close to p with high probability. As an example, let s flip a fair coin (p = 1/2) n times. The probability that there are k heads is 1 2 n ( n k). If we plot this number as a function of k, the graph looks more and more spiked as n gets big.

Flat The distribution of X when n = 1 1.5 1 0.5 0.2 0.4 0.6 0.8 1-0.5

Curved The distribution of X when n = 10 0.25 0.2 0.15 0.1 0.05 0.2 0.4 0.6 0.8 1 Note that ( 10 5 ) = 252, so that ( 10 5 ) /2 10 is about 0.246.

Spiking The distribution of X when n = 100 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.2 0.4 0.6 0.8 1

Spiked The distribution of X when n = 1000 0.025 0.02 0.015 0.01 0.005 0.2 0.4 0.6 0.8 1

In this story, we started with a Bernoulli random variable X ( heads or tails? ) and considered the average of a large number of copies. We could re-do the story with any random variable X as long as the X i continue to be independent copies of X. In stat lingo, the X i are independent, identically distributed random variables. The Law of Large Numbers states roughly that X approaches the expected value of X ( written µ, typically) as n. The correct way to state the Law is to note that the probability space Ω is growing as n. In our coin-flipping example, it has 2 n elements when there are n flips. In the limit, Ω acquires a probability structure that is built from the structures on its finite pieces. The law states that the set of ω Ω for which X(ω) µ is an event whose probability is 1.

Central Limit Theorem According to the textbook, Math 10A veterans will not be surprised by the introduction of n Z := (X µ) σ. Subtracting µ from X gives you a random variable with mean 0; multiplying by scales the variable so its variance is 1. n σ In the examples that we ve done pictorially, p = 1/2, σ 2 = 1/4, so σ = 1/2. We are taking the values of X, which ranged from 0 to 1 and shifting them by subtracting 1/2, thereby getting numbers between 1/2 and +1/2. We are then multiplying by 2 n, so the values range between ± n.

The Central Limit Theorem states that Z is approximately normal for large n. The textbook refers to outside sources, and I ll do the same. There s something that I need to explain, 1 at least to myself. If you plot together the bell curve e x 2 /2 2π and the probability distribution Z, you ll see that Z looks miuch less tall than the bell curve (= normal curve). 0.4 0.3 0.2 0.1-10 -5 5 10

This needs some explanation. We will focus on the case of n flips of a fair coin: The plot of Z runs horizontally from n to n and includes n + 1 points. If we were to estimate the area under the plot, we d add together the areas of rectangles whose widths would be 2 n 1 n, in view of the fact that the n + 1 points divide an interval of length 2 n into n sub-intervals. The heights of the rectangles would be the various probabilities associated with the distribution of Z ; these probabilities sum to 1. Thus our estimate for the area would be 2 n 1 n 1 = 2 n. We want to compare the plot of Z with the bell curve; the area under the bell curve is 1. Accordingly, we expect the plot of Z to be roughly 2 n as tall as the bell curve. In particular, the maximum height of the Z plot should be around 2 n 1 2π.

For example, when n = 10, the maximum height of the Z -plot is around 0.246, as we saw earlier. According to Sage, the value 2 of n 1 when n = 10 is 0.252313252202016. 2π For n = 100, 2 n 1 2π 0.08. That looks pretty much like the height of the dotted curve that we saw two slides back.

As I explained in class, it was a failure on my part not to include a graph showing the exponention curve together with the discrete plot that has been scaled up so that the y-axis is stretched by a factor of n = 100: n 2. Here is that happens when 0.4 0.3 0.2 0.1-10 -5 5 10 This is a pretty good fit!! The results for n = 1000 and n = 10000 are similar and perhaps even more dramatic.

We now have a complete attitude adjustment where we imagine trying to learn about X through sampling. For example, we might know that X corresponds to the flip of a biased coin and would like to know p, the probability of a head. We do lots of coin flips and compile data. The X i are the same as before (so that X i refers to the ith coin flip). The actual flips of our coin generate values of the functions X i, and we write x i for these actual values. Thus the x i are numbers, whereas the X i are functions on the probability space. Jargon: a statistic is a function g of n variables. Then g(x 1,..., X n ) is a function on the probability space; it s a random variable. The quantity g(x 1,..., x n ) is a number. More jargon: a point statistic is a function g that can be used to estimate the mean, variance or standard deviation of X.

Example: the function X is a statistic that estimates the mean of X. If flip a coin 1000 times and observe 678 heads, we would estimate that the coin is biased with p = 0.678. This blows my mind: the statistic 1 n 1 n (X k X) 2 k=1 estimates Var[X]. This is upsetting since Var[X] = E[(X µ) 2 ] and since expected values are estimated by taking averages with n in the denominator. So why do we have n 1? It s because n E[ (X k X) 2 ] = (n 1) Var[X], k=1 as we re about to see.

One point is that X is not E[X] = µ, but only an estimate for µ. Moreover, X = X 1+ +X n n involves the various X k in its definition. As a result, the difference X k X is a combination of the X i for which the coefficient of X k is n 1 n and the coefficients of the other X i are all 1 n. This proves nothing so far but presages a somewhat lengthy computation in which n 1s are likely to pop up. Following the textbook," we will subtract µ from X, X and the X i. This makes their expected values all equal to 0 instead of µ and does not change the differences X k X or the variance of X. Also, we note for distinct j and k that E[X j X k ] = E[X j ]E[X k ] by independence. The right-hand expected values are both 0, so E[X j X k ] = 0. The takeaway is that squares of sums will be sums of squares, when taking expected values cross terms won t contribute.

The next comment is that E[Xj 2 ] = Var[X] for all j. That s because the variables X j are all distributed like X. The upshot is that the quantity to be computed, E[ n k=1 (X k X) 2 ], is just C Var[X], where C is the sum of the coefficients of the various that appear when you write out n k=1 (X k X) 2. It s easy to Xj 2 make mistakes calculating (as you ll see if I try to do this in front of you with the document camera), but you should get C = n 1 if you persevere and don t get spooked by the subscripts.