Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Similar documents
Overview. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Random Processes. DS GA 1002 Probability and Statistics for Data Science.

Learning Objectives for Stat 225

Statistics & Data Sciences: First Year Prelim Exam May 2018

Stat 5101 Lecture Notes

Fundamentals of Applied Probability and Random Processes

Statistics: Learning models from data

Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Convergence of Random Processes

STA 4273H: Statistical Machine Learning

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Lecture 2: Repetition of probability theory and statistics

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Probability Distributions Columns (a) through (d)

Introduction to Machine Learning CMU-10701

STAT 518 Intro Student Presentation

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Descriptive Statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Probability and Statistics for Data Science. Carlos Fernandez-Granda

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Bayesian Regression Linear and Logistic Regression

Algorithms for Uncertainty Quantification

STAT 461/561- Assignments, Year 2015

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Machine learning: Hypothesis testing. Anders Hildeman

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Stat 516, Homework 1

First Year Examination Department of Statistics, University of Florida

Problem 1 (20) Log-normal. f(x) Cauchy

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Probability and Stochastic Processes

Spring 2012 Math 541B Exam 1

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

. Find E(V ) and var(v ).

CPSC 540: Machine Learning

Statistical Methods in Particle Physics

STA 4273H: Sta-s-cal Machine Learning

Class 26: review for final exam 18.05, Spring 2014

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

6 Markov Chain Monte Carlo (MCMC)

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Nonparametric Bayesian Methods - Lecture I

Masters Comprehensive Examination Department of Statistics, University of Florida

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

Statistical Data Analysis

Previously Monte Carlo Integration

Statistical Data Analysis Stat 3: p-values, parameter estimation

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Fundamental Probability and Statistics

Estimation of Quantiles

Non-parametric Inference and Resampling

STA 294: Stochastic Processes & Bayesian Nonparametrics

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Need for Sampling in Machine Learning. Sargur Srihari

Bayesian Methods for Machine Learning

Supplementary Note on Bayesian analysis

HANDBOOK OF APPLICABLE MATHEMATICS

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

8 Basics of Hypothesis Testing

Chapter 6: Random Processes 1

EXAM # 3 PLEASE SHOW ALL WORK!

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Subject CS1 Actuarial Statistics 1 Core Principles

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Density Estimation. Seungjin Choi

CSC 2541: Bayesian Methods for Machine Learning

Master s Written Examination - Solution

10-701/15-781, Machine Learning: Homework 4

Bayesian GLMs and Metropolis-Hastings Algorithm

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Markov Chain Monte Carlo (MCMC)

6.1 Moment Generating and Characteristic Functions

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

An Introduction to Bayesian Linear Regression

1 Exercises for lecture 1

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Lecture 1: Bayesian Framework Basics

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Math 494: Mathematical Statistics

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models

Machine Learning Linear Classification. Prof. Matteo Matteucci

Swarthmore Honors Exam 2012: Statistics

Basic Sampling Methods

Transcription:

Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda

Probability and statistics Probability: Framework for dealing with uncertainty Statistics: Framework for extracting information from data making probabilistic assumptions

Probability Probability basics: probability spaces, conditional probability, independence, conditional independence Random variables: pmf, cdf, pdf, important distributions, functions of random variables Multivariate random variables: joint pmf, joint cdf, joint pdf, marginal distributions, conditional distributions, independence, joint distribution of discrete/continuous random variables

Probability Expectation: definition, mean, median, variance, Markov and Chebyshev inequalities, covariance, correlation coefficient, covariance matrix, conditional expectation Random processes: definition, mean, autocovariance, important processes (iid, Gaussian, Poisson, random walk), Markov chains Convergence: types of convergence, law of large numbers, central limit theorem, convergence of Markov chains Simulation: motivation, inverse-transform sampling, rejection sampling, Markov-chain Monte Carlo

Statistics Descriptive statistics: histogram, empirical mean/variance, order statistics, empirical covariance, principal component analysis Statistical estimation: frequentist perspective, mean square error, consistency, confidence intervals Learning models: method of moments, maximum likelihood, empirical cdf, kernel density estimation

Statistics Hypothesis testing: definitions (null/alternative hypothesis, Type I/II errors), significance level, power, p value, parametric testing, power function, likelihood-ratio test, permutation test, multiple testing, Bonferroni s method Bayesian statistics: prior, likelihood, posterior, posterior mean/mode Linear regression: linear models, least squares, geometric interpretation, probabilistic interpretation, overfitting

Random walk with a drift We define the random walk X as the discrete-state discrete-time random process X (0) := 0, X (i) := X (i 1) + S (i) + 1, i = 1, 2,... where S (i) = { +1 with probability 1 2, 1 with probability 1 2, is an iid sequence of steps

Random walk with a drift What is the mean of this random process? ( ) E X (i)

Random walk with a drift What is the mean of this random process? ( ) i ( ) E X (i) = E S (j) + 1 j=1

Random walk with a drift What is the mean of this random process? ( ) i ( ) E X (i) = E S (j) + 1 = i j=1 j=1 ( ) E S (j) + n

Random walk with a drift What is the mean of this random process? ( ) i ( ) E X (i) = E S (j) + 1 = = i i j=1 j=1 ( ) E S (j) + n

Random walk with a drift What is the autocovariance? Use the fact that the autocovariance of the random walk without drift W that we studied in the lecture notes is R W (i, j) = min {i, j}

Random walk with a drift X (i)

Random walk with a drift X (i) = W (i) + i

Random walk with a drift ) E ( W (i) X (i) = W (i) + i

Random walk with a drift ) E ( W (i) X (i) = W (i) + i = 0

Random walk with a drift ) E ( W (i) X (i) = W (i) + i = 0 R X (i, j)

Random walk with a drift ) E ( W (i) X (i) = W (i) + i = 0 ( ) ( ) ( ) R X (i, j) := E X (i) X (j) E X (i) E X (j)

Random walk with a drift ) E ( W (i) X (i) = W (i) + i = 0 ( ) R X (i, j) := E X (i) X (j) = E ( ) E X (i) ( ) E X (j) (( W (i) + i ) ( W (j) + j )) E ) ) ( W (i) + i E ( W (j) + j

Random walk with a drift ) E ( W (i) X (i) = W (i) + i = 0 ( ) R X (i, j) := E X (i) X (j) ( ) E X (i) ) )) (( W (i) + i ( W (j) + j ( ) E X (j) ) = E E ( W (i) + i ) ) ) = E ( W (i) W (j) + ie ( W (j) + je ( W (i) + ij ie ( W (j) ) je ( W (i) ) ij ) E ( W (j) + j

Random walk with a drift ) E ( W (i) X (i) = W (i) + i = 0 ( ) R X (i, j) := E X (i) X (j) ( ) E X (i) ) )) (( W (i) + i ( W (j) + j ( ) E X (j) ) = E E ( W (i) + i ) ) ) = E ( W (i) W (j) + ie ( W (j) + je ( W (i) + ij ie = min {i, j} ( W (j) ) je ( W (i) ) ij ) E ( W (j) + j

Random walk with a drift Compute the first-order pmf of X (i). Recall that the first-order pmf of the random walk W equals {( i ) 1 i+x if i + x is even and i x i p W (i) (x) = 2 2 i 0 otherwise

Random walk with a drift p X (i) (x)

Random walk with a drift ( ) p X (i) (x) = P X (i) = x

Random walk with a drift ( ) p X (i) (x) = P X (i) = x ) = P ( W (i) = x i

Random walk with a drift ( ) p X (i) (x) = P X (i) = x ) = P ( W (i) = x i = p W (i) (x 1)

Random walk with a drift ( ) p X (i) (x) = P X (i) = x ) = P ( W (i) = x i = p W (i) (x 1) { =

Random walk with a drift ( ) p X (i) (x) = P X (i) = x ) = P ( W (i) = x i = p W (i) (x 1) = {( ix ) 1 2 2 i

Random walk with a drift ( ) p X (i) (x) = P X (i) = x ) = P ( W (i) = x i = p W (i) (x 1) {( ix ) 1 if x is even and 0 x 2i 2 i = 2

Random walk with a drift ( ) p X (i) (x) = P X (i) = x ) = P ( W (i) = x i = p W (i) (x 1) {( ix ) 1 if x is even and 0 x 2i 2 i = 2 0 otherwise

Random walk with a drift Does the process satisfy the Markov condition? p X (i+1) X (1), X (2),..., X (i) (x i+1 x 1, x 2,..., x i ) = p X (i+1) X (i) (x i+1 x i )

Random walk with a drift p X (i+1) X (1), X (2),..., X (i) (x i+1 x 1, x 2,..., x i )

Random walk with a drift p X (i+1) X (1), X (2),..., X (i) (x i+1 x 1, x 2,..., x i ) = P (x i + S ) (i + 1) + 1 = x i+1

Random walk with a drift p X (i+1) X (1), X (2),..., X (i) (x i+1 x 1, x 2,..., x i ) = P (x i + S ) (i + 1) + 1 = x i+1 = p X (i+1) X (i) (x i+1 x i )

Random walk with a drift We observe that X (10) = 16 and X (20) = 30. What is the best estimator for X (21) in terms of probability of error?

Random walk with a drift p X (21) X (10), X (20) (x 16, 30)

Random walk with a drift p X (21) X (10), X (20) (x 16, 30) = p X (21) X (20) (x 30)

Random walk with a drift p X (21) X (10), X (20) (x 16, 30) = p X (21) X (20) (x 30) 1 2 if x = 32 = 1 2 if x = 30 0 otherwise

Markov chain Consider a Markov chain X with transition matrix [ ] a 1 T X :=, 1 a 0 where a is a constant between 0 and 1. We label the two states 0 and 1. The transition matrix T X has two eigenvectors q 1 := [ 1 ] [ ] 1 a 1, q 1 2 := 1 The corresponding eigenvalues are λ 1 := 1 and λ 2 := a 1

Markov chain For what values of a is the Markov chain irreducible?

Markov chain For what values of a is the Markov chain periodic?

Markov chain Express the stationary distribution of X in terms of a p stat

Markov chain Express the stationary distribution of X in terms of a p stat = 1 ( q 1 ) 1 + ( q 1 ) 2 q 1

Markov chain Express the stationary distribution of X in terms of a 1 p stat = q 1 ( q 1 ) 1 + ( q 1 ) 2 = 1 [ ] 1 2 a 1 a

Markov chain Does the Markov chain always converge in probability for all values of a? Justify that this is the case or provide a counterexample.

Markov chain Express the conditional pmf of X (i) conditioned on X (1) = 0 as a function of a and i. (Hint: Computing q 1 + q 2 could be a helpful first step.) Evaluate the expression at a = 0 and a = 1. Does the result make sense?

Markov chain We have q 1 + q 2

Markov chain We have q 1 + q 2 = [ 1 ] [ ] 1 a 1 + 1 1

Markov chain We have q 1 + q 2 = = [ 1 1 a ] [ 1 + 1 1 ] [ 2 a 1 a 0 ]

Markov chain We have q 1 + q 2 = = [ 1 1 a 1 [ 2 a 1 a 0 ] [ 1 + 1 ] ] p X (0)

Markov chain We have q 1 + q 2 = = [ 1 1 a 1 [ 2 a 1 a 0 ] [ 1 + 1 ] ] p X (0) = [ ] 1 0

Markov chain We have q 1 + q 2 = = [ 1 1 a 1 [ 2 a 1 a 0 ] [ 1 + 1 ] ] p X (0) = [ ] 1 0 = 1 a 2 a ( q 1 + q 2 )

Markov chain p X (i)

Markov chain p X (i) = T ĩ X p X (0)

Markov chain p X (i) = T ĩ p X X (0) = T ĩ 1 a X 2 a ( q 1 + q 2 )

Markov chain p X (i) = T ĩ p X X (0) = T ĩ 1 a X 2 a ( q 1 + q 2 ) = 1 a ( λ i 2 a 1 q 1 + λ i 2 q 2)

Markov chain p X (i) = T ĩ p X X (0) = T ĩ 1 a X 2 a ( q 1 + q 2 ) = 1 a ( λ i 2 a 1 q 1 + λ i 2 q 2) = 1 a 2 a ([ 1 ] [ ]) 1 a + (a 1) i 1 1 1

Markov chain p X (i) = T ĩ p X X (0) = T ĩ 1 a X 2 a ( q 1 + q 2 ) = 1 a ( λ i 2 a 1 q 1 + λ i 2 q 2) = 1 a 2 a = 1 2 a ([ 1 1 a ] [ ]) + (a 1) i 1 1 ] 1 [ 1 (a 1) i+1 (1 a) (1 (a 1) i)

Markov chain For a = 1 we have p X (i) = [ ] 1 0

Markov chain For a = 0 we have p X (i) = 1 2 [ ] 1 ( 1) i+1 1 ( 1) i

Markov chain For a = 0 we have [ ] p X (i) = 1 1 ( 1) i+1 2 1 ( 1) i [ ] 0 if i is odd, 1 = [ ] 1 if i is even. 0

Sampling from multivariate distributions We are interested in generating samples from the joint distribution of two random variables X and Y. If we generate a sample x according to the pdf f X and a sample y according to the pdf f Y, are these samples a realization of the joint distribution of X and Y? Explain your answer with a simple example.

Sampling from multivariate distributions Now, assume that X is discrete and Y is continuous. Propose a method to generate a sample from the joint distribution using the pmf of X and the conditional cdf of Y given X using two independent samples from a distribution that is uniform between 0 and 1. Assume that the conditional cdf is invertible.

Sampling from multivariate distributions 1. Obtain two independent samples u 1 and u 2 from the uniform distribution.

Sampling from multivariate distributions 1. Obtain two independent samples u 1 and u 2 from the uniform distribution. 2. Set x to equal the smallest value a such that p X (a) 0 and u 1 F X (a).

Sampling from multivariate distributions 1. Obtain two independent samples u 1 and u 2 from the uniform distribution. 2. Set x to equal the smallest value a such that p X (a) 0 and u 1 F X (a). 3. Define Set y := F 1 x (u 2 ) F x ( ) := F Y X ( x)

Sampling from multivariate distributions Explain how to generate samples from a random variable with pdf f W (w) = 0.1 λ 1 exp ( λ 1 w) + 0.9 λ 2 exp ( λ 2 w), w 0, where λ 1 and λ 2 are positive constants, using two iid uniform samples between 0 and 1.

Sampling from multivariate distributions Let us define a Bernoulli random variable X with parameter 0.9, such that if X = 0 then Y is exponential with parameter λ 1 and if X = 1 then Y is exponential with parameter λ 2

Sampling from multivariate distributions Let us define a Bernoulli random variable X with parameter 0.9, such that if X = 0 then Y is exponential with parameter λ 1 and if X = 1 then Y is exponential with parameter λ 2 The marginal distribution of Y is f Y (w) = p X (0) f Y X (w 0) + p X (1) f Y X (w 1)

Sampling from multivariate distributions Let us define a Bernoulli random variable X with parameter 0.9, such that if X = 0 then Y is exponential with parameter λ 1 and if X = 1 then Y is exponential with parameter λ 2 The marginal distribution of Y is f Y (w) = p X (0) f Y X (w 0) + p X (1) f Y X (w 1) = 0.1 λ 1 exp ( λ 1 w) + 0.9 λ 2 exp ( λ 2 w)

Sampling from multivariate distributions 1. We obtain two independent samples u 1 and u 2 from the uniform distribution.

Sampling from multivariate distributions 1. We obtain two independent samples u 1 and u 2 from the uniform distribution. 2. If u 1 0.1 we set w := 1 ( ) 1 log λ 1 1 u 2 otherwise we set w := 1 ( ) 1 log λ 2 1 u 2

Convergence Let U be a random variable uniformly distributed between 0 and 1. If we define the discrete random process X X (i) = U for all i, does X converge to 1 U in probability?

Convergence Does X converge to 1 U in distribution?

Convergence You draw some iid samples x 1, x 2,... from a Cauchy random variable. Will the empirical mean 1 n n i=1 x i converge in probability as n grows large? Explain why briefly and if the answer is yes state what it converges to.

Convergence You draw m iid samples x 1, x 2,..., x m from a Cauchy random variable. Then you draw iid samples y 1, y 2,... uniformly from {x 1, x 2,..., x m } (each y i is equal to each element of {x 1, x 2,..., x m } with probability 1/m). Will the empirical mean 1 n n i=1 y i converge in probability as n grows large? Explain why very briefly and if the answer is yes state what it converges to.

Earthquake We are interested in learning a model for the occurrence of earthquakes. We decide to model the time between earthquakes as an exponential random variable with parameter λ. Compute the maximum-likelihood estimate of λ given t 1, t 2,..., t n, which are interarrival times for past earthquakes. Assume that the data are iid.

Earthquake L (λ)

Earthquake L (λ) := f T (1),..., T (n) (t 1,..., t n )

Earthquake L (λ) := f T (1),..., T (n) (t 1,..., t n ) n = λ exp ( λt i ) i=1

Earthquake L (λ) := f T (1),..., T (n) (t 1,..., t n ) n = λ exp ( λt i ) i=1 ( = λ n exp λ ) n t i i=1

Earthquake L (λ) := f T (1),..., T (n) (t 1,..., t n ) n = λ exp ( λt i ) i=1 ( = λ n exp λ ) n t i i=1 log L (λ)

Earthquake L (λ) := f T (1),..., T (n) (t 1,..., t n ) n = λ exp ( λt i ) i=1 ( = λ n exp λ ) n t i i=1 log L (λ) = n log λ λ n i=1 t i

Earthquake d log L t1,...,t n (λ) dλ

Earthquake d log L t1,...,t n (λ) dλ = n λ n i=1 t i

Earthquake d log L t1,...,t n (λ) dλ = n λ n i=1 t i d 2 log L t1,...,t n (λ) dλ 2

Earthquake d log L t1,...,t n (λ) dλ = n λ n i=1 t i d 2 log L t1,...,t n (λ) dλ 2 = n λ 2

Earthquake d log L t1,...,t n (λ) dλ = n λ n i=1 t i d 2 log L t1,...,t n (λ) dλ 2 = n λ 2 λ ML

Earthquake d log L t1,...,t n (λ) dλ = n λ n i=1 t i d 2 log L t1,...,t n (λ) dλ 2 = n λ 2 1 λ ML = 1 n n i=1 t i

Earthquake Find an approximate 0.95 confidence interval based on the central limit theorem for the value of λ. Assume that you know a bound b on the standard deviation (i.e. the variance of the exponential 1/λ 2 is bounded by b 2 ) and express your answer using the Q function. (Hint: Express the ML estimate in terms of the empirical mean.) (See solutions.)

Earthquake What is the posterior distribution of the parameter Λ if we model it as a random variable with a uniform distribution between 0 and u? Express your answer in terms of the sum n i=1 t i, u and the marginal pdf of the data evaluated at t 1, t 2,..., t n c := f T (1),..., T (n) (t 1,..., t n ).

Earthquake f Λ T (1),..., T (n) (λ t 1,..., t n )

Earthquake f Λ T (1),..., T (n) (λ t 1,..., t n ) = f Λ (λ) λ n exp ( λ n i=1 t i) f T (1),..., T (n) (t 1,..., t n )

Earthquake f Λ T (1),..., T (n) (λ t 1,..., t n ) = f Λ (λ) λ n exp ( λ n i=1 t i) f T (1),..., T (n) (t 1,..., t n ) = 1 u c λn exp ( λ ) n t i i=1

Earthquake f Λ T (1),..., T (n) (λ t 1,..., t n ) = f Λ (λ) λ n exp ( λ n i=1 t i) f T (1),..., T (n) (t 1,..., t n ) = 1 u c λn exp ( λ for 0 λ u and zero otherwise ) n t i i=1

Earthquake f Λ T (1),..., T (n) (λ t1,..., tn) λ

Earthquake Explain how you would use the answer in the previous question to construct a confidence interval for the parameter

Chad You hate a coworker and want to predict when he is in the office from the temperature. Chad 61 65 59 61 61 65 61 63 63 59 No Chad 68 70 68 64 64 - - - - - You model his presence using a random variable C which is equal to 1 if he is there and 0 if he is not. Estimate p C.

Chad The empirical pmf is p C (0) = 5 15 = 1 3, p C (1) = 10 15 = 2 3.

Chad You model the temperature using a random variable T. Sketch the kernel density estimator of the conditional distribution of T given C using a rectangular kernel with width equal to 2.

Chad 0.20 f T C (t 0) 0.15 f T C (t 1) 0.10 0.05 0.00 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75

Chad If T = 68 what is the ML estimate of C?

Chad If T = 68 what is the ML estimate of C? f T C (68 0) = 0.2 f T C (68 1) = 0

Chad If T = 64 what is the MAP estimate of C?

Chad If T = 64 what is the MAP estimate of C? p C T (0 64)

Chad If T = 64 what is the MAP estimate of C? p C (0) f T C (64 0) p C T (0 64) = p C (0) f T C (64 0) + p C (1) f T C (64 1)

Chad If T = 64 what is the MAP estimate of C? p C (0) f T C (64 0) p C T (0 64) = p C (0) f T C (64 0) + p C (1) f T C (64 1) = 1 3 0.2 1 3 0.2 + 2 3 0.1

Chad If T = 64 what is the MAP estimate of C? p C (0) f T C (64 0) p C T (0 64) = p C (0) f T C (64 0) + p C (1) f T C (64 1) = = 1 2 1 3 0.2 1 3 0.2 + 2 3 0.1

Chad If T = 64 what is the MAP estimate of C? p C (0) f T C (64 0) p C T (0 64) = p C (0) f T C (64 0) + p C (1) f T C (64 1) = = 1 2 1 3 0.2 1 3 0.2 + 2 3 0.1 p C T (1 64)

Chad If T = 64 what is the MAP estimate of C? p C (0) f T C (64 0) p C T (0 64) = p C (0) f T C (64 0) + p C (1) f T C (64 1) = = 1 2 1 3 0.2 1 3 0.2 + 2 3 0.1 p C T (1 64) = 1 p C T (0 64)

Chad If T = 64 what is the MAP estimate of C? p C (0) f T C (64 0) p C T (0 64) = p C (0) f T C (64 0) + p C (1) f T C (64 1) = = 1 2 1 3 0.2 1 3 0.2 + 2 3 0.1 p C T (1 64) = 1 p C T (0 64) = 1 2

Chad What happens if the temperature is 57? Explain how using parametric estimation may alleviate this problem.

3-point shooting The New York Knicks hire you as a data analyst. Your first task is to come up with a way to determine whether a 3-point shooter is any good. You will use the following graph of the function g (θ, n) = θ n. g(θ,n) 0.950 0.900 0.850 0.800 0.750 0.700 0.650 0.600 0.550 0.500 0.450 0.400 0.350 0.300 0.250 0.200 0.150 0.100 0.050 0.005 n = 4 n = 9 n = 14 n = 19 n = 24 0.5 0.6 0.7 0.8 0.9 1.0 θ

3-point shooting 1. Interpret g (θ, n).

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis?

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%?

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%? 14

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%? 14 4. A player makes 9 shots in a row. What is the corresponding p value? Do you declare him as a good shooter?

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%? 14 4. A player makes 9 shots in a row. What is the corresponding p value? Do you declare him as a good shooter? 0.14

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%? 14 4. A player makes 9 shots in a row. What is the corresponding p value? Do you declare him as a good shooter? 0.14 5. What is the probability that you do not declare a player who has a shooting percentage of 90% as a good shooter?

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%? 14 4. A player makes 9 shots in a row. What is the corresponding p value? Do you declare him as a good shooter? 0.14 5. What is the probability that you do not declare a player who has a shooting percentage of 90% as a good shooter? 1 g (0.9, 14) 0.76

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%? 14 4. A player makes 9 shots in a row. What is the corresponding p value? Do you declare him as a good shooter? 0.14 5. What is the probability that you do not declare a player who has a shooting percentage of 90% as a good shooter? 1 g (0.9, 14) 0.76 6. You apply the test on 10 players. You adapt the threshold applying Bonferroni s method. What is the new threshold?

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%? 14 4. A player makes 9 shots in a row. What is the corresponding p value? Do you declare him as a good shooter? 0.14 5. What is the probability that you do not declare a player who has a shooting percentage of 90% as a good shooter? 1 g (0.9, 14) 0.76 6. You apply the test on 10 players. You adapt the threshold applying Bonferroni s method. What is the new threshold? n = 24

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%? 14 4. A player makes 9 shots in a row. What is the corresponding p value? Do you declare him as a good shooter? 0.14 5. What is the probability that you do not declare a player who has a shooting percentage of 90% as a good shooter? 1 g (0.9, 14) 0.76 6. You apply the test on 10 players. You adapt the threshold applying Bonferroni s method. What is the new threshold? n = 24 7. With the correction, what is the probability that you do not declare a player who has a shooting percentage of 90% as a good shooter?

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%? 14 4. A player makes 9 shots in a row. What is the corresponding p value? Do you declare him as a good shooter? 0.14 5. What is the probability that you do not declare a player who has a shooting percentage of 90% as a good shooter? 1 g (0.9, 14) 0.76 6. You apply the test on 10 players. You adapt the threshold applying Bonferroni s method. What is the new threshold? n = 24 7. With the correction, what is the probability that you do not declare a player who has a shooting percentage of 90% as a good shooter? 1 g (0.9, 14) 0.92

3-point shooting 1. Interpret g (θ, n). 2. The coach tells you: I want to make sure that the guy has a shooting percentage over 80%. What is your null hypothesis? 3. What number of shots does a player need to make in a row for you to reject the null hypothesis with a confidence level of 5%? 14 4. A player makes 9 shots in a row. What is the corresponding p value? Do you declare him as a good shooter? 0.14 5. What is the probability that you do not declare a player who has a shooting percentage of 90% as a good shooter? 1 g (0.9, 14) 0.76 6. You apply the test on 10 players. You adapt the threshold applying Bonferroni s method. What is the new threshold? n = 24 7. With the correction, what is the probability that you do not declare a player who has a shooting percentage of 90% as a good shooter? 1 g (0.9, 14) 0.92 8. What is the advantage of adapting the threshold? What is the disadvantage?