Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Similar documents
Problem Set 2 Solutions

Machine Learning Brett Bernstein

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Matrix Representation of Data in Experiment

Stat 139 Homework 7 Solutions, Fall 2015

Lecture 19: Convergence

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Chapter Vectors

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

STAT Homework 1 - Solutions

IE 230 Probability & Statistics in Engineering I. Closed book and notes. No calculators. 120 minutes.

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 7: Properties of Random Samples

Efficient GMM LECTURE 12 GMM II

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

7.1 Convergence of sequences of random variables

Math Solutions to homework 6

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Estimation for Complete Data

Distribution of Random Samples & Limit theorems

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Problem Set 4 Due Oct, 12

1 Review of Probability & Statistics

An Introduction to Randomized Algorithms

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

( ) = p and P( i = b) = q.

STAT Homework 2 - Solutions

Learning Theory: Lecture Notes

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Statistical Properties of OLS estimators

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Simulation. Two Rule For Inverting A Distribution Function

Notes 27 : Brownian motion: path properties

This section is optional.

Properties and Hypothesis Testing

10-701/ Machine Learning Mid-term Exam Solution

Solution to Chapter 2 Analytical Exercises

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Math 61CM - Solutions to homework 3

1 Inferential Methods for Correlation and Regression Analysis

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

Output Analysis and Run-Length Control

Chapter 1 Simple Linear Regression (part 6: matrix version)

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Statistics 20: Final Exam Solutions Summer Session 2007

STATISTICAL INFERENCE

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Intro to Learning Theory

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16

Chapter 6 Principles of Data Reduction

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Element sampling: Part 2

7.1 Convergence of sequences of random variables

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Statistical Inference Based on Extremum Estimators

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Linear Regression Demystified

2.1. The Algebraic and Order Properties of R Definition. A binary operation on a set F is a function B : F F! F.

Data Analysis and Statistical Methods Statistics 651

Lecture 20: Multivariate convergence and the Central Limit Theorem

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Topic 9: Sampling Distributions of Estimators

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Chapter 2 The Monte Carlo Method

1 Introduction to reducing variance in Monte Carlo simulations

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

TAMS24: Notations and Formulas

Correlation Regression

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Topic 9: Sampling Distributions of Estimators

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

AMS570 Lecture Notes #2

Solutions to Homework 6

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

11 Correlation and Regression

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Frequentist Inference

Transcription:

Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet colums. Sice b R m ad the dimesio of R m is larger tha, there are vectors i R m that are ot i the rage of A. If b is equal to oe of these vectors, the equatio does ot have a solutio. b. I this case the dimesio of the rage of A is m because the rak of the matrix is m ad hece the matrix has m liearly idepedet colums. The rage is cosequetly equal to R m ad icludes every possible vector b. The equatio always has a solutio. c. The liear model ca be writte i matrix form as Ax b, where A R 50 50 cotais the average temperatures of each state as its colums ad b R 50 cotais the team s scores. The system of equatios has a solutio as log as A will be ivertible, which will be the case if the vectors of temperatures from each state are liearly idepedet. Of course this does ot mea that the model will geeralize, we are just overfittig the data. d. We use a radom variable X to represet the icome. Applyig Markov s iequality Applyig Chebyshev s iequality, P (X > 50000) E (X) 50000 5. () P (X > 50000) P ( X 0000 > 40000) () Var (X) 40000 (3) 0.04. (4) e. Type I errors are false positives, we wat to miimize them whe our aim is to make sure that a pheomeo that we are observig is ot just geerated by radom fluctuatios i the data. f. Type II errors are false egatives, we wat to miimize them whe our aim is detect patters i the data (that may or may ot be the product of radom fluctuatios). g. First we fid the cdf The pdf of Y is the derivative of the cdf, F Y (y) P (Y y) (5) P (X y) (6) y 0 dx (7) y. (8) f Y y { y if 0 y, 0 otherwise. (9)

. Chad a. The empirical pmf is b. The kerel desity estimator is of the form p C (0) 5 5 3, (0) p C () 0 5 3. () f T C (t 0) 0 0 Π i f T C (t ) Π i ( t d0,i ( t d,i ), () ), (3) where Π is a rectagular kerel with uit width, d,0,..., d,0 the temperatures whe Chad is ot there ad d,,..., d, the temperatures whe he is there. The estimator is show i Figure. c. We have so the ML estimate is that Chad is ot at the office. d. Applyig Bayes rule, p C T (0 64) (4) f T C (68 0) 0. > 0 f T C (68 ), (5) p C (0) f T C (64 0) p C (0) f T C (64 0) + p C () f T C (64 ) (6) 0. 3 0. + 0. (7) 3 3, (8) p C T (0 64) p C T (0 64) (9). (0) The MAP estimate is that there is a 50 % chace that Chad is there. e. Both f T C (57 0) ad f T C (68 ) are zero so the ML ad MAP estimates are icoclusive. If we use a parametric distributio such as a Gaussia to fit the data, the f T C (57 0) ad f T C (68 ) would ot be set to zero as log as the distributio has ozero values o all of the real lie (as is the case for a Gaussia pdf). This would allow us to apply MAP or ML estimatio. A oparametric solutio would be to use a kerel with a larger width. 3. 3-poit shootig a. Uder the assumptio that the shots are idepedet. g (θ, ) is the probability that the player makes or more shots i a row if the probability that he makes each shot is θ. Page of 9 DS-GA 00, Fall 05

0.0 f T C (t 0) 0.5 f T C (t ) 0.0 0.05 0.00 55 56 57 58 59 60 6 6 63 64 65 66 67 68 69 70 7 7 73 74 75 Figure : Kerel desity estimator for Problem. b. The ull hypothesis is that the player s shootig percetage is below 80%, i.e. θ < 0.8. c. From the figure, the player eeds to make at least 4 shots. d. The p value is aroud 3%. This is ot eough to reject the ull hypothesis, so you do ot declare him to be a good shooter. e. The probability that the umber of made shots is less tha 4 if the player has a 90% shootig percetage is g (0.9, 4) 0.76. You miss such a player 76% of the time! f. The ew threshold is 4 because we oly reject the ull hypothesis if the p value is below 0.05/0 0.005. g. The probability is ow g (0.9, 4) 0.9. h. The advatage is that we reduce the probability of false positives: we cotrol for the fact that amog may players there is boud to be oe that makes may shots i a radom just by sheer luck. The disadvatage is that the threshold is ow very strict. It is highly ulikely that we will reject the ull hypothesis for ay player, eve if he has a high shootig percetage. 4. Liear regressio with a itercept a. The least-squares problem is mi α,β y x β α b. {/ } is a basis of spa (). Takig ito accout that. () T (). (3) Sample Fial Problems Solutios Page 3 of 9

g(θ,) 0.950 0.900 0.850 0.800 0.750 0.700 0.650 0.600 0.550 0.500 0.450 0.400 0.350 0.300 0.50 0.00 0.50 0.00 0.050 0.005 4 9 4 9 4 0.5 0.6 0.7 0.8 0.9.0 θ Figure : Graph for Problem. We have P spa() (x) x P spa() (x) (4) ( ( ) T x x) (5) x T x. (6) Projectig oto spa () is equivalet to subtractig the average value of x from each etry. c. We ca write x T x + x so ay vector w i the spa of ad x, w a + bx (7) ( ) a + bt x + b x, (8) also belogs to the spa of ad x. To show that ay vector i the spa of ad x belogs to the spa of ad x we apply the same argumet sice x x T x. d. The least-squares problem is mi α, β y x α β. (9) Page 4 of 9 DS-GA 00, Fall 05

e. The solutio is So βls α LS ( T ) T x x x y (30) 0 0 x T x T y x T y (3). (3) T y x T y x T x i β LS y i, (33) α LS xt y x. (34) f. Sice α LS ad β LS are solutios of the least-squares problem P spa(,x) (y) x β LS (35) α LS β LS x + α LS β LS x + α LS. (36) β LS x + α LS Similarly, P spa(, x) (y) x βls α LS Recall that spa (, x) P spa(, x). This establishes the equality. 5. Foxes ad rabbits (37) β LS x + α LS β LS x + α LS (38) β LS x + α LS a. The umber of rabbits i year is equal to.06 times the rabbits i year mius the umber of foxes i year times 0.6. The umber of foxes i year is equal to 0.84 times the foxes i year plus the umber of rabbits i year times 0.06. b. If that is the case, the populatio becomes extict. Let the umber of rabbits ad foxes be equal to a arbitrary umber m, r m lim lim A (39) m f lim ma u (40) m lim λ u (4) 0. (4) 0 Sample Fial Problems Solutios Page 5 of 9

c. Usig the formula u u (43) 0.8 0.3 0.3 0.8. (44) 0.6.6 Sice we have 00 00 0.6.6 00 40 00 00 0.6.6 00 60 (45) (46) x 00 u + 40 u, (47) x 00 u + 60 u. (48) d. The populatios ted to 60 rabbits ad 60 foxes. e. The populatios ted to r lim f r lim f lim A x (49) lim 00A u + 40A u (50) lim 00λ u + 40λ u (5) 00 u (5) 60 60. (53) lim A x (54) lim 00A u + 60A u (55) lim 00λ u + 60λ u (56) 00 u (57) 60 60. (58) The rabbits ad the foxes disappear. f. For a arbitrary populatio of r rabbits ad f foxes r (r f) 0.6.6 f.6f 0.6r (59) Page 6 of 9 DS-GA 00, Fall 05

so r (r f) u f + (.6f 0.6r) u. (60) The populatios ted to r lim f lim A x (6) lim (r f) A u + (.6f 0.6r) A u (6) lim (r f) λ u + (.6f 0.6r) λ u (63) (r f) u (64).6 (r f). (65) 0.6 The umber is positive oly if r f 0, i.e. if there are more rabbits tha foxes. 6. Defective pixels. a. We model the umber of defective pixels i each TV by a radom variable D i ad the average umber of defective pixels by a radom variable D. By liearity of expectatio ( ) 00 E (D) E D i (66) 00 00 i 00 Assumig that the TVs are sampled idepedetly, ( ) 00 Var (D) Var D i 00 Now, applyig Chebyshev s iequality i E (D i ) (67) 0. (68) 00 4 00 00 i i (69) Var (D i ) (70) 0.6. (7) P (D 9.) P ( D E (D) 0.9) (7) VarD 0.9 (73) 0.6 0.0. 0.8 (74) b. The ull hypothesis is that the mea of D is greater or equal to 0. Sample Fial Problems Solutios Page 7 of 9

c. The result is ot statistically sigificat because uder the ull hypothesis the probability of the observed results is 0. > 0.05. d. By the CLT, D ca be approximated as a Gaussia radom variable with mea 0 ad variace 0.6. Equivaletly U : D 0 is a Gaussia with zero mea ad uit variace. As 0.4 a result, P (D 9.) P (0.4U + 0 9.) (75) ( P U 0.9 ) (76) 0.4 ( ) 0.9 Q by symmetry (77) 0.4 Q (.5). 0. (78) Note that cosiderig ay mea greater tha 0 would yield a smaller p value by the same argumet. This implies that the probability of the observed results uder the ull hypothesis is bouded by. 0 < 0.05 so the result is sigificat (uder the assumptio that the CLT approximatio is valid). e. Boferroi s method is meat for situatios i which we perform several tests at the same time. Here we oly have oe test that ivolves several TVs. 7. (0 poits) Camera measuremet. a. Note that E (A) E (A ) p ad E (X ) µ + σx. We have b. E(Y ) E(AX + Z) (79) E(A) E(X) + E(Z) by idepedece of A ad X (80) pµ. E ( Y ) ) E((AX + Z) ) (8) E(A X + Z + AXZ) (83) E(A ) E(X ) + E(Z ) + E (A) E (X) E (Z) by idepedece of A, X ad Z p ( µ + σ X) + σ Z. (84) Var(Y ) E ( Y ) ) E (Y ) (85) p ( µ + σ X) + σ Z p µ (86) pσ X + p( p)µ + σ Z. (87) (8) Cov(X, Y ) E (XY ) E (X) E (Y ) (88) E ( AX + XZ ) pµ (89) E (A) E ( X ) + E (X) E (Z) pµ (90) p ( ) µ + σx pµ (9) pσx. (9) Page 8 of 9 DS-GA 00, Fall 05

c. The best MSE liear estimate is give by X LMMSE Cov(X, Y ) (Y E(Y )) + E(X) (93) Var(Y ) pσx (Y pµ) pσx + p( + µ. (94) p)µ + σz d. Whe p 0 the X LMMSE µ. I this case Y 0 with probability. This implies that X ad Y are idepedet, so the best MSE estimate is E (X Y ) E (X) µ. Sample Fial Problems Solutios Page 9 of 9